 4f0b3e0b95
			
		
	
	
		4f0b3e0b95
		
	
	
	
	
		
			
			Convert docs/devel/multiple-iothreads.txt to rST format. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20240816132212.3602106-5-peter.maydell@linaro.org
		
			
				
	
	
		
			140 lines
		
	
	
		
			6.4 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			140 lines
		
	
	
		
			6.4 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
| Using Multiple ``IOThread``\ s
 | |
| ==============================
 | |
| 
 | |
| ..
 | |
|    Copyright (c) 2014-2017 Red Hat Inc.
 | |
| 
 | |
|    This work is licensed under the terms of the GNU GPL, version 2 or later.  See
 | |
|    the COPYING file in the top-level directory.
 | |
| 
 | |
| 
 | |
| This document explains the ``IOThread`` feature and how to write code that runs
 | |
| outside the BQL.
 | |
| 
 | |
| The main loop and ``IOThread``\ s
 | |
| ---------------------------------
 | |
| QEMU is an event-driven program that can do several things at once using an
 | |
| event loop.  The VNC server and the QMP monitor are both processed from the
 | |
| same event loop, which monitors their file descriptors until they become
 | |
| readable and then invokes a callback.
 | |
| 
 | |
| The default event loop is called the main loop (see ``main-loop.c``).  It is
 | |
| possible to create additional event loop threads using
 | |
| ``-object iothread,id=my-iothread``.
 | |
| 
 | |
| Side note: The main loop and ``IOThread`` are both event loops but their code is
 | |
| not shared completely.  Sometimes it is useful to remember that although they
 | |
| are conceptually similar they are currently not interchangeable.
 | |
| 
 | |
| Why ``IOThread``\ s are useful
 | |
| ------------------------------
 | |
| ``IOThread``\ s allow the user to control the placement of work.  The main loop is a
 | |
| scalability bottleneck on hosts with many CPUs.  Work can be spread across
 | |
| several ``IOThread``\ s instead of just one main loop.  When set up correctly this
 | |
| can improve I/O latency and reduce jitter seen by the guest.
 | |
| 
 | |
| The main loop is also deeply associated with the BQL, which is a
 | |
| scalability bottleneck in itself.  vCPU threads and the main loop use the BQL
 | |
| to serialize execution of QEMU code.  This mutex is necessary because a lot of
 | |
| QEMU's code historically was not thread-safe.
 | |
| 
 | |
| The fact that all I/O processing is done in a single main loop and that the
 | |
| BQL is contended by all vCPU threads and the main loop explain
 | |
| why it is desirable to place work into ``IOThread``\ s.
 | |
| 
 | |
| The experimental ``virtio-blk`` data-plane implementation has been benchmarked and
 | |
| shows these effects:
 | |
| ftp://public.dhe.ibm.com/linux/pdfs/KVM_Virtualized_IO_Performance_Paper.pdf
 | |
| 
 | |
| .. _how-to-program:
 | |
| 
 | |
| How to program for ``IOThread``\ s
 | |
| ----------------------------------
 | |
| The main difference between legacy code and new code that can run in an
 | |
| ``IOThread`` is dealing explicitly with the event loop object, ``AioContext``
 | |
| (see ``include/block/aio.h``).  Code that only works in the main loop
 | |
| implicitly uses the main loop's ``AioContext``.  Code that supports running
 | |
| in ``IOThread``\ s must be aware of its ``AioContext``.
 | |
| 
 | |
| AioContext supports the following services:
 | |
|  * File descriptor monitoring (read/write/error on POSIX hosts)
 | |
|  * Event notifiers (inter-thread signalling)
 | |
|  * Timers
 | |
|  * Bottom Halves (BH) deferred callbacks
 | |
| 
 | |
| There are several old APIs that use the main loop AioContext:
 | |
|  * LEGACY ``qemu_aio_set_fd_handler()`` - monitor a file descriptor
 | |
|  * LEGACY ``qemu_aio_set_event_notifier()`` - monitor an event notifier
 | |
|  * LEGACY ``timer_new_ms()`` - create a timer
 | |
|  * LEGACY ``qemu_bh_new()`` - create a BH
 | |
|  * LEGACY ``qemu_bh_new_guarded()`` - create a BH with a device re-entrancy guard
 | |
|  * LEGACY ``qemu_aio_wait()`` - run an event loop iteration
 | |
| 
 | |
| Since they implicitly work on the main loop they cannot be used in code that
 | |
| runs in an ``IOThread``.  They might cause a crash or deadlock if called from an
 | |
| ``IOThread`` since the BQL is not held.
 | |
| 
 | |
| Instead, use the ``AioContext`` functions directly (see ``include/block/aio.h``):
 | |
|  * ``aio_set_fd_handler()`` - monitor a file descriptor
 | |
|  * ``aio_set_event_notifier()`` - monitor an event notifier
 | |
|  * ``aio_timer_new()`` - create a timer
 | |
|  * ``aio_bh_new()`` - create a BH
 | |
|  * ``aio_bh_new_guarded()`` - create a BH with a device re-entrancy guard
 | |
|  * ``aio_poll()`` - run an event loop iteration
 | |
| 
 | |
| The ``qemu_bh_new_guarded``/``aio_bh_new_guarded`` APIs accept a
 | |
| ``MemReentrancyGuard``
 | |
| argument, which is used to check for and prevent re-entrancy problems. For
 | |
| BHs associated with devices, the reentrancy-guard is contained in the
 | |
| corresponding ``DeviceState`` and named ``mem_reentrancy_guard``.
 | |
| 
 | |
| The ``AioContext`` can be obtained from the ``IOThread`` using
 | |
| ``iothread_get_aio_context()`` or for the main loop using
 | |
| ``qemu_get_aio_context()``. Code that takes an ``AioContext`` argument
 | |
| works both in ``IOThread``\ s or the main loop, depending on which ``AioContext``
 | |
| instance the caller passes in.
 | |
| 
 | |
| How to synchronize with an ``IOThread``
 | |
| ---------------------------------------
 | |
| Variables that can be accessed by multiple threads require some form of
 | |
| synchronization such as ``qemu_mutex_lock()``, ``rcu_read_lock()``, etc.
 | |
| 
 | |
| ``AioContext`` functions like ``aio_set_fd_handler()``,
 | |
| ``aio_set_event_notifier()``, ``aio_bh_new()``, and ``aio_timer_new()``
 | |
| are thread-safe. They can be used to trigger activity in an ``IOThread``.
 | |
| 
 | |
| Side note: the best way to schedule a function call across threads is to call
 | |
| ``aio_bh_schedule_oneshot()``.
 | |
| 
 | |
| The main loop thread can wait synchronously for a condition using
 | |
| ``AIO_WAIT_WHILE()``.
 | |
| 
 | |
| ``AioContext`` and the block layer
 | |
| ----------------------------------
 | |
| The ``AioContext`` originates from the QEMU block layer, even though nowadays
 | |
| ``AioContext`` is a generic event loop that can be used by any QEMU subsystem.
 | |
| 
 | |
| The block layer has support for ``AioContext`` integrated.  Each
 | |
| ``BlockDriverState`` is associated with an ``AioContext`` using
 | |
| ``bdrv_try_change_aio_context()`` and ``bdrv_get_aio_context()``.
 | |
| This allows block layer code to process I/O inside the
 | |
| right ``AioContext``.  Other subsystems may wish to follow a similar approach.
 | |
| 
 | |
| Block layer code must therefore expect to run in an ``IOThread`` and avoid using
 | |
| old APIs that implicitly use the main loop.  See
 | |
| `How to program for IOThreads`_ for information on how to do that.
 | |
| 
 | |
| Code running in the monitor typically needs to ensure that past
 | |
| requests from the guest are completed.  When a block device is running
 | |
| in an ``IOThread``, the ``IOThread`` can also process requests from the guest
 | |
| (via ioeventfd).  To achieve both objects, wrap the code between
 | |
| ``bdrv_drained_begin()`` and ``bdrv_drained_end()``, thus creating a "drained
 | |
| section".
 | |
| 
 | |
| Long-running jobs (usually in the form of coroutines) are often scheduled in
 | |
| the ``BlockDriverState``'s ``AioContext``.  The functions
 | |
| ``bdrv_add``/``remove_aio_context_notifier``, or alternatively
 | |
| ``blk_add``/``remove_aio_context_notifier`` if you use ``BlockBackends``,
 | |
| can be used to get a notification whenever ``bdrv_try_change_aio_context()``
 | |
| moves a ``BlockDriverState`` to a different ``AioContext``.
 |