
* Update to QEMU v9.0.0 --------- Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: Ido Plat <ido.plat@ibm.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Signed-off-by: Gregory Price <gregory.price@memverge.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Lorenz Brun <lorenz@brun.one> Signed-off-by: Yao Xingtao <yaoxt.fnst@fujitsu.com> Signed-off-by: Arnaud Minier <arnaud.minier@telecom-paris.fr> Signed-off-by: Inès Varhol <ines.varhol@telecom-paris.fr> Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu> Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Sven Schnelle <svens@stackframe.org> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Avihai Horon <avihaih@nvidia.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: Joonas Kankaala <joonas.a.kankaala@gmail.com> Signed-off-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org> Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Signed-off-by: Glenn Miles <milesg@linux.ibm.com> Signed-off-by: Oleg Sviridov <oleg.sviridov@red-soft.ru> Signed-off-by: Artem Chernyshev <artem.chernyshev@red-soft.ru> Signed-off-by: Yajun Wu <yajunw@nvidia.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Signed-off-by: Pierre-Clément Tosi <ptosi@google.com> Signed-off-by: Lei Wang <lei4.wang@intel.com> Signed-off-by: Wei Wang <wei.w.wang@intel.com> Signed-off-by: Martin Hundebøll <martin@geanix.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> Signed-off-by: Wafer <wafer@jaguarmicro.com> Signed-off-by: Yuxue Liu <yuxue.liu@jaguarmicro.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Nguyen Dinh Phi <phind.uet@gmail.com> Signed-off-by: Zack Buhman <zack@buhman.org> Signed-off-by: Keith Packard <keithp@keithp.com> Signed-off-by: Yuquan Wang wangyuquan1236@phytium.com.cn Signed-off-by: Matheus Tavares Bernardino <quic_mathbern@quicinc.com> Signed-off-by: Cindy Lu <lulu@redhat.com> Co-authored-by: Peter Maydell <peter.maydell@linaro.org> Co-authored-by: Fabiano Rosas <farosas@suse.de> Co-authored-by: Peter Xu <peterx@redhat.com> Co-authored-by: Thomas Huth <thuth@redhat.com> Co-authored-by: Cédric Le Goater <clg@redhat.com> Co-authored-by: Zheyu Ma <zheyuma97@gmail.com> Co-authored-by: Ido Plat <ido.plat@ibm.com> Co-authored-by: Ilya Leoshkevich <iii@linux.ibm.com> Co-authored-by: Markus Armbruster <armbru@redhat.com> Co-authored-by: Marc-André Lureau <marcandre.lureau@redhat.com> Co-authored-by: Paolo Bonzini <pbonzini@redhat.com> Co-authored-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Co-authored-by: David Hildenbrand <david@redhat.com> Co-authored-by: Kevin Wolf <kwolf@redhat.com> Co-authored-by: Stefan Reiter <s.reiter@proxmox.com> Co-authored-by: Fiona Ebner <f.ebner@proxmox.com> Co-authored-by: Gregory Price <gregory.price@memverge.com> Co-authored-by: Lorenz Brun <lorenz@brun.one> Co-authored-by: Yao Xingtao <yaoxt.fnst@fujitsu.com> Co-authored-by: Philippe Mathieu-Daudé <philmd@linaro.org> Co-authored-by: Arnaud Minier <arnaud.minier@telecom-paris.fr> Co-authored-by: BALATON Zoltan <balaton@eik.bme.hu> Co-authored-by: Igor Mammedov <imammedo@redhat.com> Co-authored-by: Akihiko Odaki <akihiko.odaki@daynix.com> Co-authored-by: Richard Henderson <richard.henderson@linaro.org> Co-authored-by: Sven Schnelle <svens@stackframe.org> Co-authored-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Co-authored-by: Helge Deller <deller@kernel.org> Co-authored-by: Harsh Prateek Bora <harshpb@linux.ibm.com> Co-authored-by: Benjamin Gray <bgray@linux.ibm.com> Co-authored-by: Nicholas Piggin <npiggin@gmail.com> Co-authored-by: Avihai Horon <avihaih@nvidia.com> Co-authored-by: Michael Tokarev <mjt@tls.msk.ru> Co-authored-by: Joonas Kankaala <joonas.a.kankaala@gmail.com> Co-authored-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org> Co-authored-by: Stefan Weil <sw@weilnetz.de> Co-authored-by: Dayu Liu <liu.dayu@zte.com.cn> Co-authored-by: Zhao Liu <zhao1.liu@intel.com> Co-authored-by: Glenn Miles <milesg@linux.vnet.ibm.com> Co-authored-by: Artem Chernyshev <artem.chernyshev@red-soft.ru> Co-authored-by: Yajun Wu <yajunw@nvidia.com> Co-authored-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Co-authored-by: Pierre-Clément Tosi <ptosi@google.com> Co-authored-by: Wei Wang <wei.w.wang@intel.com> Co-authored-by: Martin Hundebøll <martin@geanix.com> Co-authored-by: Michael S. Tsirkin <mst@redhat.com> Co-authored-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> Co-authored-by: Wafer <wafer@jaguarmicro.com> Co-authored-by: lyx634449800 <yuxue.liu@jaguarmicro.com> Co-authored-by: Gerd Hoffmann <kraxel@redhat.com> Co-authored-by: Nguyen Dinh Phi <phind.uet@gmail.com> Co-authored-by: Zack Buhman <zack@buhman.org> Co-authored-by: Keith Packard <keithp@keithp.com> Co-authored-by: Yuquan Wang <wangyuquan1236@phytium.com.cn> Co-authored-by: Matheus Tavares Bernardino <quic_mathbern@quicinc.com> Co-authored-by: Cindy Lu <lulu@redhat.com>
131 lines
5.9 KiB
Plaintext
131 lines
5.9 KiB
Plaintext
Copyright (c) 2014-2017 Red Hat Inc.
|
|
|
|
This work is licensed under the terms of the GNU GPL, version 2 or later. See
|
|
the COPYING file in the top-level directory.
|
|
|
|
|
|
This document explains the IOThread feature and how to write code that runs
|
|
outside the BQL.
|
|
|
|
The main loop and IOThreads
|
|
---------------------------
|
|
QEMU is an event-driven program that can do several things at once using an
|
|
event loop. The VNC server and the QMP monitor are both processed from the
|
|
same event loop, which monitors their file descriptors until they become
|
|
readable and then invokes a callback.
|
|
|
|
The default event loop is called the main loop (see main-loop.c). It is
|
|
possible to create additional event loop threads using -object
|
|
iothread,id=my-iothread.
|
|
|
|
Side note: The main loop and IOThread are both event loops but their code is
|
|
not shared completely. Sometimes it is useful to remember that although they
|
|
are conceptually similar they are currently not interchangeable.
|
|
|
|
Why IOThreads are useful
|
|
------------------------
|
|
IOThreads allow the user to control the placement of work. The main loop is a
|
|
scalability bottleneck on hosts with many CPUs. Work can be spread across
|
|
several IOThreads instead of just one main loop. When set up correctly this
|
|
can improve I/O latency and reduce jitter seen by the guest.
|
|
|
|
The main loop is also deeply associated with the BQL, which is a
|
|
scalability bottleneck in itself. vCPU threads and the main loop use the BQL
|
|
to serialize execution of QEMU code. This mutex is necessary because a lot of
|
|
QEMU's code historically was not thread-safe.
|
|
|
|
The fact that all I/O processing is done in a single main loop and that the
|
|
BQL is contended by all vCPU threads and the main loop explain
|
|
why it is desirable to place work into IOThreads.
|
|
|
|
The experimental virtio-blk data-plane implementation has been benchmarked and
|
|
shows these effects:
|
|
ftp://public.dhe.ibm.com/linux/pdfs/KVM_Virtualized_IO_Performance_Paper.pdf
|
|
|
|
How to program for IOThreads
|
|
----------------------------
|
|
The main difference between legacy code and new code that can run in an
|
|
IOThread is dealing explicitly with the event loop object, AioContext
|
|
(see include/block/aio.h). Code that only works in the main loop
|
|
implicitly uses the main loop's AioContext. Code that supports running
|
|
in IOThreads must be aware of its AioContext.
|
|
|
|
AioContext supports the following services:
|
|
* File descriptor monitoring (read/write/error on POSIX hosts)
|
|
* Event notifiers (inter-thread signalling)
|
|
* Timers
|
|
* Bottom Halves (BH) deferred callbacks
|
|
|
|
There are several old APIs that use the main loop AioContext:
|
|
* LEGACY qemu_aio_set_fd_handler() - monitor a file descriptor
|
|
* LEGACY qemu_aio_set_event_notifier() - monitor an event notifier
|
|
* LEGACY timer_new_ms() - create a timer
|
|
* LEGACY qemu_bh_new() - create a BH
|
|
* LEGACY qemu_bh_new_guarded() - create a BH with a device re-entrancy guard
|
|
* LEGACY qemu_aio_wait() - run an event loop iteration
|
|
|
|
Since they implicitly work on the main loop they cannot be used in code that
|
|
runs in an IOThread. They might cause a crash or deadlock if called from an
|
|
IOThread since the BQL is not held.
|
|
|
|
Instead, use the AioContext functions directly (see include/block/aio.h):
|
|
* aio_set_fd_handler() - monitor a file descriptor
|
|
* aio_set_event_notifier() - monitor an event notifier
|
|
* aio_timer_new() - create a timer
|
|
* aio_bh_new() - create a BH
|
|
* aio_bh_new_guarded() - create a BH with a device re-entrancy guard
|
|
* aio_poll() - run an event loop iteration
|
|
|
|
The qemu_bh_new_guarded/aio_bh_new_guarded APIs accept a "MemReentrancyGuard"
|
|
argument, which is used to check for and prevent re-entrancy problems. For
|
|
BHs associated with devices, the reentrancy-guard is contained in the
|
|
corresponding DeviceState and named "mem_reentrancy_guard".
|
|
|
|
The AioContext can be obtained from the IOThread using
|
|
iothread_get_aio_context() or for the main loop using qemu_get_aio_context().
|
|
Code that takes an AioContext argument works both in IOThreads or the main
|
|
loop, depending on which AioContext instance the caller passes in.
|
|
|
|
How to synchronize with an IOThread
|
|
-----------------------------------
|
|
Variables that can be accessed by multiple threads require some form of
|
|
synchronization such as qemu_mutex_lock(), rcu_read_lock(), etc.
|
|
|
|
AioContext functions like aio_set_fd_handler(), aio_set_event_notifier(),
|
|
aio_bh_new(), and aio_timer_new() are thread-safe. They can be used to trigger
|
|
activity in an IOThread.
|
|
|
|
Side note: the best way to schedule a function call across threads is to call
|
|
aio_bh_schedule_oneshot().
|
|
|
|
The main loop thread can wait synchronously for a condition using
|
|
AIO_WAIT_WHILE().
|
|
|
|
AioContext and the block layer
|
|
------------------------------
|
|
The AioContext originates from the QEMU block layer, even though nowadays
|
|
AioContext is a generic event loop that can be used by any QEMU subsystem.
|
|
|
|
The block layer has support for AioContext integrated. Each BlockDriverState
|
|
is associated with an AioContext using bdrv_try_change_aio_context() and
|
|
bdrv_get_aio_context(). This allows block layer code to process I/O inside the
|
|
right AioContext. Other subsystems may wish to follow a similar approach.
|
|
|
|
Block layer code must therefore expect to run in an IOThread and avoid using
|
|
old APIs that implicitly use the main loop. See the "How to program for
|
|
IOThreads" above for information on how to do that.
|
|
|
|
Code running in the monitor typically needs to ensure that past
|
|
requests from the guest are completed. When a block device is running
|
|
in an IOThread, the IOThread can also process requests from the guest
|
|
(via ioeventfd). To achieve both objects, wrap the code between
|
|
bdrv_drained_begin() and bdrv_drained_end(), thus creating a "drained
|
|
section".
|
|
|
|
Long-running jobs (usually in the form of coroutines) are often scheduled in
|
|
the BlockDriverState's AioContext. The functions
|
|
bdrv_add/remove_aio_context_notifier, or alternatively
|
|
blk_add/remove_aio_context_notifier if you use BlockBackends, can be used to
|
|
get a notification whenever bdrv_try_change_aio_context() moves a
|
|
BlockDriverState to a different AioContext.
|