FRET-qemu/docs/devel/vfio-iommufd.rst
Romain Malmain 7c3c7877d8 Update to QEMU 9.0.0 (#67)
* Update to QEMU v9.0.0

---------

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
Signed-off-by: Ido Plat <ido.plat@ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Gregory Price <gregory.price@memverge.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Lorenz Brun <lorenz@brun.one>
Signed-off-by: Yao Xingtao <yaoxt.fnst@fujitsu.com>
Signed-off-by: Arnaud Minier <arnaud.minier@telecom-paris.fr>
Signed-off-by: Inès Varhol <ines.varhol@telecom-paris.fr>
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Joonas Kankaala <joonas.a.kankaala@gmail.com>
Signed-off-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Glenn Miles <milesg@linux.ibm.com>
Signed-off-by: Oleg Sviridov <oleg.sviridov@red-soft.ru>
Signed-off-by: Artem Chernyshev <artem.chernyshev@red-soft.ru>
Signed-off-by: Yajun Wu <yajunw@nvidia.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Pierre-Clément Tosi <ptosi@google.com>
Signed-off-by: Lei Wang <lei4.wang@intel.com>
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Martin Hundebøll <martin@geanix.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Signed-off-by: Wafer <wafer@jaguarmicro.com>
Signed-off-by: Yuxue Liu <yuxue.liu@jaguarmicro.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Nguyen Dinh Phi <phind.uet@gmail.com>
Signed-off-by: Zack Buhman <zack@buhman.org>
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Yuquan Wang wangyuquan1236@phytium.com.cn
Signed-off-by: Matheus Tavares Bernardino <quic_mathbern@quicinc.com>
Signed-off-by: Cindy Lu <lulu@redhat.com>
Co-authored-by: Peter Maydell <peter.maydell@linaro.org>
Co-authored-by: Fabiano Rosas <farosas@suse.de>
Co-authored-by: Peter Xu <peterx@redhat.com>
Co-authored-by: Thomas Huth <thuth@redhat.com>
Co-authored-by: Cédric Le Goater <clg@redhat.com>
Co-authored-by: Zheyu Ma <zheyuma97@gmail.com>
Co-authored-by: Ido Plat <ido.plat@ibm.com>
Co-authored-by: Ilya Leoshkevich <iii@linux.ibm.com>
Co-authored-by: Markus Armbruster <armbru@redhat.com>
Co-authored-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Co-authored-by: Paolo Bonzini <pbonzini@redhat.com>
Co-authored-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Co-authored-by: David Hildenbrand <david@redhat.com>
Co-authored-by: Kevin Wolf <kwolf@redhat.com>
Co-authored-by: Stefan Reiter <s.reiter@proxmox.com>
Co-authored-by: Fiona Ebner <f.ebner@proxmox.com>
Co-authored-by: Gregory Price <gregory.price@memverge.com>
Co-authored-by: Lorenz Brun <lorenz@brun.one>
Co-authored-by: Yao Xingtao <yaoxt.fnst@fujitsu.com>
Co-authored-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Co-authored-by: Arnaud Minier <arnaud.minier@telecom-paris.fr>
Co-authored-by: BALATON Zoltan <balaton@eik.bme.hu>
Co-authored-by: Igor Mammedov <imammedo@redhat.com>
Co-authored-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Co-authored-by: Richard Henderson <richard.henderson@linaro.org>
Co-authored-by: Sven Schnelle <svens@stackframe.org>
Co-authored-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Co-authored-by: Helge Deller <deller@kernel.org>
Co-authored-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Co-authored-by: Benjamin Gray <bgray@linux.ibm.com>
Co-authored-by: Nicholas Piggin <npiggin@gmail.com>
Co-authored-by: Avihai Horon <avihaih@nvidia.com>
Co-authored-by: Michael Tokarev <mjt@tls.msk.ru>
Co-authored-by: Joonas Kankaala <joonas.a.kankaala@gmail.com>
Co-authored-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
Co-authored-by: Stefan Weil <sw@weilnetz.de>
Co-authored-by: Dayu Liu <liu.dayu@zte.com.cn>
Co-authored-by: Zhao Liu <zhao1.liu@intel.com>
Co-authored-by: Glenn Miles <milesg@linux.vnet.ibm.com>
Co-authored-by: Artem Chernyshev <artem.chernyshev@red-soft.ru>
Co-authored-by: Yajun Wu <yajunw@nvidia.com>
Co-authored-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Co-authored-by: Pierre-Clément Tosi <ptosi@google.com>
Co-authored-by: Wei Wang <wei.w.wang@intel.com>
Co-authored-by: Martin Hundebøll <martin@geanix.com>
Co-authored-by: Michael S. Tsirkin <mst@redhat.com>
Co-authored-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Co-authored-by: Wafer <wafer@jaguarmicro.com>
Co-authored-by: lyx634449800 <yuxue.liu@jaguarmicro.com>
Co-authored-by: Gerd Hoffmann <kraxel@redhat.com>
Co-authored-by: Nguyen Dinh Phi <phind.uet@gmail.com>
Co-authored-by: Zack Buhman <zack@buhman.org>
Co-authored-by: Keith Packard <keithp@keithp.com>
Co-authored-by: Yuquan Wang <wangyuquan1236@phytium.com.cn>
Co-authored-by: Matheus Tavares Bernardino <quic_mathbern@quicinc.com>
Co-authored-by: Cindy Lu <lulu@redhat.com>
2024-05-01 16:10:20 +02:00

167 lines
6.2 KiB
ReStructuredText

===============================
IOMMUFD BACKEND usage with VFIO
===============================
(Same meaning for backend/container/BE)
With the introduction of iommufd, the Linux kernel provides a generic
interface for user space drivers to propagate their DMA mappings to kernel
for assigned devices. While the legacy kernel interface is group-centric,
the new iommufd interface is device-centric, relying on device fd and iommufd.
To support both interfaces in the QEMU VFIO device, introduce a base container
to abstract the common part of VFIO legacy and iommufd container. So that the
generic VFIO code can use either container.
The base container implements generic functions such as memory_listener and
address space management whereas the derived container implements callbacks
specific to either legacy or iommufd. Each container has its own way to setup
secure context and dma management interface. The below diagram shows how it
looks like with both containers.
::
VFIO AddressSpace/Memory
+-------+ +----------+ +-----+ +-----+
| pci | | platform | | ap | | ccw |
+---+---+ +----+-----+ +--+--+ +--+--+ +----------------------+
| | | | | AddressSpace |
| | | | +------------+---------+
+---V-----------V-----------V--------V----+ /
| VFIOAddressSpace | <------------+
| | | MemoryListener
| VFIOContainerBase list |
+-------+----------------------------+----+
| |
| |
+-------V------+ +--------V----------+
| iommufd | | vfio legacy |
| container | | container |
+-------+------+ +--------+----------+
| |
| /dev/iommu | /dev/vfio/vfio
| /dev/vfio/devices/vfioX | /dev/vfio/$group_id
Userspace | |
============+============================+===========================
Kernel | device fd |
+---------------+ | group/container fd
| (BIND_IOMMUFD | | (SET_CONTAINER/SET_IOMMU)
| ATTACH_IOAS) | | device fd
| | |
| +-------V------------V-----------------+
iommufd | | vfio |
(map/unmap | +---------+--------------------+-------+
ioas_copy) | | | map/unmap
| | |
+------V------+ +-----V------+ +------V--------+
| iommfd core | | device | | vfio iommu |
+-------------+ +------------+ +---------------+
* Secure Context setup
- iommufd BE: uses device fd and iommufd to setup secure context
(bind_iommufd, attach_ioas)
- vfio legacy BE: uses group fd and container fd to setup secure context
(set_container, set_iommu)
* Device access
- iommufd BE: device fd is opened through ``/dev/vfio/devices/vfioX``
- vfio legacy BE: device fd is retrieved from group fd ioctl
* DMA Mapping flow
1. VFIOAddressSpace receives MemoryRegion add/del via MemoryListener
2. VFIO populates DMA map/unmap via the container BEs
* iommufd BE: uses iommufd
* vfio legacy BE: uses container fd
Example configuration
=====================
Step 1: configure the host device
---------------------------------
It's exactly same as the VFIO device with legacy VFIO container.
Step 2: configure QEMU
----------------------
Interactions with the ``/dev/iommu`` are abstracted by a new iommufd
object (compiled in with the ``CONFIG_IOMMUFD`` option).
Any QEMU device (e.g. VFIO device) wishing to use ``/dev/iommu`` must
be linked with an iommufd object. It gets a new optional property
named iommufd which allows to pass an iommufd object. Take ``vfio-pci``
device for example:
.. code-block:: bash
-object iommufd,id=iommufd0
-device vfio-pci,host=0000:02:00.0,iommufd=iommufd0
Note the ``/dev/iommu`` and VFIO cdev can be externally opened by a
management layer. In such a case the fd is passed, the fd supports a
string naming the fd or a number, for example:
.. code-block:: bash
-object iommufd,id=iommufd0,fd=22
-device vfio-pci,iommufd=iommufd0,fd=23
If the ``fd`` property is not passed, the fd is opened by QEMU.
If no ``iommufd`` object is passed to the ``vfio-pci`` device, iommufd
is not used and the user gets the behavior based on the legacy VFIO
container:
.. code-block:: bash
-device vfio-pci,host=0000:02:00.0
Supported platform
==================
Supports x86, ARM and s390x currently.
Caveats
=======
Dirty page sync
---------------
Dirty page sync with iommufd backend is unsupported yet, live migration is
disabled by default. But it can be force enabled like below, low efficient
though.
.. code-block:: bash
-object iommufd,id=iommufd0
-device vfio-pci,host=0000:02:00.0,iommufd=iommufd0,enable-migration=on
P2P DMA
-------
PCI p2p DMA is unsupported as IOMMUFD doesn't support mapping hardware PCI
BAR region yet. Below warning shows for assigned PCI device, it's not a bug.
.. code-block:: none
qemu-system-x86_64: warning: IOMMU_IOAS_MAP failed: Bad address, PCI BAR?
qemu-system-x86_64: vfio_container_dma_map(0x560cb6cb1620, 0xe000000021000, 0x3000, 0x7f32ed55c000) = -14 (Bad address)
FD passing with mdev
--------------------
``vfio-pci`` device checks sysfsdev property to decide if backend is a mdev.
If FD passing is used, there is no way to know that and the mdev is treated
like a real PCI device. There is an error as below if user wants to enable
RAM discarding for mdev.
.. code-block:: none
qemu-system-x86_64: -device vfio-pci,iommufd=iommufd0,x-balloon-allowed=on,fd=9: vfio VFIO_FD9: x-balloon-allowed only potentially compatible with mdev devices
``vfio-ap`` and ``vfio-ccw`` devices don't have same issue as their backend
devices are always mdev and RAM discarding is force enabled.