 f21e95ee97
			
		
	
	
		f21e95ee97
		
	
	
	
	
		
			
			The current model of memory mapping at the back-end works fine where a standard call to mmap() (for the respective file descriptor) is enough before the front-end can start accessing the guest memory. There are other complex cases though where the back-end needs more information and simple mmap() isn't enough. For example Xen, a type-1 hypervisor, currently supports memory mapping via two different methods, foreign-mapping (via /dev/privcmd) and grant-dev (via /dev/gntdev). In both these cases, the back-end needs to call mmap() and ioctl(), with extra information like the Xen domain-id of the guest whose memory we are trying to map. Add a new protocol feature, 'VHOST_USER_PROTOCOL_F_XEN_MMAP', which lets the back-end know about the additional memory mapping requirements. When this feature is negotiated, the front-end will send the additional information within the memory regions themselves. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Message-Id: <6d0bd7f0e1aeec3ddb603ae4ff334c75c7d0d7b3.1678351495.git.viresh.kumar@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
		
			
				
	
	
		
			1678 lines
		
	
	
		
			58 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			1678 lines
		
	
	
		
			58 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
| .. _vhost_user_proto:
 | |
| 
 | |
| ===================
 | |
| Vhost-user Protocol
 | |
| ===================
 | |
| 
 | |
| ..
 | |
|   Copyright 2014 Virtual Open Systems Sarl.
 | |
|   Copyright 2019 Intel Corporation
 | |
|   Licence: This work is licensed under the terms of the GNU GPL,
 | |
|            version 2 or later. See the COPYING file in the top-level
 | |
|            directory.
 | |
| 
 | |
| .. contents:: Table of Contents
 | |
| 
 | |
| Introduction
 | |
| ============
 | |
| 
 | |
| This protocol is aiming to complement the ``ioctl`` interface used to
 | |
| control the vhost implementation in the Linux kernel. It implements
 | |
| the control plane needed to establish virtqueue sharing with a user
 | |
| space process on the same host. It uses communication over a Unix
 | |
| domain socket to share file descriptors in the ancillary data of the
 | |
| message.
 | |
| 
 | |
| The protocol defines 2 sides of the communication, *front-end* and
 | |
| *back-end*. The *front-end* is the application that shares its virtqueues, in
 | |
| our case QEMU. The *back-end* is the consumer of the virtqueues.
 | |
| 
 | |
| In the current implementation QEMU is the *front-end*, and the *back-end*
 | |
| is the external process consuming the virtio queues, for example a
 | |
| software Ethernet switch running in user space, such as Snabbswitch,
 | |
| or a block device back-end processing read & write to a virtual
 | |
| disk. In order to facilitate interoperability between various back-end
 | |
| implementations, it is recommended to follow the :ref:`Backend program
 | |
| conventions <backend_conventions>`.
 | |
| 
 | |
| The *front-end* and *back-end* can be either a client (i.e. connecting) or
 | |
| server (listening) in the socket communication.
 | |
| 
 | |
| Support for platforms other than Linux
 | |
| --------------------------------------
 | |
| 
 | |
| While vhost-user was initially developed targeting Linux, nowadays it
 | |
| is supported on any platform that provides the following features:
 | |
| 
 | |
| - A way for requesting shared memory represented by a file descriptor
 | |
|   so it can be passed over a UNIX domain socket and then mapped by the
 | |
|   other process.
 | |
| 
 | |
| - AF_UNIX sockets with SCM_RIGHTS, so QEMU and the other process can
 | |
|   exchange messages through it, including ancillary data when needed.
 | |
| 
 | |
| - Either eventfd or pipe/pipe2. On platforms where eventfd is not
 | |
|   available, QEMU will automatically fall back to pipe2 or, as a last
 | |
|   resort, pipe. Each file descriptor will be used for receiving or
 | |
|   sending events by reading or writing (respectively) an 8-byte value
 | |
|   to the corresponding it. The 8-value itself has no meaning and
 | |
|   should not be interpreted.
 | |
| 
 | |
| Message Specification
 | |
| =====================
 | |
| 
 | |
| .. Note:: All numbers are in the machine native byte order.
 | |
| 
 | |
| A vhost-user message consists of 3 header fields and a payload.
 | |
| 
 | |
| +---------+-------+------+---------+
 | |
| | request | flags | size | payload |
 | |
| +---------+-------+------+---------+
 | |
| 
 | |
| Header
 | |
| ------
 | |
| 
 | |
| :request: 32-bit type of the request
 | |
| 
 | |
| :flags: 32-bit bit field
 | |
| 
 | |
| - Lower 2 bits are the version (currently 0x01)
 | |
| - Bit 2 is the reply flag - needs to be sent on each reply from the back-end
 | |
| - Bit 3 is the need_reply flag - see :ref:`REPLY_ACK <reply_ack>` for
 | |
|   details.
 | |
| 
 | |
| :size: 32-bit size of the payload
 | |
| 
 | |
| Payload
 | |
| -------
 | |
| 
 | |
| Depending on the request type, **payload** can be:
 | |
| 
 | |
| A single 64-bit integer
 | |
| ^^^^^^^^^^^^^^^^^^^^^^^
 | |
| 
 | |
| +-----+
 | |
| | u64 |
 | |
| +-----+
 | |
| 
 | |
| :u64: a 64-bit unsigned integer
 | |
| 
 | |
| A vring state description
 | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^
 | |
| 
 | |
| +-------+-----+
 | |
| | index | num |
 | |
| +-------+-----+
 | |
| 
 | |
| :index: a 32-bit index
 | |
| 
 | |
| :num: a 32-bit number
 | |
| 
 | |
| A vring address description
 | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | |
| 
 | |
| +-------+-------+------+------------+------+-----------+-----+
 | |
| | index | flags | size | descriptor | used | available | log |
 | |
| +-------+-------+------+------------+------+-----------+-----+
 | |
| 
 | |
| :index: a 32-bit vring index
 | |
| 
 | |
| :flags: a 32-bit vring flags
 | |
| 
 | |
| :descriptor: a 64-bit ring address of the vring descriptor table
 | |
| 
 | |
| :used: a 64-bit ring address of the vring used ring
 | |
| 
 | |
| :available: a 64-bit ring address of the vring available ring
 | |
| 
 | |
| :log: a 64-bit guest address for logging
 | |
| 
 | |
| Note that a ring address is an IOVA if ``VIRTIO_F_IOMMU_PLATFORM`` has
 | |
| been negotiated. Otherwise it is a user address.
 | |
| 
 | |
| Memory region description
 | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^
 | |
| 
 | |
| +---------------+------+--------------+-------------+
 | |
| | guest address | size | user address | mmap offset |
 | |
| +---------------+------+--------------+-------------+
 | |
| 
 | |
| :guest address: a 64-bit guest address of the region
 | |
| 
 | |
| :size: a 64-bit size
 | |
| 
 | |
| :user address: a 64-bit user address
 | |
| 
 | |
| :mmap offset: 64-bit offset where region starts in the mapped memory
 | |
| 
 | |
| When the ``VHOST_USER_PROTOCOL_F_XEN_MMAP`` protocol feature has been
 | |
| successfully negotiated, the memory region description contains two extra
 | |
| fields at the end.
 | |
| 
 | |
| +---------------+------+--------------+-------------+----------------+-------+
 | |
| | guest address | size | user address | mmap offset | xen mmap flags | domid |
 | |
| +---------------+------+--------------+-------------+----------------+-------+
 | |
| 
 | |
| :xen mmap flags: 32-bit bit field
 | |
| 
 | |
| - Bit 0 is set for Xen foreign memory mapping.
 | |
| - Bit 1 is set for Xen grant memory mapping.
 | |
| - Bit 8 is set if the memory region can not be mapped in advance, and memory
 | |
|   areas within this region must be mapped / unmapped only when required by the
 | |
|   back-end. The back-end shouldn't try to map the entire region at once, as the
 | |
|   front-end may not allow it. The back-end should rather map only the required
 | |
|   amount of memory at once and unmap it after it is used.
 | |
| 
 | |
| :domid: a 32-bit Xen hypervisor specific domain id.
 | |
| 
 | |
| Single memory region description
 | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | |
| 
 | |
| +---------+--------+
 | |
| | padding | region |
 | |
| +---------+--------+
 | |
| 
 | |
| :padding: 64-bit
 | |
| 
 | |
| A region is represented by Memory region description.
 | |
| 
 | |
| Multiple Memory regions description
 | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 | |
| 
 | |
| +-------------+---------+---------+-----+---------+
 | |
| | num regions | padding | region0 | ... | region7 |
 | |
| +-------------+---------+---------+-----+---------+
 | |
| 
 | |
| :num regions: a 32-bit number of regions
 | |
| 
 | |
| :padding: 32-bit
 | |
| 
 | |
| A region is represented by Memory region description.
 | |
| 
 | |
| Log description
 | |
| ^^^^^^^^^^^^^^^
 | |
| 
 | |
| +----------+------------+
 | |
| | log size | log offset |
 | |
| +----------+------------+
 | |
| 
 | |
| :log size: size of area used for logging
 | |
| 
 | |
| :log offset: offset from start of supplied file descriptor where
 | |
|              logging starts (i.e. where guest address 0 would be
 | |
|              logged)
 | |
| 
 | |
| An IOTLB message
 | |
| ^^^^^^^^^^^^^^^^
 | |
| 
 | |
| +------+------+--------------+-------------------+------+
 | |
| | iova | size | user address | permissions flags | type |
 | |
| +------+------+--------------+-------------------+------+
 | |
| 
 | |
| :iova: a 64-bit I/O virtual address programmed by the guest
 | |
| 
 | |
| :size: a 64-bit size
 | |
| 
 | |
| :user address: a 64-bit user address
 | |
| 
 | |
| :permissions flags: an 8-bit value:
 | |
|   - 0: No access
 | |
|   - 1: Read access
 | |
|   - 2: Write access
 | |
|   - 3: Read/Write access
 | |
| 
 | |
| :type: an 8-bit IOTLB message type:
 | |
|   - 1: IOTLB miss
 | |
|   - 2: IOTLB update
 | |
|   - 3: IOTLB invalidate
 | |
|   - 4: IOTLB access fail
 | |
| 
 | |
| Virtio device config space
 | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^
 | |
| 
 | |
| +--------+------+-------+---------+
 | |
| | offset | size | flags | payload |
 | |
| +--------+------+-------+---------+
 | |
| 
 | |
| :offset: a 32-bit offset of virtio device's configuration space
 | |
| 
 | |
| :size: a 32-bit configuration space access size in bytes
 | |
| 
 | |
| :flags: a 32-bit value:
 | |
|   - 0: Vhost front-end messages used for writable fields
 | |
|   - 1: Vhost front-end messages used for live migration
 | |
| 
 | |
| :payload: Size bytes array holding the contents of the virtio
 | |
|           device's configuration space
 | |
| 
 | |
| Vring area description
 | |
| ^^^^^^^^^^^^^^^^^^^^^^
 | |
| 
 | |
| +-----+------+--------+
 | |
| | u64 | size | offset |
 | |
| +-----+------+--------+
 | |
| 
 | |
| :u64: a 64-bit integer contains vring index and flags
 | |
| 
 | |
| :size: a 64-bit size of this area
 | |
| 
 | |
| :offset: a 64-bit offset of this area from the start of the
 | |
|          supplied file descriptor
 | |
| 
 | |
| Inflight description
 | |
| ^^^^^^^^^^^^^^^^^^^^
 | |
| 
 | |
| +-----------+-------------+------------+------------+
 | |
| | mmap size | mmap offset | num queues | queue size |
 | |
| +-----------+-------------+------------+------------+
 | |
| 
 | |
| :mmap size: a 64-bit size of area to track inflight I/O
 | |
| 
 | |
| :mmap offset: a 64-bit offset of this area from the start
 | |
|               of the supplied file descriptor
 | |
| 
 | |
| :num queues: a 16-bit number of virtqueues
 | |
| 
 | |
| :queue size: a 16-bit size of virtqueues
 | |
| 
 | |
| C structure
 | |
| -----------
 | |
| 
 | |
| In QEMU the vhost-user message is implemented with the following struct:
 | |
| 
 | |
| .. code:: c
 | |
| 
 | |
|   typedef struct VhostUserMsg {
 | |
|       VhostUserRequest request;
 | |
|       uint32_t flags;
 | |
|       uint32_t size;
 | |
|       union {
 | |
|           uint64_t u64;
 | |
|           struct vhost_vring_state state;
 | |
|           struct vhost_vring_addr addr;
 | |
|           VhostUserMemory memory;
 | |
|           VhostUserLog log;
 | |
|           struct vhost_iotlb_msg iotlb;
 | |
|           VhostUserConfig config;
 | |
|           VhostUserVringArea area;
 | |
|           VhostUserInflight inflight;
 | |
|       };
 | |
|   } QEMU_PACKED VhostUserMsg;
 | |
| 
 | |
| Communication
 | |
| =============
 | |
| 
 | |
| The protocol for vhost-user is based on the existing implementation of
 | |
| vhost for the Linux Kernel. Most messages that can be sent via the
 | |
| Unix domain socket implementing vhost-user have an equivalent ioctl to
 | |
| the kernel implementation.
 | |
| 
 | |
| The communication consists of the *front-end* sending message requests and
 | |
| the *back-end* sending message replies. Most of the requests don't require
 | |
| replies. Here is a list of the ones that do:
 | |
| 
 | |
| * ``VHOST_USER_GET_FEATURES``
 | |
| * ``VHOST_USER_GET_PROTOCOL_FEATURES``
 | |
| * ``VHOST_USER_GET_VRING_BASE``
 | |
| * ``VHOST_USER_SET_LOG_BASE`` (if ``VHOST_USER_PROTOCOL_F_LOG_SHMFD``)
 | |
| * ``VHOST_USER_GET_INFLIGHT_FD`` (if ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``)
 | |
| 
 | |
| .. seealso::
 | |
| 
 | |
|    :ref:`REPLY_ACK <reply_ack>`
 | |
|        The section on ``REPLY_ACK`` protocol extension.
 | |
| 
 | |
| There are several messages that the front-end sends with file descriptors passed
 | |
| in the ancillary data:
 | |
| 
 | |
| * ``VHOST_USER_ADD_MEM_REG``
 | |
| * ``VHOST_USER_SET_MEM_TABLE``
 | |
| * ``VHOST_USER_SET_LOG_BASE`` (if ``VHOST_USER_PROTOCOL_F_LOG_SHMFD``)
 | |
| * ``VHOST_USER_SET_LOG_FD``
 | |
| * ``VHOST_USER_SET_VRING_KICK``
 | |
| * ``VHOST_USER_SET_VRING_CALL``
 | |
| * ``VHOST_USER_SET_VRING_ERR``
 | |
| * ``VHOST_USER_SET_BACKEND_REQ_FD`` (previous name ``VHOST_USER_SET_SLAVE_REQ_FD``)
 | |
| * ``VHOST_USER_SET_INFLIGHT_FD`` (if ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``)
 | |
| 
 | |
| If *front-end* is unable to send the full message or receives a wrong
 | |
| reply it will close the connection. An optional reconnection mechanism
 | |
| can be implemented.
 | |
| 
 | |
| If *back-end* detects some error such as incompatible features, it may also
 | |
| close the connection. This should only happen in exceptional circumstances.
 | |
| 
 | |
| Any protocol extensions are gated by protocol feature bits, which
 | |
| allows full backwards compatibility on both front-end and back-end.  As
 | |
| older back-ends don't support negotiating protocol features, a feature
 | |
| bit was dedicated for this purpose::
 | |
| 
 | |
|   #define VHOST_USER_F_PROTOCOL_FEATURES 30
 | |
| 
 | |
| Note that VHOST_USER_F_PROTOCOL_FEATURES is the UNUSED (30) feature
 | |
| bit defined in `VIRTIO 1.1 6.3 Legacy Interface: Reserved Feature Bits
 | |
| <https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-4130003>`_.
 | |
| VIRTIO devices do not advertise this feature bit and therefore VIRTIO
 | |
| drivers cannot negotiate it.
 | |
| 
 | |
| This reserved feature bit was reused by the vhost-user protocol to add
 | |
| vhost-user protocol feature negotiation in a backwards compatible
 | |
| fashion. Old vhost-user front-end and back-end implementations continue to
 | |
| work even though they are not aware of vhost-user protocol feature
 | |
| negotiation.
 | |
| 
 | |
| Ring states
 | |
| -----------
 | |
| 
 | |
| Rings can be in one of three states:
 | |
| 
 | |
| * stopped: the back-end must not process the ring at all.
 | |
| 
 | |
| * started but disabled: the back-end must process the ring without
 | |
|   causing any side effects.  For example, for a networking device,
 | |
|   in the disabled state the back-end must not supply any new RX packets,
 | |
|   but must process and discard any TX packets.
 | |
| 
 | |
| * started and enabled.
 | |
| 
 | |
| Each ring is initialized in a stopped state.  The back-end must start
 | |
| ring upon receiving a kick (that is, detecting that file descriptor is
 | |
| readable) on the descriptor specified by ``VHOST_USER_SET_VRING_KICK``
 | |
| or receiving the in-band message ``VHOST_USER_VRING_KICK`` if negotiated,
 | |
| and stop ring upon receiving ``VHOST_USER_GET_VRING_BASE``.
 | |
| 
 | |
| Rings can be enabled or disabled by ``VHOST_USER_SET_VRING_ENABLE``.
 | |
| 
 | |
| If ``VHOST_USER_F_PROTOCOL_FEATURES`` has not been negotiated, the
 | |
| ring starts directly in the enabled state.
 | |
| 
 | |
| If ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated, the ring is
 | |
| initialized in a disabled state and is enabled by
 | |
| ``VHOST_USER_SET_VRING_ENABLE`` with parameter 1.
 | |
| 
 | |
| While processing the rings (whether they are enabled or not), the back-end
 | |
| must support changing some configuration aspects on the fly.
 | |
| 
 | |
| Multiple queue support
 | |
| ----------------------
 | |
| 
 | |
| Many devices have a fixed number of virtqueues.  In this case the front-end
 | |
| already knows the number of available virtqueues without communicating with the
 | |
| back-end.
 | |
| 
 | |
| Some devices do not have a fixed number of virtqueues.  Instead the maximum
 | |
| number of virtqueues is chosen by the back-end.  The number can depend on host
 | |
| resource availability or back-end implementation details.  Such devices are called
 | |
| multiple queue devices.
 | |
| 
 | |
| Multiple queue support allows the back-end to advertise the maximum number of
 | |
| queues.  This is treated as a protocol extension, hence the back-end has to
 | |
| implement protocol features first. The multiple queues feature is supported
 | |
| only when the protocol feature ``VHOST_USER_PROTOCOL_F_MQ`` (bit 0) is set.
 | |
| 
 | |
| The max number of queues the back-end supports can be queried with message
 | |
| ``VHOST_USER_GET_QUEUE_NUM``. Front-end should stop when the number of requested
 | |
| queues is bigger than that.
 | |
| 
 | |
| As all queues share one connection, the front-end uses a unique index for each
 | |
| queue in the sent message to identify a specified queue.
 | |
| 
 | |
| The front-end enables queues by sending message ``VHOST_USER_SET_VRING_ENABLE``.
 | |
| vhost-user-net has historically automatically enabled the first queue pair.
 | |
| 
 | |
| Back-ends should always implement the ``VHOST_USER_PROTOCOL_F_MQ`` protocol
 | |
| feature, even for devices with a fixed number of virtqueues, since it is simple
 | |
| to implement and offers a degree of introspection.
 | |
| 
 | |
| Front-ends must not rely on the ``VHOST_USER_PROTOCOL_F_MQ`` protocol feature for
 | |
| devices with a fixed number of virtqueues.  Only true multiqueue devices
 | |
| require this protocol feature.
 | |
| 
 | |
| Migration
 | |
| ---------
 | |
| 
 | |
| During live migration, the front-end may need to track the modifications
 | |
| the back-end makes to the memory mapped regions. The front-end should mark
 | |
| the dirty pages in a log. Once it complies to this logging, it may
 | |
| declare the ``VHOST_F_LOG_ALL`` vhost feature.
 | |
| 
 | |
| To start/stop logging of data/used ring writes, the front-end may send
 | |
| messages ``VHOST_USER_SET_FEATURES`` with ``VHOST_F_LOG_ALL`` and
 | |
| ``VHOST_USER_SET_VRING_ADDR`` with ``VHOST_VRING_F_LOG`` in ring's
 | |
| flags set to 1/0, respectively.
 | |
| 
 | |
| All the modifications to memory pointed by vring "descriptor" should
 | |
| be marked. Modifications to "used" vring should be marked if
 | |
| ``VHOST_VRING_F_LOG`` is part of ring's flags.
 | |
| 
 | |
| Dirty pages are of size::
 | |
| 
 | |
|   #define VHOST_LOG_PAGE 0x1000
 | |
| 
 | |
| The log memory fd is provided in the ancillary data of
 | |
| ``VHOST_USER_SET_LOG_BASE`` message when the back-end has
 | |
| ``VHOST_USER_PROTOCOL_F_LOG_SHMFD`` protocol feature.
 | |
| 
 | |
| The size of the log is supplied as part of ``VhostUserMsg`` which
 | |
| should be large enough to cover all known guest addresses. Log starts
 | |
| at the supplied offset in the supplied file descriptor.  The log
 | |
| covers from address 0 to the maximum of guest regions. In pseudo-code,
 | |
| to mark page at ``addr`` as dirty::
 | |
| 
 | |
|   page = addr / VHOST_LOG_PAGE
 | |
|   log[page / 8] |= 1 << page % 8
 | |
| 
 | |
| Where ``addr`` is the guest physical address.
 | |
| 
 | |
| Use atomic operations, as the log may be concurrently manipulated.
 | |
| 
 | |
| Note that when logging modifications to the used ring (when
 | |
| ``VHOST_VRING_F_LOG`` is set for this ring), ``log_guest_addr`` should
 | |
| be used to calculate the log offset: the write to first byte of the
 | |
| used ring is logged at this offset from log start. Also note that this
 | |
| value might be outside the legal guest physical address range
 | |
| (i.e. does not have to be covered by the ``VhostUserMemory`` table), but
 | |
| the bit offset of the last byte of the ring must fall within the size
 | |
| supplied by ``VhostUserLog``.
 | |
| 
 | |
| ``VHOST_USER_SET_LOG_FD`` is an optional message with an eventfd in
 | |
| ancillary data, it may be used to inform the front-end that the log has
 | |
| been modified.
 | |
| 
 | |
| Once the source has finished migration, rings will be stopped by the
 | |
| source. No further update must be done before rings are restarted.
 | |
| 
 | |
| In postcopy migration the back-end is started before all the memory has
 | |
| been received from the source host, and care must be taken to avoid
 | |
| accessing pages that have yet to be received.  The back-end opens a
 | |
| 'userfault'-fd and registers the memory with it; this fd is then
 | |
| passed back over to the front-end.  The front-end services requests on the
 | |
| userfaultfd for pages that are accessed and when the page is available
 | |
| it performs WAKE ioctl's on the userfaultfd to wake the stalled
 | |
| back-end.  The front-end indicates support for this via the
 | |
| ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` feature.
 | |
| 
 | |
| Memory access
 | |
| -------------
 | |
| 
 | |
| The front-end sends a list of vhost memory regions to the back-end using the
 | |
| ``VHOST_USER_SET_MEM_TABLE`` message.  Each region has two base
 | |
| addresses: a guest address and a user address.
 | |
| 
 | |
| Messages contain guest addresses and/or user addresses to reference locations
 | |
| within the shared memory.  The mapping of these addresses works as follows.
 | |
| 
 | |
| User addresses map to the vhost memory region containing that user address.
 | |
| 
 | |
| When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has not been negotiated:
 | |
| 
 | |
| * Guest addresses map to the vhost memory region containing that guest
 | |
|   address.
 | |
| 
 | |
| When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has been negotiated:
 | |
| 
 | |
| * Guest addresses are also called I/O virtual addresses (IOVAs).  They are
 | |
|   translated to user addresses via the IOTLB.
 | |
| 
 | |
| * The vhost memory region guest address is not used.
 | |
| 
 | |
| IOMMU support
 | |
| -------------
 | |
| 
 | |
| When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has been negotiated, the
 | |
| front-end sends IOTLB entries update & invalidation by sending
 | |
| ``VHOST_USER_IOTLB_MSG`` requests to the back-end with a ``struct
 | |
| vhost_iotlb_msg`` as payload. For update events, the ``iotlb`` payload
 | |
| has to be filled with the update message type (2), the I/O virtual
 | |
| address, the size, the user virtual address, and the permissions
 | |
| flags. Addresses and size must be within vhost memory regions set via
 | |
| the ``VHOST_USER_SET_MEM_TABLE`` request. For invalidation events, the
 | |
| ``iotlb`` payload has to be filled with the invalidation message type
 | |
| (3), the I/O virtual address and the size. On success, the back-end is
 | |
| expected to reply with a zero payload, non-zero otherwise.
 | |
| 
 | |
| The back-end relies on the back-end communication channel (see :ref:`Back-end
 | |
| communication <backend_communication>` section below) to send IOTLB miss
 | |
| and access failure events, by sending ``VHOST_USER_BACKEND_IOTLB_MSG``
 | |
| requests to the front-end with a ``struct vhost_iotlb_msg`` as
 | |
| payload. For miss events, the iotlb payload has to be filled with the
 | |
| miss message type (1), the I/O virtual address and the permissions
 | |
| flags. For access failure event, the iotlb payload has to be filled
 | |
| with the access failure message type (4), the I/O virtual address and
 | |
| the permissions flags.  For synchronization purpose, the back-end may
 | |
| rely on the reply-ack feature, so the front-end may send a reply when
 | |
| operation is completed if the reply-ack feature is negotiated and
 | |
| back-ends requests a reply. For miss events, completed operation means
 | |
| either front-end sent an update message containing the IOTLB entry
 | |
| containing requested address and permission, or front-end sent nothing if
 | |
| the IOTLB miss message is invalid (invalid IOVA or permission).
 | |
| 
 | |
| The front-end isn't expected to take the initiative to send IOTLB update
 | |
| messages, as the back-end sends IOTLB miss messages for the guest virtual
 | |
| memory areas it needs to access.
 | |
| 
 | |
| .. _backend_communication:
 | |
| 
 | |
| Back-end communication
 | |
| ----------------------
 | |
| 
 | |
| An optional communication channel is provided if the back-end declares
 | |
| ``VHOST_USER_PROTOCOL_F_BACKEND_REQ`` protocol feature, to allow the
 | |
| back-end to make requests to the front-end.
 | |
| 
 | |
| The fd is provided via ``VHOST_USER_SET_BACKEND_REQ_FD`` ancillary data.
 | |
| 
 | |
| A back-end may then send ``VHOST_USER_BACKEND_*`` messages to the front-end
 | |
| using this fd communication channel.
 | |
| 
 | |
| If ``VHOST_USER_PROTOCOL_F_BACKEND_SEND_FD`` protocol feature is
 | |
| negotiated, back-end can send file descriptors (at most 8 descriptors in
 | |
| each message) to front-end via ancillary data using this fd communication
 | |
| channel.
 | |
| 
 | |
| Inflight I/O tracking
 | |
| ---------------------
 | |
| 
 | |
| To support reconnecting after restart or crash, back-end may need to
 | |
| resubmit inflight I/Os. If virtqueue is processed in order, we can
 | |
| easily achieve that by getting the inflight descriptors from
 | |
| descriptor table (split virtqueue) or descriptor ring (packed
 | |
| virtqueue). However, it can't work when we process descriptors
 | |
| out-of-order because some entries which store the information of
 | |
| inflight descriptors in available ring (split virtqueue) or descriptor
 | |
| ring (packed virtqueue) might be overridden by new entries. To solve
 | |
| this problem, the back-end need to allocate an extra buffer to store this
 | |
| information of inflight descriptors and share it with front-end for
 | |
| persistent. ``VHOST_USER_GET_INFLIGHT_FD`` and
 | |
| ``VHOST_USER_SET_INFLIGHT_FD`` are used to transfer this buffer
 | |
| between front-end and back-end. And the format of this buffer is described
 | |
| below:
 | |
| 
 | |
| +---------------+---------------+-----+---------------+
 | |
| | queue0 region | queue1 region | ... | queueN region |
 | |
| +---------------+---------------+-----+---------------+
 | |
| 
 | |
| N is the number of available virtqueues. The back-end could get it from num
 | |
| queues field of ``VhostUserInflight``.
 | |
| 
 | |
| For split virtqueue, queue region can be implemented as:
 | |
| 
 | |
| .. code:: c
 | |
| 
 | |
|   typedef struct DescStateSplit {
 | |
|       /* Indicate whether this descriptor is inflight or not.
 | |
|        * Only available for head-descriptor. */
 | |
|       uint8_t inflight;
 | |
| 
 | |
|       /* Padding */
 | |
|       uint8_t padding[5];
 | |
| 
 | |
|       /* Maintain a list for the last batch of used descriptors.
 | |
|        * Only available when batching is used for submitting */
 | |
|       uint16_t next;
 | |
| 
 | |
|       /* Used to preserve the order of fetching available descriptors.
 | |
|        * Only available for head-descriptor. */
 | |
|       uint64_t counter;
 | |
|   } DescStateSplit;
 | |
| 
 | |
|   typedef struct QueueRegionSplit {
 | |
|       /* The feature flags of this region. Now it's initialized to 0. */
 | |
|       uint64_t features;
 | |
| 
 | |
|       /* The version of this region. It's 1 currently.
 | |
|        * Zero value indicates an uninitialized buffer */
 | |
|       uint16_t version;
 | |
| 
 | |
|       /* The size of DescStateSplit array. It's equal to the virtqueue size.
 | |
|        * The back-end could get it from queue size field of VhostUserInflight. */
 | |
|       uint16_t desc_num;
 | |
| 
 | |
|       /* The head of list that track the last batch of used descriptors. */
 | |
|       uint16_t last_batch_head;
 | |
| 
 | |
|       /* Store the idx value of used ring */
 | |
|       uint16_t used_idx;
 | |
| 
 | |
|       /* Used to track the state of each descriptor in descriptor table */
 | |
|       DescStateSplit desc[];
 | |
|   } QueueRegionSplit;
 | |
| 
 | |
| To track inflight I/O, the queue region should be processed as follows:
 | |
| 
 | |
| When receiving available buffers from the driver:
 | |
| 
 | |
| #. Get the next available head-descriptor index from available ring, ``i``
 | |
| 
 | |
| #. Set ``desc[i].counter`` to the value of global counter
 | |
| 
 | |
| #. Increase global counter by 1
 | |
| 
 | |
| #. Set ``desc[i].inflight`` to 1
 | |
| 
 | |
| When supplying used buffers to the driver:
 | |
| 
 | |
| 1. Get corresponding used head-descriptor index, i
 | |
| 
 | |
| 2. Set ``desc[i].next`` to ``last_batch_head``
 | |
| 
 | |
| 3. Set ``last_batch_head`` to ``i``
 | |
| 
 | |
| #. Steps 1,2,3 may be performed repeatedly if batching is possible
 | |
| 
 | |
| #. Increase the ``idx`` value of used ring by the size of the batch
 | |
| 
 | |
| #. Set the ``inflight`` field of each ``DescStateSplit`` entry in the batch to 0
 | |
| 
 | |
| #. Set ``used_idx`` to the ``idx`` value of used ring
 | |
| 
 | |
| When reconnecting:
 | |
| 
 | |
| #. If the value of ``used_idx`` does not match the ``idx`` value of
 | |
|    used ring (means the inflight field of ``DescStateSplit`` entries in
 | |
|    last batch may be incorrect),
 | |
| 
 | |
|    a. Subtract the value of ``used_idx`` from the ``idx`` value of
 | |
|       used ring to get last batch size of ``DescStateSplit`` entries
 | |
| 
 | |
|    #. Set the ``inflight`` field of each ``DescStateSplit`` entry to 0 in last batch
 | |
|       list which starts from ``last_batch_head``
 | |
| 
 | |
|    #. Set ``used_idx`` to the ``idx`` value of used ring
 | |
| 
 | |
| #. Resubmit inflight ``DescStateSplit`` entries in order of their
 | |
|    counter value
 | |
| 
 | |
| For packed virtqueue, queue region can be implemented as:
 | |
| 
 | |
| .. code:: c
 | |
| 
 | |
|   typedef struct DescStatePacked {
 | |
|       /* Indicate whether this descriptor is inflight or not.
 | |
|        * Only available for head-descriptor. */
 | |
|       uint8_t inflight;
 | |
| 
 | |
|       /* Padding */
 | |
|       uint8_t padding;
 | |
| 
 | |
|       /* Link to the next free entry */
 | |
|       uint16_t next;
 | |
| 
 | |
|       /* Link to the last entry of descriptor list.
 | |
|        * Only available for head-descriptor. */
 | |
|       uint16_t last;
 | |
| 
 | |
|       /* The length of descriptor list.
 | |
|        * Only available for head-descriptor. */
 | |
|       uint16_t num;
 | |
| 
 | |
|       /* Used to preserve the order of fetching available descriptors.
 | |
|        * Only available for head-descriptor. */
 | |
|       uint64_t counter;
 | |
| 
 | |
|       /* The buffer id */
 | |
|       uint16_t id;
 | |
| 
 | |
|       /* The descriptor flags */
 | |
|       uint16_t flags;
 | |
| 
 | |
|       /* The buffer length */
 | |
|       uint32_t len;
 | |
| 
 | |
|       /* The buffer address */
 | |
|       uint64_t addr;
 | |
|   } DescStatePacked;
 | |
| 
 | |
|   typedef struct QueueRegionPacked {
 | |
|       /* The feature flags of this region. Now it's initialized to 0. */
 | |
|       uint64_t features;
 | |
| 
 | |
|       /* The version of this region. It's 1 currently.
 | |
|        * Zero value indicates an uninitialized buffer */
 | |
|       uint16_t version;
 | |
| 
 | |
|       /* The size of DescStatePacked array. It's equal to the virtqueue size.
 | |
|        * The back-end could get it from queue size field of VhostUserInflight. */
 | |
|       uint16_t desc_num;
 | |
| 
 | |
|       /* The head of free DescStatePacked entry list */
 | |
|       uint16_t free_head;
 | |
| 
 | |
|       /* The old head of free DescStatePacked entry list */
 | |
|       uint16_t old_free_head;
 | |
| 
 | |
|       /* The used index of descriptor ring */
 | |
|       uint16_t used_idx;
 | |
| 
 | |
|       /* The old used index of descriptor ring */
 | |
|       uint16_t old_used_idx;
 | |
| 
 | |
|       /* Device ring wrap counter */
 | |
|       uint8_t used_wrap_counter;
 | |
| 
 | |
|       /* The old device ring wrap counter */
 | |
|       uint8_t old_used_wrap_counter;
 | |
| 
 | |
|       /* Padding */
 | |
|       uint8_t padding[7];
 | |
| 
 | |
|       /* Used to track the state of each descriptor fetched from descriptor ring */
 | |
|       DescStatePacked desc[];
 | |
|   } QueueRegionPacked;
 | |
| 
 | |
| To track inflight I/O, the queue region should be processed as follows:
 | |
| 
 | |
| When receiving available buffers from the driver:
 | |
| 
 | |
| #. Get the next available descriptor entry from descriptor ring, ``d``
 | |
| 
 | |
| #. If ``d`` is head descriptor,
 | |
| 
 | |
|    a. Set ``desc[old_free_head].num`` to 0
 | |
| 
 | |
|    #. Set ``desc[old_free_head].counter`` to the value of global counter
 | |
| 
 | |
|    #. Increase global counter by 1
 | |
| 
 | |
|    #. Set ``desc[old_free_head].inflight`` to 1
 | |
| 
 | |
| #. If ``d`` is last descriptor, set ``desc[old_free_head].last`` to
 | |
|    ``free_head``
 | |
| 
 | |
| #. Increase ``desc[old_free_head].num`` by 1
 | |
| 
 | |
| #. Set ``desc[free_head].addr``, ``desc[free_head].len``,
 | |
|    ``desc[free_head].flags``, ``desc[free_head].id`` to ``d.addr``,
 | |
|    ``d.len``, ``d.flags``, ``d.id``
 | |
| 
 | |
| #. Set ``free_head`` to ``desc[free_head].next``
 | |
| 
 | |
| #. If ``d`` is last descriptor, set ``old_free_head`` to ``free_head``
 | |
| 
 | |
| When supplying used buffers to the driver:
 | |
| 
 | |
| 1. Get corresponding used head-descriptor entry from descriptor ring,
 | |
|    ``d``
 | |
| 
 | |
| 2. Get corresponding ``DescStatePacked`` entry, ``e``
 | |
| 
 | |
| 3. Set ``desc[e.last].next`` to ``free_head``
 | |
| 
 | |
| 4. Set ``free_head`` to the index of ``e``
 | |
| 
 | |
| #. Steps 1,2,3,4 may be performed repeatedly if batching is possible
 | |
| 
 | |
| #. Increase ``used_idx`` by the size of the batch and update
 | |
|    ``used_wrap_counter`` if needed
 | |
| 
 | |
| #. Update ``d.flags``
 | |
| 
 | |
| #. Set the ``inflight`` field of each head ``DescStatePacked`` entry
 | |
|    in the batch to 0
 | |
| 
 | |
| #. Set ``old_free_head``,  ``old_used_idx``, ``old_used_wrap_counter``
 | |
|    to ``free_head``, ``used_idx``, ``used_wrap_counter``
 | |
| 
 | |
| When reconnecting:
 | |
| 
 | |
| #. If ``used_idx`` does not match ``old_used_idx`` (means the
 | |
|    ``inflight`` field of ``DescStatePacked`` entries in last batch may
 | |
|    be incorrect),
 | |
| 
 | |
|    a. Get the next descriptor ring entry through ``old_used_idx``, ``d``
 | |
| 
 | |
|    #. Use ``old_used_wrap_counter`` to calculate the available flags
 | |
| 
 | |
|    #. If ``d.flags`` is not equal to the calculated flags value (means
 | |
|       back-end has submitted the buffer to guest driver before crash, so
 | |
|       it has to commit the in-progres update), set ``old_free_head``,
 | |
|       ``old_used_idx``, ``old_used_wrap_counter`` to ``free_head``,
 | |
|       ``used_idx``, ``used_wrap_counter``
 | |
| 
 | |
| #. Set ``free_head``, ``used_idx``, ``used_wrap_counter`` to
 | |
|    ``old_free_head``, ``old_used_idx``, ``old_used_wrap_counter``
 | |
|    (roll back any in-progress update)
 | |
| 
 | |
| #. Set the ``inflight`` field of each ``DescStatePacked`` entry in
 | |
|    free list to 0
 | |
| 
 | |
| #. Resubmit inflight ``DescStatePacked`` entries in order of their
 | |
|    counter value
 | |
| 
 | |
| In-band notifications
 | |
| ---------------------
 | |
| 
 | |
| In some limited situations (e.g. for simulation) it is desirable to
 | |
| have the kick, call and error (if used) signals done via in-band
 | |
| messages instead of asynchronous eventfd notifications. This can be
 | |
| done by negotiating the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS``
 | |
| protocol feature.
 | |
| 
 | |
| Note that due to the fact that too many messages on the sockets can
 | |
| cause the sending application(s) to block, it is not advised to use
 | |
| this feature unless absolutely necessary. It is also considered an
 | |
| error to negotiate this feature without also negotiating
 | |
| ``VHOST_USER_PROTOCOL_F_BACKEND_REQ`` and ``VHOST_USER_PROTOCOL_F_REPLY_ACK``,
 | |
| the former is necessary for getting a message channel from the back-end
 | |
| to the front-end, while the latter needs to be used with the in-band
 | |
| notification messages to block until they are processed, both to avoid
 | |
| blocking later and for proper processing (at least in the simulation
 | |
| use case.) As it has no other way of signalling this error, the back-end
 | |
| should close the connection as a response to a
 | |
| ``VHOST_USER_SET_PROTOCOL_FEATURES`` message that sets the in-band
 | |
| notifications feature flag without the other two.
 | |
| 
 | |
| Protocol features
 | |
| -----------------
 | |
| 
 | |
| .. code:: c
 | |
| 
 | |
|   #define VHOST_USER_PROTOCOL_F_MQ                    0
 | |
|   #define VHOST_USER_PROTOCOL_F_LOG_SHMFD             1
 | |
|   #define VHOST_USER_PROTOCOL_F_RARP                  2
 | |
|   #define VHOST_USER_PROTOCOL_F_REPLY_ACK             3
 | |
|   #define VHOST_USER_PROTOCOL_F_MTU                   4
 | |
|   #define VHOST_USER_PROTOCOL_F_BACKEND_REQ           5
 | |
|   #define VHOST_USER_PROTOCOL_F_CROSS_ENDIAN          6
 | |
|   #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION        7
 | |
|   #define VHOST_USER_PROTOCOL_F_PAGEFAULT             8
 | |
|   #define VHOST_USER_PROTOCOL_F_CONFIG                9
 | |
|   #define VHOST_USER_PROTOCOL_F_BACKEND_SEND_FD      10
 | |
|   #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER        11
 | |
|   #define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD       12
 | |
|   #define VHOST_USER_PROTOCOL_F_RESET_DEVICE         13
 | |
|   #define VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS 14
 | |
|   #define VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS  15
 | |
|   #define VHOST_USER_PROTOCOL_F_STATUS               16
 | |
|   #define VHOST_USER_PROTOCOL_F_XEN_MMAP             17
 | |
| 
 | |
| Front-end message types
 | |
| -----------------------
 | |
| 
 | |
| ``VHOST_USER_GET_FEATURES``
 | |
|   :id: 1
 | |
|   :equivalent ioctl: ``VHOST_GET_FEATURES``
 | |
|   :request payload: N/A
 | |
|   :reply payload: ``u64``
 | |
| 
 | |
|   Get from the underlying vhost implementation the features bitmask.
 | |
|   Feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` signals back-end support
 | |
|   for ``VHOST_USER_GET_PROTOCOL_FEATURES`` and
 | |
|   ``VHOST_USER_SET_PROTOCOL_FEATURES``.
 | |
| 
 | |
| ``VHOST_USER_SET_FEATURES``
 | |
|   :id: 2
 | |
|   :equivalent ioctl: ``VHOST_SET_FEATURES``
 | |
|   :request payload: ``u64``
 | |
|   :reply payload: N/A
 | |
| 
 | |
|   Enable features in the underlying vhost implementation using a
 | |
|   bitmask.  Feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` signals
 | |
|   back-end support for ``VHOST_USER_GET_PROTOCOL_FEATURES`` and
 | |
|   ``VHOST_USER_SET_PROTOCOL_FEATURES``.
 | |
| 
 | |
| ``VHOST_USER_GET_PROTOCOL_FEATURES``
 | |
|   :id: 15
 | |
|   :equivalent ioctl: ``VHOST_GET_FEATURES``
 | |
|   :request payload: N/A
 | |
|   :reply payload: ``u64``
 | |
| 
 | |
|   Get the protocol feature bitmask from the underlying vhost
 | |
|   implementation.  Only legal if feature bit
 | |
|   ``VHOST_USER_F_PROTOCOL_FEATURES`` is present in
 | |
|   ``VHOST_USER_GET_FEATURES``.  It does not need to be acknowledged by
 | |
|   ``VHOST_USER_SET_FEATURES``.
 | |
| 
 | |
| .. Note::
 | |
|    Back-ends that report ``VHOST_USER_F_PROTOCOL_FEATURES`` must
 | |
|    support this message even before ``VHOST_USER_SET_FEATURES`` was
 | |
|    called.
 | |
| 
 | |
| ``VHOST_USER_SET_PROTOCOL_FEATURES``
 | |
|   :id: 16
 | |
|   :equivalent ioctl: ``VHOST_SET_FEATURES``
 | |
|   :request payload: ``u64``
 | |
|   :reply payload: N/A
 | |
| 
 | |
|   Enable protocol features in the underlying vhost implementation.
 | |
| 
 | |
|   Only legal if feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` is present in
 | |
|   ``VHOST_USER_GET_FEATURES``.  It does not need to be acknowledged by
 | |
|   ``VHOST_USER_SET_FEATURES``.
 | |
| 
 | |
| .. Note::
 | |
|    Back-ends that report ``VHOST_USER_F_PROTOCOL_FEATURES`` must support
 | |
|    this message even before ``VHOST_USER_SET_FEATURES`` was called.
 | |
| 
 | |
| ``VHOST_USER_SET_OWNER``
 | |
|   :id: 3
 | |
|   :equivalent ioctl: ``VHOST_SET_OWNER``
 | |
|   :request payload: N/A
 | |
|   :reply payload: N/A
 | |
| 
 | |
|   Issued when a new connection is established. It marks the sender
 | |
|   as the front-end that owns of the session. This can be used on the *back-end*
 | |
|   as a "session start" flag.
 | |
| 
 | |
| ``VHOST_USER_RESET_OWNER``
 | |
|   :id: 4
 | |
|   :request payload: N/A
 | |
|   :reply payload: N/A
 | |
| 
 | |
| .. admonition:: Deprecated
 | |
| 
 | |
|    This is no longer used. Used to be sent to request disabling all
 | |
|    rings, but some back-ends interpreted it to also discard connection
 | |
|    state (this interpretation would lead to bugs).  It is recommended
 | |
|    that back-ends either ignore this message, or use it to disable all
 | |
|    rings.
 | |
| 
 | |
| ``VHOST_USER_SET_MEM_TABLE``
 | |
|   :id: 5
 | |
|   :equivalent ioctl: ``VHOST_SET_MEM_TABLE``
 | |
|   :request payload: multiple memory regions description
 | |
|   :reply payload: (postcopy only) multiple memory regions description
 | |
| 
 | |
|   Sets the memory map regions on the back-end so it can translate the
 | |
|   vring addresses. In the ancillary data there is an array of file
 | |
|   descriptors for each memory mapped region. The size and ordering of
 | |
|   the fds matches the number and ordering of memory regions.
 | |
| 
 | |
|   When ``VHOST_USER_POSTCOPY_LISTEN`` has been received,
 | |
|   ``SET_MEM_TABLE`` replies with the bases of the memory mapped
 | |
|   regions to the front-end.  The back-end must have mmap'd the regions but
 | |
|   not yet accessed them and should not yet generate a userfault
 | |
|   event.
 | |
| 
 | |
| .. Note::
 | |
|    ``NEED_REPLY_MASK`` is not set in this case.  QEMU will then
 | |
|    reply back to the list of mappings with an empty
 | |
|    ``VHOST_USER_SET_MEM_TABLE`` as an acknowledgement; only upon
 | |
|    reception of this message may the guest start accessing the memory
 | |
|    and generating faults.
 | |
| 
 | |
| ``VHOST_USER_SET_LOG_BASE``
 | |
|   :id: 6
 | |
|   :equivalent ioctl: ``VHOST_SET_LOG_BASE``
 | |
|   :request payload: u64
 | |
|   :reply payload: N/A
 | |
| 
 | |
|   Sets logging shared memory space.
 | |
| 
 | |
|   When the back-end has ``VHOST_USER_PROTOCOL_F_LOG_SHMFD`` protocol feature,
 | |
|   the log memory fd is provided in the ancillary data of
 | |
|   ``VHOST_USER_SET_LOG_BASE`` message, the size and offset of shared
 | |
|   memory area provided in the message.
 | |
| 
 | |
| ``VHOST_USER_SET_LOG_FD``
 | |
|   :id: 7
 | |
|   :equivalent ioctl: ``VHOST_SET_LOG_FD``
 | |
|   :request payload: N/A
 | |
|   :reply payload: N/A
 | |
| 
 | |
|   Sets the logging file descriptor, which is passed as ancillary data.
 | |
| 
 | |
| ``VHOST_USER_SET_VRING_NUM``
 | |
|   :id: 8
 | |
|   :equivalent ioctl: ``VHOST_SET_VRING_NUM``
 | |
|   :request payload: vring state description
 | |
|   :reply payload: N/A
 | |
| 
 | |
|   Set the size of the queue.
 | |
| 
 | |
| ``VHOST_USER_SET_VRING_ADDR``
 | |
|   :id: 9
 | |
|   :equivalent ioctl: ``VHOST_SET_VRING_ADDR``
 | |
|   :request payload: vring address description
 | |
|   :reply payload: N/A
 | |
| 
 | |
|   Sets the addresses of the different aspects of the vring.
 | |
| 
 | |
| ``VHOST_USER_SET_VRING_BASE``
 | |
|   :id: 10
 | |
|   :equivalent ioctl: ``VHOST_SET_VRING_BASE``
 | |
|   :request payload: vring state description
 | |
|   :reply payload: N/A
 | |
| 
 | |
|   Sets the base offset in the available vring.
 | |
| 
 | |
| ``VHOST_USER_GET_VRING_BASE``
 | |
|   :id: 11
 | |
|   :equivalent ioctl: ``VHOST_USER_GET_VRING_BASE``
 | |
|   :request payload: vring state description
 | |
|   :reply payload: vring state description
 | |
| 
 | |
|   Get the available vring base offset.
 | |
| 
 | |
| ``VHOST_USER_SET_VRING_KICK``
 | |
|   :id: 12
 | |
|   :equivalent ioctl: ``VHOST_SET_VRING_KICK``
 | |
|   :request payload: ``u64``
 | |
|   :reply payload: N/A
 | |
| 
 | |
|   Set the event file descriptor for adding buffers to the vring. It is
 | |
|   passed in the ancillary data.
 | |
| 
 | |
|   Bits (0-7) of the payload contain the vring index. Bit 8 is the
 | |
|   invalid FD flag. This flag is set when there is no file descriptor
 | |
|   in the ancillary data. This signals that polling should be used
 | |
|   instead of waiting for the kick. Note that if the protocol feature
 | |
|   ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` has been negotiated
 | |
|   this message isn't necessary as the ring is also started on the
 | |
|   ``VHOST_USER_VRING_KICK`` message, it may however still be used to
 | |
|   set an event file descriptor (which will be preferred over the
 | |
|   message) or to enable polling.
 | |
| 
 | |
| ``VHOST_USER_SET_VRING_CALL``
 | |
|   :id: 13
 | |
|   :equivalent ioctl: ``VHOST_SET_VRING_CALL``
 | |
|   :request payload: ``u64``
 | |
|   :reply payload: N/A
 | |
| 
 | |
|   Set the event file descriptor to signal when buffers are used. It is
 | |
|   passed in the ancillary data.
 | |
| 
 | |
|   Bits (0-7) of the payload contain the vring index. Bit 8 is the
 | |
|   invalid FD flag. This flag is set when there is no file descriptor
 | |
|   in the ancillary data. This signals that polling will be used
 | |
|   instead of waiting for the call. Note that if the protocol features
 | |
|   ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` and
 | |
|   ``VHOST_USER_PROTOCOL_F_BACKEND_REQ`` have been negotiated this message
 | |
|   isn't necessary as the ``VHOST_USER_BACKEND_VRING_CALL`` message can be
 | |
|   used, it may however still be used to set an event file descriptor
 | |
|   or to enable polling.
 | |
| 
 | |
| ``VHOST_USER_SET_VRING_ERR``
 | |
|   :id: 14
 | |
|   :equivalent ioctl: ``VHOST_SET_VRING_ERR``
 | |
|   :request payload: ``u64``
 | |
|   :reply payload: N/A
 | |
| 
 | |
|   Set the event file descriptor to signal when error occurs. It is
 | |
|   passed in the ancillary data.
 | |
| 
 | |
|   Bits (0-7) of the payload contain the vring index. Bit 8 is the
 | |
|   invalid FD flag. This flag is set when there is no file descriptor
 | |
|   in the ancillary data. Note that if the protocol features
 | |
|   ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` and
 | |
|   ``VHOST_USER_PROTOCOL_F_BACKEND_REQ`` have been negotiated this message
 | |
|   isn't necessary as the ``VHOST_USER_BACKEND_VRING_ERR`` message can be
 | |
|   used, it may however still be used to set an event file descriptor
 | |
|   (which will be preferred over the message).
 | |
| 
 | |
| ``VHOST_USER_GET_QUEUE_NUM``
 | |
|   :id: 17
 | |
|   :equivalent ioctl: N/A
 | |
|   :request payload: N/A
 | |
|   :reply payload: u64
 | |
| 
 | |
|   Query how many queues the back-end supports.
 | |
| 
 | |
|   This request should be sent only when ``VHOST_USER_PROTOCOL_F_MQ``
 | |
|   is set in queried protocol features by
 | |
|   ``VHOST_USER_GET_PROTOCOL_FEATURES``.
 | |
| 
 | |
| ``VHOST_USER_SET_VRING_ENABLE``
 | |
|   :id: 18
 | |
|   :equivalent ioctl: N/A
 | |
|   :request payload: vring state description
 | |
|   :reply payload: N/A
 | |
| 
 | |
|   Signal the back-end to enable or disable corresponding vring.
 | |
| 
 | |
|   This request should be sent only when
 | |
|   ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated.
 | |
| 
 | |
| ``VHOST_USER_SEND_RARP``
 | |
|   :id: 19
 | |
|   :equivalent ioctl: N/A
 | |
|   :request payload: ``u64``
 | |
|   :reply payload: N/A
 | |
| 
 | |
|   Ask vhost user back-end to broadcast a fake RARP to notify the migration
 | |
|   is terminated for guest that does not support GUEST_ANNOUNCE.
 | |
| 
 | |
|   Only legal if feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` is
 | |
|   present in ``VHOST_USER_GET_FEATURES`` and protocol feature bit
 | |
|   ``VHOST_USER_PROTOCOL_F_RARP`` is present in
 | |
|   ``VHOST_USER_GET_PROTOCOL_FEATURES``.  The first 6 bytes of the
 | |
|   payload contain the mac address of the guest to allow the vhost user
 | |
|   back-end to construct and broadcast the fake RARP.
 | |
| 
 | |
| ``VHOST_USER_NET_SET_MTU``
 | |
|   :id: 20
 | |
|   :equivalent ioctl: N/A
 | |
|   :request payload: ``u64``
 | |
|   :reply payload: N/A
 | |
| 
 | |
|   Set host MTU value exposed to the guest.
 | |
| 
 | |
|   This request should be sent only when ``VIRTIO_NET_F_MTU`` feature
 | |
|   has been successfully negotiated, ``VHOST_USER_F_PROTOCOL_FEATURES``
 | |
|   is present in ``VHOST_USER_GET_FEATURES`` and protocol feature bit
 | |
|   ``VHOST_USER_PROTOCOL_F_NET_MTU`` is present in
 | |
|   ``VHOST_USER_GET_PROTOCOL_FEATURES``.
 | |
| 
 | |
|   If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, the back-end must
 | |
|   respond with zero in case the specified MTU is valid, or non-zero
 | |
|   otherwise.
 | |
| 
 | |
| ``VHOST_USER_SET_BACKEND_REQ_FD`` (previous name ``VHOST_USER_SET_SLAVE_REQ_FD``)
 | |
|   :id: 21
 | |
|   :equivalent ioctl: N/A
 | |
|   :request payload: N/A
 | |
|   :reply payload: N/A
 | |
| 
 | |
|   Set the socket file descriptor for back-end initiated requests. It is passed
 | |
|   in the ancillary data.
 | |
| 
 | |
|   This request should be sent only when
 | |
|   ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated, and protocol
 | |
|   feature bit ``VHOST_USER_PROTOCOL_F_BACKEND_REQ`` bit is present in
 | |
|   ``VHOST_USER_GET_PROTOCOL_FEATURES``.  If
 | |
|   ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, the back-end must
 | |
|   respond with zero for success, non-zero otherwise.
 | |
| 
 | |
| ``VHOST_USER_IOTLB_MSG``
 | |
|   :id: 22
 | |
|   :equivalent ioctl: N/A (equivalent to ``VHOST_IOTLB_MSG`` message type)
 | |
|   :request payload: ``struct vhost_iotlb_msg``
 | |
|   :reply payload: ``u64``
 | |
| 
 | |
|   Send IOTLB messages with ``struct vhost_iotlb_msg`` as payload.
 | |
| 
 | |
|   The front-end sends such requests to update and invalidate entries in the
 | |
|   device IOTLB. The back-end has to acknowledge the request with sending
 | |
|   zero as ``u64`` payload for success, non-zero otherwise.
 | |
| 
 | |
|   This request should be send only when ``VIRTIO_F_IOMMU_PLATFORM``
 | |
|   feature has been successfully negotiated.
 | |
| 
 | |
| ``VHOST_USER_SET_VRING_ENDIAN``
 | |
|   :id: 23
 | |
|   :equivalent ioctl: ``VHOST_SET_VRING_ENDIAN``
 | |
|   :request payload: vring state description
 | |
|   :reply payload: N/A
 | |
| 
 | |
|   Set the endianness of a VQ for legacy devices. Little-endian is
 | |
|   indicated with state.num set to 0 and big-endian is indicated with
 | |
|   state.num set to 1. Other values are invalid.
 | |
| 
 | |
|   This request should be sent only when
 | |
|   ``VHOST_USER_PROTOCOL_F_CROSS_ENDIAN`` has been negotiated.
 | |
|   Backends that negotiated this feature should handle both
 | |
|   endiannesses and expect this message once (per VQ) during device
 | |
|   configuration (ie. before the front-end starts the VQ).
 | |
| 
 | |
| ``VHOST_USER_GET_CONFIG``
 | |
|   :id: 24
 | |
|   :equivalent ioctl: N/A
 | |
|   :request payload: virtio device config space
 | |
|   :reply payload: virtio device config space
 | |
| 
 | |
|   When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, this message is
 | |
|   submitted by the vhost-user front-end to fetch the contents of the
 | |
|   virtio device configuration space, vhost-user back-end's payload size
 | |
|   MUST match the front-end's request, vhost-user back-end uses zero length of
 | |
|   payload to indicate an error to the vhost-user front-end. The vhost-user
 | |
|   front-end may cache the contents to avoid repeated
 | |
|   ``VHOST_USER_GET_CONFIG`` calls.
 | |
| 
 | |
| ``VHOST_USER_SET_CONFIG``
 | |
|   :id: 25
 | |
|   :equivalent ioctl: N/A
 | |
|   :request payload: virtio device config space
 | |
|   :reply payload: N/A
 | |
| 
 | |
|   When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, this message is
 | |
|   submitted by the vhost-user front-end when the Guest changes the virtio
 | |
|   device configuration space and also can be used for live migration
 | |
|   on the destination host. The vhost-user back-end must check the flags
 | |
|   field, and back-ends MUST NOT accept SET_CONFIG for read-only
 | |
|   configuration space fields unless the live migration bit is set.
 | |
| 
 | |
| ``VHOST_USER_CREATE_CRYPTO_SESSION``
 | |
|   :id: 26
 | |
|   :equivalent ioctl: N/A
 | |
|   :request payload: crypto session description
 | |
|   :reply payload: crypto session description
 | |
| 
 | |
|   Create a session for crypto operation. The back-end must return
 | |
|   the session id, 0 or positive for success, negative for failure.
 | |
|   This request should be sent only when
 | |
|   ``VHOST_USER_PROTOCOL_F_CRYPTO_SESSION`` feature has been
 | |
|   successfully negotiated.  It's a required feature for crypto
 | |
|   devices.
 | |
| 
 | |
| ``VHOST_USER_CLOSE_CRYPTO_SESSION``
 | |
|   :id: 27
 | |
|   :equivalent ioctl: N/A
 | |
|   :request payload: ``u64``
 | |
|   :reply payload: N/A
 | |
| 
 | |
|   Close a session for crypto operation which was previously
 | |
|   created by ``VHOST_USER_CREATE_CRYPTO_SESSION``.
 | |
| 
 | |
|   This request should be sent only when
 | |
|   ``VHOST_USER_PROTOCOL_F_CRYPTO_SESSION`` feature has been
 | |
|   successfully negotiated.  It's a required feature for crypto
 | |
|   devices.
 | |
| 
 | |
| ``VHOST_USER_POSTCOPY_ADVISE``
 | |
|   :id: 28
 | |
|   :request payload: N/A
 | |
|   :reply payload: userfault fd
 | |
| 
 | |
|   When ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported, the front-end
 | |
|   advises back-end that a migration with postcopy enabled is underway,
 | |
|   the back-end must open a userfaultfd for later use.  Note that at this
 | |
|   stage the migration is still in precopy mode.
 | |
| 
 | |
| ``VHOST_USER_POSTCOPY_LISTEN``
 | |
|   :id: 29
 | |
|   :request payload: N/A
 | |
|   :reply payload: N/A
 | |
| 
 | |
|   The front-end advises back-end that a transition to postcopy mode has
 | |
|   happened.  The back-end must ensure that shared memory is registered
 | |
|   with userfaultfd to cause faulting of non-present pages.
 | |
| 
 | |
|   This is always sent sometime after a ``VHOST_USER_POSTCOPY_ADVISE``,
 | |
|   and thus only when ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported.
 | |
| 
 | |
| ``VHOST_USER_POSTCOPY_END``
 | |
|   :id: 30
 | |
|   :request payload: N/A
 | |
|   :reply payload: ``u64``
 | |
| 
 | |
|   The front-end advises that postcopy migration has now completed.  The back-end
 | |
|   must disable the userfaultfd. The reply is an acknowledgement
 | |
|   only.
 | |
| 
 | |
|   When ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported, this message
 | |
|   is sent at the end of the migration, after
 | |
|   ``VHOST_USER_POSTCOPY_LISTEN`` was previously sent.
 | |
| 
 | |
|   The value returned is an error indication; 0 is success.
 | |
| 
 | |
| ``VHOST_USER_GET_INFLIGHT_FD``
 | |
|   :id: 31
 | |
|   :equivalent ioctl: N/A
 | |
|   :request payload: inflight description
 | |
|   :reply payload: N/A
 | |
| 
 | |
|   When ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD`` protocol feature has
 | |
|   been successfully negotiated, this message is submitted by the front-end to
 | |
|   get a shared buffer from back-end. The shared buffer will be used to
 | |
|   track inflight I/O by back-end. QEMU should retrieve a new one when vm
 | |
|   reset.
 | |
| 
 | |
| ``VHOST_USER_SET_INFLIGHT_FD``
 | |
|   :id: 32
 | |
|   :equivalent ioctl: N/A
 | |
|   :request payload: inflight description
 | |
|   :reply payload: N/A
 | |
| 
 | |
|   When ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD`` protocol feature has
 | |
|   been successfully negotiated, this message is submitted by the front-end to
 | |
|   send the shared inflight buffer back to the back-end so that the back-end
 | |
|   could get inflight I/O after a crash or restart.
 | |
| 
 | |
| ``VHOST_USER_GPU_SET_SOCKET``
 | |
|   :id: 33
 | |
|   :equivalent ioctl: N/A
 | |
|   :request payload: N/A
 | |
|   :reply payload: N/A
 | |
| 
 | |
|   Sets the GPU protocol socket file descriptor, which is passed as
 | |
|   ancillary data. The GPU protocol is used to inform the front-end of
 | |
|   rendering state and updates. See vhost-user-gpu.rst for details.
 | |
| 
 | |
| ``VHOST_USER_RESET_DEVICE``
 | |
|   :id: 34
 | |
|   :equivalent ioctl: N/A
 | |
|   :request payload: N/A
 | |
|   :reply payload: N/A
 | |
| 
 | |
|   Ask the vhost user back-end to disable all rings and reset all
 | |
|   internal device state to the initial state, ready to be
 | |
|   reinitialized. The back-end retains ownership of the device
 | |
|   throughout the reset operation.
 | |
| 
 | |
|   Only valid if the ``VHOST_USER_PROTOCOL_F_RESET_DEVICE`` protocol
 | |
|   feature is set by the back-end.
 | |
| 
 | |
| ``VHOST_USER_VRING_KICK``
 | |
|   :id: 35
 | |
|   :equivalent ioctl: N/A
 | |
|   :request payload: vring state description
 | |
|   :reply payload: N/A
 | |
| 
 | |
|   When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol
 | |
|   feature has been successfully negotiated, this message may be
 | |
|   submitted by the front-end to indicate that a buffer was added to
 | |
|   the vring instead of signalling it using the vring's kick file
 | |
|   descriptor or having the back-end rely on polling.
 | |
| 
 | |
|   The state.num field is currently reserved and must be set to 0.
 | |
| 
 | |
| ``VHOST_USER_GET_MAX_MEM_SLOTS``
 | |
|   :id: 36
 | |
|   :equivalent ioctl: N/A
 | |
|   :request payload: N/A
 | |
|   :reply payload: u64
 | |
| 
 | |
|   When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol
 | |
|   feature has been successfully negotiated, this message is submitted
 | |
|   by the front-end to the back-end. The back-end should return the message with a
 | |
|   u64 payload containing the maximum number of memory slots for
 | |
|   QEMU to expose to the guest. The value returned by the back-end
 | |
|   will be capped at the maximum number of ram slots which can be
 | |
|   supported by the target platform.
 | |
| 
 | |
| ``VHOST_USER_ADD_MEM_REG``
 | |
|   :id: 37
 | |
|   :equivalent ioctl: N/A
 | |
|   :request payload: N/A
 | |
|   :reply payload: single memory region description
 | |
| 
 | |
|   When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol
 | |
|   feature has been successfully negotiated, this message is submitted
 | |
|   by the front-end to the back-end. The message payload contains a memory
 | |
|   region descriptor struct, describing a region of guest memory which
 | |
|   the back-end device must map in. When the
 | |
|   ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol feature has
 | |
|   been successfully negotiated, along with the
 | |
|   ``VHOST_USER_REM_MEM_REG`` message, this message is used to set and
 | |
|   update the memory tables of the back-end device.
 | |
| 
 | |
|   Exactly one file descriptor from which the memory is mapped is
 | |
|   passed in the ancillary data.
 | |
| 
 | |
|   In postcopy mode (see ``VHOST_USER_POSTCOPY_LISTEN``), the back-end
 | |
|   replies with the bases of the memory mapped region to the front-end.
 | |
|   For further details on postcopy, see ``VHOST_USER_SET_MEM_TABLE``.
 | |
|   They apply to ``VHOST_USER_ADD_MEM_REG`` accordingly.
 | |
| 
 | |
| ``VHOST_USER_REM_MEM_REG``
 | |
|   :id: 38
 | |
|   :equivalent ioctl: N/A
 | |
|   :request payload: N/A
 | |
|   :reply payload: single memory region description
 | |
| 
 | |
|   When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol
 | |
|   feature has been successfully negotiated, this message is submitted
 | |
|   by the front-end to the back-end. The message payload contains a memory
 | |
|   region descriptor struct, describing a region of guest memory which
 | |
|   the back-end device must unmap. When the
 | |
|   ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol feature has
 | |
|   been successfully negotiated, along with the
 | |
|   ``VHOST_USER_ADD_MEM_REG`` message, this message is used to set and
 | |
|   update the memory tables of the back-end device.
 | |
| 
 | |
|   The memory region to be removed is identified by its guest address,
 | |
|   user address and size. The mmap offset is ignored.
 | |
| 
 | |
|   No file descriptors SHOULD be passed in the ancillary data. For
 | |
|   compatibility with existing incorrect implementations, the back-end MAY
 | |
|   accept messages with one file descriptor. If a file descriptor is
 | |
|   passed, the back-end MUST close it without using it otherwise.
 | |
| 
 | |
| ``VHOST_USER_SET_STATUS``
 | |
|   :id: 39
 | |
|   :equivalent ioctl: VHOST_VDPA_SET_STATUS
 | |
|   :request payload: ``u64``
 | |
|   :reply payload: N/A
 | |
| 
 | |
|   When the ``VHOST_USER_PROTOCOL_F_STATUS`` protocol feature has been
 | |
|   successfully negotiated, this message is submitted by the front-end to
 | |
|   notify the back-end with updated device status as defined in the Virtio
 | |
|   specification.
 | |
| 
 | |
| ``VHOST_USER_GET_STATUS``
 | |
|   :id: 40
 | |
|   :equivalent ioctl: VHOST_VDPA_GET_STATUS
 | |
|   :request payload: N/A
 | |
|   :reply payload: ``u64``
 | |
| 
 | |
|   When the ``VHOST_USER_PROTOCOL_F_STATUS`` protocol feature has been
 | |
|   successfully negotiated, this message is submitted by the front-end to
 | |
|   query the back-end for its device status as defined in the Virtio
 | |
|   specification.
 | |
| 
 | |
| 
 | |
| Back-end message types
 | |
| ----------------------
 | |
| 
 | |
| For this type of message, the request is sent by the back-end and the reply
 | |
| is sent by the front-end.
 | |
| 
 | |
| ``VHOST_USER_BACKEND_IOTLB_MSG`` (previous name ``VHOST_USER_SLAVE_IOTLB_MSG``)
 | |
|   :id: 1
 | |
|   :equivalent ioctl: N/A (equivalent to ``VHOST_IOTLB_MSG`` message type)
 | |
|   :request payload: ``struct vhost_iotlb_msg``
 | |
|   :reply payload: N/A
 | |
| 
 | |
|   Send IOTLB messages with ``struct vhost_iotlb_msg`` as payload.
 | |
|   The back-end sends such requests to notify of an IOTLB miss, or an IOTLB
 | |
|   access failure. If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is
 | |
|   negotiated, and back-end set the ``VHOST_USER_NEED_REPLY`` flag, the front-end
 | |
|   must respond with zero when operation is successfully completed, or
 | |
|   non-zero otherwise.  This request should be send only when
 | |
|   ``VIRTIO_F_IOMMU_PLATFORM`` feature has been successfully
 | |
|   negotiated.
 | |
| 
 | |
| ``VHOST_USER_BACKEND_CONFIG_CHANGE_MSG`` (previous name ``VHOST_USER_SLAVE_CONFIG_CHANGE_MSG``)
 | |
|   :id: 2
 | |
|   :equivalent ioctl: N/A
 | |
|   :request payload: N/A
 | |
|   :reply payload: N/A
 | |
| 
 | |
|   When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, vhost-user
 | |
|   back-end sends such messages to notify that the virtio device's
 | |
|   configuration space has changed, for those host devices which can
 | |
|   support such feature, host driver can send ``VHOST_USER_GET_CONFIG``
 | |
|   message to the back-end to get the latest content. If
 | |
|   ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, and the back-end sets the
 | |
|   ``VHOST_USER_NEED_REPLY`` flag, the front-end must respond with zero when
 | |
|   operation is successfully completed, or non-zero otherwise.
 | |
| 
 | |
| ``VHOST_USER_BACKEND_VRING_HOST_NOTIFIER_MSG`` (previous name ``VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG``)
 | |
|   :id: 3
 | |
|   :equivalent ioctl: N/A
 | |
|   :request payload: vring area description
 | |
|   :reply payload: N/A
 | |
| 
 | |
|   Sets host notifier for a specified queue. The queue index is
 | |
|   contained in the ``u64`` field of the vring area description. The
 | |
|   host notifier is described by the file descriptor (typically it's a
 | |
|   VFIO device fd) which is passed as ancillary data and the size
 | |
|   (which is mmap size and should be the same as host page size) and
 | |
|   offset (which is mmap offset) carried in the vring area
 | |
|   description. QEMU can mmap the file descriptor based on the size and
 | |
|   offset to get a memory range. Registering a host notifier means
 | |
|   mapping this memory range to the VM as the specified queue's notify
 | |
|   MMIO region. The back-end sends this request to tell QEMU to de-register
 | |
|   the existing notifier if any and register the new notifier if the
 | |
|   request is sent with a file descriptor.
 | |
| 
 | |
|   This request should be sent only when
 | |
|   ``VHOST_USER_PROTOCOL_F_HOST_NOTIFIER`` protocol feature has been
 | |
|   successfully negotiated.
 | |
| 
 | |
| ``VHOST_USER_BACKEND_VRING_CALL`` (previous name ``VHOST_USER_SLAVE_VRING_CALL``)
 | |
|   :id: 4
 | |
|   :equivalent ioctl: N/A
 | |
|   :request payload: vring state description
 | |
|   :reply payload: N/A
 | |
| 
 | |
|   When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol
 | |
|   feature has been successfully negotiated, this message may be
 | |
|   submitted by the back-end to indicate that a buffer was used from
 | |
|   the vring instead of signalling this using the vring's call file
 | |
|   descriptor or having the front-end relying on polling.
 | |
| 
 | |
|   The state.num field is currently reserved and must be set to 0.
 | |
| 
 | |
| ``VHOST_USER_BACKEND_VRING_ERR`` (previous name ``VHOST_USER_SLAVE_VRING_ERR``)
 | |
|   :id: 5
 | |
|   :equivalent ioctl: N/A
 | |
|   :request payload: vring state description
 | |
|   :reply payload: N/A
 | |
| 
 | |
|   When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol
 | |
|   feature has been successfully negotiated, this message may be
 | |
|   submitted by the back-end to indicate that an error occurred on the
 | |
|   specific vring, instead of signalling the error file descriptor
 | |
|   set by the front-end via ``VHOST_USER_SET_VRING_ERR``.
 | |
| 
 | |
|   The state.num field is currently reserved and must be set to 0.
 | |
| 
 | |
| .. _reply_ack:
 | |
| 
 | |
| VHOST_USER_PROTOCOL_F_REPLY_ACK
 | |
| -------------------------------
 | |
| 
 | |
| The original vhost-user specification only demands replies for certain
 | |
| commands. This differs from the vhost protocol implementation where
 | |
| commands are sent over an ``ioctl()`` call and block until the back-end
 | |
| has completed.
 | |
| 
 | |
| With this protocol extension negotiated, the sender (QEMU) can set the
 | |
| ``need_reply`` [Bit 3] flag to any command. This indicates that the
 | |
| back-end MUST respond with a Payload ``VhostUserMsg`` indicating success
 | |
| or failure. The payload should be set to zero on success or non-zero
 | |
| on failure, unless the message already has an explicit reply body.
 | |
| 
 | |
| The reply payload gives QEMU a deterministic indication of the result
 | |
| of the command. Today, QEMU is expected to terminate the main vhost-user
 | |
| loop upon receiving such errors. In future, qemu could be taught to be more
 | |
| resilient for selective requests.
 | |
| 
 | |
| For the message types that already solicit a reply from the back-end,
 | |
| the presence of ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` or need_reply bit
 | |
| being set brings no behavioural change. (See the Communication_
 | |
| section for details.)
 | |
| 
 | |
| .. _backend_conventions:
 | |
| 
 | |
| Backend program conventions
 | |
| ===========================
 | |
| 
 | |
| vhost-user back-ends can provide various devices & services and may
 | |
| need to be configured manually depending on the use case. However, it
 | |
| is a good idea to follow the conventions listed here when
 | |
| possible. Users, QEMU or libvirt, can then rely on some common
 | |
| behaviour to avoid heterogeneous configuration and management of the
 | |
| back-end programs and facilitate interoperability.
 | |
| 
 | |
| Each back-end installed on a host system should come with at least one
 | |
| JSON file that conforms to the vhost-user.json schema. Each file
 | |
| informs the management applications about the back-end type, and binary
 | |
| location. In addition, it defines rules for management apps for
 | |
| picking the highest priority back-end when multiple match the search
 | |
| criteria (see ``@VhostUserBackend`` documentation in the schema file).
 | |
| 
 | |
| If the back-end is not capable of enabling a requested feature on the
 | |
| host (such as 3D acceleration with virgl), or the initialization
 | |
| failed, the back-end should fail to start early and exit with a status
 | |
| != 0. It may also print a message to stderr for further details.
 | |
| 
 | |
| The back-end program must not daemonize itself, but it may be
 | |
| daemonized by the management layer. It may also have a restricted
 | |
| access to the system.
 | |
| 
 | |
| File descriptors 0, 1 and 2 will exist, and have regular
 | |
| stdin/stdout/stderr usage (they may have been redirected to /dev/null
 | |
| by the management layer, or to a log handler).
 | |
| 
 | |
| The back-end program must end (as quickly and cleanly as possible) when
 | |
| the SIGTERM signal is received. Eventually, it may receive SIGKILL by
 | |
| the management layer after a few seconds.
 | |
| 
 | |
| The following command line options have an expected behaviour. They
 | |
| are mandatory, unless explicitly said differently:
 | |
| 
 | |
| --socket-path=PATH
 | |
| 
 | |
|   This option specify the location of the vhost-user Unix domain socket.
 | |
|   It is incompatible with --fd.
 | |
| 
 | |
| --fd=FDNUM
 | |
| 
 | |
|   When this argument is given, the back-end program is started with the
 | |
|   vhost-user socket as file descriptor FDNUM. It is incompatible with
 | |
|   --socket-path.
 | |
| 
 | |
| --print-capabilities
 | |
| 
 | |
|   Output to stdout the back-end capabilities in JSON format, and then
 | |
|   exit successfully. Other options and arguments should be ignored, and
 | |
|   the back-end program should not perform its normal function.  The
 | |
|   capabilities can be reported dynamically depending on the host
 | |
|   capabilities.
 | |
| 
 | |
| The JSON output is described in the ``vhost-user.json`` schema, by
 | |
| ```@VHostUserBackendCapabilities``.  Example:
 | |
| 
 | |
| .. code:: json
 | |
| 
 | |
|   {
 | |
|     "type": "foo",
 | |
|     "features": [
 | |
|       "feature-a",
 | |
|       "feature-b"
 | |
|     ]
 | |
|   }
 | |
| 
 | |
| vhost-user-input
 | |
| ----------------
 | |
| 
 | |
| Command line options:
 | |
| 
 | |
| --evdev-path=PATH
 | |
| 
 | |
|   Specify the linux input device.
 | |
| 
 | |
|   (optional)
 | |
| 
 | |
| --no-grab
 | |
| 
 | |
|   Do no request exclusive access to the input device.
 | |
| 
 | |
|   (optional)
 | |
| 
 | |
| vhost-user-gpu
 | |
| --------------
 | |
| 
 | |
| Command line options:
 | |
| 
 | |
| --render-node=PATH
 | |
| 
 | |
|   Specify the GPU DRM render node.
 | |
| 
 | |
|   (optional)
 | |
| 
 | |
| --virgl
 | |
| 
 | |
|   Enable virgl rendering support.
 | |
| 
 | |
|   (optional)
 | |
| 
 | |
| vhost-user-blk
 | |
| --------------
 | |
| 
 | |
| Command line options:
 | |
| 
 | |
| --blk-file=PATH
 | |
| 
 | |
|   Specify block device or file path.
 | |
| 
 | |
|   (optional)
 | |
| 
 | |
| --read-only
 | |
| 
 | |
|   Enable read-only.
 | |
| 
 | |
|   (optional)
 |