Targets know whether they are big-endian more than they know if
the endianness is different from the host: the former is mostly
a constant, at least in machine creation code, while the latter
has to be computed with TARGET_BIG_ENDIAN != HOST_BIG_ENDIAN or
something like that.
load_aout, however, takes a "bswap_needed" argument. Replace
it with a "big_endian" argument; even though all users are
big-endian, it is cheap enough to keep the optional swapping
functionality even for little-endian boards.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Allow the device being added to loongarch virt VMs.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-ID: <20250319141159.1461621-6-kraxel@redhat.com>
Allow the device being added to riscv virt VMs.
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-ID: <20250319141159.1461621-5-kraxel@redhat.com>
Catch lseek errors. Return on errors.
Use autoptr for the GString to simplify cleanup.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-ID: <20250319141159.1461621-3-kraxel@redhat.com>
Make live migration more robust. Commit 4c0cfc72b31a ("pflash_cfi01:
write flash contents to bdrv on incoming migration") elaborates in
detail on the motivation.
Cc: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-ID: <20250319141159.1461621-2-kraxel@redhat.com>
When POWER10 CPU was made as default, we missed keeping POWER9 as
default for older pseries releases (pre-9.0) at that time.
This caused breakge in default cpu evaluation for older pseries
machines and hence this fix.
Fixes: 51113013f3 ("ppc/spapr: change pseries machine default to POWER10 CPU")
Cc: qemu-stable@nongnu.org
Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250313094705.2361997-1-harshpb@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
The variable holding default env is not supposed to be written.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250314200145.08E0F4E6067@zero.eik.bme.hu>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Coverity reported that return value of blk_pwrite() maybe should not
be ignored. We can't do much if this happens other than report an
error but let's do that to silence this report.
Resolves: Coverity CID 1593725
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Message-ID: <20250314200140.2DBE74E6069@zero.eik.bme.hu>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
* Update functional tests parts of the documenation
* Some minor fixes for functional tests
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAmfawycRHHRodXRoQHJl
ZGhhdC5jb20ACgkQLtnXdP5wLbU2vA//UV2RdKVIQDS7MbMYRjmUr0NK9/9dLmrn
/lZVWXCBDEB7seu/VOGZmr1H0zoQ8XYJTSbrmp2cW0NRPhCVeAz9Zpg7+jt3Qy6/
ahbiNQyhYztMbSa4XOOUEoLZBsfZILjWgqBilrRn7ng6wJoNabEIs/KqMP3O9qsx
TYCCnu5JkMF85Bf0l3kUJlLX0b5+BnpUNDke1cipvTa7u/Coz0mDBBZZtgW1bBj8
TETuMC1JtCg3aj1ey7k0pK4nCd740mr5w659C4LE8NCE0/juc3AtRM5RCqU9tAGh
tXpfrZziyvSrAhyWieRQlgzLvrt2gTF/5FrqhPUssts+vkH1EgB56FiPXdqMtLRo
zU+SVRuOMHQZn7E6L9KQ7Gz5w98PSVGYxUUpWIvOx/0d9wgoIfYPjgtJz5UV11mV
Nnt304UV4FKw94V8S8JYUClamP4SMTMLZNRIsd46Ef+DOL1CI+jcDZBntijwSgs1
5fs0IZyl6ZXtmUibVWJ+PqyYW6YiAfi/wY/mJlfnvKVZjoudbhNkNOtC9hi4YTQd
yJ7gVy9A4OeQqXgiQcymFvlseggds7uPQ9/szuGC1RwrW2NYH1YLisKpNzPtqq16
TEOnsozlDa9OUDshKxrA5rwHiDcSuqJjkP26N91AmdEQDgoQcbIKWghriTxkOV9Q
d2aJt+3KF04=
=cNi4
-----END PGP SIGNATURE-----
Merge tag 'pull-request-2025-03-19' of https://gitlab.com/thuth/qemu into staging
* Fix linking problem when CONFIG_VIRTIO_PCI is not set for s390x
* Update functional tests parts of the documenation
* Some minor fixes for functional tests
# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCAAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAmfawycRHHRodXRoQHJl
# ZGhhdC5jb20ACgkQLtnXdP5wLbU2vA//UV2RdKVIQDS7MbMYRjmUr0NK9/9dLmrn
# /lZVWXCBDEB7seu/VOGZmr1H0zoQ8XYJTSbrmp2cW0NRPhCVeAz9Zpg7+jt3Qy6/
# ahbiNQyhYztMbSa4XOOUEoLZBsfZILjWgqBilrRn7ng6wJoNabEIs/KqMP3O9qsx
# TYCCnu5JkMF85Bf0l3kUJlLX0b5+BnpUNDke1cipvTa7u/Coz0mDBBZZtgW1bBj8
# TETuMC1JtCg3aj1ey7k0pK4nCd740mr5w659C4LE8NCE0/juc3AtRM5RCqU9tAGh
# tXpfrZziyvSrAhyWieRQlgzLvrt2gTF/5FrqhPUssts+vkH1EgB56FiPXdqMtLRo
# zU+SVRuOMHQZn7E6L9KQ7Gz5w98PSVGYxUUpWIvOx/0d9wgoIfYPjgtJz5UV11mV
# Nnt304UV4FKw94V8S8JYUClamP4SMTMLZNRIsd46Ef+DOL1CI+jcDZBntijwSgs1
# 5fs0IZyl6ZXtmUibVWJ+PqyYW6YiAfi/wY/mJlfnvKVZjoudbhNkNOtC9hi4YTQd
# yJ7gVy9A4OeQqXgiQcymFvlseggds7uPQ9/szuGC1RwrW2NYH1YLisKpNzPtqq16
# TEOnsozlDa9OUDshKxrA5rwHiDcSuqJjkP26N91AmdEQDgoQcbIKWghriTxkOV9Q
# d2aJt+3KF04=
# =cNi4
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 19 Mar 2025 09:14:15 EDT
# gpg: using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5
# gpg: issuer "thuth@redhat.com"
# gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [full]
# gpg: aka "Thomas Huth <thuth@redhat.com>" [full]
# gpg: aka "Thomas Huth <huth@tuxfamily.org>" [full]
# gpg: aka "Thomas Huth <th.huth@posteo.de>" [unknown]
# Primary key fingerprint: 27B8 8847 EEE0 2501 18F3 EAB9 2ED9 D774 FE70 2DB5
* tag 'pull-request-2025-03-19' of https://gitlab.com/thuth/qemu:
tests/functional/test_migration: Use "ncat" instead of "nc" in the exec test
tests/functional/test_x86_64_kvm_xen: Remove avocado tags
docs/devel/testing/functional: Add a section about logging
docs/system/arm: Use "functional tests" instead of "integration tests"
docs/system: Use the meson binary from the pyvenv
tests/functional: remove all class level fields
tests/functional/test_arm_orangepi: rename test class to 'OrangePiMachine'
hw/virtio: Also include md stubs in case CONFIG_VIRTIO_PCI is not set
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
These definitions were taken from skiboot firmware. I naively thought it
would be nicer to keep the code similar by using the preprocessor, but
it was pointed out that system headers might still use those symbols and
cause something unexpected. Also just nicer to keep the QEMU tree clean.
Cc: "Philippe Mathieu-Daudé" <philmd@linaro.org>
Cc: "Stefan Hajnoczi" <stefanha@gmail.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Fixes: 70bc5c2498f46 ("ppc/pnv: Make HOMER memory a RAM region")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Rather than use the hardcoded define throughout the tree for the
PNOR LPC address, keep it within the PnvPnor object.
This should solve a dead code issue in the BMC HIOMAP checks where
Coverity (correctly) reported that the sanity checks are dead code.
We would like to keep the sanity checks without turning them into a
compile time assert in case we would like to make them configurable
in future.
Fixes: 4c84a0a4a6e5 ("ppc/pnv: Add a PNOR address and size sanity checks")
Resolves: Coverity CID 1593723
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Coverity reports a possible memory overflow in spapr_dt_pa_features().
This should not be a true bug since DAWR1 cap is only be true for
CPU_POWERPC_LOGICAL_3_10. Add an assertion to ensure any bug there is
caught.
Resolves: Coverity CID 1593722
Fixes: 5f361ea187ba ("ppc: spapr: Enable 2nd DAWR on Power10 pSeries machine")
Reviewed-By: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
The comparison as written is always false (perhaps confusingly, because
the functions/macros are not really booleans but return 0 or the tested
bit value). Change to use logical-and.
Resolves: Coverity CID 1593721
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Coverity discovered a potential shift overflow in group size calculation
in the case of a guest error. Add checks and logs to ensure a issues are
caught.
Make the group and crowd error checking code more similar to one another
while here.
Resolves: Coverity CID 1593724
Fixes: 9cb7f6ebed60 ("ppc/xive2: Support group-matching when looking for target")
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
I introduced this bug when "tidying" the original patch, not Frederic.
Paper bag for me.
Fixes: 9cb7f6ebed60 ("ppc/xive2: Support group-matching when looking for target")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Currently we require everywhere that wants to know if there
is an HPET device to check for "CONFIG_HPET || CONFIG_X_HPET_RUST".
Factor out whether the HPET device is Rust or C into a separate
Kconfig stanza, so that CONFIG_HPET means "there is an HPET",
and whether this has pulled in CONFIG_X_HPET_RUST or CONFIG_HPET_C
is something the rest of QEMU can ignore.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Link: https://lore.kernel.org/r/20250319193110.1565578-3-peter.maydell@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently every board that uses the PL011 duplicates the logic that
selects the Rust implementation if Rust was enabled and the C
implementation if it does not. Factor this out into a separate
Kconfig stanza, so that boards can go back to simply doing "select
PL011" and get whichever implementation is correct for the build.
This fixes a compilation failure if CONFIG_VMAPPLE is enabled
in a Rust build, because hw/vmapple/Kconfig didn't have the
"pick the Rust PL011 if Rust is enabled" logic in it.
Fixes: 59f4d65584bd33 ("hw/vmapple/vmapple: Add vmapple machine type")
Reported-by: Tanish Desai <tanishdesai37@gmail.com>
Analyzed-by: Tanish Desai <tanishdesai37@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/20250319193110.1565578-2-peter.maydell@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The Error ** argument must be NULL, &error_abort, &error_fatal, or a
pointer to a variable containing NULL. Passing an argument of the
latter kind twice without clearing it in between is wrong: if the
first call sets an error, it no longer points to NULL for the second
call.
virt_cpu_irq_init() is wrong that way: it passes &err to
hotplug_handler_plug() twice. If both calls failed, this could trip
error_setv()'s assertion. Moreover, if just one fails, the Error
object leaks. Fortunately, these calls can't actually fail.
Messed up in commit 50ebc3fc47f7 (hw/intc/loongarch_ipi: Notify ipi
object when cpu is plugged) and commit 087a23a87c57
(hw/intc/loongarch_extioi: Use cpu plug notification).
Clean this up by passing &error_abort instead.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20250320032158.1762751-7-maobibo@loongson.cn>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
When there is an error, it is put into a local variable and then
propagated to somewhere else. Instead the error can be set right
away, error propagation can be removed.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Message-ID: <20250320032158.1762751-5-maobibo@loongson.cn>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
This change takes the CPUPPCState 'quiesced' field added for powernv
hardware CPU core controls (used to stop and start cores), and extends
it to spapr to model the "RTAS stopped" state. This prevents the
schedulers attempting to run stopped CPUs unexpectedly, which can cause
hangs and possibly other unexpected behaviour.
The detail of the problematic situation is this:
A KVM spapr guest boots with all secondary CPUs defined to be in the
"RTAS stopped" state. In this state, the CPU is only responsive to the
start-cpu RTAS call. This behaviour is modeled in QEMU with the
start_powered_off feature, which sets ->halted on secondary CPUs at
boot. ->halted=true looks like an idle / sleep / power-save state which
typically is responsive to asynchronous interrupts, but spapr clears
wake-on-interrupt bits in the LPCR SPR. This more-or-less works.
Commit e8291ec16da8 ("target/ppc: fix timebase register reset state")
recently caused the decrementer to expire sooner at boot, causing a
decrementer exception on secondary CPUs in RTAS stopped state. This
was not a problem on TCG, but KVM limits how a guest can modify LPCR, in
particular it prevents the clearing of wake-on-interrupt bits, and so in
the course of CPU register synchronisation, the LPCR as set by spapr to
model the RTAS stopped state is overwritten with KVM's LPCR value, and
that then causes QEMU's interrupt code to notice the expired decrementer
exception, turn that into an interrupt, and set CPU_INTERRUPT_HARD.
That causes the CPU to be kicked, and the KVM vCPU thread to loop
calling kvm_cpu_exec(). kvm_cpu_exec() calls
kvm_arch_process_async_events(), which on ppc just returns ->halted.
This is still true, so it returns immediately with EXCP_HLT, and the
vCPU never goes to sleep because qemu_wait_io_event() sees
CPU_INTERRUPT_HARD is set. All this while the vCPU holds the bql. This
causes the boot CPU to eventually lock up when it needs the bql.
So make 'quiesced' represent the "RTAS stopped" state, and have it
explicitly not respond to exceptions (interrupt conditions) rather than
rely on machine register state to model that state. This matches the
powernv quiesced state very well because it essentially turns off the
CPU core via a side-band control unit.
There are still issues with QEMU and KVM idea of LPCR diverging and that
is quite ugly and fragile that should be fixed. spapr should synchronize
its LPCR properly with KVM, and not try to use values that KVM does not
support.
Reported-by: Misbah Anjum N <misanjum@linux.ibm.com>
Tested-by: Misbah Anjum N <misanjum@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
xen_bus_realize() reports a failure to set up a watch as error, but it
doesn't treat it as one: it simply continues. Report a warning
instead.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20250314143500.2449658-3-armbru@redhat.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
The Error ** argument must be NULL, &error_abort, &error_fatal, or a
pointer to a variable containing NULL. Passing an argument of the
latter kind twice without clearing it in between is wrong: if the
first call sets an error, it no longer points to NULL for the second
call.
xen_bus_realize() is wrong that way: it passes &local_err to
xs_node_watch() in a loop. If this fails in more than one iteration,
it can trip error_setv()'s assertion.
Fix by clearing @local_err.
Fixes: c4583c8c394e (xen-bus: reduce scope of backend watch)
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20250314143500.2449658-2-armbru@redhat.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
For the s390x target, it's possible to build the QEMU binary without
CONFIG_VIRTIO_PCI and only have the virtio-mem device via the ccw
transport. In that case, QEMU currently fails to link correctly:
/usr/bin/ld: libqemu-s390x-softmmu.a.p/hw_s390x_s390-virtio-ccw.c.o: in function `s390_machine_device_pre_plug':
../hw/s390x/s390-virtio-ccw.c:579:(.text+0x1e96): undefined reference to `virtio_md_pci_pre_plug'
/usr/bin/ld: libqemu-s390x-softmmu.a.p/hw_s390x_s390-virtio-ccw.c.o: in function `s390_machine_device_plug':
../hw/s390x/s390-virtio-ccw.c:608:(.text+0x21a4): undefined reference to `virtio_md_pci_plug'
/usr/bin/ld: libqemu-s390x-softmmu.a.p/hw_s390x_s390-virtio-ccw.c.o: in function `s390_machine_device_unplug_request':
../hw/s390x/s390-virtio-ccw.c:622:(.text+0x2334): undefined reference to `virtio_md_pci_unplug_request'
/usr/bin/ld: libqemu-s390x-softmmu.a.p/hw_s390x_s390-virtio-ccw.c.o: in function `s390_machine_device_unplug':
../hw/s390x/s390-virtio-ccw.c:633:(.text+0x2436): undefined reference to `virtio_md_pci_unplug'
clang: error: linker command failed with exit code 1 (use -v to see invocation)
We also need to include the stubs when CONFIG_VIRTIO_PCI is missing.
Fixes: aa910c20ec5 ("s390x: virtio-mem support")
Message-ID: <20250313063522.1348288-1-thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
The PPN field in a non-leaf PDT entry is positioned differently from that
in a leaf PDT entry. The original implementation incorrectly used the leaf
entry's PPN mask to extract the PPN from a non-leaf entry, leading to an
erroneous page table walk.
This commit introduces new macros to properly define the fields for
non-leaf PDT entries and corrects the page table walk.
Signed-off-by: Jason Chien <jason.chien@sifive.com>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Message-ID: <20250301173751.9446-1-jason.chien@sifive.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
1 << i is casted to uint64_t while bitwise and with val.
So this value may become 0xffffffff80000000 but only
31th "start" bit is required.
Use the bitfield extract() API instead.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Tigran Sogomonian <tsogomonian@astralinux.ru>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Link: https://lore.kernel.org/r/20241227104618.2526-1-tsogomonian@astralinux.ru
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The guest does not control whether characters are sent on the UART.
Sending them before the guest happens to boot will now result in a
"guest error" log entry that is only because of timing, even if the
guest _would_ later setup the receiver correctly.
This reverts the bulk of commit abf2b6a028670bd2890bb3aee7e103fe53e4b0df,
and instead adds a comment about why we don't check the enable bits.
Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20250311153717.206129-1-pbonzini@redhat.com
[PMM: expanded comment]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
During normal migration, new QEMU creates and initializes memory regions,
then loads the preserved contents of the region from vmstate.
During CPR, memory regions are preserved in place, then the realize
method initializes the regions contents, losing the old contents. To
fix, skip writes to the qxl memory regions during CPR load.
Reported-by: andrey.drobyshev@virtuozzo.com
Tested-by: andrey.drobyshev@virtuozzo.com
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Message-ID: <1741380954-341079-5-git-send-email-steven.sistare@oracle.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
During normal migration, new QEMU creates and initializes memory regions,
then loads the preserved contents of the region from vmstate.
During CPR, memory regions are preserved in place, then the realize
method initializes the regions contents, losing the old contents. To
fix, skip the re-init during CPR.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Message-ID: <1741380954-341079-4-git-send-email-steven.sistare@oracle.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
During normal migration, new QEMU creates and initializes memory regions,
then loads the preserved contents of the region from vmstate.
During CPR, memory regions are preserved in place, then the realize
method initializes the regions contents, losing the old contents. To
fix, skip the re-init during CPR.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Message-ID: <1741380954-341079-3-git-send-email-steven.sistare@oracle.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
The functions arm_current_el() and arm_el_is_aa64() are used only in
target/arm and in hw/intc/arm_gicv3_cpuif.c. They're functions that
query internal state of the CPU. Move them out of cpu.h and into
internals.h.
This means we need to include internals.h in arm_gicv3_cpuif.c, but
this is justifiable because that file is implementing the GICv3 CPU
interface, which really is part of the CPU proper; we just ended up
implementing it in code in hw/intc/ for historical reasons.
The motivation for this move is that we'd like to change
arm_el_is_aa64() to add a condition that uses cpu_isar_feature();
but we don't want to include cpu-features.h in cpu.h.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Peter Krempa and Kevin Wolf observed that iothread-vq-mapping is
confusing to use because the control and event virtqueues have a fixed
location before the command virtqueues but need to be treated
differently.
Only expose the command virtqueues via iothread-vq-mapping so that the
command-line parameter is intuitive: it controls where SCSI requests are
processed.
The control virtqueue needs to be hardcoded to the main loop thread for
technical reasons anyway. Kevin also pointed out that it's better to
place the event virtqueue in the main loop thread since its no poll
behavior would prevent polling if assigned to an IOThread.
This change is its own commit to avoid squashing the previous commit.
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Suggested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20250311132616.1049687-14-stefanha@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Previously the ctrl virtqueue was handled in the AioContext where SCSI
requests are processed. When IOThread Virtqueue Mapping was added things
become more complicated because SCSI requests could run in other
AioContexts.
Simplify by handling the ctrl virtqueue in the main loop where reset
operations can be performed. Note that BHs are still used canceling SCSI
requests in their AioContexts but at least the mean loop activity
doesn't need BHs anymore.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20250311132616.1049687-13-stefanha@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Allow virtio-scsi virtqueues to be assigned to different IOThreads. This
makes it possible to take advantage of host multi-queue block layer
scalability by assigning virtqueues that have affinity with vCPUs to
different IOThreads that have affinity with host CPUs. The same feature
was introduced for virtio-blk in the past:
https://developers.redhat.com/articles/2024/09/05/scaling-virtio-blk-disk-io-iothread-virtqueue-mapping
Here are fio randread 4k iodepth=64 results from a 4 vCPU guest with an
Intel P4800X SSD:
iothreads IOPS
------------------------------
1 189576
2 312698
4 346744
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20250311132616.1049687-12-stefanha@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
[kwolf: Updated 051 output, virtio-scsi can now use any iothread]
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The code that builds an array of AioContext pointers indexed by the
virtqueue is not specific to virtio-blk. virtio-scsi will need to do the
same thing, so extract the functions.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250311132616.1049687-11-stefanha@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Use noun_verb() function naming instead of verb_noun() because the
former is the most common naming style for APIs. The next commit will
move these functions into a header file so that virtio-scsi can call
them.
Shorten iothread_vq_mapping_apply()'s iothread_vq_mapping_list argument
to just "list" like in the other functions.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250311132616.1049687-10-stefanha@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This is the cleanup function that must be called after
apply_iothread_vq_mapping() succeeds. virtio-scsi will need this
function too, so extract it.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250311132616.1049687-9-stefanha@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
With IOThread Virtqueue Mapping there will be multiple AioContexts
processing SCSI requests. scsi_req_cancel() and other SCSI request
operations must be performed from the AioContext where the request is
running.
Introduce a virtio_scsi_defer_tmf_to_aio_context() function and the
necessary VirtIOSCSIReq->remaining refcount infrastructure to move the
TMF code into the AioContext where the request is running.
For the time being there is still just one AioContext: the main loop or
the IOThread. When the iothread-vq-mapping parameter is added in a later
patch this will be changed to per-virtqueue AioContexts.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250311132616.1049687-8-stefanha@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The block layer can invoke the resize callback from any AioContext that
is processing requests. The virtqueue is already protected but the
events_dropped field also needs to be protected against races. Cover it
using the event virtqueue lock because it is closely associated with
accesses to the virtqueue.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250311132616.1049687-7-stefanha@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Virtqueues are not thread-safe. Until now this was not a major issue
since all virtqueue processing happened in the same thread. The ctrl
queue's Task Management Function (TMF) requests sometimes need the main
loop, so a BH was used to schedule the virtqueue completion back in the
thread that has virtqueue access.
When IOThread Virtqueue Mapping is introduced in later commits, event
and ctrl virtqueue accesses from other threads will become necessary.
Introduce an optional per-virtqueue lock so the event and ctrl
virtqueues can be protected in the commits that follow.
The addition of the ctrl virtqueue lock makes
virtio_scsi_complete_req_from_main_loop() and its BH unnecessary.
Instead, take the ctrl virtqueue lock from the main loop thread.
The cmd virtqueue does not have a lock because the entirety of SCSI
command processing happens in one thread. Only one thread accesses the
cmd virtqueue and a lock is unnecessary.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250311132616.1049687-6-stefanha@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
SCSIDevice keeps track of in-flight requests for device reset and Task
Management Functions (TMFs). The request list requires protection so
that multi-threaded SCSI emulation can be implemented in commits that
follow.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250311132616.1049687-5-stefanha@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Until now, a SCSIDevice's I/O requests have run in a single AioContext.
In order to support multiple IOThreads it will be necessary to move to
the concept of a per-SCSIRequest AioContext.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250311132616.1049687-4-stefanha@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In the past a single AioContext was used for block I/O and it was
fetched using blk_get_aio_context(). Nowadays the block layer supports
running I/O from any AioContext and multiple AioContexts at the same
time. Remove the dma_blk_io() AioContext argument and use the current
AioContext instead.
This makes calling the function easier and enables multiple IOThreads to
use dma_blk_io() concurrently for the same block device.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250311132616.1049687-3-stefanha@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Commit 71544d30a6f8 ("scsi: push request restart to SCSIDevice") removed
the only user of SCSIDiskState->bh.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250311132616.1049687-2-stefanha@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* Fixed endianness of VFIO device state packets
* Improved IGD passthrough support with legacy mode
* Improved build
* Added support for old AMD GPUs (x550)
* Updated property documentation
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmfQfQcACgkQUaNDx8/7
7KEUNw/+PjFpHrz5muQ8itkbyd36eJJdcxCl+9IPIWfnUfB582epkLcgvWyswGUo
krFTregoRG0PKtgZDtv95owGtVJOgK6XYFadGHiYkvvsb41twOYsP7/SuI+KMiEv
IDFLMvCTyorSIIoEF8i2EexfGPRV1VoWwvBoHgRRmYlzwzXnufjABpoZ0a25DTye
DQ4yhSfqoIh1gOcdL9tPictnZg9OxKr2ePXNdrtymtEIhg3ZobD3Jd8J4WCcsfKT
fxxBO5NsGgA8oM7i02fYN9kgMwqTnVhSAu1wq9PXsbrnNXam+trywAWSO6CjL+rV
++STWNSrRoHzuotRBr7BzrTpTFyQyfwBWqUT5L4NlhgXB3Xybk+M6Zj08Yva8pjE
w78JQKvKp54gU34AWBW0/J6+u3v+iE8l1Eywx6xueF9Q+YSUDeW9B1LDdjFJryhF
d8j3J+vuglbdsp05D+tVErf5cqFvFDfrjTkXkZNtmx7wky45XS9ZvNazYW1KI3f9
bg8Wjb7ZujuvxpSjycPRZzdKa8kqSgSZg7fg91Wimiy1Iqe3SZVVWNchLYiPp8Dm
nXMfOEpVHQZ1vzeo7dVWyxu9Y1ujgvUQy8kMa9q2W2S7HQ5Sna79n7eMVJxqZQ4G
m0ETFToOcPPOnZBWgqNOSUlSQncFuIVgNTDvycQ9dMhGorYcBDI=
=Vh0m
-----END PGP SIGNATURE-----
Merge tag 'pull-vfio-20250311' of https://github.com/legoater/qemu into staging
vfio queue:
* Fixed endianness of VFIO device state packets
* Improved IGD passthrough support with legacy mode
* Improved build
* Added support for old AMD GPUs (x550)
* Updated property documentation
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmfQfQcACgkQUaNDx8/7
# 7KEUNw/+PjFpHrz5muQ8itkbyd36eJJdcxCl+9IPIWfnUfB582epkLcgvWyswGUo
# krFTregoRG0PKtgZDtv95owGtVJOgK6XYFadGHiYkvvsb41twOYsP7/SuI+KMiEv
# IDFLMvCTyorSIIoEF8i2EexfGPRV1VoWwvBoHgRRmYlzwzXnufjABpoZ0a25DTye
# DQ4yhSfqoIh1gOcdL9tPictnZg9OxKr2ePXNdrtymtEIhg3ZobD3Jd8J4WCcsfKT
# fxxBO5NsGgA8oM7i02fYN9kgMwqTnVhSAu1wq9PXsbrnNXam+trywAWSO6CjL+rV
# ++STWNSrRoHzuotRBr7BzrTpTFyQyfwBWqUT5L4NlhgXB3Xybk+M6Zj08Yva8pjE
# w78JQKvKp54gU34AWBW0/J6+u3v+iE8l1Eywx6xueF9Q+YSUDeW9B1LDdjFJryhF
# d8j3J+vuglbdsp05D+tVErf5cqFvFDfrjTkXkZNtmx7wky45XS9ZvNazYW1KI3f9
# bg8Wjb7ZujuvxpSjycPRZzdKa8kqSgSZg7fg91Wimiy1Iqe3SZVVWNchLYiPp8Dm
# nXMfOEpVHQZ1vzeo7dVWyxu9Y1ujgvUQy8kMa9q2W2S7HQ5Sna79n7eMVJxqZQ4G
# m0ETFToOcPPOnZBWgqNOSUlSQncFuIVgNTDvycQ9dMhGorYcBDI=
# =Vh0m
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 12 Mar 2025 02:12:23 HKT
# gpg: using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1
# gpg: Good signature from "Cédric Le Goater <clg@redhat.com>" [full]
# gpg: aka "Cédric Le Goater <clg@kaod.org>" [full]
# Primary key fingerprint: A0F6 6548 F048 95EB FE6B 0B60 51A3 43C7 CFFB ECA1
* tag 'pull-vfio-20250311' of https://github.com/legoater/qemu: (21 commits)
vfio/pci: Drop debug commentary from x-device-dirty-page-tracking
vfio/pci-quirks: Exclude non-ioport BAR from ATI quirk
hw/vfio: Compile display.c once
hw/vfio: Compile iommufd.c once
hw/vfio: Compile more objects once
hw/vfio: Compile some common objects once
hw/vfio/common: Get target page size using runtime helpers
hw/vfio/common: Include missing 'system/tcg.h' header
hw/vfio/spapr: Do not include <linux/kvm.h>
system: Declare qemu_[min/max]rampagesize() in 'system/hostmem.h'
vfio/migration: Use BE byte order for device state wire packets
vfio/igd: Fix broken KVMGT OpRegion support
vfio/igd: Introduce x-igd-lpc option for LPC bridge ID quirk
vfio/igd: Handle x-igd-opregion option in config quirk
vfio/igd: Decouple common quirks from legacy mode
vfio/igd: Refactor vfio_probe_igd_bar4_quirk into pci config quirk
vfio/pci: Add placeholder for device-specific config space quirks
vfio/igd: Move LPC bridge initialization to a separate function
vfio/igd: Consolidate OpRegion initialization into a single function
vfio/igd: Do not include GTT stolen size in etc/igd-bdsm-size
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>