qcow2_refresh_limits() assumes that s->crypto is non-NULL whenever
bs->encrypted is true. This is actually not the case: qcow2_do_open()
allows to open an image with a missing crypto header for BDRV_O_NO_IO,
and then bs->encrypted is true, but s->crypto is still NULL.
It doesn't make sense to open an invalid image, so remove the exception
for BDRV_O_NO_IO. This catches the problem early and any code that makes
the same assumption is safe now.
At the same time, in the name of defensive programming, we shouldn't
make the assumption in the first place. Let qcow2_refresh_limits() check
s->crypto rather than bs->encrypted. If s->crypto is NULL, it also can't
make any requirement on request alignment.
Finally, start a qcow2-encryption test case that only serves as a
regression test for this crash for now.
Reported-by: Leonid Reviakin <L.reviakin@fobos-nt.ru>
Reported-by: Denis Rastyogin <gerben@altlinux.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250318201143.70657-1-kwolf@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
For block drivers that don't advertise FUA support, we already call
bdrv_co_flush(), which considers BDRV_O_NO_FLUSH. However, drivers that
do support FUA still see the FUA flag with BDRV_O_NO_FLUSH and get the
associated performance penalty that cache.no-flush=on was supposed to
avoid.
Clear FUA for write requests if BDRV_O_NO_FLUSH is set.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250307221634.71951-3-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Until now, FUA was always emulated with a separate flush after the write
for file-posix. The overhead of processing a second request can reduce
performance significantly for a guest disk that has disabled the write
cache, especially if the host disk is already write through, too, and
the flush isn't actually doing anything.
Advertise support for REQ_FUA in write requests and implement it for
Linux AIO and io_uring using the RWF_DSYNC flag for write requests. The
thread pool still performs a separate fdatasync() call. This can be
improved later by using the pwritev2() syscall if available.
As an example, this is how fio numbers can be improved in some scenarios
with this patch (all using virtio-blk with cache=directsync on an nvme
block device for the VM, fio with ioengine=libaio,direct=1,sync=1):
| old | with FUA support
------------------------------+---------------+-------------------
bs=4k, iodepth=1, numjobs=1 | 45.6k iops | 56.1k iops
bs=4k, iodepth=1, numjobs=16 | 183.3k iops | 236.0k iops
bs=4k, iodepth=16, numjobs=1 | 258.4k iops | 311.1k iops
However, not all scenarios are clear wins. On another slower disk I saw
little to no improvment. In fact, in two corner case scenarios, I even
observed a regression, which I however consider acceptable:
1. On slow host disks in a write through cache mode, when the guest is
using virtio-blk in a separate iothread so that polling can be
enabled, and each completion is quickly followed up with a new
request (so that polling gets it), it can happen that enabling FUA
makes things slower - the additional very fast no-op flush we used to
have gave the adaptive polling algorithm a success so that it kept
polling. Without it, we only have the slow write request, which
disables polling. This is a problem in the polling algorithm that
will be fixed later in this series.
2. With a high queue depth, it can be beneficial to have flush requests
for another reason: The optimisation in bdrv_co_flush() that flushes
only once per write generation acts as a synchronisation mechanism
that lets all requests complete at the same time. This can result in
better batching and if the disk is very fast (I only saw this with a
null_blk backend), this can make up for the overhead of the flush and
improve throughput. In theory, we could optionally introduce a
similar artificial latency in the normal completion path to achieve
the same kind of completion batching. This is not implemented in this
series.
Compatibility is not a concern for the kernel side of io_uring, it has
supported RWF_DSYNC from the start. However, io_uring_prep_writev2() is
not available before liburing 2.2.
Linux AIO started supporting it in Linux 4.13 and libaio 0.3.111. The
kernel is not a problem for any supported build platform, so it's not
necessary to add runtime checks. However, openSUSE is still stuck with
an older libaio version that would break the build.
We must detect the presence of the writev2 functions in the user space
libraries at build time to avoid build failures.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250307221634.71951-2-kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Block drivers assume in their .bdrv_open() implementation that their
state in bs->opaque has been zeroed; it is initially allocated with
g_malloc0() in bdrv_open_driver().
bdrv_snapshot_goto() needs to make sure that it is zeroed again before
calling drv->bdrv_open() to avoid that block drivers use stale values.
One symptom of this bug is VMDK running into a double free when the user
tries to apply an internal snapshot like 'qemu-img snapshot -a test
test.vmdk'. This should be a graceful error because VMDK doesn't support
internal snapshots.
==25507== Invalid free() / delete / delete[] / realloc()
==25507== at 0x484B347: realloc (vg_replace_malloc.c:1801)
==25507== by 0x54B592A: g_realloc (gmem.c:171)
==25507== by 0x1B221D: vmdk_add_extent (../block/vmdk.c:570)
==25507== by 0x1B1084: vmdk_open_sparse (../block/vmdk.c:1059)
==25507== by 0x1AF3D8: vmdk_open (../block/vmdk.c:1371)
==25507== by 0x1A2AE0: bdrv_snapshot_goto (../block/snapshot.c:299)
==25507== by 0x205C77: img_snapshot (../qemu-img.c:3500)
==25507== by 0x58FA087: (below main) (libc_start_call_main.h:58)
==25507== Address 0x832f3e0 is 0 bytes inside a block of size 272 free'd
==25507== at 0x4846B83: free (vg_replace_malloc.c:989)
==25507== by 0x54AEAC4: g_free (gmem.c:208)
==25507== by 0x1AF629: vmdk_close (../block/vmdk.c:2889)
==25507== by 0x1A2A9C: bdrv_snapshot_goto (../block/snapshot.c:290)
==25507== by 0x205C77: img_snapshot (../qemu-img.c:3500)
==25507== by 0x58FA087: (below main) (libc_start_call_main.h:58)
This error was discovered by fuzzing qemu-img.
Cc: qemu-stable@nongnu.org
Closes: https://gitlab.com/qemu-project/qemu/-/issues/2853
Closes: https://gitlab.com/qemu-project/qemu/-/issues/2851
Reported-by: Denis Rastyogin <gerben@altlinux.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250310104858.28221-1-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Commit fc4e394b28 removed the last caller of blk_op_is_blocked(). Remove
the now unused function.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20250206165331.379033-1-kwolf@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
- Merge "qemu/clang-tsa.h" within "qemu/compiler.h"
- Various cleanups around accelerators initialization code
(better user/system split)
- Various trivial cleanups in accel/tcg/,
Guard few TCG calls with tcg_enabled()
- Explicit disassemble_info endianness
- Improve dual-endianness support for MicroBlaze
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmfJw08ACgkQ4+MsLN6t
wN70whAAtfcdWtqseFfb6fvDtjflgxN51Ui0iaOECXUA18USKriGy34eBcMYMiM2
+eKgU7+jI6JGE4+burcgWUsPpFFF951/A8+lyIbFgO5yToTDmC+qNe4XfmMAIyXq
uf9Obr2c0Xk9luh4odb+jPAQodw/7G1fKgcCVIJNDCl/xEcPhS9eNpTaHwcVnkWI
K6KrxWXOsqG6+evJBPWYoXtOOyt0+JcwAsJoGhprwtGm3P9+jSVXsgeGsJVyZcna
f32JtjWL754O8XeMkOn4x6rt58VrCIMKI9xT7keDyuhTCq0Zki9RO2nMU2dSw5mN
AfL9hxqUy0Nijnyslg3ugujDfTePsNyLdwwH7n0mnoD72ELi6WnhDsmOThuEB3Rd
4/kdwTJfA/rlWk/GF1tbKW7AvQZokRARtzmL3V0HmGJu57lX+2JuszEdYBkqDEP7
GH1I10B2yANUm+C9y3X8qWOU7Ws433ebJeJoZuyfnbZ9Me+UfRmql/oS+V8ata2i
fArEItpldUFrWRyYLkTbXrh2dgyV9yJTEir/lzOzeAZZzyabTbjf2z9qnh976GGO
1QnDy5QA4f54kDBUZe7JK26TZsHPch7cgqXW6f8tRlJF7A9hxGK8d2TUV/lC3/vx
LUOlWNu03PhiruYmZEcWOsY3Jt9jRCF6lIryrnaJsqnVOVmMUMM=
=3TRh
-----END PGP SIGNATURE-----
Merge tag 'accel-cpus-20250306' of https://github.com/philmd/qemu into staging
Generic CPUs / accelerators patch queue
- Merge "qemu/clang-tsa.h" within "qemu/compiler.h"
- Various cleanups around accelerators initialization code
(better user/system split)
- Various trivial cleanups in accel/tcg/,
Guard few TCG calls with tcg_enabled()
- Explicit disassemble_info endianness
- Improve dual-endianness support for MicroBlaze
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmfJw08ACgkQ4+MsLN6t
# wN70whAAtfcdWtqseFfb6fvDtjflgxN51Ui0iaOECXUA18USKriGy34eBcMYMiM2
# +eKgU7+jI6JGE4+burcgWUsPpFFF951/A8+lyIbFgO5yToTDmC+qNe4XfmMAIyXq
# uf9Obr2c0Xk9luh4odb+jPAQodw/7G1fKgcCVIJNDCl/xEcPhS9eNpTaHwcVnkWI
# K6KrxWXOsqG6+evJBPWYoXtOOyt0+JcwAsJoGhprwtGm3P9+jSVXsgeGsJVyZcna
# f32JtjWL754O8XeMkOn4x6rt58VrCIMKI9xT7keDyuhTCq0Zki9RO2nMU2dSw5mN
# AfL9hxqUy0Nijnyslg3ugujDfTePsNyLdwwH7n0mnoD72ELi6WnhDsmOThuEB3Rd
# 4/kdwTJfA/rlWk/GF1tbKW7AvQZokRARtzmL3V0HmGJu57lX+2JuszEdYBkqDEP7
# GH1I10B2yANUm+C9y3X8qWOU7Ws433ebJeJoZuyfnbZ9Me+UfRmql/oS+V8ata2i
# fArEItpldUFrWRyYLkTbXrh2dgyV9yJTEir/lzOzeAZZzyabTbjf2z9qnh976GGO
# 1QnDy5QA4f54kDBUZe7JK26TZsHPch7cgqXW6f8tRlJF7A9hxGK8d2TUV/lC3/vx
# LUOlWNu03PhiruYmZEcWOsY3Jt9jRCF6lIryrnaJsqnVOVmMUMM=
# =3TRh
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 06 Mar 2025 23:46:23 HKT
# gpg: using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE
# gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [full]
# Primary key fingerprint: FAAB E75E 1291 7221 DCFD 6BB2 E3E3 2C2C DEAD C0DE
* tag 'accel-cpus-20250306' of https://github.com/philmd/qemu: (54 commits)
include: Poison TARGET_PHYS_ADDR_SPACE_BITS definition
system: Open-code qemu_init_arch_modules() using target_name()
target/i386: Mark WHPX APIC region as little-endian
target/alpha: Do not mix exception flags and FPCR bits
target/riscv: Convert misa_mxl_max using GLib macros
target/riscv: Declare RISCVCPUClass::misa_mxl_max as RISCVMXL
target/xtensa: Finalize config in xtensa_register_core()
target/sparc: Constify SPARCCPUClass::cpu_def
target/i386: Constify X86CPUModel uses
disas: Remove target_words_bigendian() call in initialize_debug_target()
target/xtensa: Set disassemble_info::endian value in disas_set_info()
target/sh4: Set disassemble_info::endian value in disas_set_info()
target/riscv: Set disassemble_info::endian value in disas_set_info()
target/ppc: Set disassemble_info::endian value in disas_set_info()
target/mips: Set disassemble_info::endian value in disas_set_info()
target/microblaze: Set disassemble_info::endian value in disas_set_info
target/arm: Set disassemble_info::endian value in disas_set_info()
target: Set disassemble_info::endian value for big-endian targets
target: Set disassemble_info::endian value for little-endian targets
target/mips: Fix possible MSA int overflow
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
We already have "qemu/compiler.h" for compiler-specific arrangements,
automatically included by "qemu/osdep.h" for each source file. No
need to explicitly include a header for a Clang particularity.
Suggested-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20250117170201.91182-1-philmd@linaro.org>
This error was discovered by fuzzing qemu-img.
In the QED block driver, the need_check_timer timer is freed in
bdrv_qed_detach_aio_context, but the pointer to the timer is not
set to NULL. This can lead to a use-after-free scenario
in bdrv_qed_drain_begin().
The need_check_timer pointer is set to NULL after freeing the timer.
Which helps catch this condition when checking in bdrv_qed_drain_begin().
Closes: https://gitlab.com/qemu-project/qemu/-/issues/2852
Signed-off-by: Denis Rastyogin <gerben@altlinux.org>
Message-ID: <20250304083927.37681-1-gerben@altlinux.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Although defaulting the handshake limit to 10 seconds was a nice QoI
change to weed out intentionally slow clients, it can interfere with
integration testing done with manual NBD_OPT commands over 'nbdsh
--opt-mode'. Expose a QMP knob 'handshake-max-secs' to allow the user
to alter the timeout away from the default.
The parameter name here intentionally matches the spelling of the
constant added in commit fb1c2aaa98, and not the command-line spelling
added in the previous patch for qemu-nbd; that's because in QMP,
longer names serve as good self-documentation, and unlike the command
line, machines don't have problems generating longer spellings.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20250203222722.650694-6-eblake@redhat.com>
[eblake: s/max-secs/max-seconds/ in QMP]
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Commit 7452162adec25c10 introduced 'qom-path' argument to BLOCK_IO_ERROR
event but when the event is instantiated in 'send_qmp_error_event()' the
arguments for 'device' and 'qom_path' in
qapi_event_send_block_io_error() were reversed :
Generated code for sending event:
void qapi_event_send_block_io_error(const char *qom_path,
const char *device,
const char *node_name,
IoOperationType operation,
[...]
Call inside send_qmp_error_event():
qapi_event_send_block_io_error(blk_name(blk),
blk_get_attached_dev_path(blk),
bs ? bdrv_get_node_name(bs) : NULL, optype,
[...]
This results into reporting the QOM path as the device alias and vice
versa which in turn breaks libvirt, which expects the device alias being
either a valid alias or empty (which would make libvirt do the lookup by
node-name instead).
Cc: qemu-stable@nongnu.org
Fixes: 7452162adec2 ("qapi: add qom-path to BLOCK_IO_ERROR event")
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Message-ID: <09728d784888b38d7a8f09ee5e9e9c542c875e1e.1737973614.git.pkrempa@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 107c551de0d7bc3aa8e926c557b66b9549616f42)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
ASAN detected a leak when running the ahci-test
/ahci/io/dma/lba28/retry:
Direct leak of 35 byte(s) in 1 object(s) allocated from:
#0 in malloc
#1 in __vasprintf_internal
#2 in vasprintf
#3 in g_vasprintf
#4 in g_strdup_vprintf
#5 in g_strdup_printf
#6 in object_get_canonical_path ../qom/object.c:2096:19
#7 in blk_get_attached_dev_id_or_path ../block/block-backend.c:1033:12
#8 in blk_get_attached_dev_path ../block/block-backend.c:1047:12
#9 in send_qmp_error_event ../block/block-backend.c:2140:36
#10 in blk_error_action ../block/block-backend.c:2172:9
#11 in ide_handle_rw_error ../hw/ide/core.c:875:5
#12 in ide_dma_cb ../hw/ide/core.c:894:13
#13 in dma_complete ../system/dma-helpers.c:107:9
#14 in dma_blk_cb ../system/dma-helpers.c:129:9
#15 in blk_aio_complete ../block/block-backend.c:1552:9
#16 in blk_aio_write_entry ../block/block-backend.c:1619:5
#17 in coroutine_trampoline ../util/coroutine-ucontext.c:175:9
Plug the leak by freeing the device path string.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20241111145214.8261-1-farosas@suse.de>
[PMD: Use g_autofree]
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20241111170333.43833-3-philmd@linaro.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 23ea425c14d3b89a002e0127b17456eee3102ab7)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
The general expectation is that header files should follow the same
file/path naming scheme as the corresponding source file. There are
various historical exceptions to this practice in QEMU, with one of
the most notable being the include/qapi/qmp/ directory. Most of the
headers there correspond to source files in qobject/.
This patch corrects most of that inconsistency by creating
include/qobject/ and moving the headers for qobject/ there.
This also fixes MAINTAINERS for include/qapi/qmp/dispatch.h:
scripts/get_maintainer.pl now reports "QAPI" instead of "No
maintainers found".
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Acked-by: Halil Pasic <pasic@linux.ibm.com> #s390x
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20241118151235.2665921-2-armbru@redhat.com>
[Rebased]
BLOCK_OP_TYPE_DATAPLANE prevents BlockDriverState from being used by
virtio-blk/virtio-scsi with IOThread. Commit b112a65c52aa ("block:
declare blockjobs and dataplane friends!") eliminated the main reason
for this blocker in 2014.
Nowadays the block layer supports I/O from multiple AioContexts, so
there is even less reason to block IOThread users. Any legitimate
reasons related to interference would probably also apply to
non-IOThread users.
The only remaining users are bdrv_op_unblock(BLOCK_OP_TYPE_DATAPLANE)
calls after bdrv_op_block_all(). If we remove BLOCK_OP_TYPE_DATAPLANE
their behavior doesn't change.
Existing bdrv_op_block_all() callers that don't explicitly unblock
BLOCK_OP_TYPE_DATAPLANE seem to do so simply because no one bothered to
rather than because it is necessary to keep BLOCK_OP_TYPE_DATAPLANE
blocked.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20250203182529.269066-1-stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add an option in BlockExportOptions to allow creating an export on an
inactive node without activating the node. This mode needs to be
explicitly supported by the export type (so that it doesn't perform any
operations that are forbidden for inactive nodes), so this patch alone
doesn't allow this option to be successfully used yet.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20250204211407.381505-13-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Currently, block exports can't handle inactive images correctly.
Incoming write requests would run into assertion failures. Make sure
that we return an error when creating an export can't activate the
image.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20250204211407.381505-11-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Device models have a relatively complex way to set up their block
backends, in which blk_attach_dev() sets blk->disable_perm = true.
We want to support inactive images in exports, too, so that
qemu-storage-daemon can be used with migration. Because they don't use
blk_attach_dev(), they need another way to set this flag. The most
convenient is to do this automatically when an inactive node is attached
to a BlockBackend that can be inactivated.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20250204211407.381505-10-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In order for block_resize to fail gracefully on an inactive node instead
of crashing with an assertion failure in bdrv_co_write_req_prepare()
(called from bdrv_co_truncate()), we need to check for inactive nodes
also when they are attached as a root node and make sure that
BLK_PERM_RESIZE isn't among the permissions allowed for inactive nodes.
To this effect, don't enumerate the permissions that are incompatible
with inactive nodes any more, but allow only BLK_PERM_CONSISTENT_READ
for them.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20250204211407.381505-7-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This allows querying from QMP (and also HMP) whether an image is
currently active or inactive (in the sense of BDRV_O_INACTIVE).
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20250204211407.381505-2-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Commit 7452162adec25c10 introduced 'qom-path' argument to BLOCK_IO_ERROR
event but when the event is instantiated in 'send_qmp_error_event()' the
arguments for 'device' and 'qom_path' in
qapi_event_send_block_io_error() were reversed :
Generated code for sending event:
void qapi_event_send_block_io_error(const char *qom_path,
const char *device,
const char *node_name,
IoOperationType operation,
[...]
Call inside send_qmp_error_event():
qapi_event_send_block_io_error(blk_name(blk),
blk_get_attached_dev_path(blk),
bs ? bdrv_get_node_name(bs) : NULL, optype,
[...]
This results into reporting the QOM path as the device alias and vice
versa which in turn breaks libvirt, which expects the device alias being
either a valid alias or empty (which would make libvirt do the lookup by
node-name instead).
Cc: qemu-stable@nongnu.org
Fixes: 7452162adec2 ("qapi: add qom-path to BLOCK_IO_ERROR event")
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Message-ID: <09728d784888b38d7a8f09ee5e9e9c542c875e1e.1737973614.git.pkrempa@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
ASAN detected a leak when running the ahci-test
/ahci/io/dma/lba28/retry:
Direct leak of 35 byte(s) in 1 object(s) allocated from:
#0 in malloc
#1 in __vasprintf_internal
#2 in vasprintf
#3 in g_vasprintf
#4 in g_strdup_vprintf
#5 in g_strdup_printf
#6 in object_get_canonical_path ../qom/object.c:2096:19
#7 in blk_get_attached_dev_id_or_path ../block/block-backend.c:1033:12
#8 in blk_get_attached_dev_path ../block/block-backend.c:1047:12
#9 in send_qmp_error_event ../block/block-backend.c:2140:36
#10 in blk_error_action ../block/block-backend.c:2172:9
#11 in ide_handle_rw_error ../hw/ide/core.c:875:5
#12 in ide_dma_cb ../hw/ide/core.c:894:13
#13 in dma_complete ../system/dma-helpers.c:107:9
#14 in dma_blk_cb ../system/dma-helpers.c:129:9
#15 in blk_aio_complete ../block/block-backend.c:1552:9
#16 in blk_aio_write_entry ../block/block-backend.c:1619:5
#17 in coroutine_trampoline ../util/coroutine-ucontext.c:175:9
Plug the leak by freeing the device path string.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20241111145214.8261-1-farosas@suse.de>
[PMD: Use g_autofree]
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20241111170333.43833-3-philmd@linaro.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Expose the method docstring in the header, and mention
returned value must be free'd by caller.
Reported-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20241111170333.43833-2-philmd@linaro.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It was found that 'qemu-nbd' is not able to work with some disk images
exported from Azure. Looking at the 512b footer (which contains VPC
metadata):
00000000 63 6f 6e 65 63 74 69 78 00 00 00 02 00 01 00 00 |conectix........|
00000010 ff ff ff ff ff ff ff ff 2e c7 9b 96 77 61 00 00 |............wa..|
00000020 00 07 00 00 57 69 32 6b 00 00 00 01 40 00 00 00 |....Wi2k....@...|
00000030 00 00 00 01 40 00 00 00 28 a2 10 3f 00 00 00 02 |....@...(..?....|
00000040 ff ff e7 47 8c 54 df 94 bd 35 71 4c 94 5f e5 44 |...G.T...5qL._.D|
00000050 44 53 92 1a 00 00 00 00 00 00 00 00 00 00 00 00 |DS..............|
00000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
we can see that Azure uses a different 'Creator application' --
'wa\0\0' (offset 0x1c, likely reads as 'Windows Azure') and QEMU uses this
field to determine how it can get image size. Apparently, Azure uses 'new'
method, just like Hyper-V.
Overall, it seems that only VPC and old QEMUs need to be ignored as all new
creator apps seem to have reliable current_size. Invert the logic and make
'current_size' method the default to avoid adding every new creator app to
the list.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-ID: <20241212134504.1983757-3-vkuznets@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In preparation to making changes to the logic deciding whether CHS or
'current_size' need to be used in determining the image size, split off
vpc_ignore_current_size() helper.
No functional change intended.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-ID: <20241212134504.1983757-2-vkuznets@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This error was discovered by fuzzing qemu-img.
When ph.ext_off has a sufficiently large value, the operation
le64_to_cpu(ph.ext_off) << BDRV_SECTOR_BITS in
parallels_read_format_extension() can cause an overflow in int64_t.
This overflow triggers the assert(ext_off > 0)
check in block/parallels-ext.c: parallels_read_format_extension(),
leading to a crash.
This commit adds a check to prevent overflow when shifting ph.ext_off
by BDRV_SECTOR_BITS, ensuring that the value remains within a valid range.
Reported-by: Leonid Reviakin <L.reviakin@fobos-nt.ru>
Signed-off-by: Denis Rastyogin <gerben@altlinux.org>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Message-ID: <20241212104212.513947-2-gerben@altlinux.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
create_long_filename() intentionally uses direntry_t->name[8+3] array
as a larger array. This works, but makes static code analysis tools
unhappy. The problem here is that a directory entry holding long file
name is significantly different from regular directory entry, and the
name is split into several parts within the entry, not just in regular
8+3 name field.
Treat the entry as array of bytes instead. This fixes the OOB access
from the compiler/tools PoV, but does not change the resulting code
in any way.
Keep the existing code style.
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
This reverts commit 0cb3ff7c22671aa1e1e227318799ccf6762c3bea.
The original code was right in that long name in LFN directory
entry uses other parts of the entry for the name too, not just
the original "name" field. So it is wrong to limit the offset
to be within the name field. Some other mechanism is needed
to fix the ubsan report and whole messy usage of bytes past the
given field.
Reported-by: Volker Rümelin <vr_qemu@t-online.de>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Found with test sbsaref introduced in [1].
[1] https://patchew.org/QEMU/20241203213629.2482806-1-pierrick.bouvier@linaro.org/
../block/vvfat.c:433:24: runtime error: index 14 out of bounds for type 'uint8_t [11]'
#0 0x56151a66b93a in create_long_filename ../block/vvfat.c:433
#1 0x56151a66f3d7 in create_short_and_long_name ../block/vvfat.c:725
#2 0x56151a670403 in read_directory ../block/vvfat.c:804
#3 0x56151a674432 in init_directories ../block/vvfat.c:964
#4 0x56151a67867b in vvfat_open ../block/vvfat.c:1258
#5 0x56151a3b8e19 in bdrv_open_driver ../block.c:1660
#6 0x56151a3bb666 in bdrv_open_common ../block.c:1985
#7 0x56151a3cadb9 in bdrv_open_inherit ../block.c:4153
#8 0x56151a3c8850 in bdrv_open_child_bs ../block.c:3731
#9 0x56151a3ca832 in bdrv_open_inherit ../block.c:4098
#10 0x56151a3cbe40 in bdrv_open ../block.c:4248
#11 0x56151a46344f in blk_new_open ../block/block-backend.c:457
#12 0x56151a388bd9 in blockdev_init ../blockdev.c:612
#13 0x56151a38ab2d in drive_new ../blockdev.c:1006
#14 0x5615190fca41 in drive_init_func ../system/vl.c:649
#15 0x56151aa796dd in qemu_opts_foreach ../util/qemu-option.c:1135
#16 0x5615190fd2b6 in configure_blockdev ../system/vl.c:708
#17 0x56151910a307 in qemu_create_early_backends ../system/vl.c:2004
#18 0x561519113fcf in qemu_init ../system/vl.c:3685
#19 0x56151a7e438e in main ../system/main.c:47
#20 0x7f72d1a46249 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
#21 0x7f72d1a46304 in __libc_start_main_impl ../csu/libc-start.c:360
#22 0x561517e98510 in _start (/home/user/.work/qemu/build/qemu-system-aarch64+0x3b9b510)
The offset used can easily go beyond entry->name size. It's probably a
bug, but I don't have the time to dive into vfat specifics for now.
This change solves the ubsan issue, and is functionally equivalent, as
anything written past the entry->name array would not be read anyway.
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
The next commit will remove "qemu/clang-tsa.h" of "exec/exec-all.h",
however the following files indirectly include it:
$ git grep -L qemu/clang-tsa.h $(git grep -wl TSA_NO_TSA)
block/create.c
include/block/block_int-common.h
tests/unit/test-bdrv-drain.c
tests/unit/test-block-iothread.c
util/qemu-thread-posix.c
Explicitly include it so we can process with the removal in the
next commit.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20241212185341.2857-4-philmd@linaro.org>
Headers in include/sysemu/ are not only related to system
*emulation*, they are also used by virtualization. Rename
as system/ which is clearer.
Files renamed manually then mechanical change using sed tool.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Lei Yang <leiyang@redhat.com>
Message-Id: <20241203172445.28576-1-philmd@linaro.org>
The libssh does not handle non-blocking mode in SFTP correctly. The
driver code already changes the mode to blocking for the SFTP
initialization, but for some reason changes to non-blocking mode.
This used to work accidentally until libssh in 0.11 branch merged
the patch to avoid infinite looping in case of network errors:
https://gitlab.com/libssh/libssh-mirror/-/merge_requests/498
Since then, the ssh driver in qemu fails to read files over SFTP
as the first SFTP messages exchanged after switching the session
to non-blocking mode return SSH_AGAIN, but that message is lost
int the SFTP internals and interpretted as SSH_ERROR, which is
returned to the caller:
https://gitlab.com/libssh/libssh-mirror/-/issues/280
This is indeed an issue in libssh that we should address in the
long term, but it will require more work on the internals. For
now, the SFTP is not supported in non-blocking mode.
Fixes: https://gitlab.com/libssh/libssh-mirror/-/issues/280
Signed-off-by: Jakub Jelen <jjelen@redhat.com>
Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Message-ID: <20241113125526.2495731-1-rjones@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The sum "cluster_index + count" may overflow uint32_t.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Dmitry Frolov <frolov@swemel.ru>
Message-ID: <20241106080521.219255-2-frolov@swemel.ru>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
s->offset and s->size are only set at the end of the function and still
contain the old values when formatting the error message. Print the
parameters with the new values that we actually checked instead.
Fixes: 500e2434207d ('raw-format: Split raw_read_options()')
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20240829185527.47152-1-kwolf@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We need something more reliable than "device" (which absent in modern
interfaces) and "node-name" (which may absent, and actually don't
specify the device, which is a source of error) to make a per-device
throttling for the event in the following commit.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20241002151806.592469-2-vsementsov@yandex-team.ru>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Make the VDI SECTOR_SIZE define be a 64-bit constant; this matches
how we define BDRV_SECTOR_SIZE. The benefit is that it means that we
don't need to carefully cast to 64-bits when doing operations like
"n_sectors * SECTOR_SIZE" to avoid doing a 32x32->32 multiply, which
might overflow, and which Coverity and other static analysers tend to
warn about.
The specific potential overflow Coverity is highlighting is the one
at the end of vdi_co_pwritev() where we write out n_sectors sectors
to the block map. This is very unlikely to actually overflow, since
the block map has 4 bytes per block and the maximum number of blocks
in the image must fit into a 32-bit integer. So this commit is not
fixing a real-world bug.
An inspection of all the places currently using SECTOR_SIZE in the
file shows none which care about the change in its type, except for
one call to error_setg() which needs the format string adjusting.
Resolves: Coverity CID 1508076
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20241008164708.2966400-5-peter.maydell@linaro.org>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In compare_fingerprint() we effectively check whether the characters
in the fingerprint are valid hex digits twice: first we do so with
qemu_isxdigit(), but then the hex2decimal() function also has a code
path where it effectively detects an invalid digit and returns -1.
This causes Coverity to complain because it thinks that we might use
that -1 value in an expression where it would be an integer overflow.
Avoid the double-check of hex digit validity by testing the return
values from hex2decimal() rather than doing separate calls to
qemu_isxdigit().
Since this means we now use the illegal-character return value
from hex2decimal(), rewrite it from "-1" to "UINT_MAX", which
has the same effect since the return type is "unsigned" but
looks less confusing at the callsites when we detect it with
"c0 > 0xf".
Resolves: Coverity CID 1547813
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20241008164708.2966400-3-peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In the loop in qemu_gluster_parse_json() we do:
char *str = NULL;
for(...) {
str = g_strdup_printf(...);
...
if (various errors) {
goto out;
}
...
g_free(str);
str = NULL;
}
return 0;
out:
various cleanups;
g_free(str);
...
return -errno;
Coverity correctly complains that the assignment "str = NULL" at the
end of the loop is unnecessary, because we will either go back to the
top of the loop and overwrite it, or else we will exit the loop and
then exit the function without ever reading str again. The assignment
is there as defensive coding to ensure that str is only non-NULL if
it's a live allocation, so this is intentional.
We can make Coverity happier and simplify the code here by using
g_autofree, since we never need 'str' outside the loop.
Resolves: Coverity CID 1527385
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20241008164708.2966400-2-peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Parameter @id is no longer used, drop. Return a bool to indicate
success / failure, as recommended by qapi/error.h.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20241010150144.986655-4-armbru@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
According to https://marc.info/?l=fedora-devel-list&m=171934833215726
the GlusterFS development effectively ended. Thus mark it as deprecated
in QEMU, so we can remove it in a future release if the project does
not gain momentum again.
Acked-by: Niels de Vos <ndevos@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20241002082033.129022-1-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
blk_by_public last use was removed in 2017 by
c61791fc23 ("block: add aio_context field in ThrottleGroupMember")
blk_activate last use was removed earlier this year by
eef0bae3a7 ("migration: Remove block migration")
blk_add_insert_bs_notifier, blk_op_block_all, blk_op_unblock_all
last uses were removed in 2016 by
ef8875b549 ("virtio-scsi: Remove op blocker for dataplane")
blk_iostatus_disable last use was removed in 2016 by
66a0fae438 ("blockjob: Don't touch BDS iostatus")
Remove them.
Signed-off-by: Dr. David Alan Gilbert <dave@treblig.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
../block/block-copy.c:591:12: error: ‘ret’ may be used uninitialized [-Werror=maybe-uninitialized]
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
../block/stream.c:193:19: error: ‘unfiltered_bs’ may be used uninitialized [-Werror=maybe-uninitialized]
../block/stream.c:176:5: error: ‘len’ may be used uninitialized [-Werror=maybe-uninitialized]
trace/trace-block.h:906:9: error: ‘ret’ may be used uninitialized [-Werror=maybe-uninitialized]
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Acked-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
../block/mirror.c:404:5: error: ‘ret’ may be used uninitialized [-Werror=maybe-uninitialized]
../block/mirror.c:895:12: error: ‘ret’ may be used uninitialized [-Werror=maybe-uninitialized]
../block/mirror.c:578:12: error: ‘ret’ may be used uninitialized [-Werror=maybe-uninitialized]
Change a variable to int, as suggested by Manos: "bdrv_co_preadv()
which is int and is passed as an int argument to mirror_read_complete()"
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
../block/mirror.c:1066:22: error: ‘iostatus’ may be used uninitialized [-Werror=maybe-uninitialized]
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>