* scsi-disk: add new quirks bitmap to SCSIDiskState
Since the MacOS SCSI implementation is quite old (and Apple added some firmware
customisations to their drives for m68k Macs) there is need to add a mechanism
to correctly handle Apple-specific quirks.
Add a new quirks bitmap to SCSIDiskState that can be used to enable these
features as required.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20220622105314.802852-2-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* scsi-disk: add MODE_PAGE_APPLE_VENDOR quirk for Macintosh
One of the mechanisms MacOS uses to identify CDROM drives compatible with MacOS
is to send a custom MODE SELECT command for page 0x30 to the drive. The
response to this is a hard-coded manufacturer string which must match in order
for the CDROM to be usable within MacOS.
Add an implementation of the MODE SELECT page 0x30 response guarded by a newly
defined SCSI_DISK_QUIRK_MODE_PAGE_APPLE_VENDOR quirk bit so that CDROM drives
attached to non-Apple machines function exactly as before.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20220622105314.802852-3-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* q800: implement compat_props to enable quirk_mode_page_apple_vendor for scsi-cd devices
By default quirk_mode_page_apple_vendor should be enabled for all scsi-cd devices
connected to the q800 machine to enable MacOS to detect and use them.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20220622105314.802852-4-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* scsi-disk: add SCSI_DISK_QUIRK_MODE_SENSE_ROM_USE_DBD quirk for Macintosh
During SCSI bus enumeration A/UX sends a MODE SENSE command to the CDROM with
the DBD bit unset and expects the response to include a block descriptor. As per
the latest SCSI documentation, QEMU currently force-disables the block
descriptor for CDROM devices but the A/UX driver expects the requested block
descriptor to be returned.
If the block descriptor is not returned in the response then A/UX becomes
confused, since the block descriptor returned in the MODE SENSE response is
used to generate a subsequent MODE SELECT command which is then invalid.
Add a new SCSI_DISK_QUIRK_MODE_SENSE_ROM_USE_DBD quirk to allow this behaviour
to be enabled as required. Note that an additional workaround is required for
the previous SCSI_DISK_QUIRK_MODE_PAGE_APPLE_VENDOR quirk which must never
return a block descriptor even though the DBD bit is left unset.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20220622105314.802852-5-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* q800: implement compat_props to enable quirk_mode_sense_rom_use_dbd for scsi-cd devices
By default quirk_mode_sense_rom_use_dbd should be enabled for all scsi-cd devices
connected to the q800 machine to correctly report the CDROM block descriptor back
to A/UX.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20220622105314.802852-6-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* scsi-disk: add SCSI_DISK_QUIRK_MODE_PAGE_VENDOR_SPECIFIC_APPLE quirk for Macintosh
Both MacOS and A/UX make use of vendor-specific MODE SELECT commands with PF=0
to identify SCSI devices:
- MacOS sends a MODE SELECT command with PF=0 for the MODE_PAGE_VENDOR_SPECIFIC
(0x0) mode page containing 2 bytes before initialising a disk
- A/UX (installed on disk) sends a MODE SELECT command with PF=0 during SCSI
bus enumeration, and gets stuck in an infinite loop if it fails
Add a new SCSI_DISK_QUIRK_MODE_PAGE_VENDOR_SPECIFIC_APPLE quirk to allow both
PF=0 MODE SELECT commands and implement a MODE_PAGE_VENDOR_SPECIFIC (0x0)
mode page which is compatible with MacOS.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20220622105314.802852-7-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* q800: implement compat_props to enable quirk_mode_page_vendor_specific_apple for scsi devices
By default quirk_mode_page_vendor_specific_apple should be enabled for both scsi-hd
and scsi-cd devices to allow MacOS to format SCSI disk devices, and A/UX to
enumerate SCSI CDROM devices succesfully without getting stuck in a loop.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20220622105314.802852-8-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* scsi-disk: add FORMAT UNIT command
When initialising a drive ready to install MacOS, Apple HD SC Setup first attempts
to format the drive. Add a simple FORMAT UNIT command which simply returns success
to allow the format to succeed.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220622105314.802852-9-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* scsi-disk: add SCSI_DISK_QUIRK_MODE_PAGE_TRUNCATED quirk for Macintosh
When A/UX configures the CDROM device it sends a truncated MODE SELECT request
for page 1 (MODE_PAGE_R_W_ERROR) which is only 6 bytes in length rather than
10. This seems to be due to bug in Apple's code which calculates the CDB message
length incorrectly.
The work at [1] suggests that this truncated request is accepted on real
hardware whereas in QEMU it generates an INVALID_PARAM_LEN sense code which
causes A/UX to get stuck in a loop retrying the command in an attempt to succeed.
Alter the mode page request length check so that truncated requests are allowed
if the SCSI_DISK_QUIRK_MODE_PAGE_TRUNCATED quirk is enabled, whilst also adding a
trace event to enable the condition to be detected.
[1] https://68kmla.org/bb/index.php?threads/scsi2sd-project-anyone-interested.29040/page-7#post-316444
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20220622105314.802852-10-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* q800: implement compat_props to enable quirk_mode_page_truncated for scsi-cd devices
By default quirk_mode_page_truncated should be enabled for all scsi-cd devices
connected to the q800 machine to allow A/UX to enumerate SCSI CDROM devices
without hanging.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20220622105314.802852-11-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* scsi-disk: allow the MODE_PAGE_R_W_ERROR AWRE bit to be changeable for CDROM drives
A/UX sends a MODE_PAGE_R_W_ERROR command with the AWRE bit set to 0 when enumerating
CDROM drives. Since the bit is currently hardcoded to 1 then indicate that the AWRE
bit can be changed (even though we don't care about the value) so that
the MODE_PAGE_R_W_ERROR page can be set successfully.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20220622105314.802852-12-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* scsi-disk: allow MODE SELECT block descriptor to set the block size
The MODE SELECT command can contain an optional block descriptor that can be used
to set the device block size. If the block descriptor is present then update the
block size on the SCSI device accordingly.
This allows CDROMs to be used with A/UX which requires a CDROM drive which is
capable of switching from a 2048 byte sector size to a 512 byte sector size.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20220622105314.802852-13-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* q800: add default vendor and product information for scsi-hd devices
The Apple HD SC Setup program uses a SCSI INQUIRY command to check that any SCSI
hard disks detected match a whitelist of vendors and products before allowing
the "Initialise" button to prepare an empty disk.
Add known-good default vendor and product information using the existing
compat_prop mechanism so the user doesn't have to use long command lines to set
the qdev properties manually.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20220622105314.802852-14-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* q800: add default vendor and product information for scsi-cd devices
The MacOS CDROM driver uses a SCSI INQUIRY command to check that any SCSI CDROMs
detected match a whitelist of vendors and products before adding them to the
list of available devices.
Add known-good default vendor and product information using the existing
compat_prop mechanism so the user doesn't have to use long command lines to set
the qdev properties manually.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20220622105314.802852-15-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* pc-bios/s390-ccw: add -Wno-array-bounds
The option generates a lot of warnings for integers casted to pointers,
for example:
/home/pbonzini/work/upstream/qemu/pc-bios/s390-ccw/dasd-ipl.c:174:19: warning: array subscript 0 is outside array bounds of ‘CcwSeekData[0]’ [-Warray-bounds]
174 | seekData->cyl = 0x00;
| ~~~~~~~~~~~~~~^~~~~~
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* aspeed: sbc: Allow per-machine settings
In order to correctly report secure boot running firmware the values
of certain registers must be set.
We don't yet have documentation from ASPEED on what they mean. The
meaning is inferred from u-boot's use of them.
Introduce properties so the settings can be configured per-machine.
Reviewed-by: Peter Delevoryas <pdel@fb.com>
Tested-by: Peter Delevoryas <pdel@fb.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Message-Id: <20220628154740.1117349-4-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
* hw/i2c/pmbus: Add idle state to return 0xff's
Signed-off-by: Peter Delevoryas <pdel@fb.com>
Reviewed-by: Titus Rwantare <titusr@google.com>
Message-Id: <20220701000626.77395-2-me@pjd.dev>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
* hw/sensor: Add IC_DEVICE_ID to ISL voltage regulators
This commit adds a passthrough for PMBUS_IC_DEVICE_ID to allow Renesas
voltage regulators to return the integrated circuit device ID if they
would like to.
The behavior is very device specific, so it hasn't been added to the
general PMBUS model. Additionally, if the device ID hasn't been set,
then the voltage regulator will respond with the error byte value. The
guest error message will change slightly for IC_DEVICE_ID with this
commit.
Signed-off-by: Peter Delevoryas <pdel@fb.com>
Reviewed-by: Titus Rwantare <titusr@google.com>
Message-Id: <20220701000626.77395-3-me@pjd.dev>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
* hw/sensor: Add Renesas ISL69259 device model
This adds the ISL69259, using all the same functionality as the existing
ISL69260 but overriding the IC_DEVICE_ID.
Signed-off-by: Peter Delevoryas <pdel@fb.com>
Reviewed-by: Titus Rwantare <titusr@google.com>
Message-Id: <20220701000626.77395-4-me@pjd.dev>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
* aspeed: Create SRAM name from first CPU index
To support multiple SoC's running simultaneously, we need a unique name for
each RAM region. DRAM is created by the machine, but SRAM is created by the
SoC, since in hardware it is part of the SoC's internals.
We need a way to uniquely identify each SRAM region though, for VM
migration. Since each of the SoC's CPU's has an index which identifies it
uniquely from other CPU's in the machine, we can use the index of any of the
CPU's in the SoC to uniquely identify differentiate the SRAM name from other
SoC SRAM's. In this change, I just elected to use the index of the first CPU
in each SoC.
Signed-off-by: Peter Delevoryas <peter@pjd.dev>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20220705191400.41632-3-peter@pjd.dev>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
* aspeed: Refactor UART init for multi-SoC machines
This change moves the code that connects the SoC UART's to serial_hd's
to the machine.
It makes each UART a proper child member of the SoC, and then allows the
machine to selectively initialize the chardev for each UART with a
serial_hd.
This should preserve backwards compatibility, but also allow multi-SoC
boards to completely change the wiring of serial devices from the
command line to specific SoC UART's.
This also removes the uart-default property from the SoC, since the SoC
doesn't need to know what UART is the "default" on the machine anymore.
I tested this using the images and commands from the previous
refactoring, and another test image for the ast1030:
wget https://github.com/facebook/openbmc/releases/download/v2021.49.0/fuji.mtd
wget https://github.com/facebook/openbmc/releases/download/v2021.49.0/wedge100.mtd
wget https://github.com/peterdelevoryas/OpenBIC/releases/download/oby35-cl-2022.13.01/Y35BCL.elf
Fuji uses UART1:
qemu-system-arm -machine fuji-bmc \
-drive file=fuji.mtd,format=raw,if=mtd \
-nographic
ast2600-evb uses uart-default=UART5:
qemu-system-arm -machine ast2600-evb \
-drive file=fuji.mtd,format=raw,if=mtd \
-serial null -serial mon:stdio -display none
Wedge100 uses UART3:
qemu-system-arm -machine palmetto-bmc \
-drive file=wedge100.mtd,format=raw,if=mtd \
-serial null -serial null -serial null \
-serial mon:stdio -display none
AST1030 EVB uses UART5:
qemu-system-arm -machine ast1030-evb \
-kernel Y35BCL.elf -nographic
Fixes: 6827ff20b2975 ("hw: aspeed: Init all UART's with serial devices")
Signed-off-by: Peter Delevoryas <peter@pjd.dev>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20220705191400.41632-4-peter@pjd.dev>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
* aspeed: Make aspeed_board_init_flashes public
Signed-off-by: Peter Delevoryas <peter@pjd.dev>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20220705191400.41632-5-peter@pjd.dev>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
* aspeed: Add fby35 skeleton
Signed-off-by: Peter Delevoryas <peter@pjd.dev>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20220705191400.41632-6-peter@pjd.dev>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
* aspeed: Add AST2600 (BMC) to fby35
You can test booting the BMC with both '-device loader' and '-drive
file'. This is necessary because of how the fb-openbmc boot sequence
works (jump to 0x20000000 after U-Boot SPL).
wget https://github.com/facebook/openbmc/releases/download/openbmc-e2294ff5d31d/fby35.mtd
qemu-system-arm -machine fby35 -nographic \
-device loader,file=fby35.mtd,addr=0,cpu-num=0 -drive file=fby35.mtd,format=raw,if=mtd
Signed-off-by: Peter Delevoryas <peter@pjd.dev>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20220705191400.41632-7-peter@pjd.dev>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
* aspeed: fby35: Add a bootrom for the BMC
The BMC boots from the first flash device by fetching instructions
from the flash contents. Add an alias region on 0x0 for this
purpose. There are currently performance issues with this method (TBs
being flushed too often), so as a faster alternative, install the
flash contents as a ROM in the BMC memory space.
See commit 1a15311a12fa ("hw/arm/aspeed: add a 'execute-in-place'
property to boot directly from CE0")
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Peter Delevoryas <peter@pjd.dev>
[ clg: blk_pread() fixes ]
Message-Id: <20220705191400.41632-8-peter@pjd.dev>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
* aspeed: Add AST1030 (BIC) to fby35
With the BIC, the easiest way to run everything is to create two pty's
for each SoC and reserve stdin/stdout for the monitor:
wget https://github.com/facebook/openbmc/releases/download/openbmc-e2294ff5d31d/fby35.mtd
wget https://github.com/peterdelevoryas/OpenBIC/releases/download/oby35-cl-2022.13.01/Y35BCL.elf
qemu-system-arm -machine fby35 \
-drive file=fby35.mtd,format=raw,if=mtd \
-device loader,file=fby35.mtd,addr=0,cpu-num=0 \
-serial pty -serial pty -serial mon:stdio -display none -S
screen /dev/ttys0
screen /dev/ttys1
(qemu) c
This commit only adds the the first server board's Bridge IC, but in the
future we'll try to include the other three server board Bridge IC's
too.
Signed-off-by: Peter Delevoryas <peter@pjd.dev>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20220705191400.41632-9-peter@pjd.dev>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
* docs: aspeed: Add fby35 multi-SoC machine section
Signed-off-by: Peter Delevoryas <peter@pjd.dev>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
[ clg: - fixed URL links
- Moved Facebook Yosemite section at the end of the file ]
Message-Id: <20220705191400.41632-10-peter@pjd.dev>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
* docs: aspeed: Minor updates
Some more controllers have been modeled recently. Reflect that in the
list of supported devices. New machines were also added.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Peter Delevoryas <peter@pjd.dev>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Message-Id: <20220706172131.809255-1-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
* test/avocado/machine_aspeed.py: Add SDK tests
The Aspeed SDK kernel usually includes support for the lastest HW
features. This is interesting to exercise QEMU and discover the gaps
in the models.
Add extra I2C tests for the AST2600 EVB machine to check the new
register interface.
Message-Id: <20220707091239.1029561-1-clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
* hw: m25p80: Add Block Protect and Top Bottom bits for write protect
Signed-off-by: Iris Chen <irischenlj@fb.com>
Reviewed-by: Francisco Iglesias <frasse.iglesias@gmail.com>
Message-Id: <20220708164552.3462620-1-irischenlj@fb.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
* hw: m25p80: add tests for BP and TB bit write protect
Signed-off-by: Iris Chen <irischenlj@fb.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20220627185234.1911337-3-irischenlj@fb.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
* qtest/aspeed_gpio: Add input pin modification test
Verify the current behavior, which is that input pins can be modified by
guest OS register writes.
Signed-off-by: Peter Delevoryas <peter@pjd.dev>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20220712023219.41065-2-peter@pjd.dev>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
* hw/gpio/aspeed: Don't let guests modify input pins
Up until now, guests could modify input pins by overwriting the data
value register. The guest OS should only be allowed to modify output pin
values, and the QOM property setter should only be permitted to modify
input pins.
This change also updates the gpio input pin test to match this
expectation.
Andrew suggested this particularly refactoring here:
https://lore.kernel.org/qemu-devel/23523aa1-ba81-412b-92cc-8174faba3612@www.fastmail.com/
Suggested-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Peter Delevoryas <peter@pjd.dev>
Fixes: 4b7f956862dc ("hw/gpio: Add basic Aspeed GPIO model for AST2400 and AST2500")
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20220712023219.41065-3-peter@pjd.dev>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
* aspeed: Add fby35-bmc slot GPIO's
Signed-off-by: Peter Delevoryas <peter@pjd.dev>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20220712023219.41065-4-peter@pjd.dev>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
* hw/nvme: Implement shadow doorbell buffer support
Implement Doorbel Buffer Config command (Section 5.7 in NVMe Spec 1.3)
and Shadow Doorbel buffer & EventIdx buffer handling logic (Section 7.13
in NVMe Spec 1.3). For queues created before the Doorbell Buffer Config
command, the nvme_dbbuf_config function tries to associate each existing
SQ and CQ with its Shadow Doorbel buffer and EventIdx buffer address.
Queues created after the Doorbell Buffer Config command will have the
doorbell buffers associated with them when they are initialized.
In nvme_process_sq and nvme_post_cqe, proactively check for Shadow
Doorbell buffer changes instead of wait for doorbell register changes.
This reduces the number of MMIOs.
In nvme_process_db(), update the shadow doorbell buffer value with
the doorbell register value if it is the admin queue. This is a hack
since hosts like Linux NVMe driver and SPDK do not use shadow
doorbell buffer for the admin queue. Copying the doorbell register
value to the shadow doorbell buffer allows us to support these hosts
as well as spec-compliant hosts that use shadow doorbell buffer for
the admin queue.
Signed-off-by: Jinhao Fan <fanjinhao21s@ict.ac.cn>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
[k.jensen: rebased]
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
* hw/nvme: Add trace events for shadow doorbell buffer
When shadow doorbell buffer is enabled, doorbell registers are lazily
updated. The actual queue head and tail pointers are stored in Shadow
Doorbell buffers.
Add trace events for updates on the Shadow Doorbell buffers and EventIdx
buffers. Also add trace event for the Doorbell Buffer Config command.
Signed-off-by: Jinhao Fan <fanjinhao21s@ict.ac.cn>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
[k.jensen: rebased]
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
* hw/nvme: fix example serial in documentation
The serial prop on the controller is actually describing the nvme
subsystem serial, which has to be identical for all controllers within
the same nvme subsystem.
This is enforced since commit a859eb9f8f64 ("hw/nvme: enforce common
serial per subsystem").
Fix the documentation, so that people copying the qemu command line
example won't get an error on qemu start.
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
* hw/nvme: force nvme-ns param 'shared' to false if no nvme-subsys node
Since commit 916b0f0b5264 ("hw/nvme: change nvme-ns 'shared' default")
the default value of nvme-ns param 'shared' is set to true, regardless
if there is a nvme-subsys node or not.
On a system without a nvme-subsys node, a namespace will never be able
to be attached to more than one controller, so for this configuration,
it is counterintuitive for this parameter to be set by default.
Force the nvme-ns param 'shared' to false for configurations where
there is no nvme-subsys node, as the namespace will never be able to
attach to more than one controller anyway.
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
* nvme: Fix misleading macro when mixed with ternary operator
Using the Parfait source code analyser and issue was found in
hw/nvme/ctrl.c where the macros NVME_CAP_SET_CMBS and NVME_CAP_SET_PMRS
are called with a ternary operatore in the second parameter, resulting
in a potentially unexpected expansion of the form:
x ? a: b & FLAG_TEST
which will result in a different result to:
(x ? a: b) & FLAG_TEST.
The macros should wrap each of the parameters in brackets to ensure the
correct result on expansion.
Signed-off-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
* hw/nvme: Use ioeventfd to handle doorbell updates
Add property "ioeventfd" which is enabled by default. When this is
enabled, updates on the doorbell registers will cause KVM to signal
an event to the QEMU main loop to handle the doorbell updates.
Therefore, instead of letting the vcpu thread run both guest VM and
IO emulation, we now use the main loop thread to do IO emulation and
thus the vcpu thread has more cycles for the guest VM.
Since ioeventfd does not tell us the exact value that is written, it is
only useful when shadow doorbell buffer is enabled, where we check
for the value in the shadow doorbell buffer when we get the doorbell
update event.
IOPS comparison on Linux 5.19-rc2: (Unit: KIOPS)
qd 1 4 16 64
qemu 35 121 176 153
ioeventfd 41 133 258 313
Changes since v3:
- Do not deregister ioeventfd when it was not enabled on a SQ/CQ
Signed-off-by: Jinhao Fan <fanjinhao21s@ict.ac.cn>
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
* MAINTAINERS: Add myself as Guest Agent co-maintainer
Signed-off-by: Konstantin Kostiuk <kkostiuk@redhat.com>
Acked-by: Michael Roth <michael.roth@amd.com>
* hw/intc/armv7m_nvic: ICPRn must not unpend an IRQ that is being held high
In the M-profile Arm ARM, rule R_CVJS defines when an interrupt should
be set to the Pending state:
A) when the input line is high and the interrupt is not Active
B) when the input line transitions from low to high and the interrupt
is Active
(Note that the first of these is an ongoing condition, and the
second is a point-in-time event.)
This can be rephrased as:
1 when the line goes from low to high, set Pending
2 when Active goes from 1 to 0, if line is high then set Pending
3 ignore attempts to clear Pending when the line is high
and Active is 0
where 1 covers both B and one of the "transition into condition A"
cases, 2 deals with the other "transition into condition A"
possibility, and 3 is "don't drop Pending if we're already in
condition A". Transitions out of condition A don't affect Pending
state.
We handle case 1 in set_irq_level(). For an interrupt (as opposed
to other kinds of exception) the only place where we clear Active
is in armv7m_nvic_complete_irq(), where we handle case 2 by
checking for whether we need to re-pend the exception. For case 3,
the only places where we clear Pending state on an interrupt are in
armv7m_nvic_acknowledge_irq() (where we are setting Active so it
doesn't count) and for writes to NVIC_ICPRn.
It is the "write to NVIC_ICPRn" case that we missed: we must ignore
this if the input line is high and the interrupt is not Active.
(This required behaviour is differently and perhaps more clearly
stated in the v7M Arm ARM, which has pseudocode in section B3.4.1
that implies it.)
Reported-by: Igor Kotrasiński <i.kotrasinsk@samsung.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20220628154724.3297442-1-peter.maydell@linaro.org
* target/arm: Fill in VL for tbflags when SME enabled and SVE disabled
When PSTATE.SM, VL = SVL even if SVE is disabled.
This is visible in kselftest ssve-test.
Reported-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220713045848.217364-2-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* target/arm: Fix aarch64_sve_change_el for SME
We were only checking for SVE disabled and not taking into
account PSTATE.SM to check SME disabled, which resulted in
vectors being incorrectly truncated.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220713045848.217364-3-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* linux-user/aarch64: Do not clear PROT_MTE on mprotect
The documentation for PROT_MTE says that it cannot be cleared
by mprotect. Further, the implementation of the VM_ARCH_CLEAR bit,
contains PROT_BTI confiming that bit should be cleared.
Introduce PAGE_TARGET_STICKY to allow target/arch/cpu.h to control
which bits may be reset during page_set_flags. This is sort of the
opposite of VM_ARCH_CLEAR, but works better with qemu's PAGE_* bits
that are separate from PROT_* bits.
Reported-by: Vitaly Buka <vitalybuka@google.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220711031420.17820-1-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* target/arm: Define and use new regime_tcr_value() function
The regime_tcr() function returns a pointer to a struct TCR
corresponding to the TCR controlling a translation regime. The
struct TCR has the raw value of the register, plus two fields mask
and base_mask which are used as a small optimization in the case of
32-bit short-descriptor lookups. Almost all callers of regime_tcr()
only want the raw register value. Define and use a new
regime_tcr_value() function which returns only the raw 64-bit
register value.
This is a preliminary to removing the 32-bit short descriptor
optimization -- it only saves a handful of bit operations, which is
tiny compared to the overhead of doing a page table walk at all, and
the TCR struct is awkward and makes fixing
https://gitlab.com/qemu-project/qemu/-/issues/1103 unnecessarily
difficult.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220714132303.1287193-2-peter.maydell@linaro.org
* target/arm: Calculate mask/base_mask in get_level1_table_address()
In get_level1_table_address(), instead of using precalculated values
of mask and base_mask from the TCR struct, calculate them directly
(in the same way we currently do in vmsa_ttbcr_raw_write() to
populate the TCR struct fields).
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220714132303.1287193-3-peter.maydell@linaro.org
* target/arm: Fold regime_tcr() and regime_tcr_value() together
The only caller of regime_tcr() is now regime_tcr_value(); fold the
two together, and use the shorter and more natural 'regime_tcr'
name for the new function.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220714132303.1287193-4-peter.maydell@linaro.org
* target/arm: Fix big-endian host handling of VTCR
We have a bug in our handling of accesses to the AArch32 VTCR
register on big-endian hosts: we were not adjusting the part of the
uint64_t field within TCR that the generated code would access. That
can be done with offsetoflow32(), by using an ARM_CP_STATE_BOTH cpreg
struct, or by defining a full set of read/write/reset functions --
the various other TCR cpreg structs used one or another of those
strategies, but for VTCR we did not, so on a big-endian host VTCR
accesses would touch the wrong half of the register.
Use offsetoflow32() in the VTCR register struct. This works even
though the field in the CPU struct is currently a struct TCR, because
the first field in that struct is the uint64_t raw_tcr.
None of the other TCR registers have this bug -- either they are
AArch64 only, or else they define resetfn, writefn, etc, and
expect to be passed the full struct pointer.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220714132303.1287193-5-peter.maydell@linaro.org
* target/arm: Store VTCR_EL2, VSTCR_EL2 registers as uint64_t
Change the representation of the VSTCR_EL2 and VTCR_EL2 registers in
the CPU state struct from struct TCR to uint64_t.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220714132303.1287193-6-peter.maydell@linaro.org
* target/arm: Store TCR_EL* registers as uint64_t
Change the representation of the TCR_EL* registers in the CPU state
struct from struct TCR to uint64_t. This allows us to drop the
custom vmsa_ttbcr_raw_write() function, moving the "enforce RES0"
checks to their more usual location in the writefn
vmsa_ttbcr_write(). We also don't need the resetfn any more.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220714132303.1287193-7-peter.maydell@linaro.org
* target/arm: Honour VTCR_EL2 bits in Secure EL2
In regime_tcr() we return the appropriate TCR register for the
translation regime. For Secure EL2, we return the VSTCR_EL2 value,
but in this translation regime some fields that control behaviour are
in VTCR_EL2. When this code was originally written (as the comment
notes), QEMU didn't care about any of those fields, but we have since
added support for features such as LPA2 which do need the values from
those fields.
Synthesize a TCR value by merging in the relevant VTCR_EL2 fields to
the VSTCR_EL2 value.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1103
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220714132303.1287193-8-peter.maydell@linaro.org
* hw/adc: Fix CONV bit in NPCM7XX ADC CON register
The correct bit for the CONV bit in NPCM7XX ADC is bit 13. This patch
fixes that in the module, and also lower the IRQ when the guest
is done handling an interrupt event from the ADC module.
Signed-off-by: Hao Wu <wuhaotsh@google.com>
Reviewed-by: Patrick Venture<venture@google.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20220714182836.89602-4-wuhaotsh@google.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* hw/adc: Make adci[*] R/W in NPCM7XX ADC
Our sensor test requires both reading and writing from a sensor's
QOM property. So we need to make the input of ADC module R/W instead
of write only for that to work.
Signed-off-by: Hao Wu <wuhaotsh@google.com>
Reviewed-by: Titus Rwantare <titusr@google.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20220714182836.89602-5-wuhaotsh@google.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* target/arm: Don't set syndrome ISS for loads and stores with writeback
The architecture requires that for faults on loads and stores which
do writeback, the syndrome information does not have the ISS
instruction syndrome information (i.e. ISV is 0). We got this wrong
for the load and store instructions covered by disas_ldst_reg_imm9().
Calculate iss_valid correctly so that if the insn is a writeback one
it is false.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1057
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220715123323.1550983-1-peter.maydell@linaro.org
* Align Raspberry Pi DMA interrupts with Linux DTS
There is nothing in the specs on DMA engine interrupt lines: it should have
been in the "BCM2835 ARM Peripherals" datasheet but the appropriate
"ARM peripherals interrupt table" (p.113) is nearly empty.
All Raspberry Pi models 1-3 (based on bcm2835) have
Linux device tree (arch/arm/boot/dts/bcm2835-common.dtsi +25):
/* dma channel 11-14 share one irq */
This information is repeated in the driver code
(drivers/dma/bcm2835-dma.c +1344):
/*
* in case of channel >= 11
* use the 11th interrupt and that is shared
*/
In this patch channels 0--10 and 11--14 are handled separately.
Signed-off-by: Andrey Makarov <andrey.makarov@auriga.com>
Message-id: 20220716113210.349153-1-andrey.makarov@auriga.com
[PMM: fixed checkpatch nits]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* monitor: add support for boolean statistics
The next version of Linux will introduce boolean statistics, which
can only have 0 or 1 values. Support them in the schema and in
the HMP command.
Suggested-by: Amneesh Singh <natto@weirdnatto.in>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* kvm: add support for boolean statistics
The next version of Linux will introduce boolean statistics, which
can only have 0 or 1 values. Convert them to the new QAPI fields
added in the previous commit.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* ppc64: Allocate IRQ lines with qdev_init_gpio_in()
This replaces the IRQ array 'irq_inputs' with GPIO lines, the goal
being to remove 'irq_inputs' when all CPUs have been converted.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Acked-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220705145814.461723-2-clg@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
* ppc/40x: Allocate IRQ lines with qdev_init_gpio_in()
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220705145814.461723-3-clg@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
* ppc/6xx: Allocate IRQ lines with qdev_init_gpio_in()
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Acked-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220705145814.461723-4-clg@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
* ppc/e500: Allocate IRQ lines with qdev_init_gpio_in()
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220705145814.461723-5-clg@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
* ppc: Remove unused irq_inputs
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220705145814.461723-6-clg@kaod.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
* hw/ppc: pass random seed to fdt
If the FDT contains /chosen/rng-seed, then the Linux RNG will use it to
initialize early. Set this using the usual guest random number
generation function. This is confirmed to successfully initialize the
RNG on Linux 5.19-rc6. The rng-seed node is part of the DT spec. Set
this on the paravirt platforms, spapr and e500, just as is done on other
architectures with paravirt hardware.
Cc: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20220712135114.289855-1-Jason@zx2c4.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
* target/ppc/kvm: Skip current and parent directories in kvmppc_find_cpu_dt
Some systems have /proc/device-tree/cpus/../clock-frequency. However,
this is not the expected path for a CPU device tree directory.
Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20220712210810.35514-1-muriloo@linux.ibm.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
* target/ppc: Fix gen_priv_exception error value in mfspr/mtspr
The code in linux-user/ppc/cpu_loop.c expects POWERPC_EXCP_PRIV
exception with error POWERPC_EXCP_PRIV_OPC or POWERPC_EXCP_PRIV_REG,
while POWERPC_EXCP_INVAL_SPR is expected in POWERPC_EXCP_INVAL
exceptions. This mismatch caused an EXCP_DUMP with the message "Unknown
privilege violation (03)", as seen in [1].
[1] https://gitlab.com/qemu-project/qemu/-/issues/588
Fixes: 9b2fadda3e01 ("ppc: Rework generation of priv and inval interrupts")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/588
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Message-Id: <20220627141104.669152-2-matheus.ferst@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
* target/ppc: fix exception error value in slbfee
Testing on a POWER9 DD2.3, we observed that the Linux kernel delivers a
signal with si_code ILL_PRVOPC (5) when a userspace application tries to
use slbfee. To obtain this behavior on linux-user, we should use
POWERPC_EXCP_PRIV with POWERPC_EXCP_PRIV_OPC.
No functional change is intended for softmmu targets as
gen_hvpriv_exception uses the same 'exception' argument
(POWERPC_EXCP_HV_EMU) for raise_exception_*, and the powerpc_excp_*
methods do not use lower bits of the exception error code when handling
POWERPC_EXCP_{INVAL,PRIV}.
Reported-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20220627141104.669152-3-matheus.ferst@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
* target/ppc: remove mfdcrux and mtdcrux
The only PowerPC implementations with these insns were the 460 and 460F,
which had their definitions removed in [1].
[1] 7ff26aa6c657 ("target/ppc: Remove unused PPC 460 and 460F definitions")
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20220627141104.669152-4-matheus.ferst@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
* target/ppc: fix exception error code in helper_{load, store}_dcr
POWERPC_EXCP_INVAL should only be or-ed with other constants prefixed
with POWERPC_EXCP_INVAL_. Also, take the opportunity to move both
helpers under #if !defined(CONFIG_USER_ONLY) as the instructions that
use them are privileged.
No functional change is intended, the lower 4 bits of the error code are
ignored by all powerpc_excp_* methods on POWERPC_EXCP_INVAL exceptions.
Reported-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20220627141104.669152-5-matheus.ferst@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
* target/ppc: fix PMU Group A register read/write exceptions
A call to "gen_(hv)priv_exception" should use POWERPC_EXCP_PRIV_* as the
'error' argument instead of POWERPC_EXCP_INVAL_*, and POWERPC_EXCP_FU is
an exception type, not an exception error code. To correctly set
FSCR[IC], we should raise Facility Unavailable with this exception type
and IC value as the error code.
Fixes: 565cb1096733 ("target/ppc: add user read/write functions for MMCR0")
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20220627141104.669152-6-matheus.ferst@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
* target/ppc: fix exception error code in spr_write_excp_vector
The 'error' argument of gen_inval_exception will be or-ed with
POWERPC_EXCP_INVAL, so it should always be a constant prefixed with
POWERPC_EXCP_INVAL_. No functional change is intended,
spr_write_excp_vector is only used by register_BookE_sprs, and
powerpc_excp_booke ignores the lower 4 bits of the error code on
POWERPC_EXCP_INVAL exceptions.
Also, take the opportunity to replace printf with qemu_log_mask.
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20220627141104.669152-7-matheus.ferst@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
* target/ppc: Move tlbie[l] to decode tree
Also decode RIC, PRS and R operands.
Signed-off-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20220712193741.59134-2-leandro.lupori@eldorado.org.br>
[danielhb: mark bit 31 in @X_tlbie pattern as ignored]
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
* target/ppc: Implement ISA 3.00 tlbie[l]
This initial version supports the invalidation of one or all
TLB entries. Flush by PID/LPID, or based in process/partition
scope is not supported, because it would make using the
generic QEMU TLB implementation hard. In these cases, all
entries are flushed.
Signed-off-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20220712193741.59134-3-leandro.lupori@eldorado.org.br>
[danielhb: moved 'set' declaration to TLBIE_RIC_PWC block]
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
* target/ppc: receive DisasContext explicitly in GEN_PRIV
GEN_PRIV and related CHK_* macros just assumed that variable named
"ctx" would be in scope when they are used, and that it would be a
pointer to DisasContext. Change these macros to receive the pointer
explicitly.
Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br>
Message-Id: <20220701133507.740619-2-lucas.coutinho@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
* target/ppc: add macros to check privilege level
Equivalent to CHK_SV and CHK_HV, but can be used in decodetree methods.
Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br>
Message-Id: <20220701133507.740619-3-lucas.coutinho@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
* target/ppc: Move slbie to decodetree
Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br>
Message-Id: <20220701133507.740619-4-lucas.coutinho@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
* target/ppc: Move slbieg to decodetree
Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br>
Message-Id: <20220701133507.740619-5-lucas.coutinho@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
* target/ppc: Move slbia to decodetree
Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br>
Message-Id: <20220701133507.740619-6-lucas.coutinho@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
* target/ppc: Move slbmte to decodetree
Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br>
Message-Id: <20220701133507.740619-7-lucas.coutinho@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
* target/ppc: Move slbmfev to decodetree
Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br>
Message-Id: <20220701133507.740619-8-lucas.coutinho@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
* target/ppc: Move slbmfee to decodetree
Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br>
Message-Id: <20220701133507.740619-9-lucas.coutinho@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
* target/ppc: Move slbfee to decodetree
Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br>
Message-Id: <20220701133507.740619-10-lucas.coutinho@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
* target/ppc: Move slbsync to decodetree
Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br>
Message-Id: <20220701133507.740619-11-lucas.coutinho@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
* target/ppc: Implement slbiag
Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br>
Message-Id: <20220701133507.740619-12-lucas.coutinho@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
* target/ppc: check tb_env != 0 before printing TBU/TBL/DECR
When using "-machine none", env->tb_env is not allocated, causing the
segmentation fault reported in issue #85 (launchpad bug #811683). To
avoid this problem, check if the pointer != NULL before calling the
methods to print TBU/TBL/DECR.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/85
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20220714172343.80539-1-matheus.ferst@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
* ppc: Check partition and process table alignment
Check if partition and process tables are properly aligned, in
their size, according to PowerISA 3.1B, Book III 6.7.6 programming
note. Hardware and KVM also raise an exception in these cases.
Signed-off-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20220628133959.15131-2-leandro.lupori@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
* target/ppc: Improve Radix xlate level validation
Check if the number and size of Radix levels are valid on
POWER9/POWER10 CPUs, according to the supported Radix Tree
Configurations described in their User Manuals.
Signed-off-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20220628133959.15131-3-leandro.lupori@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
* target/ppc: Check page dir/table base alignment
According to PowerISA 3.1B, Book III 6.7.6 programming note, the
page directory base addresses are expected to be aligned to their
size. Real hardware seems to rely on that and will access the
wrong address if they are misaligned. This results in a
translation failure even if the page tables seem to be properly
populated.
Signed-off-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20220628133959.15131-4-leandro.lupori@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
* qga: treat get-guest-fsinfo as "best effort"
In some container environments, there may be references to block devices
witnessable from a container through /proc/self/mountinfo that reference
devices we simply don't have access to in the container, and cannot
provide information about.
Instead of failing the entire fsinfo command, return stub information
for these failed lookups.
This allows test-qga to pass under docker tests, which are in turn used
by the CentOS VM tests.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220708153503.18864-2-jsnow@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
* tests/vm: use 'cp' instead of 'ln' for temporary vm images
If the initial setup fails, you've permanently altered the state of the
downloaded image in an unknowable way. Use 'cp' like our other test
setup scripts do.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220708153503.18864-3-jsnow@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
* tests/vm: switch CentOS 8 to CentOS 8 Stream
The old CentOS image didn't work anymore because it was already EOL at
the beginning of 2022.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220708153503.18864-4-jsnow@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
* tests/vm: switch centos.aarch64 to CentOS 8 Stream
Switch this test over to using a cloud image like the base CentOS8 VM
test, which helps make this script a bit simpler too.
Note: At time of writing, this test seems pretty flaky when run without
KVM support for aarch64. Certain unit tests like migration-test,
virtio-net-failover, test-hmp and qom-test seem quite prone to fail
under TCG. Still, this is an improvement in that at least pure build
tests are functional.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220708153503.18864-5-jsnow@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
* tests/vm: upgrade Ubuntu 18.04 VM to 20.04
18.04 has fallen out of our support window, so move ubuntu.aarch64
forward to ubuntu 20.04, which is now our oldest supported Ubuntu
release.
Notes:
This checksum changes periodically; use a fixed point image with a known
checksum so that the image isn't re-downloaded on every single
invocation. (The checksum for the 18.04 image was already incorrect at
the time of writing.)
Just like the centos.aarch64 test, this test currently seems very
flaky when run as a TCG test.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220708153503.18864-6-jsnow@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
* tests/vm: remove ubuntu.i386 VM test
Ubuntu 18.04 is out of our support window, and Ubuntu 20.04 does not
support i386 anymore. The debian project does, but they do not provide
any cloud images for it, a new expect-style script would have to be
written.
Since we have i386 cross-compiler tests hosted on GitLab CI, we don't
need to support this VM test anymore.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220708153503.18864-7-jsnow@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
* tests/vm: remove duplicate 'centos' VM test
This is listed twice by accident; we require genisoimage to run the
test, so remove the unconditional entry.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220708153503.18864-8-jsnow@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
* tests/vm: add 1GB extra memory per core
If you try to run a 16 or 32 threaded test, you're going to run out of
memory very quickly with qom-test and a few others. Bump the memory
limit to try to scale with larger-core machines.
Granted, this means that a 16 core processor is going to ask for 16GB,
but you *probably* meet that requirement if you have such a machine.
512MB per core didn't seem to be enough to avoid ENOMEM and SIGABRTs in
the test cases in practice on a six core machine; so I bumped it up to
1GB which seemed to help.
Add this magic in early to the configuration process so that the
config file, if provided, can still override it.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Acked-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220708153503.18864-9-jsnow@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
* tests/vm: Remove docker cross-compile test from CentOS VM
The fedora container has since been split apart, so there's no suitable
nearby target that would support "test-mingw" as it requires both x32
and x64 support -- so either fedora-cross-win32 nor fedora-cross-win64
would be truly suitable.
Just remove this test as superfluous with our current CI infrastructure.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220708153503.18864-10-jsnow@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
* qtest/machine-none: Add LoongArch support
Update the cpu_maps[] to support the LoongArch target.
Signed-off-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220713020258.601424-1-gaosong@loongson.cn>
Signed-off-by: Thomas Huth <thuth@redhat.com>
* tests/unit: Replace g_memdup() by g_memdup2()
Per https://discourse.gnome.org/t/port-your-module-from-g-memdup-to-g-memdup2-now/5538
The old API took the size of the memory to duplicate as a guint,
whereas most memory functions take memory sizes as a gsize. This
made it easy to accidentally pass a gsize to g_memdup(). For large
values, that would lead to a silent truncation of the size from 64
to 32 bits, and result in a heap area being returned which is
significantly smaller than what the caller expects. This can likely
be exploited in various modules to cause a heap buffer overflow.
Replace g_memdup() by the safer g_memdup2() wrapper.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210903174510.751630-24-philmd@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
* Replace 'whitelist' with 'allow'
Let's use more inclusive language here and avoid terms
that are frowned upon nowadays.
Message-Id: <20220711095300.60462-1-thuth@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
* util: Fix broken build on Haiku
A recent commit moved some Haiku-specific code parts from oslib-posix.c
to cutils.c, but failed to move the corresponding header #include
statement, too, so "make vm-build-haiku.x86_64" is currently broken.
Fix it by moving the header #include, too.
Fixes: 06680b15b4 ("include: move qemu_*_exec_dir() to cutils")
Message-Id: <20220718172026.139004-1-thuth@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
* python/qemu/qmp/legacy: Replace 'returns-whitelist' with the correct type
'returns-whitelist' has been renamed to 'command-returns-exceptions' in
commit b86df3747848 ("qapi: Rename pragma *-whitelist to *-exceptions").
Message-Id: <20220711095721.61280-1-thuth@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
* pl050: move PL050State from pl050.c to new pl050.h header file
This allows the QOM types in pl050.c to be used elsewhere by simply including
pl050.h.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Helge Deller <deller@gmx.de>
Acked-by: Helge Deller <deller@gmx.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220712215251.7944-2-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
* pl050: rename pl050_keyboard_init() to pl050_kbd_init()
This is for consistency with all of the other devices that use the PS2 keyboard
device.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Helge Deller <deller@gmx.de>
Acked-by: Helge Deller <deller@gmx.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220712215251.7944-3-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
* pl050: change PL050State dev pointer from void to PS2State
This allows the compiler to enforce that the PS2 device pointer is always of
type PS2State. Update the name of the pointer from dev to ps2dev to emphasise
this type change.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Helge Deller <deller@gmx.de>
Acked-by: Helge Deller <deller@gmx.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220712215251.7944-4-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
* pl050: introduce new PL050_KBD_DEVICE QOM type
This will be soon be used to hold the underlying PS2_KBD_DEVICE object.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Helge Deller <deller@gmx.de>
Acked-by: Helge Deller <deller@gmx.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220712215251.7944-5-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
* pl050: introduce new PL050_MOUSE_DEVICE QOM type
This will be soon be used to hold the underlying PS2_MOUSE_DEVICE object.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Helge Deller <deller@gmx.de>
Acked-by: Helge Deller <deller@gmx.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220712215251.7944-6-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
* pl050: move logic from pl050_realize() to pl050_init()
The logic for initialising the register memory region and the sysbus output IRQ
does not depend upon any device properties and so can be moved from
pl050_realize() to pl050_init().
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Helge Deller <deller@gmx.de>
Acked-by: Helge Deller <deller@gmx.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220712215251.7944-7-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
* pl050: introduce PL050DeviceClass for the PL050 device
This will soon be used to store the reference to the PL050 parent device
for PL050_KBD_DEVICE and PL050_MOUSE_DEVICE.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Helge Deller <deller@gmx.de>
Acked-by: Helge Deller <deller@gmx.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220712215251.7944-8-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
* pl050: introduce pl050_kbd_class_init() and pl050_kbd_realize()
Introduce a new pl050_kbd_class_init() function containing a call to
device_class_set_parent_realize() which calls a new pl050_kbd_realize()
function to initialise the PS2 keyboard device.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Helge Deller <deller@gmx.de>
Acked-by: Helge Deller <deller@gmx.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220712215251.7944-9-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
* pl050: introduce pl050_mouse_class_init() and pl050_mouse_realize()
Introduce a new pl050_mouse_class_init() function containing a call to
device_class_set_parent_realize() which calls a new pl050_mouse_realize()
function to initialise the PS2 mouse device.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Helge Deller <deller@gmx.de>
Acked-by: Helge Deller <deller@gmx.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220712215251.7944-10-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
* pl050: don't use legacy ps2_kbd_init() function
Instantiate the PS2 keyboard device within PL050KbdState using
object_initialize_child() in pl050_kbd_init() and realize it in
pl050_kbd_realize() accordingly.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Helge Deller <deller@gmx.de>
Acked-by: Helge Deller <deller@gmx.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220712215251.7944-11-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
* pl050: don't use legacy ps2_mouse_init() function
Instantiate the PS2 mouse device within PL050MouseState using
object_initialize_child() in pl050_mouse_init() and realize it in
pl050_mouse_realize() accordingly.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Helge Deller <deller@gmx.de>
Acked-by: Helge Deller <deller@gmx.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220712215251.7944-12-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
* lasips2: don't use vmstate_register() in lasips2_realize()
Since lasips2 is a qdev device then vmstate_ps2_mouse can be registered using
the DeviceClass vmsd field instead.
Note that due to the use of the base parameter in the original vmstate_register()
function call, this is actually a migration break for the HPPA B160L machine.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Helge Deller <deller@gmx.de>
Acked-by: Helge Deller <deller@gmx.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220712215251.7944-13-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
* lasips2: remove the qdev base property and the lasips2_properties array
The base property was only needed for use by vmstate_register() in order to
preserve migration compatibility. Now that the lasips2 migration state is
registered through the DeviceClass vmsd field, the base property and also
the lasips2_properties array can be removed completely as they are no longer
required.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Helge Deller <deller@gmx.de>
Acked-by: Helge Deller <deller@gmx.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220712215251.7944-14-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
* lasips2: remove legacy lasips2_initfn() function
There is only one user of the legacy lasips2_initfn() function which is in
machine_hppa_init(), so inline its functionality into machine_hppa_init() and
then remove it.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Helge Deller <deller@gmx.de>
Acked-by: Helge Deller <deller@gmx.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220712215251.7944-15-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
* lasips2: change LASIPS2State dev pointer from void to PS2State
This allows the compiler to enforce that the PS2 device pointer is always of
type PS2State. Update the name of the pointer from dev to ps2dev to emphasise
this type change.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Helge Deller <deller@gmx.de>
Acked-by: Helge Deller <deller@gmx.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220712215251.7944-16-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
* lasips2: QOMify LASIPS2Port
This becomes an abstract QOM type which will be a parent type for separate
keyboard and mouse port types.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Helge Deller <deller@gmx.de>
Acked-by: Helge Deller <deller@gmx.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220712215251.7944-17-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
* lasips2: introduce new LASIPS2_KBD_PORT QOM type
This will be soon be used to hold the underlying PS2_KBD_DEVICE object.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Helge Deller <deller@gmx.de>
Acked-by: Helge Deller <deller@gmx.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220712215251.7944-18-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
* lasips2: introduce new LASIPS2_MOUSE_PORT QOM type
This will be soon be used to hold the underlying PS2_MOUSE_DEVICE object.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Helge Deller <deller@gmx.de>
Acked-by: Helge Deller <deller@gmx.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220712215251.7944-19-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
* lasips2: move keyboard port initialisation to new lasips2_kbd_port_init() function
Move the initialisation of the keyboard port from lasips2_init() to
a new lasips2_kbd_port_init() function which will be invoked using
object_initialize_child() during the LASIPS2 device init.
Update LASIPS2State so that it now holds the new LASIPS2KbdPort child object and
ensure that it is realised in lasips2_realize().
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Helge Deller <deller@gmx.de>
Acked-by: Helge Deller <deller@gmx.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220712215251.7944-20-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
* lasips2: move mouse port initialisation to new lasips2_mouse_port_init() function
Move the initialisation of the mouse port from lasips2_init() to
a new lasips2_mouse_port_init() function which will be invoked using
object_initialize_child() during the LASIPS2 device init.
Update LASIPS2State so that it now holds the new LASIPS2MousePort child object and
ensure that it is realised in lasips2_realize().
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Helge Deller <deller@gmx.de>
Acked-by: Helge Deller <deller@gmx.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220712215251.7944-21-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
* lasips2: introduce lasips2_kbd_port_class_init() and lasips2_kbd_port_realize()
Introduce a new lasips2_kbd_port_class_init() function which uses a new
lasips2_kbd_port_realize() function to initialise the PS2 keyboard device.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Helge Deller <deller@gmx.de>
Acked-by: Helge Deller <deller@gmx.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220712215251.7944-22-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
* lasips2: introduce lasips2_mouse_port_class_init() and lasips2_mouse_port_realize()
Introduce a new lasips2_mouse_port_class_init() function which uses a new
lasips2_mouse_port_realize() function to initialise the PS2 mouse device.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Helge Deller <deller@gmx.de>
Acked-by: Helge Deller <deller@gmx.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220712215251.7944-23-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
* lasips2: rename LASIPS2Port irq field to birq
The existing boolean irq field in LASIPS2Port will soon be replaced by a proper
qemu_irq, so rename the field to birq to allow the upcoming qemu_irq to use the
irq name.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Helge Deller <deller@gmx.de>
Acked-by: Helge Deller <deller@gmx.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220712215251.7944-24-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
* lasips2: introduce port IRQ and new lasips2_port_init() function
Introduce a new lasips2_port_init() QOM init function for the LASIPS2_PORT type
and use it to initialise a new gpio for use as a port IRQ. Add a new qemu_irq
representing the gpio as a new irq field within LASIPS2Port.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Helge Deller <deller@gmx.de>
Acked-by: Helge Deller <deller@gmx.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220712215251.7944-25-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
* lasips2: introduce LASIPS2PortDeviceClass for the LASIPS2_PORT device
This will soon be used to store the reference to the LASIPS2_PORT parent device
for LASIPS2_KBD_PORT and LASIPS2_MOUSE_PORT.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Helge Deller <deller@gmx.de>
Acked-by: Helge Deller <deller@gmx.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220712215251.7944-26-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
* lasips2: add named input gpio to port for downstream PS2 device IRQ
The named input gpio is to be connected to the IRQ output of the downstream
PS2 device and used to drive the port IRQ. Initialise the named input gpio
in lasips2_port_init() and add new lasips2_port_class_init() and
lasips2_port_realize() functions to connect the PS2 device output gpio to
the new named input gpio.
Note that the reference to lasips2_port_realize() is stored in
LASIPS2PortDeviceClass but not yet used.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Helge Deller <deller@gmx.de>
Acked-by: Helge Deller <deller@gmx.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220712215251.7944-27-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
* lasips2: add named input gpio to handle incoming port IRQs
The LASIPS2 device named input gpio is soon to be connected to the port output
IRQs. Add a new int_status field to LASIPS2State which is a bitmap representing
the port input IRQ status which will be enabled in the next patch.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Helge Deller <deller@gmx.de>
Acked-by: Helge Deller <deller@gmx.de>
Message-Id: <20220712215251.7944-28-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
* lasips2: switch to using port-based IRQs
Now we can implement port-based IRQs by wiring the PS2 device IRQs to the
LASI2Port named input gpios rather than directly to the LASIPS2 device, and
generate the LASIPS2 output IRQ from the int_status bitmap representing the
individual port IRQs instead of the birq boolean.
This enables us to remove the separate PS2 keyboard and PS2 mouse named input
gpios from the LASIPS2 device and simplify the register implementation to
drive the port IRQ using qemu_set_irq() rather than accessing the LASIPS2
device IRQs directly. As a consequence the IRQ level logic in lasips2_set_irq()
can also be simplified accordingly.
For now this patch ignores adding the int_status bitmap and simply drops the
birq boolean from the vmstate_lasips2 VMStateDescription. This is because the
migration stream is already missing some required LASIPS2 fields, and as this
series already introduces a migration break for the lasips2 device it is
easiest to fix this in a follow-up patch.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Helge Deller <deller@gmx.de>
Acked-by: Helge Deller <deller@gmx.de>
Message-Id: <20220712215251.7944-29-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
* lasips2: rename LASIPS2Port parent pointer to lasips2
This makes it clearer that the pointer is a reference to the LASIPS2 container
device rather than an implied part of the QOM hierarchy.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Helge Deller <deller@gmx.de>
Acked-by: Helge Deller <deller@gmx.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220712215251.7944-30-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
* lasips2: standardise on lp name for LASIPS2Port variables
This is shorter to type and keeps the naming convention consistent within the
LASIPS2 device.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Helge Deller <deller@gmx.de>
Acked-by: Helge Deller <deller@gmx.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220712215251.7944-31-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
* lasips2: switch register memory region to DEVICE_BIG_ENDIAN
The LASI device (and so also the LASIPS2 device) are only used for the HPPA
B160L machine which is a big endian architecture.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Helge Deller <deller@gmx.de>
Acked-by: Helge Deller <deller@gmx.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220712215251.7944-32-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
* lasips2: don't use legacy ps2_kbd_init() function
Instantiate the PS2 keyboard device within LASIPS2KbdPort using
object_initialize_child() in lasips2_kbd_port_init() and realize it in
lasips2_kbd_port_realize() accordingly.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Helge Deller <deller@gmx.de>
Acked-by: Helge Deller <deller@gmx.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220712215251.7944-33-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
* lasips2: don't use legacy ps2_mouse_init() function
Instantiate the PS2 mouse device within LASIPS2MousePort using
object_initialize_child() in lasips2_mouse_port_init() and realize it in
lasips2_mouse_port_realize() accordingly.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Helge Deller <deller@gmx.de>
Acked-by: Helge Deller <deller@gmx.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220712215251.7944-34-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
* lasips2: update VMStateDescription for LASIPS2 device
Since this series has already introduced a migration break for the HPPA B160L
machine, we can use this opportunity to improve the VMStateDescription for
the LASIPS2 device.
Add the new int_status field to the VMStateDescription and remodel the ports
as separate VMSTATE_STRUCT instances representing each LASIPS2Port. Once this
is done, the migration stream can be updated to include buf and loopback_rbne
for each port (which is necessary since the values are accessed across separate
IO accesses), and drop the port id as this is hardcoded for each port type.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Helge Deller <deller@gmx.de>
Acked-by: Helge Deller <deller@gmx.de>
Message-Id: <20220712215251.7944-35-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
* pckbd: introduce new vmstate_kbd_mmio VMStateDescription for the I8042_MMIO device
This enables us to register the VMStateDescription using the DeviceClass vmsd
property rather than having to call vmstate_register() from i8042_mmio_realize().
Note that this is a migration break for the MIPS magnum machine which is the only
user of the I8042_MMIO device.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Helge Deller <deller@gmx.de>
Acked-by: Helge Deller <deller@gmx.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220712215251.7944-36-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
* pckbd: don't use legacy ps2_kbd_init() function
Instantiate the PS2 keyboard device within KBDState using
object_initialize_child() in i8042_initfn() and i8042_mmio_init() and realize
it in i8042_realizefn() and i8042_mmio_realize() accordingly.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Helge Deller <deller@gmx.de>
Acked-by: Helge Deller <deller@gmx.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220712215251.7944-37-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
* ps2: remove unused legacy ps2_kbd_init() function
Now that the legacy ps2_kbd_init() function is no longer used, it can be completely
removed along with its associated trace-event.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Helge Deller <deller@gmx.de>
Acked-by: Helge Deller <deller@gmx.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220712215251.7944-38-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
* pckbd: don't use legacy ps2_mouse_init() function
Instantiate the PS2 mouse device within KBDState using
object_initialize_child() in i8042_initfn() and i8042_mmio_init() and realize
it in i8042_realizefn() and i8042_mmio_realize() accordingly.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Helge Deller <deller@gmx.de>
Acked-by: Helge Deller <deller@gmx.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220712215251.7944-39-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
* ps2: remove unused legacy ps2_mouse_init() function
Now that the legacy ps2_mouse_init() function is no longer used, it can be completely
removed along with its associated trace-event.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Helge Deller <deller@gmx.de>
Acked-by: Helge Deller <deller@gmx.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220712215251.7944-40-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
* pckbd: remove legacy i8042_mm_init() function
This legacy function is only used during the initialisation of the MIPS magnum
machine, so inline its functionality directly into mips_jazz_init() and then
remove it.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Helge Deller <deller@gmx.de>
Acked-by: Helge Deller <deller@gmx.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220712215251.7944-41-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
* util: Fix broken build on Haiku
A recent commit moved some Haiku-specific code parts from oslib-posix.c
to cutils.c, but failed to move the corresponding header #include
statement, too, so "make vm-build-haiku.x86_64" is currently broken.
Fix it by moving the header #include, too.
Fixes: 06680b15b4 ("include: move qemu_*_exec_dir() to cutils")
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220718172026.139004-1-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* target/s390x: fix handling of zeroes in vfmin/vfmax
vfmin_res() / vfmax_res() are trying to check whether a and b are both
zeroes, but in reality they check that they are the same kind of zero.
This causes incorrect results when comparing positive and negative
zeroes.
Fixes: da4807527f3b ("s390x/tcg: Implement VECTOR FP (MAXIMUM|MINIMUM)")
Co-developed-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-Id: <20220713182612.3780050-2-iii@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
* target/s390x: fix NaN propagation rules
s390x has the same NaN propagation rules as ARM, and not as x86.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-Id: <20220713182612.3780050-3-iii@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
* tests/tcg/s390x: test signed vfmin/vfmax
Add a test to prevent regressions. Try all floating point value sizes
and all combinations of floating point value classes. Verify the results
against PoP tables, which are represented as close to the original as
possible - this produces a lot of checkpatch complaints, but it seems
to be justified in this case.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220713182612.3780050-4-iii@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
* dbus-display: fix test race when initializing p2p connection
The D-Bus connection starts processing messages before QEMU has the time
to set the object manager server. This is causing dbus-display-test to
fail randomly with:
ERROR:../tests/qtest/dbus-display-test.c:68:test_dbus_display_vm:
assertion failed
(qemu_dbus_display1_vm_get_name(QEMU_DBUS_DISPLAY1_VM(vm)) ==
"dbus-test"): (NULL == "dbus-test") ERROR
Use the delayed message processing flag and method to avoid that
situation.
(the bus connection doesn't need a fix, as the initialization is done
synchronously)
Reported-by: Robinson, Cole <crobinso@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Tested-by: Cole Robinson <crobinso@redhat.com>
Message-Id: <20220609152647.870373-1-marcandre.lureau@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
* microvm: turn off io reservations for pcie root ports
The pcie host bridge has no io window on microvm,
so io reservations will not work.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-Id: <20220701091516.43489-1-kraxel@redhat.com>
* usb/hcd-xhci: check slotid in xhci_wakeup_endpoint()
This prevents an OOB read (followed by an assertion failure in
xhci_kick_ep) when slotid > xhci->numslots.
Reported-by: Soul Chen <soulchen8650@gmail.com>
Signed-off-by: Mauro Matteo Cascella <mcascell@redhat.com>
Message-Id: <20220705174734.2348829-1-mcascell@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
* usb: document guest-reset and guest-reset-all
Suggested-by: Michal Prívozník <mprivozn@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Message-Id: <20220711094437.3995927-2-kraxel@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
* usb: document pcap (aka usb traffic capture)
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-Id: <20220711094437.3995927-3-kraxel@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
* gtk: Add show_tabs=on|off command line option.
The patch adds "show_tabs" command line option for GTK ui similar to
"grab_on_hover". This option allows tabbed view mode to not have to be
enabled by hand at each start of the VM.
Signed-off-by: Felix "xq" Queißner <xq@random-projects.net>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220712133753.18937-1-xq@random-projects.net>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
* tests/docker/dockerfiles: Add debian-loongarch-cross.docker
Use the pre-packaged toolchain provided by Loongson via github.
Tested-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220704070824.965429-1-richard.henderson@linaro.org>
* target/loongarch: Fix loongarch_cpu_class_by_name
The cpu_model argument may already have the '-loongarch-cpu' suffix,
e.g. when using the default for the LS7A1000 machine. If that fails,
try again with the suffix. Validate that the object created by the
function is derived from the proper base class.
Signed-off-by: Xiaojuan Yang <yangxiaojuan@loongson.cn>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220715060740.1500628-2-yangxiaojuan@loongson.cn>
[rth: Try without and then with the suffix, to avoid testsuite breakage.]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* hw/intc/loongarch_pch_pic: Fix bugs for update_irq function
Fix such errors:
1. We should not use 'unsigned long' type as argument when we use
find_first_bit(), and we use ctz64() to replace find_first_bit()
to fix this bug.
2. It is not standard to use '1ULL << irq' to generate a irq mask.
So, we replace it with 'MAKE_64BIT_MASK(irq, 1)'.
Fix coverity CID: 1489761 1489764 1489765
Signed-off-by: Xiaojuan Yang <yangxiaojuan@loongson.cn>
Message-Id: <20220715060740.1500628-3-yangxiaojuan@loongson.cn>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* target/loongarch/cpu: Fix coverity errors about excp_names
Fix out-of-bounds errors when access excp_names[] array. the valid
boundary size of excp_names should be 0 to ARRAY_SIZE(excp_names)-1.
However, the general code do not consider the max boundary.
Fix coverity CID: 1489758
Signed-off-by: Xiaojuan Yang <yangxiaojuan@loongson.cn>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220715060740.1500628-4-yangxiaojuan@loongson.cn>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* target/loongarch/tlb_helper: Fix coverity integer overflow error
Replace '1 << shift' with 'MAKE_64BIT_MASK(shift, 1)' to fix
unintentional integer overflow errors in tlb_helper file.
Fix coverity CID: 1489759 1489762
Signed-off-by: Xiaojuan Yang <yangxiaojuan@loongson.cn>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220715060740.1500628-5-yangxiaojuan@loongson.cn>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* target/loongarch/op_helper: Fix coverity cond_at_most error
The boundary size of cpucfg array should be 0 to ARRAY_SIZE(cpucfg)-1.
So, using index bigger than max boundary to access cpucfg[] must be
forbidden.
Fix coverity CID: 1489760
Signed-off-by: Xiaojuan Yang <yangxiaojuan@loongson.cn>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220715060740.1500628-6-yangxiaojuan@loongson.cn>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* target/loongarch/cpu: Fix cpucfg default value
We should config cpucfg[20] to set value for the scache's ways, sets,
and size arguments when loongarch cpu init. However, the old code
wirte 'sets argument' twice, so we change one of them to 'size argument'.
Signed-off-by: Xiaojuan Yang <yangxiaojuan@loongson.cn>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220715064829.1521482-1-yangxiaojuan@loongson.cn>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* fpu/softfloat: Add LoongArch specializations for pickNaN*
The muladd (inf,zero,nan) case sets InvalidOp and returns the
input value 'c', and prefer sNaN over qNaN, in c,a,b order.
Binary operations prefer sNaN over qNaN and a,b order.
Signed-off-by: Song Gao <gaosong@loongson.cn>
Message-Id: <20220716085426.3098060-3-gaosong@loongson.cn>
[rth: Add specialization for pickNaN]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* target/loongarch: Fix float_convd/float_convs test failing
We should result zero when exception is invalid and operation is nan
Signed-off-by: Song Gao <gaosong@loongson.cn>
Message-Id: <20220716085426.3098060-4-gaosong@loongson.cn>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tests/tcg/loongarch64: Add float reference files
Generated on Loongson-3A5000 (CPU revision 0x0014c011).
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220104132022.2146857-1-f4bug@amsat.org>
Signed-off-by: Song Gao <gaosong@loongson.cn>
Message-Id: <20220716085426.3098060-2-gaosong@loongson.cn>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tests/tcg/loongarch64: Add clo related instructions test
This includes:
- CL{O/Z}.{W/D}
- CT{O/Z}.{W/D}
Signed-off-by: Song Gao <gaosong@loongson.cn>
Message-Id: <20220716085426.3098060-5-gaosong@loongson.cn>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tests/tcg/loongarch64: Add div and mod related instructions test
This includes:
- DIV.{W[U]/D[U]}
- MOD.{W[U]/D[U]}
Signed-off-by: Song Gao <gaosong@loongson.cn>
Message-Id: <20220716085426.3098060-6-gaosong@loongson.cn>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tests/tcg/loongarch64: Add fclass test
This includes:
- FCLASS.{S/D}
Signed-off-by: Song Gao <gaosong@loongson.cn>
Message-Id: <20220716085426.3098060-7-gaosong@loongson.cn>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tests/tcg/loongarch64: Add fp comparison instructions test
Choose some instructions to test:
- FCMP.cond.S
- cond: ceq clt cle cne seq slt sle sne
Signed-off-by: Song Gao <gaosong@loongson.cn>
Message-Id: <20220716085426.3098060-8-gaosong@loongson.cn>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tests/tcg/loongarch64: Add pcadd related instructions test
This includes:
- PCADDI
- PCADDU12I
- PCADDU18I
- PCALAU12I
Signed-off-by: Song Gao <gaosong@loongson.cn>
Message-Id: <20220716085426.3098060-9-gaosong@loongson.cn>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* hw/loongarch: Add fw_cfg table support
Add fw_cfg table for loongarch virt machine, including memmap table.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Xiaojuan Yang <yangxiaojuan@loongson.cn>
Message-Id: <20220712083206.4187715-2-yangxiaojuan@loongson.cn>
[rth: Replace fprintf with assert; drop unused return value;
initialize reserved slot to zero.]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* hw/loongarch: Add uefi bios loading support
Add uefi bios loading support, now only uefi bios is porting to
loongarch virt machine.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Xiaojuan Yang <yangxiaojuan@loongson.cn>
Message-Id: <20220712083206.4187715-3-yangxiaojuan@loongson.cn>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* hw/loongarch: Add linux kernel booting support
There are two situations to start system by kernel file. If exists bios
option, system will boot from loaded bios file, else system will boot
from hardcoded auxcode, and jump to kernel elf entry.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Xiaojuan Yang <yangxiaojuan@loongson.cn>
Message-Id: <20220712083206.4187715-4-yangxiaojuan@loongson.cn>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* hw/loongarch: Add smbios support
Add smbios support for loongarch virt machine, and put them into fw_cfg
table so that bios can parse them quickly. The weblink of smbios spec:
https://www.dmtf.org/dsp/DSP0134, the version is 3.6.0.
Acked-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Xiaojuan Yang <yangxiaojuan@loongson.cn>
Message-Id: <20220712083206.4187715-5-yangxiaojuan@loongson.cn>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* hw/loongarch: Add acpi ged support
Loongarch virt machine uses general hardware reduces acpi method, rather
than LS7A acpi device. Now only power management function is used in
acpi ged device, memory hotplug will be added later. Also acpi tables
such as RSDP/RSDT/FADT etc.
The acpi table has submited to acpi spec, and will release soon.
Acked-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Xiaojuan Yang <yangxiaojuan@loongson.cn>
Message-Id: <20220712083206.4187715-6-yangxiaojuan@loongson.cn>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* hw/loongarch: Add fdt support
Add LoongArch flatted device tree, adding cpu device node, firmware cfg node,
pcie node into it, and create fdt rom memory region. Now fdt info is not
full since only uefi bios uses fdt, linux kernel does not use fdt.
Loongarch Linux kernel uses acpi table which is full in qemu virt
machine.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Xiaojuan Yang <yangxiaojuan@loongson.cn>
Message-Id: <20220712083206.4187715-7-yangxiaojuan@loongson.cn>
[rth: Set TARGET_NEED_FDT, add fdt to meson.build]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* Hexagon (target/hexagon) fix store w/mem_noshuf & predicated load
Call the CHECK_NOSHUF macro multiple times: once in the
fGEN_TCG_PRED_LOAD() and again in fLOAD().
Before this commit, a packet with a store and a predicated
load with mem_noshuf that gets encoded like this:
{ P0 = cmp.eq(R17,#0x0)
memw(R18+#0x0) = R2
if (!P0.new) R3 = memw(R17+#0x4) }
... would end up generating a branch over both the load
and the store like so:
...
brcond_i32 loc17,$0x0,eq,$L1
mov_i32 loc18,store_addr_1
qemu_st_i32 store_val32_1,store_addr_1,leul,0
qemu_ld_i32 loc16,loc7,leul,0
set_label $L1
...
Test cases added to tests/tcg/hexagon/mem_noshuf.c
Co-authored-by: Taylor Simpson <tsimpson@quicinc.com>
Signed-off-by: Brian Cain <bcain@quicinc.com>
Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220707210546.15985-2-tsimpson@quicinc.com>
* Hexagon (target/hexagon) fix bug in mem_noshuf load exception
The semantics of a mem_noshuf packet are that the store effectively
happens before the load. However, in cases where the load raises an
exception, we cannot simply execute the store first.
This change adds a probe to check that the load will not raise an
exception before executing the store.
If the load is predicated, this requires special handling. We check
the condition before performing the probe. Since, we need the EA to
perform the check, we move the GET_EA portion inside CHECK_NOSHUF_PRED.
Test case added in tests/tcg/hexagon/mem_noshuf_exception.c
Suggested-by: Alessandro Di Federico <ale@rev.ng>
Suggested-by: Anton Johansson <anjo@rev.ng>
Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220707210546.15985-3-tsimpson@quicinc.com>
* vhost: move descriptor translation to vhost_svq_vring_write_descs
It's done for both in and out descriptors so it's better placed here.
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
* virtio-net: Expose MAC_TABLE_ENTRIES
vhost-vdpa control virtqueue needs to know the maximum entries supported
by the virtio-net device, so we know if it is possible to apply the
filter.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
* virtio-net: Expose ctrl virtqueue logic
This allows external vhost-net devices to modify the state of the
VirtIO device model once the vhost-vdpa device has acknowledged the
control commands.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
* vdpa: Avoid compiler to squash reads to used idx
In the next patch we will allow busypolling of this value. The compiler
have a running path where shadow_used_idx, last_used_idx, and vring used
idx are not modified within the same thread busypolling.
This was not an issue before since we always cleared device event
notifier before checking it, and that could act as memory barrier.
However, the busypoll needs something similar to kernel READ_ONCE.
Let's add it here, sepparated from the polling.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
* vhost: Reorder vhost_svq_kick
Future code needs to call it from vhost_svq_add.
No functional change intended.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
* vhost: Move vhost_svq_kick call to vhost_svq_add
The series needs to expose vhost_svq_add with full functionality,
including kick
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
* vhost: Check for queue full at vhost_svq_add
The series need to expose vhost_svq_add with full functionality,
including checking for full queue.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
* vhost: Decouple vhost_svq_add from VirtQueueElement
VirtQueueElement comes from the guest, but we're heading SVQ to be able
to modify the element presented to the device without the guest's
knowledge.
To do so, make SVQ accept sg buffers directly, instead of using
VirtQueueElement.
Add vhost_svq_add_element to maintain element convenience.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
* vhost: Add SVQDescState
This will allow SVQ to add context to the different queue elements.
This patch only store the actual element, no functional change intended.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
* vhost: Track number of descs in SVQDescState
A guest's buffer continuos on GPA may need multiple descriptors on
qemu's VA, so SVQ should track its length sepparatedly.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
* vhost: add vhost_svq_push_elem
This function allows external SVQ users to return guest's available
buffers.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
* vhost: Expose vhost_svq_add
This allows external parts of SVQ to forward custom buffers to the
device.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
* vhost: add vhost_svq_poll
It allows the Shadow Control VirtQueue to wait for the device to use the
available buffers.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
* vhost: Add svq avail_handler callback
This allows external handlers to be aware of new buffers that the guest
places in the virtqueue.
When this callback is defined the ownership of the guest's virtqueue
element is transferred to the callback. This means that if the user
wants to forward the descriptor it needs to manually inject it. The
callback is also free to process the command by itself and use the
element with svq_push.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
* vdpa: Export vhost_vdpa_dma_map and unmap calls
Shadow CVQ will copy buffers on qemu VA, so we avoid TOCTOU attacks from
the guest that could set a different state in qemu device model and vdpa
device.
To do so, it needs to be able to map these new buffers to the device.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
* vhost-net-vdpa: add stubs for when no virtio-net device is present
net/vhost-vdpa.c will need functions that are declared in
vhost-shadow-virtqueue.c, that needs functions of virtio-net.c.
Copy the vhost-vdpa-stub.c code so
only the constructor net_init_vhost_vdpa needs to be defined.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
* vdpa: manual forward CVQ buffers
Do a simple forwarding of CVQ buffers, the same work SVQ could do but
through callbacks. No functional change intended.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
* vdpa: Buffer CVQ support on shadow virtqueue
Introduce the control virtqueue support for vDPA shadow virtqueue. This
is needed for advanced networking features like rx filtering.
Virtio-net control VQ copies the descriptors to qemu's VA, so we avoid
TOCTOU with the guest's or device's memory every time there is a device
model change. Otherwise, the guest could change the memory content in
the time between qemu and the device read it.
To demonstrate command handling, VIRTIO_NET_F_CTRL_MACADDR is
implemented. If the virtio-net driver changes MAC the virtio-net device
model will be updated with the new one, and a rx filtering change event
will be raised.
More cvq commands could be added here straightforwardly but they have
not been tested.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
* vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
To know the device features is needed for CVQ SVQ, so SVQ knows if it
can handle all commands or not. Extract from
vhost_vdpa_get_max_queue_pairs so we can reuse it.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
* vdpa: Add device migration blocker
Since the vhost-vdpa device is exposing _F_LOG, adding a migration blocker if
it uses CVQ.
However, qemu is able to migrate simple devices with no CVQ as long as
they use SVQ. To allow it, add a placeholder error to vhost_vdpa, and
only add to vhost_dev when used. vhost_dev machinery place the migration
blocker if needed.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
* vdpa: Add x-svq to NetdevVhostVDPAOptions
Finally offering the possibility to enable SVQ from the command line.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
* softmmu/runstate.c: add RunStateTransition support form COLO to PRELAUNCH
If the checkpoint occurs when the guest finishes restarting
but has not started running, the runstate_set() may reject
the transition from COLO to PRELAUNCH with the crash log:
{"timestamp": {"seconds": 1593484591, "microseconds": 26605},\
"event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
qemu-system-x86_64: invalid runstate transition: 'colo' -> 'prelaunch'
Long-term testing says that it's pretty safe.
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
* net/colo: Fix a "double free" crash to clear the conn_list
We notice the QEMU may crash when the guest has too many
incoming network connections with the following log:
15197@1593578622.668573:colo_proxy_main : colo proxy connection hashtable full, clear it
free(): invalid pointer
[1] 15195 abort (core dumped) qemu-system-x86_64 ....
This is because we create the s->connection_track_table with
g_hash_table_new_full() which is defined as:
GHashTable * g_hash_table_new_full (GHashFunc hash_func,
GEqualFunc key_equal_func,
GDestroyNotify key_destroy_func,
GDestroyNotify value_destroy_func);
The fourth parameter connection_destroy() will be called to free the
memory allocated for all 'Connection' values in the hashtable when
we call g_hash_table_remove_all() in the connection_hashtable_reset().
But both connection_track_table and conn_list reference to the same
conn instance. It will trigger double free in conn_list clear. So this
patch remove free action on hash table side to avoid double free the
conn.
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
* net/colo.c: No need to track conn_list for filter-rewriter
Filter-rewriter no need to track connection in conn_list.
This patch fix the glib g_queue_is_empty assertion when COLO guest
keep a lot of network connection.
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
* net/colo.c: fix segmentation fault when packet is not parsed correctly
When COLO use only one vnet_hdr_support parameter between
filter-redirector and filter-mirror(or colo-compare), COLO will crash
with segmentation fault. Back track as follow:
Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
0x0000555555cb200b in eth_get_l2_hdr_length (p=0x0)
at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296
296 uint16_t proto = be16_to_cpu(PKT_GET_ETH_HDR(p)->h_proto);
(gdb) bt
0 0x0000555555cb200b in eth_get_l2_hdr_length (p=0x0)
at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296
1 0x0000555555cb22b4 in parse_packet_early (pkt=0x555556a44840) at
net/colo.c:49
2 0x0000555555cb2b91 in is_tcp_packet (pkt=0x555556a44840) at
net/filter-rewriter.c:63
So wrong vnet_hdr_len will cause pkt->data become NULL. Add check to
raise error and add trace-events to track vnet_hdr_len.
Signed-off-by: Tao Xu <tao3.xu@intel.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
* accel/kvm/kvm-all: Refactor per-vcpu dirty ring reaping
Add a non-required argument 'CPUState' to kvm_dirty_ring_reap so
that it can cover single vcpu dirty-ring-reaping scenario.
Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <c32001242875e83b0d9f78f396fe2dcd380ba9e8.1656177590.git.huangy81@chinatelecom.cn>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* cpus: Introduce cpu_list_generation_id
Introduce cpu_list_generation_id to track cpu list generation so
that cpu hotplug/unplug can be detected during measurement of
dirty page rate.
cpu_list_generation_id could be used to detect changes of cpu
list, which is prepared for dirty page rate measurement.
Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <06e1f1362b2501a471dce796abb065b04f320fa5.1656177590.git.huangy81@chinatelecom.cn>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration/dirtyrate: Refactor dirty page rate calculation
abstract out dirty log change logic into function
global_dirty_log_change.
abstract out dirty page rate calculation logic via
dirty-ring into function vcpu_calculate_dirtyrate.
abstract out mathematical dirty page rate calculation
into do_calculate_dirtyrate, decouple it from DirtyStat.
rename set_sample_page_period to dirty_stat_wait, which
is well-understood and will be reused in dirtylimit.
handle cpu hotplug/unplug scenario during measurement of
dirty page rate.
export util functions outside migration.
Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <7b6f6f4748d5b3d017b31a0429e630229ae97538.1656177590.git.huangy81@chinatelecom.cn>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* softmmu/dirtylimit: Implement vCPU dirtyrate calculation periodically
Introduce the third method GLOBAL_DIRTY_LIMIT of dirty
tracking for calculate dirtyrate periodly for dirty page
rate limit.
Add dirtylimit.c to implement dirtyrate calculation periodly,
which will be used for dirty page rate limit.
Add dirtylimit.h to export util functions for dirty page rate
limit implementation.
Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <5d0d641bffcb9b1c4cc3e323b6dfecb36050d948.1656177590.git.huangy81@chinatelecom.cn>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* accel/kvm/kvm-all: Introduce kvm_dirty_ring_size function
Introduce kvm_dirty_ring_size util function to help calculate
dirty ring ful time.
Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
Acked-by: Peter Xu <peterx@redhat.com>
Message-Id: <f9ce1f550bfc0e3a1f711e17b1dbc8f701700e56.1656177590.git.huangy81@chinatelecom.cn>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* softmmu/dirtylimit: Implement virtual CPU throttle
Setup a negative feedback system when vCPU thread
handling KVM_EXIT_DIRTY_RING_FULL exit by introducing
throttle_us_per_full field in struct CPUState. Sleep
throttle_us_per_full microseconds to throttle vCPU
if dirtylimit is in service.
Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <977e808e03a1cef5151cae75984658b6821be618.1656177590.git.huangy81@chinatelecom.cn>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* softmmu/dirtylimit: Implement dirty page rate limit
Implement dirtyrate calculation periodically basing on
dirty-ring and throttle virtual CPU until it reachs the quota
dirty page rate given by user.
Introduce qmp commands "set-vcpu-dirty-limit",
"cancel-vcpu-dirty-limit", "query-vcpu-dirty-limit"
to enable, disable, query dirty page limit for virtual CPU.
Meanwhile, introduce corresponding hmp commands
"set_vcpu_dirty_limit", "cancel_vcpu_dirty_limit",
"info vcpu_dirty_limit" so the feature can be more usable.
"query-vcpu-dirty-limit" success depends on enabling dirty
page rate limit, so just add it to the list of skipped
command to ensure qmp-cmd-test run successfully.
Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <4143f26706d413dd29db0b672fe58b3d3fbe34bc.1656177590.git.huangy81@chinatelecom.cn>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* tests: Add dirty page rate limit test
Add dirty page rate limit test if kernel support dirty ring,
The following qmp commands are covered by this test case:
"calc-dirty-rate", "query-dirty-rate", "set-vcpu-dirty-limit",
"cancel-vcpu-dirty-limit" and "query-vcpu-dirty-limit".
Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
Acked-by: Peter Xu <peterx@redhat.com>
Message-Id: <eed5b847a6ef0a9c02a36383dbdd7db367dd1e7e.1656177590.git.huangy81@chinatelecom.cn>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* multifd: Copy pages before compressing them with zlib
zlib_send_prepare() compresses pages of a running VM. zlib does not
make any thread-safety guarantees with respect to changing deflate()
input concurrently with deflate() [1].
One can observe problems due to this with the IBM zEnterprise Data
Compression accelerator capable zlib [2]. When the hardware
acceleration is enabled, migration/multifd/tcp/plain/zlib test fails
intermittently [3] due to sliding window corruption. The accelerator's
architecture explicitly discourages concurrent accesses [4]:
Page 26-57, "Other Conditions":
As observed by this CPU, other CPUs, and channel
programs, references to the parameter block, first,
second, and third operands may be multiple-access
references, accesses to these storage locations are
not necessarily block-concurrent, and the sequence
of these accesses or references is undefined.
Mark Adler pointed out that vanilla zlib performs double fetches under
certain circumstances as well [5], therefore we need to copy data
before passing it to deflate().
[1] https://zlib.net/manual.html
[2] https://github.com/madler/zlib/pull/410
[3] https://lists.nongnu.org/archive/html/qemu-devel/2022-03/msg03988.html
[4] http://publibfp.dhe.ibm.com/epubs/pdf/a227832c.pdf
[5] https://lists.gnu.org/archive/html/qemu-devel/2022-07/msg00889.html
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Message-Id: <20220705203559.2960949-1-iii@linux.ibm.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration: Add postcopy-preempt capability
Firstly, postcopy already preempts precopy due to the fact that we do
unqueue_page() first before looking into dirty bits.
However that's not enough, e.g., when there're host huge page enabled, when
sending a precopy huge page, a postcopy request needs to wait until the whole
huge page that is sending to finish. That could introduce quite some delay,
the bigger the huge page is the larger delay it'll bring.
This patch adds a new capability to allow postcopy requests to preempt existing
precopy page during sending a huge page, so that postcopy requests can be
serviced even faster.
Meanwhile to send it even faster, bypass the precopy stream by providing a
standalone postcopy socket for sending requested pages.
Since the new behavior will not be compatible with the old behavior, this will
not be the default, it's enabled only when the new capability is set on both
src/dst QEMUs.
This patch only adds the capability itself, the logic will be added in follow
up patches.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220707185342.26794-2-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration: Postcopy preemption preparation on channel creation
Create a new socket for postcopy to be prepared to send postcopy requested
pages via this specific channel, so as to not get blocked by precopy pages.
A new thread is also created on dest qemu to receive data from this new channel
based on the ram_load_postcopy() routine.
The ram_load_postcopy(POSTCOPY) branch and the thread has not started to
function, and that'll be done in follow up patches.
Cleanup the new sockets on both src/dst QEMUs, meanwhile look after the new
thread too to make sure it'll be recycled properly.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220707185502.27149-1-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
dgilbert: With Peter's fix to quieten compiler warning on
start_migration
* migration: Postcopy preemption enablement
This patch enables postcopy-preempt feature.
It contains two major changes to the migration logic:
(1) Postcopy requests are now sent via a different socket from precopy
background migration stream, so as to be isolated from very high page
request delays.
(2) For huge page enabled hosts: when there's postcopy requests, they can now
intercept a partial sending of huge host pages on src QEMU.
After this patch, we'll live migrate a VM with two channels for postcopy: (1)
PRECOPY channel, which is the default channel that transfers background pages;
and (2) POSTCOPY channel, which only transfers requested pages.
There's no strict rule of which channel to use, e.g., if a requested page is
already being transferred on precopy channel, then we will keep using the same
precopy channel to transfer the page even if it's explicitly requested. In 99%
of the cases we'll prioritize the channels so we send requested page via the
postcopy channel as long as possible.
On the source QEMU, when we found a postcopy request, we'll interrupt the
PRECOPY channel sending process and quickly switch to the POSTCOPY channel.
After we serviced all the high priority postcopy pages, we'll switch back to
PRECOPY channel so that we'll continue to send the interrupted huge page again.
There's no new thread introduced on src QEMU.
On the destination QEMU, one new thread is introduced to receive page data from
the postcopy specific socket (done in the preparation patch).
This patch has a side effect: after sending postcopy pages, previously we'll
assume the guest will access follow up pages so we'll keep sending from there.
Now it's changed. Instead of going on with a postcopy requested page, we'll go
back and continue sending the precopy huge page (which can be intercepted by a
postcopy request so the huge page can be sent partially before).
Whether that's a problem is debatable, because "assuming the guest will
continue to access the next page" may not really suite when huge pages are
used, especially if the huge page is large (e.g. 1GB pages). So that locality
hint is much meaningless if huge pages are used.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220707185504.27203-1-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration: Postcopy recover with preempt enabled
To allow postcopy recovery, the ram fast load (preempt-only) dest QEMU thread
needs similar handling on fault tolerance. When ram_load_postcopy() fails,
instead of stopping the thread it halts with a semaphore, preparing to be
kicked again when recovery is detected.
A mutex is introduced to make sure there's no concurrent operation upon the
socket. To make it simple, the fast ram load thread will take the mutex during
its whole procedure, and only release it if it's paused. The fast-path socket
will be properly released by the main loading thread safely when there's
network failures during postcopy with that mutex held.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220707185506.27257-1-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration: Create the postcopy preempt channel asynchronously
This patch allows the postcopy preempt channel to be created
asynchronously. The benefit is that when the connection is slow, we won't
take the BQL (and potentially block all things like QMP) for a long time
without releasing.
A function postcopy_preempt_wait_channel() is introduced, allowing the
migration thread to be able to wait on the channel creation. The channel
is always created by the main thread, in which we'll kick a new semaphore
to tell the migration thread that the channel has created.
We'll need to wait for the new channel in two places: (1) when there's a
new postcopy migration that is starting, or (2) when there's a postcopy
migration to resume.
For the start of migration, we don't need to wait for this channel until
when we want to start postcopy, aka, postcopy_start(). We'll fail the
migration if we found that the channel creation failed (which should
probably not happen at all in 99% of the cases, because the main channel is
using the same network topology).
For a postcopy recovery, we'll need to wait in postcopy_pause(). In that
case if the channel creation failed, we can't fail the migration or we'll
crash the VM, instead we keep in PAUSED state, waiting for yet another
recovery.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Manish Mishra <manish.mishra@nutanix.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220707185509.27311-1-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration: Add property x-postcopy-preempt-break-huge
Add a property field that can conditionally disable the "break sending huge
page" behavior in postcopy preemption. By default it's enabled.
It should only be used for debugging purposes, and we should never remove
the "x-" prefix.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Manish Mishra <manish.mishra@nutanix.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220707185511.27366-1-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration: Add helpers to detect TLS capability
Add migrate_channel_requires_tls() to detect whether the specific channel
requires TLS, leveraging the recently introduced migrate_use_tls(). No
functional change intended.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220707185513.27421-1-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration: Export tls-[creds|hostname|authz] params to cmdline too
It's useful for specifying tls credentials all in the cmdline (along with
the -object tls-creds-*), especially for debugging purpose.
The trick here is we must remember to not free these fields again in the
finalize() function of migration object, otherwise it'll cause double-free.
The thing is when destroying an object, we'll first destroy the properties
that bound to the object, then the object itself. To be explicit, when
destroy the object in object_finalize() we have such sequence of
operations:
object_property_del_all(obj);
object_deinit(obj, ti);
So after this change the two fields are properly released already even
before reaching the finalize() function but in object_property_del_all(),
hence we don't need to free them anymore in finalize() or it's double-free.
This also fixes a trivial memory leak for tls-authz as we forgot to free it
before this patch.
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220707185515.27475-1-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration: Enable TLS for preempt channel
This patch is based on the async preempt channel creation. It continues
wiring up the new channel with TLS handshake to destionation when enabled.
Note that only the src QEMU needs such operation; the dest QEMU does not
need any change for TLS support due to the fact that all channels are
established synchronously there, so all the TLS magic is already properly
handled by migration_tls_channel_process_incoming().
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220707185518.27529-1-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration: Respect postcopy request order in preemption mode
With preemption mode on, when we see a postcopy request that was requesting
for exactly the page that we have preempted before (so we've partially sent
the page already via PRECOPY channel and it got preempted by another
postcopy request), currently we drop the request so that after all the
other postcopy requests are serviced then we'll go back to precopy stream
and start to handle that.
We dropped the request because we can't send it via postcopy channel since
the precopy channel already contains partial of the data, and we can only
send a huge page via one channel as a whole. We can't split a huge page
into two channels.
That's a very corner case and that works, but there's a change on the order
of postcopy requests that we handle since we're postponing this (unlucky)
postcopy request to be later than the other queued postcopy requests. The
problem is there's a possibility that when the guest was very busy, the
postcopy queue can be always non-empty, it means this dropped request will
never be handled until the end of postcopy migration. So, there's a chance
that there's one dest QEMU vcpu thread waiting for a page fault for an
extremely long time just because it's unluckily accessing the specific page
that was preempted before.
The worst case time it needs can be as long as the whole postcopy migration
procedure. It's extremely unlikely to happen, but when it happens it's not
good.
The root cause of this problem is because we treat pss->postcopy_requested
variable as with two meanings bound together, as the variable shows:
1. Whether this page request is urgent, and,
2. Which channel we should use for this page request.
With the old code, when we set postcopy_requested it means either both (1)
and (2) are true, or both (1) and (2) are false. We can never have (1)
and (2) to have different values.
However it doesn't necessarily need to be like that. It's very legal that
there's one request that has (1) very high urgency, but (2) we'd like to
use the precopy channel. Just like the corner case we were discussing
above.
To differenciate the two meanings better, introduce a new field called
postcopy_target_channel, showing which channel we should use for this page
request, so as to cover the old meaning (2) only. Then we leave the
postcopy_requested variable to stand only for meaning (1), which is the
urgency of this page request.
With this change, we can easily boost priority of a preempted precopy page
as long as we know that page is also requested as a postcopy page. So with
the new approach in get_queued_page() instead of dropping that request, we
send it right away with the precopy channel so we get back the ordering of
the page faults just like how they're requested on dest.
Reported-by: Manish Mishra <manish.mishra@nutanix.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Manish Mishra <manish.mishra@nutanix.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220707185520.27583-1-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* tests: Move MigrateCommon upper
So that it can be used in postcopy tests too soon.
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220707185522.27638-1-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* tests: Add postcopy tls migration test
We just added TLS tests for precopy but not postcopy. Add the
corresponding test for vanilla postcopy.
Rename the vanilla postcopy to "postcopy/plain" because all postcopy tests
will only use unix sockets as channel.
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220707185525.27692-1-peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
dgilbert: Manual merge
* tests: Add postcopy tls recovery migration test
It's easy to build this upon the postcopy tls test. Rename the old
postcopy recovery test to postcopy/recovery/plain.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220707185527.27747-1-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
dgilbert: Manual merge
* tests: Add postcopy preempt tests
Four tests are added for preempt mode:
- Postcopy plain
- Postcopy recovery
- Postcopy tls
- Postcopy tls+recovery
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220707185530.27801-1-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
dgilbert: Manual merge
* migration: remove unreachable code after reading data
The code calls qio_channel_read() in a loop when it reports
QIO_CHANNEL_ERR_BLOCK. This code is reported when errno==EAGAIN.
As such the later block of code will always hit the 'errno != EAGAIN'
condition, making the final 'else' unreachable.
Fixes: Coverity CID 1490203
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220627135318.156121-1-berrange@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* QIOChannelSocket: Fix zero-copy flush returning code 1 when nothing sent
If flush is called when no buffer was sent with MSG_ZEROCOPY, it currently
returns 1. This return code should be used only when Linux fails to use
MSG_ZEROCOPY on a lot of sendmsg().
Fix this by returning early from flush if no sendmsg(...,MSG_ZEROCOPY)
was attempted.
Fixes: 2bc58ffc2926 ("QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX")
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Acked-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220711211112.18951-2-leobras@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* Add dirty-sync-missed-zero-copy migration stat
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220711211112.18951-3-leobras@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration/multifd: Report to user when zerocopy not working
Some errors, like the lack of Scatter-Gather support by the network
interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail on using
zero-copy, which causes it to fall back to the default copying mechanism.
After each full dirty-bitmap scan there should be a zero-copy flush
happening, which checks for errors each of the previous calls to
sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy, then
increment dirty_sync_missed_zero_copy migration stat to let the user know
about it.
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220711211112.18951-4-leobras@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* multifd: Document the locking of MultiFD{Send/Recv}Params
Reorder the structures so we can know if the fields are:
- Read only
- Their own locking (i.e. sems)
- Protected by 'mutex'
- Only for the multifd channel
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20220531104318.7494-2-quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
dgilbert: Typo fixes from Chen Zhang
* migration: Avoid false-positive on non-supported scenarios for zero-copy-send
Migration with zero-copy-send currently has it's limitations, as it can't
be used with TLS nor any kind of compression. In such scenarios, it should
output errors during parameter / capability setting.
But currently there are some ways of setting this not-supported scenarios
without printing the error message:
!) For 'compression' capability, it works by enabling it together with
zero-copy-send. This happens because the validity test for zero-copy uses
the helper unction migrate_use_compression(), which check for compression
presence in s->enabled_capabilities[MIGRATION_CAPABILITY_COMPRESS].
The point here is: the validity test happens before the capability gets
enabled. If all of them get enabled together, this test will not return
error.
In order to fix that, replace migrate_use_compression() by directly testing
the cap_list parameter migrate_caps_check().
2) For features enabled by parameters such as TLS & 'multifd_compression',
there was also a possibility of setting non-supported scenarios: setting
zero-copy-send first, then setting the unsupported parameter.
In order to fix that, also add a check for parameters conflicting with
zero-copy-send on migrate_params_check().
3) XBZRLE is also a compression capability, so it makes sense to also add
it to the list of capabilities which are not supported with zero-copy-send.
Fixes: 1abaec9a1b2c ("migration: Change zero_copy_send from migration parameter to migration capability")
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Message-Id: <20220719122345.253713-1-leobras@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* Revert "gitlab: disable accelerated zlib for s390x"
This reverts commit 309df6acb29346f89e1ee542b1986f60cab12b87.
With Ilya's 'multifd: Copy pages before compressing them with zlib'
in the latest migration series, this shouldn't be a problem any more.
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
* slow snapshots api
Co-authored-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Co-authored-by: Paolo Bonzini <pbonzini@redhat.com>
Co-authored-by: Peter Maydell <peter.maydell@linaro.org>
Co-authored-by: Joel Stanley <joel@jms.id.au>
Co-authored-by: Peter Delevoryas <pdel@fb.com>
Co-authored-by: Peter Delevoryas <peter@pjd.dev>
Co-authored-by: Cédric Le Goater <clg@kaod.org>
Co-authored-by: Iris Chen <irischenlj@fb.com>
Co-authored-by: Jinhao Fan <fanjinhao21s@ict.ac.cn>
Co-authored-by: Niklas Cassel <niklas.cassel@wdc.com>
Co-authored-by: Darren Kenny <darren.kenny@oracle.com>
Co-authored-by: Konstantin Kostiuk <kkostiuk@redhat.com>
Co-authored-by: Richard Henderson <richard.henderson@linaro.org>
Co-authored-by: Hao Wu <wuhaotsh@google.com>
Co-authored-by: Andrey Makarov <ph.makarov@gmail.com>
Co-authored-by: Jason A. Donenfeld <Jason@zx2c4.com>
Co-authored-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Co-authored-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Co-authored-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Co-authored-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br>
Co-authored-by: John Snow <jsnow@redhat.com>
Co-authored-by: Song Gao <gaosong@loongson.cn>
Co-authored-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Co-authored-by: Thomas Huth <thuth@redhat.com>
Co-authored-by: Ilya Leoshkevich <iii@linux.ibm.com>
Co-authored-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Co-authored-by: Gerd Hoffmann <kraxel@redhat.com>
Co-authored-by: Mauro Matteo Cascella <mcascell@redhat.com>
Co-authored-by: Felix xq Queißner <xq@random-projects.net>
Co-authored-by: Xiaojuan Yang <yangxiaojuan@loongson.cn>
Co-authored-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Co-authored-by: Taylor Simpson <tsimpson@quicinc.com>
Co-authored-by: Eugenio Pérez <eperezma@redhat.com>
Co-authored-by: Zhang Chen <chen.zhang@intel.com>
Co-authored-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
Co-authored-by: Peter Xu <peterx@redhat.com>
Co-authored-by: Daniel P. Berrangé <berrange@redhat.com>
Co-authored-by: Leonardo Bras <leobras@redhat.com>
Co-authored-by: Juan Quintela <quintela@redhat.com>
Co-authored-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Commit d24f80234b ("block/rbd: increase dynamically the image size")
added a workaround to support growing images (eg. qcow2), resizing
the image before write operations that exceed the current size.
We recently added support for write zeroes and without the
workaround we can have problems with qcow2.
So let's move the resize into qemu_rbd_start_co() and do it when
the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES.
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993
Fixes: c56ac27d2a ("block/rbd: add write zeroes support")
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20220317162638.41192-1-sgarzare@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
librbd had a bug until early 2022 that affected all versions of ceph that
supported fast-diff. This bug results in reporting of incorrect offsets
if the offset parameter to rbd_diff_iterate2 is not object aligned.
This patch works around this bug for pre Quincy versions of librbd.
Fixes: 0347a8fd4c3faaedf119be04c197804be40a384b
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <20220113144426.4036493-3-pl@kamp.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Tested-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
the assumption that we can't hit a hole if we do not diff against a snapshot was wrong.
We can see a hole in an image if we diff against base if there exists an older snapshot
of the image and we have discarded blocks in the image where the snapshot has data.
Fix this by simply handling a hole like an unallocated area. There are no callbacks
for unallocated areas so just bail out if we hit a hole.
Fixes: 0347a8fd4c3faaedf119be04c197804be40a384b
Suggested-by: Ilya Dryomov <idryomov@gmail.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <20220113144426.4036493-2-pl@kamp.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
the qemu rbd driver currently lacks support for bdrv_co_block_status.
This results mainly in incorrect progress during block operations (e.g.
qemu-img convert with an rbd image as source).
This patch utilizes the rbd_diff_iterate2 call from librbd to detect
allocated and unallocated (all zero areas).
To avoid querying the ceph OSDs for the answer this is only done if
the image has the fast-diff feature which depends on the object-map and
exclusive-lock features. In this case it is guaranteed that the information
is present in memory in the librbd client and thus very fast.
If fast-diff is not available all areas are reported to be allocated
which is the current behaviour if bdrv_co_block_status is not implemented.
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <20211012152231.24868-1-pl@kamp.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.
Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.
We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).
So, convert driver discard handlers bytes parameter to int64_t.
The only caller of all updated function is bdrv_co_pdiscard in
block/io.c. It is already prepared to work with 64bit requests, but
pass at most max(bs->bl.max_pdiscard, INT_MAX) to the driver.
Let's look at all updated functions:
blkdebug: all calculations are still OK, thanks to
bdrv_check_qiov_request().
both rule_check and bdrv_co_pdiscard are 64bit
blklogwrites: pass to blk_loc_writes_co_log which is 64bit
blkreplay, copy-on-read, filter-compress: pass to bdrv_co_pdiscard, OK
copy-before-write: pass to bdrv_co_pdiscard which is 64bit and to
cbw_do_copy_before_write which is 64bit
file-posix: one handler calls raw_account_discard() is 64bit and both
handlers calls raw_do_pdiscard(). Update raw_do_pdiscard, which pass
to RawPosixAIOData::aio_nbytes, which is 64bit (and calls
raw_account_discard())
gluster: somehow, third argument of glfs_discard_async is size_t.
Let's set max_pdiscard accordingly.
iscsi: iscsi_allocmap_set_invalid is 64bit,
!is_byte_request_lun_aligned is 64bit.
list.num is uint32_t. Let's clarify max_pdiscard and
pdiscard_alignment.
mirror_top: pass to bdrv_mirror_top_do_write() which is
64bit
nbd: protocol limitation. max_pdiscard is alredy set strict enough,
keep it as is for now.
nvme: buf.nlb is uint32_t and we do shift. So, add corresponding limits
to nvme_refresh_limits().
preallocate: pass to bdrv_co_pdiscard() which is 64bit.
rbd: pass to qemu_rbd_start_co() which is 64bit.
qcow2: calculations are still OK, thanks to bdrv_check_qiov_request(),
qcow2_cluster_discard() is 64bit.
raw-format: raw_adjust_offset() is 64bit, bdrv_co_pdiscard too.
throttle: pass to bdrv_co_pdiscard() which is 64bit and to
throttle_group_co_io_limits_intercept() which is 64bit as well.
test-block-iothread: bytes argument is unused
Great! Now all drivers are prepared to handle 64bit discard requests,
or else have explicit max_pdiscard limits.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210903102807.27127-11-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.
Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.
We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).
So, convert driver write_zeroes handlers bytes parameter to int64_t.
The only caller of all updated function is bdrv_co_do_pwrite_zeroes().
bdrv_co_do_pwrite_zeroes() itself is of course OK with widening of
callee parameter type. Also, bdrv_co_do_pwrite_zeroes()'s
max_write_zeroes is limited to INT_MAX. So, updated functions all are
safe, they will not get "bytes" larger than before.
Still, let's look through all updated functions, and add assertions to
the ones which are actually unprepared to values larger than INT_MAX.
For these drivers also set explicit max_pwrite_zeroes limit.
Let's go:
blkdebug: calculations can't overflow, thanks to
bdrv_check_qiov_request() in generic layer. rule_check() and
bdrv_co_pwrite_zeroes() both have 64bit argument.
blklogwrites: pass to blk_log_writes_co_log() with 64bit argument.
blkreplay, copy-on-read, filter-compress: pass to
bdrv_co_pwrite_zeroes() which is OK
copy-before-write: Calls cbw_do_copy_before_write() and
bdrv_co_pwrite_zeroes, both have 64bit argument.
file-posix: both handler calls raw_do_pwrite_zeroes, which is updated.
In raw_do_pwrite_zeroes() calculations are OK due to
bdrv_check_qiov_request(), bytes go to RawPosixAIOData::aio_nbytes
which is uint64_t.
Check also where that uint64_t gets handed:
handle_aiocb_write_zeroes_block() passes a uint64_t[2] to
ioctl(BLKZEROOUT), handle_aiocb_write_zeroes() calls do_fallocate()
which takes off_t (and we compile to always have 64-bit off_t), as
does handle_aiocb_write_zeroes_unmap. All look safe.
gluster: bytes go to GlusterAIOCB::size which is int64_t and to
glfs_zerofill_async works with off_t.
iscsi: Aha, here we deal with iscsi_writesame16_task() that has
uint32_t num_blocks argument and iscsi_writesame16_task() has
uint16_t argument. Make comments, add assertions and clarify
max_pwrite_zeroes calculation.
iscsi_allocmap_() functions already has int64_t argument
is_byte_request_lun_aligned is simple to update, do it.
mirror_top: pass to bdrv_mirror_top_do_write which has uint64_t
argument
nbd: Aha, here we have protocol limitation, and NBDRequest::len is
uint32_t. max_pwrite_zeroes is cleanly set to 32bit value, so we are
OK for now.
nvme: Again, protocol limitation. And no inherent limit for
write-zeroes at all. But from code that calculates cdw12 it's obvious
that we do have limit and alignment. Let's clarify it. Also,
obviously the code is not prepared to handle bytes=0. Let's handle
this case too.
trace events already 64bit
preallocate: pass to handle_write() and bdrv_co_pwrite_zeroes(), both
64bit.
rbd: pass to qemu_rbd_start_co() which is 64bit.
qcow2: offset + bytes and alignment still works good (thanks to
bdrv_check_qiov_request()), so tail calculation is OK
qcow2_subcluster_zeroize() has 64bit argument, should be OK
trace events updated
qed: qed_co_request wants int nb_sectors. Also in code we have size_t
used for request length which may be 32bit. So, let's just keep
INT_MAX as a limit (aligning it down to pwrite_zeroes_alignment) and
don't care.
raw-format: Is OK. raw_adjust_offset and bdrv_co_pwrite_zeroes are both
64bit.
throttle: Both throttle_group_co_io_limits_intercept() and
bdrv_co_pwrite_zeroes() are 64bit.
vmdk: pass to vmdk_pwritev which is 64bit
quorum: pass to quorum_co_pwritev() which is 64bit
Hooray!
At this point all block drivers are prepared to support 64bit
write-zero requests, or have explicitly set max_pwrite_zeroes.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210903102807.27127-8-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: use <= rather than < in assertions relying on max_pwrite_zeroes]
Signed-off-by: Eric Blake <eblake@redhat.com>
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.
Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.
We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).
So, convert driver write handlers parameters which are already 64bit to
signed type.
While being here, convert also flags parameter to be BdrvRequestFlags.
Now let's consider all callers. Simple
git grep '\->bdrv_\(aio\|co\)_pwritev\(_part\)\?'
shows that's there three callers of driver function:
bdrv_driver_pwritev() and bdrv_driver_pwritev_compressed() in
block/io.c, both pass int64_t, checked by bdrv_check_qiov_request() to
be non-negative.
qcow2_save_vmstate() does bdrv_check_qiov_request().
Still, the functions may be called directly, not only by drv->...
Let's check:
git grep '\.bdrv_\(aio\|co\)_pwritev\(_part\)\?\s*=' | \
awk '{print $4}' | sed 's/,//' | sed 's/&//' | sort | uniq | \
while read func; do git grep "$func(" | \
grep -v "$func(BlockDriverState"; done
shows several callers:
qcow2:
qcow2_co_truncate() write at most up to @offset, which is checked in
generic qcow2_co_truncate() by bdrv_check_request().
qcow2_co_pwritev_compressed_task() pass the request (or part of the
request) that already went through normal write path, so it should
be OK
qcow:
qcow_co_pwritev_compressed() pass int64_t, it's updated by this patch
quorum:
quorum_co_pwrite_zeroes() pass int64_t and int - OK
throttle:
throttle_co_pwritev_compressed() pass int64_t, it's updated by this
patch
vmdk:
vmdk_co_pwritev_compressed() pass int64_t, it's updated by this
patch
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210903102807.27127-5-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.
Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.
We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).
So, convert driver read handlers parameters which are already 64bit to
signed type.
While being here, convert also flags parameter to be BdrvRequestFlags.
Now let's consider all callers. Simple
git grep '\->bdrv_\(aio\|co\)_preadv\(_part\)\?'
shows that's there three callers of driver function:
bdrv_driver_preadv() in block/io.c, passes int64_t, checked by
bdrv_check_qiov_request() to be non-negative.
qcow2_load_vmstate() does bdrv_check_qiov_request().
do_perform_cow_read() has uint64_t argument. And a lot of things in
qcow2 driver are uint64_t, so converting it is big job. But we must
not work with requests that don't satisfy bdrv_check_qiov_request(),
so let's just assert it here.
Still, the functions may be called directly, not only by drv->...
Let's check:
git grep '\.bdrv_\(aio\|co\)_preadv\(_part\)\?\s*=' | \
awk '{print $4}' | sed 's/,//' | sed 's/&//' | sort | uniq | \
while read func; do git grep "$func(" | \
grep -v "$func(BlockDriverState"; done
The only one such caller:
QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, &data, 1);
...
ret = bdrv_replace_test_co_preadv(bs, 0, 1, &qiov, 0);
in tests/unit/test-bdrv-drain.c, and it's OK obviously.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210903102807.27127-4-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: fix typos]
Signed-off-by: Eric Blake <eblake@redhat.com>
task->complete is a bool not an integer.
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <20210707180449.32665-1-pl@kamp.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
librbd supports 1 byte alignment for all aio operations.
Currently, there is no API call to query limits from the Ceph
ObjectStore backend. So drop the bdrv_refresh_limits completely
until there is such an API call.
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Message-Id: <20210702172356.11574-7-idryomov@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch wittingly sets BDRV_REQ_NO_FALLBACK and silently ignores
BDRV_REQ_MAY_UNMAP for older librbd versions.
The rationale for this is as follows (citing Ilya Dryomov current RBD
maintainer):
---8<---
a) remove the BDRV_REQ_MAY_UNMAP check in qemu_rbd_co_pwrite_zeroes()
and as a consequence always unmap if librbd is too old
It's not clear what qemu's expectation is but in general Write
Zeroes is allowed to unmap. The only guarantee is that subsequent
reads return zeroes, everything else is a hint. This is how it is
specified in the kernel and in the NVMe spec.
In particular, block/nvme.c implements it as follows:
if (flags & BDRV_REQ_MAY_UNMAP) {
cdw12 |= (1 << 25);
}
This sets the Deallocate bit. But if it's not set, the device may
still deallocate:
"""
If the Deallocate bit (CDW12.DEAC) is set to '1' in a Write Zeroes
command, and the namespace supports clearing all bytes to 0h in the
values read (e.g., bits 2:0 in the DLFEAT field are set to 001b)
from a deallocated logical block and its metadata (excluding
protection information), then for each specified logical block, the
controller:
- should deallocate that logical block;
...
If the Deallocate bit is cleared to '0' in a Write Zeroes command,
and the namespace supports clearing all bytes to 0h in the values
read (e.g., bits 2:0 in the DLFEAT field are set to 001b) from
a deallocated logical block and its metadata (excluding protection
information), then, for each specified logical block, the
controller:
- may deallocate that logical block;
"""
https://nvmexpress.org/wp-content/uploads/NVM-Express-NVM-Command-Set-Specification-2021.06.02-Ratified-1.pdf
b) set BDRV_REQ_NO_FALLBACK in supported_zero_flags
Again, it's not clear what qemu expects here, but without it we end
up in a ridiculous situation where specifying the "don't allow slow
fallback" switch immediately fails all efficient zeroing requests on
a device where Write Zeroes is always efficient:
$ qemu-io -c 'help write' | grep -- '-[zun]'
-n, -- with -z, don't allow slow fallback
-u, -- with -z, allow unmapping
-z, -- write zeroes using blk_co_pwrite_zeroes
$ qemu-io -f rbd -c 'write -z -u -n 0 1M' rbd:foo/bar
write failed: Operation not supported
--->8---
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Message-Id: <20210702172356.11574-6-idryomov@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Message-Id: <20210702172356.11574-5-idryomov@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
While at it just call rbd_get_size and avoid rbd_image_info_t.
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Message-Id: <20210702172356.11574-4-idryomov@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Message-Id: <20210702172356.11574-3-idryomov@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Ceph Luminous (version 12.2.z) is almost 4 years old at this point.
Bump the requirement to get rid of the ifdef'ry in the code.
Qemu 6.1 dropped the support for RHEL-7 which was the last supported
OS that required an older librbd.
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Message-Id: <20210702172356.11574-2-idryomov@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Starting from ceph Pacific, RBD has built-in support for image-level encryption.
Currently supported formats are LUKS version 1 and 2.
There are 2 new relevant librbd APIs for controlling encryption, both expect an
open image context:
rbd_encryption_format: formats an image (i.e. writes the LUKS header)
rbd_encryption_load: loads encryptor/decryptor to the image IO stack
This commit extends the qemu rbd driver API to support the above.
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Message-Id: <20210627114635.39326-1-oro@il.ibm.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Sometimes the parser needs to further split a token it has collected
from the token input stream. Right now, it does a cursory check to see
if the relevant characters appear in the token to determine if it should
break it down further.
However, qemu_rbd_next_tok() will escape characters as it removes tokens
from the token stream and plain strchr() won't. This can make the
initial strchr() check slightly misleading since it implies
qemu_rbd_next_tok() will find the token and split on it, except the
reality is that qemu_rbd_next_tok() will pass over it if it is escaped.
Use a custom strchr to avoid mixing escaped and unescaped string
operations. Furthermore, this code is identical to how
qemu_rbd_next_tok() seeks its next token, so incorporate this custom
strchr into the body of that function to reduce duplication.
Reported-by: Han Han <hhan@redhat.com>
Fixes: https://bugzilla.redhat.com/1873913
Signed-off-by: Connor Kuehl <ckuehl@redhat.com>
Message-Id: <20210421212343.85524-3-ckuehl@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
When we allocate 'q_namespace', we forgot to set 'has_q_namespace'
to true. This can cause several issues, including a memory leak,
since qapi_free_BlockdevCreateOptions() does not deallocate that
memory, as reported by valgrind:
13 bytes in 1 blocks are definitely lost in loss record 7 of 96
at 0x4839809: malloc (vg_replace_malloc.c:307)
by 0x48CEBB8: g_malloc (in /usr/lib64/libglib-2.0.so.0.6600.8)
by 0x48E3FE3: g_strdup (in /usr/lib64/libglib-2.0.so.0.6600.8)
by 0x180010: qemu_rbd_co_create_opts (rbd.c:446)
by 0x1AE72C: bdrv_create_co_entry (block.c:492)
by 0x241902: coroutine_trampoline (coroutine-ucontext.c:173)
by 0x57530AF: ??? (in /usr/lib64/libc-2.32.so)
by 0x1FFEFFFA6F: ???
Fix setting 'has_q_namespace' to true when we allocate 'q_namespace'.
Fixes: 19ae9ae014 ("block/rbd: Add support for ceph namespaces")
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20210329150129.121182-3-sgarzare@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In qemu_rbd_connect(), 'mon_host' is allocated by qemu_rbd_mon_host()
using g_strjoinv(), but it's only freed in the error path, leaking
memory in the success path as reported by valgrind:
80 bytes in 4 blocks are definitely lost in loss record 5,028 of 6,516
at 0x4839809: malloc (vg_replace_malloc.c:307)
by 0x5315BB8: g_malloc (in /usr/lib64/libglib-2.0.so.0.6600.8)
by 0x532B6FF: g_strjoinv (in /usr/lib64/libglib-2.0.so.0.6600.8)
by 0x87D07E: qemu_rbd_mon_host (rbd.c:538)
by 0x87D07E: qemu_rbd_connect (rbd.c:562)
by 0x87E1CE: qemu_rbd_open (rbd.c:740)
by 0x840EB1: bdrv_open_driver (block.c:1528)
by 0x8453A9: bdrv_open_common (block.c:1802)
by 0x8453A9: bdrv_open_inherit (block.c:3444)
by 0x8464C2: bdrv_open (block.c:3537)
by 0x8108CD: qmp_blockdev_add (blockdev.c:3569)
by 0x8EA61B: qmp_marshal_blockdev_add (qapi-commands-block-core.c:1086)
by 0x90B528: do_qmp_dispatch_bh (qmp-dispatch.c:131)
by 0x907EA4: aio_bh_poll (async.c:164)
Fix freeing 'mon_host' also when qemu_rbd_connect() ends correctly.
Fixes: 0a55679b4a5061f4d74bdb1a0e81611ba3390b00
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20210329150129.121182-2-sgarzare@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
qobject_to_json() and qobject_to_json_pretty() build a GString, then
covert it to QString. Just one of the callers actually needs a
QString: qemu_rbd_parse_filename(). A few others need a string they
can modify: qmp_send_response(), qga's send_response(), to_json_str(),
and qmp_fd_vsend_fds(). The remainder just need a string.
Change qobject_to_json() and qobject_to_json_pretty() to return the
GString.
qemu_rbd_parse_filename() now has to convert to QString. All others
save a QString temporary. to_json_str() actually becomes a bit
simpler, because GString provides more convenient modification
functions.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20201211171152.146877-6-armbru@redhat.com>
Commit 19ae9ae014 ("block/rbd: Add support for ceph namespaces")
introduced namespace support for RBD, but we forgot to add the
new 'namespace' options to qemu_rbd_strong_runtime_opts[].
The 'namespace' is used to identify the image, so it is a strong
option since it can changes the data of a BDS.
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1821528
Fixes: 19ae9ae014 ("block/rbd: Add support for ceph namespaces")
Cc: Florian Florensa <fflorensa@online.net>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20200914190553.74871-1-sgarzare@redhat.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
This saw its last use in 4bfb274165ba.
Signed-off-by: John Snow <jsnow@redhat.com>
Message-Id: <20200806211345.2925343-2-jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Convert
visit_type_FOO(v, ..., &ptr, &err);
...
if (err) {
...
}
to
visit_type_FOO(v, ..., &ptr, errp);
...
if (!ptr) {
...
}
for functions that set @ptr to non-null / null on success / error.
Eliminate error_propagate() that are now unnecessary. Delete @err
that are now unused.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200707160613.848843-40-armbru@redhat.com>
Now that there are no clients of bdrv_has_zero_init_truncate, none of
the drivers need to worry about providing it.
What's more, this eliminates a source of some confusion: a literal
reading of the documentation as written in ceaca56f and implemented in
commit 1dcaf527 claims that a driver which returns 0 for
bdrv_has_zero_init_truncate() must not return 1 for
bdrv_has_zero_init(); this condition was violated for parallels, qcow,
and sometimes for vdi, although in practice it did not matter since
those drivers also lacked .bdrv_co_truncate.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200428202905.770727-10-eblake@redhat.com>
Acked-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Our .bdrv_has_zero_init_truncate always returns 1 because rbd always
0-fills; we can use that same knowledge to implement
BDRV_REQ_ZERO_WRITE by ignoring it.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200428202905.770727-5-eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds a new BdrvRequestFlags parameter to the .bdrv_co_truncate()
driver callbacks, and a supported_truncate_flags field in
BlockDriverState that allows drivers to advertise support for request
flags in the context of truncate.
For now, we always pass 0 and no drivers declare support for any flag.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200424125448.63318-2-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This will allow the reuse of a single generic .bdrv_co_create
implementation for several drivers.
No functional changes.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200326011218.29230-2-mlevitsk@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Starting from ceph Nautilus, RBD has support for namespaces, allowing
for finer grain ACLs on images inside a pool, and tenant isolation.
In the rbd cli tool documentation, the new image-spec and snap-spec are :
- [pool-name/[namespace-name/]]image-name
- [pool-name/[namespace-name/]]image-name@snap-name
When using an non namespace's enabled qemu, it complains about not
finding the image called namespace-name/image-name, thus we only need to
parse the image once again to find if there is a '/' in its name, and if
there is, use what is before it as the name of the namespace to later
pass it to rados_ioctx_set_namespace.
rados_ioctx_set_namespace if called with en empty string or a null
pointer as the namespace parameters pretty much does nothing, as it then
defaults to the default namespace.
The namespace is extracted inside qemu_rbd_parse_filename, stored in the
qdict, and used in qemu_rbd_connect to make it work with both qemu-img,
and qemu itself.
Signed-off-by: Florian Florensa <fflorensa@online.net>
Message-Id: <20200110111513.321728-2-fflorensa@online.net>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We have two drivers (iscsi and file-posix) that (in some cases) return
success from their .bdrv_co_truncate() implementation if the block
device is larger than the requested offset, but cannot be shrunk. Some
callers do not want that behavior, so this patch adds a new parameter
that they can use to turn off that behavior.
This patch just adds the parameter and lets the block/io.c and
block/block-backend.c functions pass it around. All other callers
always pass false and none of the implementations evaluate it, so that
this patch does not change existing behavior. Future patches take care
of that.
Suggested-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190918095144.955-5-mreitz@redhat.com
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Replay is capable of recording normal BH events, but sometimes
there are single use callbacks scheduled with aio_bh_schedule_oneshot
function. This patch enables recording and replaying such callbacks.
Block layer uses these events for calling the completion function.
Replaying these calls makes the execution deterministic.
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We need to implement .bdrv_has_zero_init_truncate() for every block
driver that supports truncation and has a .bdrv_has_zero_init()
implementation.
Implement it the same way each driver implements .bdrv_has_zero_init().
This is at least not any more unsafe than what we had before.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190724171239.8764-5-mreitz@redhat.com
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
RBD APIs don't allow us to write more than the size set with
rbd_create() or rbd_resize().
In order to support growing images (eg. qcow2), we resize the
image before write operations that exceed the current size.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20190509145927.293369-1-sgarzare@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
This new field can be set by block drivers to list the runtime options
they accept that may influence the contents of the respective BDS. As of
a follow-up patch, this list will be used by the common
bdrv_refresh_filename() implementation to decide which options to put
into BDS.full_open_options (and consequently whether a JSON filename has
to be created), thus freeing the drivers of having to implement that
logic themselves.
Additionally, this patch adds the field to all of the block drivers that
need it and sets it accordingly.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190201192935.18394-22-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Some block drivers have traditionally changed their node to read-only
mode without asking the user. This behaviour has been marked deprecated
since 2.11, expecting users to provide an explicit read-only=on option.
Now that we have auto-read-only=on, enable these drivers to make use of
the option.
This is the only use of bdrv_set_read_only(), so we can make it a bit
more specific and turn it into a bdrv_apply_auto_read_only() that is
more convenient for drivers to use.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Commit e2b8247a322 introduced an error path in qemu_rbd_open() after
calling rbd_open(), but neglected to close the image again in this error
path. The error path should contain everything that the regular close
function qemu_rbd_close() contains.
This adds the missing rbd_close() call.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Calling error_report() in a function that takes an Error ** argument
is suspicious. Convert a few that are actually warnings to
warn_report().
While there, split warnings consisting of multiple sentences to
conform to conventions spelled out in warn_report()'s contract, and
improve a rather useless warning in sheepdog.c.
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Lieven <pl@kamp.de>
Cc: Liu Yuan <namei.unix@gmail.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20181017082702.5581-4-armbru@redhat.com>
Drop changes to "without an explicit read-only=on" warnings, because
there's a series removing them pending. Also drop a cc: to a former
Sheepdog maintainer.
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
When we converted rbd to get rid of the older key/value-centric
encoding format, we broke compatibility with image files with backing
file strings encoded in the old format.
This leaves a bit of an ugly conundrum, and a hacky solution.
If the initial attempt to parse the "proper" options fails, it assumes
that we may have an older key/value encoded filename. Fall back to
attempting to parse the filename, and extract the required options from
it. If that fails, pass along the original error message.
We do not support mixed modern usage alongside legacy keyvalue pair
usage.
A deprecation warning has been added, although care should be taken
when actually deprecating since the impact is not limited to
commandline or qapi usage, but also opening existing images.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Message-id: 15b332e5432ad069441f7275a46080f465d789a0.1536704901.git.jcody@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
Code movement to pull the conversion from Qdict to BlockdevOptionsRbd
into a helper function.
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Message-id: 5b49a980f2cde6610ab1df41bb0277d00b5db893.1536704901.git.jcody@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
bdrv_truncate() is an operation that can block (even for a quite long
time, depending on the PreallocMode) in I/O paths that shouldn't block.
Convert it to a coroutine_fn so that we have the infrastructure for
drivers to make their .bdrv_co_truncate implementation asynchronous.
This change could potentially introduce new race conditions because
bdrv_truncate() isn't necessarily executed atomically any more. Whether
this is a problem needs to be evaluated for each block driver that
supports truncate:
* file-posix/win32, gluster, iscsi, nfs, rbd, ssh, sheepdog: The
protocol drivers are trivially safe because they don't actually yield
yet, so there is no change in behaviour.
* copy-on-read, crypto, raw-format: Essentially just filter drivers that
pass the request to a child node, no problem.
* qcow2: The implementation modifies metadata, so it needs to hold
s->lock to be safe with concurrent I/O requests. In order to avoid
double locking, this requires pulling the locking out into
preallocate_co() and using qcow2_write_caches() instead of
bdrv_flush().
* qed: Does a single header update, this is fine without locking.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Legacy -drive supports "password-secret" parameter that isn't
available with -blockdev / blockdev-add. That's because we backed out
our first try to provide it there due to interface design doubts, in
commit 577d8c9a811, v2.9.0.
This is the second try. It brings back the parameter, except it's
named "key-secret" now.
Let's review our reasons for backing out the first try, as stated in
the commit message:
* BlockdevOptionsRbd member @password-secret isn't actually a
password, it's a key generated by Ceph.
Addressed by the rename.
* We're not sure where member @password-secret belongs (see the
previous commit).
See previous commit.
* How @password-secret interacts with settings from a configuration
file specified with @conf is undocumented.
Not actually true, the documentation for @conf says "Values in the
configuration file will be overridden by options specified via QAPI",
and we've tested this.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Parameter auth-client-required lets you configure authentication
methods. We tried to provide that in v2.9.0, but backed out due to
interface design doubts (commit 464444fcc16).
This commit is similar to what we backed out, but simpler: we use a
list of enumeration values instead of a list of objects with a member
of enumeration type.
Let's review our reasons for backing out the first try, as stated in
the commit message:
* The implementation uses deprecated rados_conf_set() key
"auth_supported". No biggie.
Fixed: we use "auth-client-required".
* The implementation makes -drive silently ignore invalid parameters
"auth" and "auth-supported.*.X" where X isn't "auth". Fixable (in
fact I'm going to fix similar bugs around parameter server), so
again no biggie.
That fix is commit 2836284db60. This commit doesn't bring the bugs
back.
* BlockdevOptionsRbd member @password-secret applies only to
authentication method cephx. Should it be a variant member of
RbdAuthMethod?
We've had time to ponder, and we decided to stick to the way Ceph
configuration works: the key configured separately, and silently
ignored if the authentication method doesn't use it.
* BlockdevOptionsRbd member @user could apply to both methods cephx
and none, but I'm not sure it's actually used with none. If it
isn't, should it be a variant member of RbdAuthMethod?
Likewise.
* The client offers a *set* of authentication methods, not a list.
Should the methods be optional members of BlockdevOptionsRbd instead
of members of list @auth-supported? The latter begs the question
what multiple entries for the same method mean. Trivial question
now that RbdAuthMethod contains nothing but @type, but less so when
RbdAuthMethod acquires other members, such the ones discussed above.
Again, we decided to stick to the way Ceph configuration works, except
we make auth-client-required a list of enumeration values instead of a
string containing keywords separated by delimiters.
* How BlockdevOptionsRbd member @auth-supported interacts with
settings from a configuration file specified with @conf is
undocumented. I suspect it's untested, too.
Not actually true, the documentation for @conf says "Values in the
configuration file will be overridden by options specified via QAPI",
and we've tested this.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Configuration flows through the block subsystem in a rather peculiar
way. Configuration made with -drive enters it as QemuOpts.
Configuration made with -blockdev / blockdev-add enters it as QAPI
type BlockdevOptions. The block subsystem uses QDict, QemuOpts and
QAPI types internally. The precise flow is next to impossible to
explain (I tried for this commit message, but gave up after wasting
several hours). What I can explain is a flaw in the BlockDriver
interface that leads to this bug:
$ qemu-system-x86_64 -blockdev node-name=n1,driver=nfs,server.type=inet,server.host=localhost,path=/foo/bar,user=1234
qemu-system-x86_64: -blockdev node-name=n1,driver=nfs,server.type=inet,server.host=localhost,path=/foo/bar,user=1234: Internal error: parameter user invalid
QMP blockdev-add is broken the same way.
Here's what happens. The block layer passes configuration represented
as flat QDict (with dotted keys) to BlockDriver methods
.bdrv_file_open(). The QDict's members are typed according to the
QAPI schema.
nfs_file_open() converts it to QAPI type BlockdevOptionsNfs, with
qdict_crumple() and a qobject input visitor.
This visitor comes in two flavors. The plain flavor requires scalars
to be typed according to the QAPI schema. That's the case here. The
keyval flavor requires string scalars. That's not the case here.
nfs_file_open() uses the latter, and promptly falls apart for members
@user, @group, @tcp-syn-count, @readahead-size, @page-cache-size,
@debug.
Switching to the plain flavor would fix -blockdev, but break -drive,
because there the scalars arrive in nfs_file_open() as strings.
The proper fix would be to replace the QDict by QAPI type
BlockdevOptions in the BlockDriver interface. Sadly, that's beyond my
reach right now.
Next best would be to fix the block layer to always pass correctly
typed QDicts to the BlockDriver methods. Also beyond my reach.
What I can do is throw another hack onto the pile: have
nfs_file_open() convert all members to string, so use of the keyval
flavor actually works, by replacing qdict_crumple() by new function
qdict_crumple_for_keyval_qiv().
The pattern "pass result of qdict_crumple() to
qobject_input_visitor_new_keyval()" occurs several times more:
* qemu_rbd_open()
Same issue as nfs_file_open(), but since BlockdevOptionsRbd has only
string members, its only a latent bug. Fix it anyway.
* parallels_co_create_opts(), qcow_co_create_opts(),
qcow2_co_create_opts(), bdrv_qed_co_create_opts(),
sd_co_create_opts(), vhdx_co_create_opts(), vpc_co_create_opts()
These work, because they create the QDict with
qemu_opts_to_qdict_filtered(), which creates only string scalars.
The function sports a TODO comment asking for better typing; that's
going to be fun. Use qdict_crumple_for_keyval_qiv() to be safe.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
There are numerous QDict functions that have been introduced for and are
used only by the block layer. Move their declarations into an own
header file to reflect that.
While qdict_extract_subqdict() is in fact used outside of the block
layer (in util/qemu-config.c), it is still a function related very
closely to how the block layer works with nested QDicts, namely by
sometimes flattening them. Therefore, its declaration is put into this
header as well and util/qemu-config.c includes it with a comment stating
exactly which function it needs.
Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20180509165530.29561-7-mreitz@redhat.com>
[Copyright note tweaked, superfluous includes dropped]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Parameter "filename" is deprecated since commit 91589d9e5ca, v2.10.0.
Time to get rid of it.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We are gradually moving away from sector-based interfaces, towards
byte-based. Make the change for the last few sector-based callbacks
in the rbd driver.
Note that the driver was already using byte-based calls for
performing actual I/O, so this just gets rid of a round trip
of scaling; however, as I don't know if RBD is tolerant of
non-sector AIO operations, I went with the conservate approach
of adding .bdrv_refresh_limits to override the block layer
defaults back to the pre-patch value of 512.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Now that we can safely call QOBJECT() on QObject * as well as its
subtypes, we can have macros qobject_ref() / qobject_unref() that work
everywhere instead of having to use QINCREF() / QDECREF() for QObject
and qobject_incref() / qobject_decref() for its subtypes.
The replacement is mechanical, except I broke a long line, and added a
cast in monitor_qmp_cleanup_req_queue_locked(). Unlike
qobject_decref(), qobject_unref() doesn't accept void *.
Note that the new macros evaluate their argument exactly once, thus no
need to shout them.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180419150145.24795-4-marcandre.lureau@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Rebased, semantic conflict resolved, commit message improved]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Commit 4bfb274 added some QAPIfication of option parsing in
qemu_rbd_open(). We need to remove all the options we processed,
otherwise in bdrv_open_inherit() we will think the remaining options are
invalid.
(This needs to go in 2.12 to avoid a regression that prevents rbd
from being opened.)
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>