Since the cost of gen_update_cc_op() must be paid anyway, it's easier
to place them manually and not rely on spilling that is buried under
multiple levels of function calls. While at it, clarify the circumstances
in which the gen_update_cc_op() is needed, and why it is not for REPxx
SCAS and REPxx CMPS.
And since cc_op will have been spilled at the point of a fault, just
make the whole insn CC_OP_DYNAMIC. Once repz_opt is reintroduced,
a fault could happen either before or after the first execution of
CMPS/SCAS, and CC_OP_DYNAMIC sidesteps the complicated matter of what
x86_restore_state_to_opc would do.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20241215090613.89588-9-pbonzini@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
RF must be set on traps and interrupts from a string instruction,
except if they occur after the last iteration. Ensure it is set
before giving the main loop a chance to execute.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20241215090613.89588-8-pbonzini@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Allow using them in the code that translates REP/REPZ, without
forward declarations.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20241215090613.89588-7-pbonzini@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The condition for optimizing repeat instruction is more or less the
opposite of what you imagine: almost always the string instruction
was _not_ optimized and optimizing the loop relied on goto_tb.
This is obviously not great for performance, due to the cost of the
exit-to-main-loop check, but also wrong. In fact, after expanding
dc->jmp_opt and simplifying "!!x" to "x", the condition for looping used
to be:
((cflags & CF_NO_GOTO_TB) ||
(flags & (HF_RF_MASK | HF_TF_MASK | HF_INHIBIT_IRQ_MASK))) && !(cflags & CF_USE_ICOUNT)
In other words, setting aside RF (it requires special handling for REP
instructions and it was completely missing), repeat instruction were
being optimized if TF or inhibit IRQ flags were set. This is certainly
wrong for TF, because string instructions trap after every execution,
and probably for interrupt shadow too.
Get rid of repz_opt completely. The next patches will reintroduce the
optimization, applying it in the common case instead of the unlikely
and wrong one.
While at it, place the CX/ECX/RCX=0 case is at the end of the function,
which saves a label and is clearer when reading the generated ops.
For clarity, mark the cc_op explicitly as DYNAMIC even if at the end
of the translation block; the cc_op can come from either the previous
instruction or the string instruction, and currently we rely on
a gen_update_cc_op() that is hidden in the bowels of gen_jcc() to
spill cc_op and mark it clean.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20241215090613.89588-6-pbonzini@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The same "if" is present in all generator functions for string instructions.
Push it inside gen_repz() and gen_repz_nz() instead.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20241215090613.89588-5-pbonzini@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It only differs in a single call to gen_jcc, so use a "bool" argument
to distinguish the two cases; do not duplicate code.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20241215090613.89588-4-pbonzini@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This is not needed anymore now that gen_jcc has been eliminated
(merged into the similarly-named gen_Jcc, where the uppercase letter
gives away that it is an emission function).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20241215090613.89588-3-pbonzini@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The code of gen_Jcc is very similar to gen_LOOP* and gen_JCXZ, but this
is hidden by gen_jcc.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20241215090613.89588-2-pbonzini@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In x86_cpu_filter_features(), if host doesn't support AVX10, the
configured avx10_version should be marked as filtered regardless of
whether prefix is NULL or not.
Check prefix before warn_report() instead of checking for
have_filtered_features.
Cc: qemu-stable@nongnu.org
Fixes: commit bccfb846fd52 ("target/i386: add AVX10 feature and AVX10 version property")
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Tao Su <tao1.su@linux.intel.com>
Link: https://lore.kernel.org/r/20241106030728.553238-2-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit cf4c263551886964c5d58bd7b675b13fd497b402)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Do not reference TCG_TARGET_HAS_* directly.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Avoid direct usage of TCG_TARGET_deposit_*_valid.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The correct usage is tracking and maintaining features in env->features[]
instead of manually set it in cpu_x86_cpuid().
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20241219110125.1266461-11-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently CPUID_HT is evaluated in cpu_x86_cpuid() each time. It's not a
correct usage of how feature bit is maintained and evaluated. The
expected practice is that features are tracked in env->features[] and
cpu_x86_cpuid() should be the consumer of env->features[].
Track CPUID_HT in env->features[FEAT_1_EDX] instead and evaluate it in
cpu's realizefn().
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20241219110125.1266461-10-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now it changes to use env->topo_info.threads_per_core and doesn't depend
on qemu_init_vcpu() anymore. Put it together with other feature checks
before qemu_init_vcpu()
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20241219110125.1266461-8-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The name of nr_modules/nr_dies are ambiguous and they mislead people.
The purpose of them is to record and form the topology information. So
just maintain a X86CPUTopoInfo member in CPUX86State instead. Then
nr_modules and nr_dies can be dropped.
As the benefit, x86 can switch to use information in
CPUX86State::topo_info and get rid of the nr_cores and nr_threads in
CPUState. This helps remove the dependency on qemu_init_vcpu(), so that
x86 can get and use topology info earlier in x86_cpu_realizefn(); drop
the comment that highlighted the depedency.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20241219110125.1266461-7-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Introduce various helpers for getting the topology info of different
semantics. Using the helper is more self-explanatory.
Besides, the semantic of the helper will stay unchanged even when new
topology is added in the future. At that time, updating the
implementation of the helper without affecting the callers.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20241219110125.1266461-6-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Local variable cores_per_pkg is only used to calculate threads_per_pkg.
No need for it. Drop it and open-code it instead.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20241219110125.1266461-4-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
There are duplicated code to setup the value of MSR_CORE_THREAD_COUNT.
Extract a common function for it.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20241219110125.1266461-2-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
kvm_install_msr_filters() uses KVM_MSR_FILTER_MAX_RANGES as the bound
when traversing msr_handlers[], while other places still compute the
size by ARRAY_SIZE(msr_handlers).
In fact, msr_handlers[] is an array with the fixed size
KVM_MSR_FILTER_MAX_RANGES, and this has to be true because
kvm_install_msr_filters copies from one array to the other.
For code consistency, assert that they match and use
ARRAY_SIZE(msr_handlers) everywehere.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently, there're following incorrect error handling cases in
kvm_arch_init():
* Missed to handle failure of kvm_get_supported_feature_msrs().
* Missed to return when kvm_vm_enable_disable_exits() fails.
* MSR filter related cases called exit() directly instead of returning
to kvm_init(). (The caller of kvm_arch_init() - kvm_init() - needs to
know if kvm_arch_init() fails in order to perform cleanup).
Fix the above cases.
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Zide Chen <zide.chen@intel.com>
Link: https://lore.kernel.org/r/20241106030728.553238-11-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It is common practice to return a negative value (like -1) to indicate
an error, and other functions in kvm_arch_init() follow this style.
To avoid confusion (sometimes returned -1 indicates failure, and
sometimes -1, in a same function), return -1 when
kvm_msr_energy_thread_init() fails.
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20241106030728.553238-10-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Before commit 0cc42e63bb54 ("kvm/i386: refactor kvm_arch_init and split
it into smaller functions"), error_report() attempts to print the error
code from kvm_filter_msr(). However, printing error code does not work
due to kvm_filter_msr() returns bool instead int.
0cc42e63bb54 fixed the error by removing error code printing, but this
lost useful error messages. Bring it back by making kvm_filter_msr()
return int.
This also makes the function call chain processing clearer, allowing for
better handling of error result propagation from kvm_filter_msr() to
kvm_arch_init(), preparing for the subsequent cleanup work of error
handling in kvm_arch_init().
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Zide Chen <zide.chen@intel.com>
Link: https://lore.kernel.org/r/20241106030728.553238-9-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Update the comment to match the X86ConfidentialGuestClass
implementation.
Reported-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Reviewed-by: Zide Chen <zide.chen@intel.com>
Link: https://lore.kernel.org/r/20241106030728.553238-8-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The KVM_X86_DISABLE_EXITS_HTL typo has been fixed in commit
77d361b13c19 ("linux-headers: Update to kernel mainline commit
b357bf602").
Drop the related workaround.
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Zide Chen <zide.chen@intel.com>
Link: https://lore.kernel.org/r/20241106030728.553238-7-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
MSR_KVM_SYSTEM_TIME and MSR_KVM_WALL_CLOCK are attached with the (old)
kvmclock feature (KVM_FEATURE_CLOCKSOURCE).
So, just save/load them only when kvmclock (KVM_FEATURE_CLOCKSOURCE) is
enabled.
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Zide Chen <zide.chen@intel.com>
Link: https://lore.kernel.org/r/20241106030728.553238-5-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
These 2 MSRs have been already defined in kvm_para.h (standard-headers/
asm-x86/kvm_para.h).
Remove QEMU local definitions to avoid duplication.
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zide Chen <zide.chen@intel.com>
Link: https://lore.kernel.org/r/20241106030728.553238-4-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add feature definitions for KVM_CPUID_FEATURES in CPUID (
CPUID[4000_0001].EAX and CPUID[4000_0001].EDX), to get rid of lots of
offset calculations.
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Zide Chen <zide.chen@intel.com>
Link: https://lore.kernel.org/r/20241106030728.553238-3-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In x86_cpu_filter_features(), if host doesn't support AVX10, the
configured avx10_version should be marked as filtered regardless of
whether prefix is NULL or not.
Check prefix before warn_report() instead of checking for
have_filtered_features.
Cc: qemu-stable@nongnu.org
Fixes: commit bccfb846fd52 ("target/i386: add AVX10 feature and AVX10 version property")
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Tao Su <tao1.su@linux.intel.com>
Link: https://lore.kernel.org/r/20241106030728.553238-2-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Using a sextract or extract operation is only necessary if a
sign or zero extended value is needed. If not, a shift is
enough.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Because BT does not write back to the source operand, it can modify it to
ensure that one of the operands of TSTNE is a constant (after either gen_BT
or the optimizer's constant propagation). This produces better and more
optimizable TCG ops. For example, the sequence
movl $0x60013f, %ebx
btl %ecx, %ebx
becomes just
and_i32 tmp1,ecx,$0x1f dead: 1 2 pref=0xffff
shr_i32 tmp0,$0x60013f,tmp1 dead: 1 2 pref=0xffff
and_i32 tmp16,tmp0,$0x1 dead: 1 pref=0xbf80
On s390x, it can use four instructions to isolate bit 0 of 0x60013f >> (ecx & 31):
nilf %r12, 0x1f
lgfi %r11, 0x60013f
srlk %r12, %r11, 0(%r12)
nilf %r12, 1
Previously, it used five instructions to build 1 << (ecx & 31) and compute
TSTEQ, and also needed two more to construct the result of setcond:
nilf %r12, 0x1f
lghi %r11, 1
sllk %r12, %r11, 0(%r12)
lgfi %r9, 0x60013f
nrk %r0, %r12, %r9
lghi %r12, 0
locghilh %r12, 1
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Fixes: 644e3c5d812 ("missing vmx features for Skylake-Server and Cascadelake-Server")
Signed-off-by: Han Han <hhan@redhat.com>
Reviewed-by: Chenyi Qiang <chenyi.qiang@intel.com>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(cherry picked from commit 93dcc9390e5ad0696ae7e9b7b3a5b08c2d1b6de6)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Fixes: 644e3c5d812 ("missing vmx features for Skylake-Server and Cascadelake-Server")
Signed-off-by: Han Han <hhan@redhat.com>
Reviewed-by: Chenyi Qiang <chenyi.qiang@intel.com>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Convert all targets simultaneously, as the gen_intermediate_code
function disappears from the target. While there are possible
workarounds, they're larger than simply performing the conversion.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Since commit 5286c3662294 ("target/i386: properly reset TSC on reset")
QEMU writes the special value of "1" to each online vCPU TSC on VM reset
to reset it.
However parked vCPUs don't get that handling and due to that their TSCs
get desynchronized when the VM gets reset.
This in turn causes KVM to turn off PVCLOCK_TSC_STABLE_BIT in its exported
PV clock.
Note that KVM has no understanding of vCPU being currently parked.
Without PVCLOCK_TSC_STABLE_BIT the sched clock is marked unstable in
the guest's kvm_sched_clock_init().
This causes a performance regressions to show in some tests.
Fix this issue by writing the special value of "1" also to TSCs of parked
vCPUs on VM reset.
Reproducing the issue:
1) Boot a VM with "-smp 2,maxcpus=3" or similar
2) device_add host-x86_64-cpu,id=vcpu,node-id=0,socket-id=0,core-id=2,thread-id=0
3) Wait a few seconds
4) device_del vcpu
5) Inside the VM run:
# echo "t" >/proc/sysrq-trigger; dmesg | grep sched_clock_stable
Observe the sched_clock_stable() value is 1.
6) Reboot the VM
7) Once the VM boots once again run inside it:
# echo "t" >/proc/sysrq-trigger; dmesg | grep sched_clock_stable
Observe the sched_clock_stable() value is now 0.
Fixes: 5286c3662294 ("target/i386: properly reset TSC on reset")
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Link: https://lore.kernel.org/r/5a605a88e9a231386dc803c60f5fed9b48108139.1734014926.git.maciej.szmigiero@oracle.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 3f2a05b31ee9ce2ddb6c75a9bc3f5e7f7af9a76f)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Remove the single target-specific definition used in
"exec/translator.h" (TARGET_PAGE_MASK) by un-inlining
is_same_page().
Rename the method as translator_is_same_page() and
improve its documentation.
Use it in translator_use_goto_tb().
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20241218154145.71353-1-philmd@linaro.org>
TB compile flags, tb_page_addr_t type, tb_cflags() and few
other methods are defined in "exec/translation-block.h".
All these files don't include "exec/translation-block.h" but
include "exec/exec-all.h" which include it. Explicitly include
"exec/translation-block.h" to be able to remove it from
"exec/exec-all.h" later when it won't be necessary. Otherwise
we'd get errors such:
accel/tcg/internal-target.h:59:20: error: a parameter list without types is only allowed in a function definition
59 | void tb_lock_page0(tb_page_addr_t);
| ^
accel/tcg/tb-hash.h:64:23: error: unknown type name 'tb_page_addr_t'
64 | uint32_t tb_hash_func(tb_page_addr_t phys_pc, vaddr pc,
| ^
accel/tcg/tcg-accel-ops.c:62:36: error: use of undeclared identifier 'CF_CLUSTER_SHIFT'
62 | cflags = cpu->cluster_index << CF_CLUSTER_SHIFT;
| ^
accel/tcg/watchpoint.c:102:47: error: use of undeclared identifier 'CF_NOIRQ'
102 | cpu->cflags_next_tb = 1 | CF_NOIRQ | curr_cflags(cpu);
| ^
target/i386/helper.c:536:28: error: use of undeclared identifier 'CF_PCREL'
536 | if (tcg_cflags_has(cs, CF_PCREL)) {
| ^
target/rx/cpu.c:51:21: error: incomplete definition of type 'struct TranslationBlock'
51 | cpu->env.pc = tb->pc;
| ~~^
system/physmem.c:2977:9: error: call to undeclared function 'tb_invalidate_phys_range'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
2977 | tb_invalidate_phys_range(addr, addr + length - 1);
| ^
plugins/api.c:96:12: error: call to undeclared function 'tb_cflags'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
96 | return tb_cflags(tcg_ctx->gen_tb) & CF_MEMI_ONLY;
| ^
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20241114011310.3615-5-philmd@linaro.org>
The TranslationBlock flags are defined in 'exec/translation-block.h'.
tcg_cflags_has/set() use them, it is more logical to declare them in
the same place. Move them there too.
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20241212144430.66224-2-philmd@linaro.org>
"exec/confidential-guest-support.h" is specific to system
emulation, so move it under the system/ namespace.
Mechanical change doing:
$ sed -i \
-e 's,exec/confidential-guest-support.h,sysemu/confidential-guest-support.h,' \
$(git grep -l exec/confidential-guest-support.h)
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Message-Id: <20241218155913.72288-2-philmd@linaro.org>
Headers in include/sysemu/ are not only related to system
*emulation*, they are also used by virtualization. Rename
as system/ which is clearer.
Files renamed manually then mechanical change using sed tool.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Lei Yang <leiyang@redhat.com>
Message-Id: <20241203172445.28576-1-philmd@linaro.org>
"system/confidential-guest-support.h" is not needed,
remove it. Reorder #ifdef'ry to reduce declarations
exposed on user emulation.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Message-Id: <20241218155913.72288-3-philmd@linaro.org>
Since commit 5286c3662294 ("target/i386: properly reset TSC on reset")
QEMU writes the special value of "1" to each online vCPU TSC on VM reset
to reset it.
However parked vCPUs don't get that handling and due to that their TSCs
get desynchronized when the VM gets reset.
This in turn causes KVM to turn off PVCLOCK_TSC_STABLE_BIT in its exported
PV clock.
Note that KVM has no understanding of vCPU being currently parked.
Without PVCLOCK_TSC_STABLE_BIT the sched clock is marked unstable in
the guest's kvm_sched_clock_init().
This causes a performance regressions to show in some tests.
Fix this issue by writing the special value of "1" also to TSCs of parked
vCPUs on VM reset.
Reproducing the issue:
1) Boot a VM with "-smp 2,maxcpus=3" or similar
2) device_add host-x86_64-cpu,id=vcpu,node-id=0,socket-id=0,core-id=2,thread-id=0
3) Wait a few seconds
4) device_del vcpu
5) Inside the VM run:
# echo "t" >/proc/sysrq-trigger; dmesg | grep sched_clock_stable
Observe the sched_clock_stable() value is 1.
6) Reboot the VM
7) Once the VM boots once again run inside it:
# echo "t" >/proc/sysrq-trigger; dmesg | grep sched_clock_stable
Observe the sched_clock_stable() value is now 0.
Fixes: 5286c3662294 ("target/i386: properly reset TSC on reset")
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Link: https://lore.kernel.org/r/5a605a88e9a231386dc803c60f5fed9b48108139.1734014926.git.maciej.szmigiero@oracle.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now that all of the Property arrays are counted, we can remove
the terminator object from each array. Update the assertions
in device_class_set_props to match.
With struct Property being 88 bytes, this was a rather large
form of terminator. Saves 30k from qemu-system-aarch64.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Lei Yang <leiyang@redhat.com>
Link: https://lore.kernel.org/r/20241218134251.4724-21-richard.henderson@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Set the default NaN pattern explicitly, and remove the ifdef from
parts64_default_nan().
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20241202131347.498124-38-peter.maydell@linaro.org
Set the Float3NaNPropRule explicitly for i386. We had no
i386-specific behaviour in the old ifdef ladder, so we were using the
default "prefer a then b then c" fallback; this is actually the
correct per-the-spec handling for i386.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20241202131347.498124-25-peter.maydell@linaro.org