FRET-qemu

Author	SHA1	Message	Date
Zhao Liu	3568adc995	i386: Support modules_per_die in X86CPUTopoInfo Support module level in i386 cpu topology structure "X86CPUTopoInfo". Since x86 does not yet support the "modules" parameter in "-smp", X86CPUTopoInfo.modules_per_die is currently always 1. Therefore, the module level width in APIC ID, which can be calculated by "apicid_bitwidth_for_count(topo_info->modules_per_die)", is always 0 for now, so we can directly add APIC ID related helpers to support module level parsing. In addition, update topology structure in test-x86-topo.c. Tested-by: Yongwei Ma <yongwei.ma@intel.com> Co-developed-by: Zhuocheng Ding <zhuocheng.ding@intel.com> Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Message-ID: <20240424154929.1487382-14-zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-22 19:43:29 +02:00
Zhao Liu	81c392ab5c	i386: Introduce module level cpu topology to CPUX86State Intel CPUs implement module level on hybrid client products (e.g., ADL-N, MTL, etc) and E-core server products. A module contains a set of cores that share certain resources (in current products, the resource usually includes L2 cache, as well as module scoped features and MSRs). Module level support is the prerequisite for L2 cache topology on module level. With module level, we can implement the Guest's CPU topology and future cache topology to be consistent with the Host's on Intel hybrid client/E-core server platforms. Tested-by: Yongwei Ma <yongwei.ma@intel.com> Co-developed-by: Zhuocheng Ding <zhuocheng.ding@intel.com> Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Message-ID: <20240424154929.1487382-13-zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-22 19:43:29 +02:00
Zhao Liu	822bce9f58	i386/cpu: Decouple CPUID[0x1F] subleaf with specific topology level At present, the subleaf 0x02 of CPUID[0x1F] is bound to the "die" level. In fact, the specific topology level exposed in 0x1F depends on the platform's support for extension levels (module, tile and die). To help expose "module" level in 0x1F, decouple CPUID[0x1F] subleaf with specific topology level. Tested-by: Yongwei Ma <yongwei.ma@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-ID: <20240424154929.1487382-12-zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-22 19:43:29 +02:00
Zhao Liu	0f6ed7ba13	i386: Split topology types of CPUID[0x1F] from the definitions of CPUID[0xB] CPUID[0xB] defines SMT, Core and Invalid types, and this leaf is shared by Intel and AMD CPUs. But for extended topology levels, Intel CPU (in CPUID[0x1F]) and AMD CPU (in CPUID[0x80000026]) have the different definitions with different enumeration values. Though CPUID[0x80000026] hasn't been implemented in QEMU, to avoid possible misunderstanding, split topology types of CPUID[0x1F] from the definitions of CPUID[0xB] and introduce CPUID[0x1F]-specific topology types. Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Tested-by: Yongwei Ma <yongwei.ma@intel.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Tested-by: Babu Moger <babu.moger@amd.com> Message-ID: <20240424154929.1487382-11-zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-22 19:43:29 +02:00
Zhao Liu	6ddeb0ec8c	i386/cpu: Introduce bitmap to cache available CPU topology levels Currently, QEMU checks the specify number of topology domains to detect if there's extended topology levels (e.g., checking nr_dies). With this bitmap, the extended CPU topology (the levels other than SMT, core and package) could be easier to detect without touching the topology details. This is also in preparation for the follow-up to decouple CPUID[0x1F] subleaf with specific topology level. Tested-by: Yongwei Ma <yongwei.ma@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-ID: <20240424154929.1487382-10-zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-22 19:43:29 +02:00
Zhao Liu	2613747a79	i386/cpu: Consolidate the use of topo_info in cpu_x86_cpuid() In cpu_x86_cpuid(), there are many variables in representing the cpu topology, e.g., topo_info, cs->nr_cores and cs->nr_threads. Since the names of cs->nr_cores and cs->nr_threads do not accurately represent its meaning, the use of cs->nr_cores or cs->nr_threads is prone to confusion and mistakes. And the structure X86CPUTopoInfo names its members clearly, thus the variable "topo_info" should be preferred. In addition, in cpu_x86_cpuid(), to uniformly use the topology variable, replace env->dies with topo_info.dies_per_pkg as well. Suggested-by: Robert Hoo <robert.hu@linux.intel.com> Tested-by: Yongwei Ma <yongwei.ma@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Tested-by: Babu Moger <babu.moger@amd.com> Message-ID: <20240424154929.1487382-9-zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-22 19:43:29 +02:00
Zhao Liu	9a085c4b4a	i386/cpu: Use APIC ID info get NumSharingCache for CPUID[0x8000001D].EAX[bits 25:14] The commit 8f4202fb1080 ("i386: Populate AMD Processor Cache Information for cpuid 0x8000001D") adds the cache topology for AMD CPU by encoding the number of sharing threads directly. From AMD's APM, NumSharingCache (CPUID[0x8000001D].EAX[bits 25:14]) means [1]: The number of logical processors sharing this cache is the value of this field incremented by 1. To determine which logical processors are sharing a cache, determine a Share Id for each processor as follows: ShareId = LocalApicId >> log2(NumSharingCache+1) Logical processors with the same ShareId then share a cache. If NumSharingCache+1 is not a power of two, round it up to the next power of two. From the description above, the calculation of this field should be same as CPUID[4].EAX[bits 25:14] for Intel CPUs. So also use the offsets of APIC ID to calculate this field. [1]: APM, vol.3, appendix.E.4.15 Function 8000_001Dh--Cache Topology Information Tested-by: Yongwei Ma <yongwei.ma@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Tested-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-ID: <20240424154929.1487382-8-zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-22 19:43:29 +02:00
Zhao Liu	88dd4ca06c	i386/cpu: Use APIC ID info to encode cache topo in CPUID[4] Refer to the fixes of cache_info_passthrough ([1], [2]) and SDM, the CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits 31:26] should use the nearest power-of-2 integer. The nearest power-of-2 integer can be calculated by pow2ceil() or by using APIC ID offset/width (like L3 topology using 1 << die_offset [3]). But in fact, CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits 31:26] are associated with APIC ID. For example, in linux kernel, the field "num_threads_sharing" (Bits 25 - 14) is parsed with APIC ID. And for another example, on Alder Lake P, the CPUID.04H:EAX[bits 31:26] is not matched with actual core numbers and it's calculated by: "(1 << (pkg_offset - core_offset)) - 1". Therefore the topology information of APIC ID should be preferred to calculate nearest power-of-2 integer for CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits 31:26]: 1. d/i cache is shared in a core, 1 << core_offset should be used instead of "cs->nr_threads" in encode_cache_cpuid4() for CPUID.04H.00H:EAX[bits 25:14] and CPUID.04H.01H:EAX[bits 25:14]. 2. L2 cache is supposed to be shared in a core as for now, thereby 1 << core_offset should also be used instead of "cs->nr_threads" in encode_cache_cpuid4() for CPUID.04H.02H:EAX[bits 25:14]. 3. Similarly, the value for CPUID.04H:EAX[bits 31:26] should also be calculated with the bit width between the package and SMT levels in the APIC ID (1 << (pkg_offset - core_offset) - 1). In addition, use APIC ID bits calculations to replace "pow2ceil()" for cache_info_passthrough case. [1]: efb3934adf9e ("x86: cpu: make sure number of addressable IDs for processor cores meets the spec") [2]: d7caf13b5fcf ("x86: cpu: fixup number of addressable IDs for logical processors sharing cache") [3]: d65af288a84d ("i386: Update new x86_apicid parsing rules with die_offset support") Fixes: 7e3482f82480 ("i386: Helpers to encode cache information consistently") Suggested-by: Robert Hoo <robert.hu@linux.intel.com> Tested-by: Yongwei Ma <yongwei.ma@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Message-ID: <20240424154929.1487382-7-zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-22 19:43:26 +02:00
Zhao Liu	12f6b8280f	i386/cpu: Fix i/d-cache topology to core level for Intel CPU For i-cache and d-cache, current QEMU hardcodes the maximum IDs for CPUs sharing cache (CPUID.04H.00H:EAX[bits 25:14] and CPUID.04H.01H:EAX[bits 25:14]) to 0, and this means i-cache and d-cache are shared in the SMT level. This is correct if there's single thread per core, but is wrong for the hyper threading case (one core contains multiple threads) since the i-cache and d-cache are shared in the core level other than SMT level. For AMD CPU, commit 8f4202fb1080 ("i386: Populate AMD Processor Cache Information for cpuid 0x8000001D") has already introduced i/d cache topology as core level by default. Therefore, in order to be compatible with both multi-threaded and single-threaded situations, we should set i-cache and d-cache be shared at the core level by default. This fix changes the default i/d cache topology from per-thread to per-core. Potentially, this change in L1 cache topology may affect the performance of the VM if the user does not specifically specify the topology or bind the vCPU. However, the way to achieve optimal performance should be to create a reasonable topology and set the appropriate vCPU affinity without relying on QEMU's default topology structure. Fixes: 7e3482f82480 ("i386: Helpers to encode cache information consistently") Suggested-by: Robert Hoo <robert.hu@linux.intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Yongwei Ma <yongwei.ma@intel.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20240424154929.1487382-6-zhao1.liu@intel.com> [Add compat property. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-22 19:39:33 +02:00
Binbin Wu	0117067131	target/i386: add control bits support for LAM LAM uses CR3[61] and CR3[62] to configure/enable LAM on user pointers. LAM uses CR4[28] to configure/enable LAM on supervisor pointers. For CR3 LAM bits, no additional handling needed: - TCG LAM is not supported for TCG of target-i386. helper_write_crN() and helper_vmrun() check max physical address bits before calling cpu_x86_update_cr3(), no change needed, i.e. CR3 LAM bits are not allowed to be set in TCG. - gdbstub x86_cpu_gdb_write_register() will call cpu_x86_update_cr3() to update cr3. Allow gdb to set the LAM bit(s) to CR3, if vcpu doesn't support LAM, KVM_SET_SREGS will fail as other reserved bits. For CR4 LAM bit, its reservation depends on vcpu supporting LAM feature or not. - TCG LAM is not supported for TCG of target-i386. helper_write_crN() and helper_vmrun() check CR4 reserved bit before calling cpu_x86_update_cr4(), i.e. CR4 LAM bit is not allowed to be set in TCG. - gdbstub x86_cpu_gdb_write_register() will call cpu_x86_update_cr4() to update cr4. Mask out LAM bit on CR4 if vcpu doesn't support LAM. - x86_cpu_reset_hold() doesn't need special handling. Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> Tested-by: Xuelian Guo <xuelian.guo@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Message-ID: <20240112060042.19925-3-binbin.wu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-22 15:53:30 +02:00
Robert Hoo	ba67809059	target/i386: add support for LAM in CPUID enumeration Linear Address Masking (LAM) is a new Intel CPU feature, which allows software to use of the untranslated address bits for metadata. The bit definition: CPUID.(EAX=7,ECX=1):EAX[26] Add CPUID definition for LAM. Note LAM feature is not supported for TCG of target-i386, LAM CPIUD bit will not be added to TCG_7_1_EAX_FEATURES. More info can be found in Intel ISE Chapter "LINEAR ADDRESS MASKING(LAM)" https://cdrdv2.intel.com/v1/dl/getContent/671368 Signed-off-by: Robert Hoo <robert.hu@linux.intel.com> Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> Tested-by: Xuelian Guo <xuelian.guo@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Message-ID: <20240112060042.19925-2-binbin.wu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-22 15:53:30 +02:00
Paolo Bonzini	ec56891984	target/i386: clean up AAM/AAD The 32-bit AAM/AAD opcodes are using helpers that read and write flags and env->regs[R_EAX]. Clean them up so that the table correctly includes AX as a 16-bit input and output. No real reason to do it to be honest, but they are nice one-output helpers and it removes the masking of env->regs[R_EAX] that generic load/writeback code already does. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-ID: <20240522123912.608497-1-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-22 15:53:30 +02:00
Paolo Bonzini	d0414d71f6	target/i386: generate simpler code for ROL/ROR with immediate count gen_rot_carry and gen_rot_overflow are meant to be called with count == NULL if the count cannot be zero. However this is not done in gen_ROL and gen_ROR, and writing everywhere "can_be_zero ? count : NULL" is burdensome and less readable. Just pass can_be_zero as a separate argument. gen_RCL and gen_RCR use a conditional branch to skip the computation if count is zero, so they can pass false unconditionally to gen_rot_overflow. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-ID: <20240522123914.608516-1-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-22 15:53:30 +02:00
Richard Henderson	dfc7228be3	target/i386: Use translator_ldub for everything Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2024-05-15 08:55:19 +02:00
Richard Henderson	962a145cdc	accel/tcg: Provide default implementation of disas_log Almost all of the disas_log implementations are identical. Unify them within translator_loop. Drop extra Priv/Virt logging from target/riscv. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2024-05-15 08:55:18 +02:00
Paolo Bonzini	1b1badf3c5	i386: select correct components for no-board build The local APIC is a part of the CPU and has callbacks that are invoked from multiple accelerators. The IOAPIC on the other hand is optional, but ioapic_eoi_broadcast is used by common x86 code to implement the IOAPIC's implicit EOI mode. Add a stub in case the IOAPIC device is not included but the APIC is. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Message-ID: <20240509170044.190795-13-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-10 15:45:15 +02:00
Paolo Bonzini	fe01af5d47	target/i386: fix feature dependency for WAITPKG The VMX feature bit depends on general availability of WAITPKG, not the other way round. Fixes: 33cc88261c3 ("target/i386: add support for VMX_SECONDARY_EXEC_ENABLE_USER_WAIT_PAUSE", 2023-08-28) Cc: qemu-stable@nongnu.org Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-10 15:45:14 +02:00
Paolo Bonzini	3fabbe0b7d	target/i386: move prefetch and multi-byte UD/NOP to new decoder These are trivial to add, and moving them to the new decoder fixes some corner cases: raising #UD instead of an instruction fetch page fault for the undefined opcodes, and incorrectly rejecting 0F 18 prefetches with register operands (which are treated as reserved NOPs). Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-10 15:45:14 +02:00
Paolo Bonzini	40a3ec7b5f	target/i386: rdpkru/wrpkru are no-prefix instructions Reject 0x66/0xf3/0xf2 in front of them. Cc: qemu-stable@nongnu.org Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-10 15:45:14 +02:00
Paolo Bonzini	41c685dc59	target/i386: fix operand size for DATA16 REX.W POPCNT According to the manual, 32-bit vs 64-bit is governed by REX.W and REX ignores the 0x66 prefix. This can be confirmed with this program: #include <stdio.h> int main() { int x = 0x12340000; int y; asm("popcntl %1, %0" : "=r" (y) : "r" (x)); printf("%x\n", y); asm("mov $-1, %0; .byte 0x66; popcntl %1, %0" : "+r" (y) : "r" (x)); printf("%x\n", y); asm("mov $-1, %0; .byte 0x66; popcntq %q1, %q0" : "+r" (y) : "r" (x)); printf("%x\n", y); } which prints 5/ffff0000/5 on real hardware and 5/ffff0000/ffff0000 on QEMU. Cc: qemu-stable@nongnu.org Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-10 15:45:14 +02:00
Paolo Bonzini	9f07e47a5e	target/i386: remove PCOMMIT from TCG, deprecate property The PCOMMIT instruction was never included in any physical processor. TCG implements it as a no-op instruction, but its utility is debatable to say the least. Drop it from the decoder since it is only available with "-cpu max", which does not guarantee migration compatibility across versions, and deprecate the property just in case someone is using it as "pcommit=off". Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-10 15:45:14 +02:00
Ruihan Li	97ffb29998	target/i386: Give IRQs a chance when resetting HF_INHIBIT_IRQ_MASK When emulated with QEMU, interrupts will never come in the following loop. However, if the NOP instruction is uncommented, interrupts will fire as normal. loop: cli call do_sti jmp loop do_sti: sti # nop ret This behavior is different from that of a real processor. For example, if KVM is enabled, interrupts will always fire regardless of whether the NOP instruction is commented or not. Also, the Intel Software Developer Manual states that after the STI instruction is executed, the interrupt inhibit should end as soon as the next instruction (e.g., the RET instruction if the NOP instruction is commented) is executed. This problem is caused because the previous code may choose not to end the TB even if the HF_INHIBIT_IRQ_MASK has just been reset (e.g., in the case where the STI instruction is immediately followed by the RET instruction), so that IRQs may not have a change to trigger. This commit fixes the problem by always terminating the current TB to give IRQs a chance to trigger when HF_INHIBIT_IRQ_MASK is reset. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Ruihan Li <lrh2000@pku.edu.cn> Message-ID: <20240415064518.4951-4-lrh2000@pku.edu.cn> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> (cherry picked from commit 6a5a63f74ba5c5355b7a8468d3d814bfffe928fb) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2024-05-09 16:35:02 +03:00
Philippe Mathieu-Daudé	8b4d80bb53	misc: Use QEMU header path relative to include/ directory QEMU headers are relative to the include/ directory, not to the project root directory. Remove "include/". See also: https://www.qemu.org/docs/master/devel/style.html#include-directives Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20240507142737.95735-1-philmd@linaro.org>	2024-05-09 00:07:21 +02:00
Paolo Bonzini	d4e6d40c36	target/i386: remove duplicate prefix decoding Now that a bulk of opcodes go through the new decoder, it is sensible to do some cleanup. Go immediately through disas_insn_new and only jump back after parsing the prefixes. disas_insn() now only contains the three sigsetjmp cases, and they are more easily managed if they are inlined into i386_tr_translate_insn. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-07 08:53:26 +02:00
Paolo Bonzini	ef309ec2a6	target/i386: split legacy decoder into a separate function Split the bits that have some duplication with disas_insn_new, from those that should be the main topic of the conversion. This is the first step towards removing duplicate decoding of prefixes between disas_insn and disas_insn_new. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-07 08:53:26 +02:00
Paolo Bonzini	7795b455dd	target/i386: decode x87 instructions in a separate function These are unlikely to be converted to the table-based decoding soon (perhaps there could be generic ESC decoding in decode-new.c.inc for the Mod/RM byte, but not operand decoding), so keep them separate from the remaining legacy-decoded instructions. Acked-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-07 08:53:26 +02:00
Paolo Bonzini	aef4f4affd	target/i386: remove now-converted opcodes from old decoder Send all converted opcodes to disas_insn_new() directly from the big decoding switch statement; once more, the debugging/bisecting logic disappears. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-07 08:53:26 +02:00
Paolo Bonzini	3519b813e1	target/i386: port extensions of one-byte opcodes to new decoder A few two-byte opcodes are simple extensions of existing one-byte opcodes; they are easy to decode and need no change to emit.c.inc. Port them to the new decoder. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-07 08:53:26 +02:00
Paolo Bonzini	37861fa519	target/i386: move BSWAP to new decoder Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-07 08:53:26 +02:00
Paolo Bonzini	2b8046f361	target/i386: move remaining conditional operations to new decoder Move long-displacement Jcc, SETcc and CMOVcc to the new decoder. While filling in the tables makes the code seem longer, the new emitters are all just one line of code. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-07 08:53:26 +02:00
Paolo Bonzini	40c4b92f1c	target/i386: merge and enlarge a few ranges for call to disas_insn_new Since new opcodes are not going to be added in translate.c, round the case labels that call to disas_insn_new(), including whole sets of eight opcodes when possible. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-07 08:53:26 +02:00
Paolo Bonzini	d7c41a60d0	target/i386: move C0-FF opcodes to new decoder (except for x87) The shift instructions are rewritten instead of reusing code from the old decoder. Rotates use CC_OP_ADCOX more extensively and generally rely more on the optimizer, so that the code generators are shared between the immediate-count and variable-count cases. In particular, this makes gen_RCL and gen_RCR pretty efficient for the count == 1 case, which becomes (apart from a few extra movs) something like: (compute_cc_all if needed) // save old value for OF calculation mov cc_src2, T0 // the bulk of RCL is just this! deposit T0, cc_src, T0, 1, TARGET_LONG_BITS - 1 // compute carry shr cc_dst, cc_src2, length - 1 and cc_dst, cc_dst, 1 // compute overflow xor cc_src2, cc_src2, T0 extract cc_src2, cc_src2, length - 1, 1 32-bit MUL and IMUL are also slightly more efficient on 64-bit hosts. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-07 08:53:26 +02:00
Paolo Bonzini	b603136402	target/i386: generalize gen_movl_seg_T0 In the new decoder it is sometimes easier to put the segment in T1 instead of T0, usually because another operand was loaded by common code in T0. Genrealize gen_movl_seg_T0 to allow using any source. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-07 08:53:26 +02:00
Paolo Bonzini	5e9e21bcc4	target/i386: move 60-BF opcodes to new decoder Compared to the old decoder, the main differences in translation are for the little-used ARPL instruction. IMUL is adjusted a bit to share more code to produce flags, but is otherwise very similar. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-07 08:53:14 +02:00
Paolo Bonzini	2666fbd271	target/i386: allow instructions with more than one immediate While keeping decode->immediate for convenience and for 4-operand instructions, store the immediate in X86DecodedOp as well. This enables instructions with more than one immediate such as ENTER. It can also be used for far calls and jumps. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-07 08:52:19 +02:00
Paolo Bonzini	442e38c4fb	target/i386: extract gen_far_call/jmp, reordering temporaries Extract the code into new functions, and swap T0/T1 so that T0 corresponds to the first immediate in the instruction stream. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-07 08:52:19 +02:00
Paolo Bonzini	cc1d28bdbe	target/i386: move 00-5F opcodes to new decoder Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-07 08:52:19 +02:00
Paolo Bonzini	445457693c	target/i386: reintroduce debugging mechanism Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-07 08:52:12 +02:00
Paolo Bonzini	8b5de7ea56	target/i386: cleanup gen_eob Create a new wrapper for syscall/sysret, and do not go through multiple layers of wrappers. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-07 08:52:09 +02:00
Paolo Bonzini	ccfabc00e0	target/i386: clarify the "reg" argument of functions returning CCPrepare Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-07 08:52:02 +02:00
Paolo Bonzini	89e4e65ac0	target/i386: do not use s->T0 and s->T1 as scratch registers for CCPrepare Instead of using s->T0 or s->T1, create a scratch register when computing the C, NC, L or LE conditions. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-07 08:51:59 +02:00
Paolo Bonzini	bccb0c138e	target/i386: extend cc_* when using them to compute flags Instead of using s->tmp0 or s->tmp4 as the result, just extend the cc_* registers in place. It is harmless and, if multiple setcc instructions are used, the optimizer will be able to remove the redundant ones. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-07 08:51:53 +02:00
Paolo Bonzini	dd17322be7	target/i386: pull cc_op update to callers of gen_jmp_rel{,_csize} gen_update_cc_op must be called before control flow splits. Doing it in gen_jmp_rel{,_csize} may hide bugs, instead assert that cc_op is clean---even if that means a few more calls to gen_update_cc_op(). With this new invariant, setting cc_op to CC_OP_DYNAMIC is unnecessary since the caller should have done it. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-07 08:51:43 +02:00
Paolo Bonzini	bbba9594e8	target/i386: cleanup cc_op changes for REP/REPZ/REPNZ gen_update_cc_op must be called before control flow splits. Do it where the jump on ECX!=0 is translated. On the other hand, remove the call before gen_jcc1, which takes care of it already, and explain why REPZ/REPNZ need not use CC_OP_DYNAMIC---the translation block ends before any control-flow-dependent cc_op could be observed. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-07 08:51:31 +02:00
Paolo Bonzini	64ddadc6bb	target/i386: cc_op is not dynamic in gen_jcc1 Resetting cc_op to CC_OP_DYNAMIC should be done at control flow junctions, which is not the case here. This translation block is ending and the only effect of calling set_cc_op() would be a discard of s->cc_srcT. This discard is useless (it's a temporary, not a global) and in fact prevents gen_prepare_cc from returning s->cc_srcT. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-07 08:51:17 +02:00
Paolo Bonzini	e995f3f944	target/i386: remove mask from CCPrepare With the introduction of TSTEQ and TSTNE the .mask field is always -1, so remove all the now-unnecessary code. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-07 08:50:39 +02:00
Paolo Bonzini	9309b53e83	target/i386: use TSTEQ/TSTNE to check flags The new conditions obviously come in handy when testing individual bits of EFLAGS, and they make it possible to remove the .mask field of CCPrepare. Lowering to shift+and is done by the optimizer if necessary. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-07 08:50:39 +02:00
Paolo Bonzini	15957eb9ef	target/i386: use TSTEQ/TSTNE to test low bits When testing the sign bit or equality to zero of a partial register, it is useful to use a single TSTEQ or TSTNE operation. It can also be used to test the parity flag, using bit 0 of the population count. Do not do this for target_ulong-sized values however; the optimizer would produce a comparison against zero anyway, and it avoids shifts by 64 which are undefined behavior. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-07 08:50:39 +02:00
Babu Moger	b776569a53	target/i386: Fix CPUID encoding of Fn8000001E_ECX Observed the following failure while booting the SEV-SNP guest and the guest fails to boot with the smp parameters: "-smp 192,sockets=1,dies=12,cores=8,threads=2". qemu-system-x86_64: sev_snp_launch_update: SNP_LAUNCH_UPDATE ret=-5 fw_error=22 'Invalid parameter' qemu-system-x86_64: SEV-SNP: CPUID validation failed for function 0x8000001e, index: 0x0. provided: eax:0x00000000, ebx: 0x00000100, ecx: 0x00000b00, edx: 0x00000000 expected: eax:0x00000000, ebx: 0x00000100, ecx: 0x00000300, edx: 0x00000000 qemu-system-x86_64: SEV-SNP: failed update CPUID page Reason for the failure is due to overflowing of bits used for "Node per processor" in CPUID Fn8000001E_ECX. This field's width is 3 bits wide and can hold maximum value 0x7. With dies=12 (0xB), it overflows and spills over into the reserved bits. In the case of SEV-SNP, this causes CPUID enforcement failure and guest fails to boot. The PPR documentation for CPUID_Fn8000001E_ECX [Node Identifiers] ================================================================= Bits Description 31:11 Reserved. 10:8 NodesPerProcessor: Node per processor. Read-only. ValidValues: Value Description 0h 1 node per processor. 7h-1h Reserved. 7:0 NodeId: Node ID. Read-only. Reset: Fixed,XXh. ================================================================= As in the spec, the valid value for "node per processor" is 0 and rest are reserved. Looking back at the history of decoding of CPUID_Fn8000001E_ECX, noticed that there were cases where "node per processor" can be more than 1. It is valid only for pre-F17h (pre-EPYC) architectures. For EPYC or later CPUs, the linux kernel does not use this information to build the L3 topology. Also noted that the CPUID Function 0x8000001E_ECX is available only when TOPOEXT feature is enabled. This feature is enabled only for EPYC(F17h) or later processors. So, previous generation of processors do not not enumerate 0x8000001E_ECX leaf. There could be some corner cases where the older guests could enable the TOPOEXT feature by running with -cpu host, in which case legacy guests might notice the topology change. To address those cases introduced a new CPU property "legacy-multi-node". It will be true for older machine types to maintain compatibility. By default, it will be false, so new decoding will be used going forward. The documentation is taken from Preliminary Processor Programming Reference (PPR) for AMD Family 19h Model 11h, Revision B1 Processors 55901 Rev 0.25 - Oct 6, 2022. Cc: qemu-stable@nongnu.org Fixes: 31ada106d891 ("Simplify CPUID_8000_001E for AMD") Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Signed-off-by: Babu Moger <babu.moger@amd.com> Message-ID: <0ee4b0a8293188a53970a2b0e4f4ef713425055e.1714757834.git.babu.moger@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-05-07 08:50:38 +02:00
Richard Henderson	873f9ca385	Accelerator patches - Extract page-protection definitions to page-protection.h - Rework in accel/tcg in preparation of extracting TCG fields from CPUState - More uses of get_task_state() in user emulation - Xen refactors in preparation for adding multiple map caches (Juergen & Edgar) - MAINTAINERS updates (Aleksandar and Bin) -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmY40CAACgkQ4+MsLN6t wN5drxAA1oIsuUzpAJmlMIxZwlzbICiuexgn/HH9DwWNlrarKo7V1l4YB8jd9WOg IKuj7c39kJKsDEB8BXApYwcly+l7DYdnAAI8Z7a+eN+ffKNl/0XBaLjsGf58RNwY fb39/cXWI9ZxKxsHMSyjpiu68gOGvZ5JJqa30Fr+eOGuug9Fn/fOe1zC6l/dMagy Dnym72stpD+hcsN5sVwohTBIk+7g9og1O/ctRx6Q3ZCOPz4p0+JNf8VUu43/reaR 294yRK++JrSMhOVFRzP+FH1G25NxiOrVCFXZsUTYU+qPDtdiKtjH1keI/sk7rwZ7 U573lesl7ewQFf1PvMdaVf0TrQyOe6kUGr9Mn2k8+KgjYRAjTAQk8V4Ric/+xXSU 0rd7Cz7lyQ8jm0DoOElROv+lTDQs4dvm3BopF3Bojo4xHLHd3SFhROVPG4tvGQ3H 72Q5UPR2Jr2QZKiImvPceUOg0z5XxoN6KRUkSEpMFOiTRkbwnrH59z/qPijUpe6v 8l5IlI9GjwkL7pcRensp1VC6e9KC7F5Od1J/2RLDw3UQllMQXqVw2bxD3CEtDRJL QSZoS4d1jUCW4iAYdqh/8+2cOIPiCJ4ai5u7lSdjrIJkRErm32FV/pQLZauoHlT5 eTPUgzDoRXVgI1X1slTpVXlEEvRNbhZqSkYLkXr80MLn5hTafo0= =3Qkg -----END PGP SIGNATURE----- Merge tag 'accel-20240506' of https://github.com/philmd/qemu into staging Accelerator patches - Extract page-protection definitions to page-protection.h - Rework in accel/tcg in preparation of extracting TCG fields from CPUState - More uses of get_task_state() in user emulation - Xen refactors in preparation for adding multiple map caches (Juergen & Edgar) - MAINTAINERS updates (Aleksandar and Bin) # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmY40CAACgkQ4+MsLN6t # wN5drxAA1oIsuUzpAJmlMIxZwlzbICiuexgn/HH9DwWNlrarKo7V1l4YB8jd9WOg # IKuj7c39kJKsDEB8BXApYwcly+l7DYdnAAI8Z7a+eN+ffKNl/0XBaLjsGf58RNwY # fb39/cXWI9ZxKxsHMSyjpiu68gOGvZ5JJqa30Fr+eOGuug9Fn/fOe1zC6l/dMagy # Dnym72stpD+hcsN5sVwohTBIk+7g9og1O/ctRx6Q3ZCOPz4p0+JNf8VUu43/reaR # 294yRK++JrSMhOVFRzP+FH1G25NxiOrVCFXZsUTYU+qPDtdiKtjH1keI/sk7rwZ7 # U573lesl7ewQFf1PvMdaVf0TrQyOe6kUGr9Mn2k8+KgjYRAjTAQk8V4Ric/+xXSU # 0rd7Cz7lyQ8jm0DoOElROv+lTDQs4dvm3BopF3Bojo4xHLHd3SFhROVPG4tvGQ3H # 72Q5UPR2Jr2QZKiImvPceUOg0z5XxoN6KRUkSEpMFOiTRkbwnrH59z/qPijUpe6v # 8l5IlI9GjwkL7pcRensp1VC6e9KC7F5Od1J/2RLDw3UQllMQXqVw2bxD3CEtDRJL # QSZoS4d1jUCW4iAYdqh/8+2cOIPiCJ4ai5u7lSdjrIJkRErm32FV/pQLZauoHlT5 # eTPUgzDoRXVgI1X1slTpVXlEEvRNbhZqSkYLkXr80MLn5hTafo0= # =3Qkg # -----END PGP SIGNATURE----- # gpg: Signature made Mon 06 May 2024 05:42:08 AM PDT # gpg: using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE # gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [full] * tag 'accel-20240506' of https://github.com/philmd/qemu: (28 commits) MAINTAINERS: Update my email address MAINTAINERS: Update Aleksandar Rikalo email system: Pass RAM MemoryRegion and is_write in xen_map_cache() xen: mapcache: Break out xen_map_cache_init_single() xen: mapcache: Break out xen_invalidate_map_cache_single() xen: mapcache: Refactor xen_invalidate_map_cache_entry_unlocked xen: mapcache: Refactor xen_replace_cache_entry_unlocked xen: mapcache: Break out xen_ram_addr_from_mapcache_single xen: mapcache: Refactor xen_remap_bucket for multi-instance xen: mapcache: Refactor xen_map_cache for multi-instance xen: mapcache: Refactor lock functions for multi-instance xen: let xen_ram_addr_from_mapcache() return -1 in case of not found entry system: let qemu_map_ram_ptr() use qemu_ram_ptr_length() user: Use get_task_state() helper user: Declare get_task_state() once in 'accel/tcg/vcpu-state.h' user: Forward declare TaskState type definition accel/tcg: Move @plugin_mem_cbs from CPUState to CPUNegativeOffsetState accel/tcg: Restrict cpu_plugin_mem_cbs_enabled() to TCG accel/tcg: Restrict qemu_plugin_vcpu_exit_hook() to TCG plugins accel/tcg: Update CPUNegativeOffsetState::can_do_io field documentation ... Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2024-05-06 10:19:10 -07:00

... 3 4 5 6 7 ...

2153 Commits