These have very simple generators and no need for complex group
decoding. Apart from LAR/LSL which are simplified to use
gen_op_deposit_reg_v and movcond, the code is generally lifted
from translate.c into the generators.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This is already partly implemented due to VLDMXCSR and VSTMXCSR; finish
the job.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The calculation of FrameTemp is done using the size indicated by mo_pushpop()
before being written back to EBP, but the final writeback to EBP is done using
the size indicated by mo_stacksize().
In the case where mo_pushpop() is MO_32 and mo_stacksize() is MO_16 then the
final writeback to EBP is done using MO_16 which can leave junk in the top
16-bits of EBP after executing ENTER.
Change the writeback of EBP to use the same size indicated by mo_pushpop() to
ensure that the full value is written back.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2198
Message-ID: <20240606095319.229650-5-mark.cave-ayland@ilande.co.uk>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 3973615e7fbaeef1deeaa067577e373781ced70a)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
The calculation of FrameTemp is done using the size indicated by mo_pushpop()
before being written back to EBP, but the final writeback to EBP is done using
the size indicated by mo_stacksize().
In the case where mo_pushpop() is MO_32 and mo_stacksize() is MO_16 then the
final writeback to EBP is done using MO_16 which can leave junk in the top
16-bits of EBP after executing ENTER.
Change the writeback of EBP to use the same size indicated by mo_pushpop() to
ensure that the full value is written back.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2198
Message-ID: <20240606095319.229650-5-mark.cave-ayland@ilande.co.uk>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
DISAS_NORETURN suppresses the work normally done by gen_eob(), and therefore
must be used in special cases only. Document them.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
From vm entry to exit, VMRUN is handled as a single instruction. It
uses DISAS_NORETURN in order to avoid processing TF or RF before
the first instruction executes in the guest. However, the corresponding
handling is missing in vmexit. Add it, and at the same time reorganize
the comments with quotes from the manual about the tasks performed
by a #VMEXIT.
Another gen_eob() task that is missing in VMRUN is preparing the
HF_INHIBIT_IRQ flag for the next instruction, in this case by loading
it from the VMCB control state.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
ICEBP generates a trap-like exception, while gen_exception() produces
a fault. Resurrect gen_update_eip_next() to implement the desired
semantics.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When preparing an exception stack frame for a fault exception, the value
pushed for RF is 1. Take that into account. The same should be true
of interrupts for repeated string instructions, but the situation there
is complicated.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
DisasContext.cpuid_ext_features indicates CPUID.01H.ECX.
Use DisasContext.cpuid_7_0_ecx_features field to check RDPID feature bit
(CPUID_7_0_ECX_RDPID).
Fixes: 6750485bf42a ("target/i386: implement RDPID in TCG")
Inspired-by: Xinyu Li <lixinyu20s@ict.ac.cn>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Message-ID: <20240603080723.1256662-1-zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Intel SDM 18.3.1.4 "If an occurrence of the MOV or POP instruction
loads the SS register executes with EFLAGS.TF = 1, no single-step debug
exception occurs following the MOV or POP instruction."
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit f0f0136abba688a6516647a79cc91e03fad6d5d7)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
If EFLAGS.RF is 1, special processing in gen_eob_worker() is needed and
therefore goto_tb cannot be used.
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 8225bff7c5db504f50e54ef66b079854635dba70)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Reject 0x66/0xf3/0xf2 in front of them.
Cc: qemu-stable@nongnu.org
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 40a3ec7b5ffde500789d016660a171057d6b467c)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
According to the manual, 32-bit vs 64-bit is governed by REX.W
and REX ignores the 0x66 prefix. This can be confirmed with this
program:
#include <stdio.h>
int main()
{
int x = 0x12340000;
int y;
asm("popcntl %1, %0" : "=r" (y) : "r" (x)); printf("%x\n", y);
asm("mov $-1, %0; .byte 0x66; popcntl %1, %0" : "+r" (y) : "r" (x)); printf("%x\n", y);
asm("mov $-1, %0; .byte 0x66; popcntq %q1, %q0" : "+r" (y) : "r" (x)); printf("%x\n", y);
}
which prints 5/ffff0000/5 on real hardware and 5/ffff0000/ffff0000
on QEMU.
Cc: qemu-stable@nongnu.org
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 41c685dc59bb611096f3bb6a663cfa82e4cba97b)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(Mjt: drop removal of mo_64_32() helper function in target/i386/tcg/translate.c
due to missing-in-9.0 v9.0.0-542-gaef4f4affde2
"target/i386: remove now-converted opcodes from old decoder"
which removed other user of it)
Do not bother generating inline wrappers for gen_repz and gen_repz2;
use s->prefix to separate REPZ from REPNZ in the case of SCAS and
CMPS.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Generalize gen_stack_A0() to include an initial add and to use an arbitrary
destination. This is a common pattern and it is not a huge burden to
add the extra arguments to the only caller of gen_stack_A0().
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use mo_stacksize for all stack accesses, including when
a 64-bit code segment is impossible and the code is
therefore checking only for SS32(s).
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It is only used in MONITOR, where a direct call of gen_lea_v_seg
is simpler, and in XLAT. Inline it in the latter.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The is_store argument of gen_ldst_modrm has only ever been passed
a constant. Just split the function in two.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Values other than OR_TMP0 were only ever used by MOV and MOVNTI
opcodes. Now that these have been converted to the new decoder,
remove the argument.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Make gen_eob take the DISAS_* constant as an argument, so that
it is not necessary to have wrappers around it.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This is an invariant now that there are no calls to gen_eob_inhibit_irq()
outside tb_stop.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
sti only has one exit, so it does not need to generate the
end-of-translation code inline. It can be deferred to tb_stop.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
syscall and sysret only have one exit, so they do not need to
generate the end-of-translation code inline. It can be
deferred to tb_stop.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Place DISAS_* constants that update cpu_eip first, and
the "jump" ones last. Add comments explaining the differences
and usage.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Mark cc_op as clean and do not spill it at the end of the translation block.
Technically this is a tiny bit less efficient, but:
* it results in translations that are a tiny bit smaller
* for most of these instructions, it is not unlikely that they are close to
the end of the basic block, in which case cc_op would not be overwritten
* anyway the cost is probably dwarfed by that of computing flags.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
No need to set it again at the end of the translation block, cc_op_dirty
can be set to false.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This is already handled in gen_eob(). Before adding another DISAS_*
case, remove the double calls.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
gen_helper_rsm cannot generate an exception, and reloads the flags.
So there's no need to spill cc_op and update cpu_eip, but on the
other hand cc_op must be reset to CC_OP_EFLAGS before returning.
It all works by chance, because by spilling cc_op before the call
to the helper, it becomes non-dirty and gen_eob will not overwrite
the CC_OP_EFLAGS value that is placed there by the helper. But
let's clean it up.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Intel SDM 18.3.1.4 "If an occurrence of the MOV or POP instruction
loads the SS register executes with EFLAGS.TF = 1, no single-step debug
exception occurs following the MOV or POP instruction."
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If EFLAGS.RF is 1, special processing in gen_eob_worker() is needed and
therefore goto_tb cannot be used.
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Almost all of the disas_log implementations are identical.
Unify them within translator_loop.
Drop extra Priv/Virt logging from target/riscv.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
These are trivial to add, and moving them to the new decoder fixes some
corner cases: raising #UD instead of an instruction fetch page fault for
the undefined opcodes, and incorrectly rejecting 0F 18 prefetches with
register operands (which are treated as reserved NOPs).
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reject 0x66/0xf3/0xf2 in front of them.
Cc: qemu-stable@nongnu.org
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
According to the manual, 32-bit vs 64-bit is governed by REX.W
and REX ignores the 0x66 prefix. This can be confirmed with this
program:
#include <stdio.h>
int main()
{
int x = 0x12340000;
int y;
asm("popcntl %1, %0" : "=r" (y) : "r" (x)); printf("%x\n", y);
asm("mov $-1, %0; .byte 0x66; popcntl %1, %0" : "+r" (y) : "r" (x)); printf("%x\n", y);
asm("mov $-1, %0; .byte 0x66; popcntq %q1, %q0" : "+r" (y) : "r" (x)); printf("%x\n", y);
}
which prints 5/ffff0000/5 on real hardware and 5/ffff0000/ffff0000
on QEMU.
Cc: qemu-stable@nongnu.org
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The PCOMMIT instruction was never included in any physical processor.
TCG implements it as a no-op instruction, but its utility is debatable
to say the least. Drop it from the decoder since it is only available
with "-cpu max", which does not guarantee migration compatibility
across versions, and deprecate the property just in case someone is
using it as "pcommit=off".
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When emulated with QEMU, interrupts will never come in the following
loop. However, if the NOP instruction is uncommented, interrupts will
fire as normal.
loop:
cli
call do_sti
jmp loop
do_sti:
sti
# nop
ret
This behavior is different from that of a real processor. For example,
if KVM is enabled, interrupts will always fire regardless of whether the
NOP instruction is commented or not. Also, the Intel Software Developer
Manual states that after the STI instruction is executed, the interrupt
inhibit should end as soon as the next instruction (e.g., the RET
instruction if the NOP instruction is commented) is executed.
This problem is caused because the previous code may choose not to end
the TB even if the HF_INHIBIT_IRQ_MASK has just been reset (e.g., in the
case where the STI instruction is immediately followed by the RET
instruction), so that IRQs may not have a change to trigger. This commit
fixes the problem by always terminating the current TB to give IRQs a
chance to trigger when HF_INHIBIT_IRQ_MASK is reset.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Ruihan Li <lrh2000@pku.edu.cn>
Message-ID: <20240415064518.4951-4-lrh2000@pku.edu.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 6a5a63f74ba5c5355b7a8468d3d814bfffe928fb)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Now that a bulk of opcodes go through the new decoder, it is sensible
to do some cleanup. Go immediately through disas_insn_new and only jump
back after parsing the prefixes.
disas_insn() now only contains the three sigsetjmp cases, and they
are more easily managed if they are inlined into i386_tr_translate_insn.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Split the bits that have some duplication with disas_insn_new, from
those that should be the main topic of the conversion. This is the
first step towards removing duplicate decoding of prefixes between
disas_insn and disas_insn_new.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
These are unlikely to be converted to the table-based decoding
soon (perhaps there could be generic ESC decoding in decode-new.c.inc
for the Mod/RM byte, but not operand decoding), so keep them separate
from the remaining legacy-decoded instructions.
Acked-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Send all converted opcodes to disas_insn_new() directly from the big
decoding switch statement; once more, the debugging/bisecting logic
disappears.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
A few two-byte opcodes are simple extensions of existing one-byte opcodes;
they are easy to decode and need no change to emit.c.inc. Port them to
the new decoder.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move long-displacement Jcc, SETcc and CMOVcc to the new decoder.
While filling in the tables makes the code seem longer, the new
emitters are all just one line of code.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Since new opcodes are not going to be added in translate.c, round the
case labels that call to disas_insn_new(), including whole sets of
eight opcodes when possible.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The shift instructions are rewritten instead of reusing code from the old
decoder. Rotates use CC_OP_ADCOX more extensively and generally rely
more on the optimizer, so that the code generators are shared between
the immediate-count and variable-count cases.
In particular, this makes gen_RCL and gen_RCR pretty efficient for the
count == 1 case, which becomes (apart from a few extra movs) something like:
(compute_cc_all if needed)
// save old value for OF calculation
mov cc_src2, T0
// the bulk of RCL is just this!
deposit T0, cc_src, T0, 1, TARGET_LONG_BITS - 1
// compute carry
shr cc_dst, cc_src2, length - 1
and cc_dst, cc_dst, 1
// compute overflow
xor cc_src2, cc_src2, T0
extract cc_src2, cc_src2, length - 1, 1
32-bit MUL and IMUL are also slightly more efficient on 64-bit hosts.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In the new decoder it is sometimes easier to put the segment
in T1 instead of T0, usually because another operand was loaded
by common code in T0. Genrealize gen_movl_seg_T0 to allow
using any source.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Compared to the old decoder, the main differences in translation
are for the little-used ARPL instruction. IMUL is adjusted a bit
to share more code to produce flags, but is otherwise very similar.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>