Move CPU TLB related methods to "exec/cputlb.h".
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-ID: <20241114011310.3615-19-philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Move CPU TLB related methods to "exec/cputlb.h".
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20241114011310.3615-17-philmd@linaro.org>
AccelCPUClass is for accelerator to initialize target specific
features of a vCPU. Not really related to hardware emulation,
rename "hw/core/accel-cpu.h" as "accel/accel-cpu-target.h"
(using the explicit -target suffix).
More importantly, target specific header often access the
target specific definitions which are in each target/FOO/cpu.h
header, usually included generically as "cpu.h" relative to
target/FOO/. However, there is already a "cpu.h" in hw/core/
which takes precedence. This change allows "accel-cpu-target.h"
to include a target "cpu.h".
Mechanical change doing:
$ git mv include/hw/core/accel-cpu.h \
include/accel/accel-cpu-target.h
$ sed -i -e 's,hw/core/accel-cpu.h,accel/accel-cpu-target.h,' \
$(git grep -l hw/core/accel-cpu.h)
and renaming header guard 'ACCEL_CPU_TARGET_H'.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20250123234415.59850-12-philmd@linaro.org>
TCGCPUOps structure makes more sense in the accelerator context
rather than hardware emulation. Move it under the accel/tcg/ scope.
Mechanical change doing:
$ sed -i -e 's,hw/core/tcg-cpu-ops.h,accel/tcg/cpu-ops.h,g' \
$(git grep -l hw/core/tcg-cpu-ops.h)
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20250123234415.59850-11-philmd@linaro.org>
The definition of which floatx80 encodings are invalid is
target-specific. Currently we handle this with an ifdef, but we
would like to defer this decision to runtime. In preparation, pass a
float_status argument to floatx80_invalid_encoding().
We will change the implementation from ifdef to looking at
the status argument in the following commit.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250224111524.1101196-7-peter.maydell@linaro.org
Unlike the other float formats, whether a floatx80 value is
considered to be an Infinity is target-dependent. (On x86 if the
explicit integer bit is clear this is a "pseudo-infinity" and not a
valid infinity; m68k does not care about the value of the integer
bit.)
Currently we select this target-specific logic at compile time with
an ifdef. We're going to want to do this at runtime, so change the
floatx80_is_infinity() function to take a float_status.
This commit doesn't change any logic; we'll do that in the
next commit.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250224111524.1101196-5-peter.maydell@linaro.org
The global const floatx80_infinity is (unlike all the other
float*_infinity values) target-specific, because whether the explicit
Integer bit is set or not varies between m68k and i386. We want to
be able to compile softfloat once for multiple targets, so we can't
continue to use a single global whose value needs to be different
between targets.
Replace the direct uses of floatx80_infinity in target/i386 with
calls to the new floatx80_default_inf() function. Note that because
we can ask the function for either a negative or positive infinity,
we don't need to change the sign of a positive infinity via
floatx80_chs() for the negative-Inf case.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250224111524.1101196-4-peter.maydell@linaro.org
Message-id: 20250217125055.160887-4-peter.maydell@linaro.org
Currently we handle flushing of output denormals in uncanon_normal
always before we deal with rounding. This works for architectures
that detect tininess before rounding, but is usually not the right
place when the architecture detects tininess after rounding. For
example, for x86 the SDM states that the MXCSR FTZ control bit causes
outputs to be flushed to zero "when it detects a floating-point
underflow condition". This means that we mustn't flush to zero if
the input is such that after rounding it is no longer tiny.
At least one of our guest architectures does underflow detection
after rounding but flushing of denormals before rounding (MIPS MSA);
this means we need to have a config knob for this that is separate
from our existing tininess_before_rounding setting.
Add an ftz_detection flag. For consistency with
tininess_before_rounding, we make it default to "detect ftz after
rounding"; this means that we need to explicitly set the flag to
"detect ftz before rounding" on every existing architecture that sets
flush_to_zero, so that this commit has no behaviour change.
(This means more code change here but for the long term a less
confusing API.)
For several architectures the current behaviour is either
definitely or possibly wrong; annotate those with TODO comments.
These architectures are definitely wrong (and should detect
ftz after rounding):
* x86
* Alpha
For these architectures the spec is unclear:
* MIPS (for non-MSA)
* RX
* SH4
PA-RISC makes ftz detection IMPDEF, but we aren't setting the
"tininess before rounding" setting that we ought to.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
In commit 8adcff4ae7 ("fpu: handle raising Invalid for infzero in
pick_nan_muladd") we changed the handling of 0 * Inf + QNaN to always
raise the Invalid exception regardless of target architecture. (This
was a change affecting hppa, i386, sh4 and tricore.) However, this
was incorrect for i386, which documents in the SDM section 14.5.2
that for the 0 * Inf + NaN case that it will only raise the Invalid
exception when the input is an SNaN. (This is permitted by the IEEE
754-2008 specification, which documents that whether we raise Invalid
for 0 * Inf + QNaN is implementation defined.)
Adjust the softfloat pick_nan_muladd code to allow the target to
suppress the raising of Invalid for the inf * zero + NaN case (as an
extra flag orthogonal to its choice for when to use the default NaN),
and enable that for x86.
We do not revert here the behaviour change for hppa, sh4 or tricore:
* The sh4 manual is clear that it should signal Invalid
* The tricore manual is a bit vague but doesn't say it shouldn't
* The hppa manual doesn't talk about fused multiply-add corner
cases at all
Cc: qemu-stable@nongnu.org
Fixes: 8adcff4ae7 (""fpu: handle raising Invalid for infzero in pick_nan_muladd")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20250116112536.4117889-2-peter.maydell@linaro.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Our float_flag_output_denormal exception flag is set when
the fpu code flushes an output denormal to zero. Rename
it to float_flag_output_denormal_flushed:
* this keeps it parallel with the flag for flushing
input denormals, which we just renamed
* it makes it clearer that it doesn't mean "set when
the output is a denormal"
Commit created with
for f in `git grep -l float_flag_output_denormal`; do sed -i -e 's/float_flag_output_denormal/float_flag_output_denormal_flushed/' $f; done
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250124162836.2332150-21-peter.maydell@linaro.org
Our float_flag_input_denormal exception flag is set when the fpu code
flushes an input denormal to zero. This is what many guest
architectures (eg classic Arm behaviour) require, but it is not the
only donarmal-related reason we might want to set an exception flag.
The x86 behaviour (which we do not currently model correctly) wants
to see an exception flag when a denormal input is *not* flushed to
zero and is actually used in an arithmetic operation. Arm's FEAT_AFP
also wants these semantics.
Rename float_flag_input_denormal to float_flag_input_denormal_flushed
to make it clearer when it is set and to allow us to add a new
float_flag_input_denormal_used next to it for the x86/FEAT_AFP
semantics.
Commit created with
for f in `git grep -l float_flag_input_denormal`; do sed -i -e 's/float_flag_input_denormal/float_flag_input_denormal_flushed/' $f; done
and manual editing of softfloat-types.h and softfloat.c to clean
up the indentation afterwards and to fix a comment which wasn't
using the full name of the flag.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250124162836.2332150-20-peter.maydell@linaro.org
For updates to implicit registers (RCX in LOOP instructions, RSI or RDI
in string instructions, or the stack pointer) do the add directly using
the registers (with no temporary) if 32-bit or 64-bit, or use a temporary
created for the occasion if 16-bit. This is more efficient and removes
move instructions for the MO_TL case.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20241215090613.89588-14-pbonzini@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now that everything has been cleaned up, look at DF and prefixes
in a single function, and call that one from gen_repz and gen_repz_nz.
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This is a common operation that is executed many times in rep
movs or rep stos loops. It can improve performance by several
percentage points.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20241215090613.89588-13-pbonzini@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use a TCG loop so that it is not necessary to go through the setup steps
of REP and through the I/O check on every iteration. Interestingly, this
is not a particularly effective optimization on its own, though it avoids
the cost of correct RF emulation that was added in the previous patch.
The main benefit lies in allowing the hoisting of loop invariants outside
the loop, which will happen separately.
The loop exits when the low 16 bits of CX/ECX/RCX are zero (so generally
speaking the string operation runs in 65536 iteration batches) to give
the main loop an opportunity to pick up interrupts.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20241215090613.89588-12-pbonzini@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In a repeated string operation, CX/ECX will be decremented until it
is 0 but never underflow. Use this observation to avoid a deposit or
zero-extend operation if the address size of the operation is smaller
than MO_TL.
As in the previous patch, the patch is structured to include some
preparatory work for subsequent changes. In particular, introducing
cx_next prepares for when ECX will be decremented *before* calling
fn(s, ot), and therefore cannot yet be written back to cpu_regs.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20241215090613.89588-11-pbonzini@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Explicitly generate a TSTEQ branch (which is optimized to NE x,0 if possible).
This does not make much sense yet, but later we will add more checks and some
will use a temporary to check on the decremented value of CX/ECX/RCX; it will
be clearer for all checks to share the same logic using TSTEQ(reg, cx_mask).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20241215090613.89588-10-pbonzini@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Since the cost of gen_update_cc_op() must be paid anyway, it's easier
to place them manually and not rely on spilling that is buried under
multiple levels of function calls. While at it, clarify the circumstances
in which the gen_update_cc_op() is needed, and why it is not for REPxx
SCAS and REPxx CMPS.
And since cc_op will have been spilled at the point of a fault, just
make the whole insn CC_OP_DYNAMIC. Once repz_opt is reintroduced,
a fault could happen either before or after the first execution of
CMPS/SCAS, and CC_OP_DYNAMIC sidesteps the complicated matter of what
x86_restore_state_to_opc would do.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20241215090613.89588-9-pbonzini@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
RF must be set on traps and interrupts from a string instruction,
except if they occur after the last iteration. Ensure it is set
before giving the main loop a chance to execute.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20241215090613.89588-8-pbonzini@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Allow using them in the code that translates REP/REPZ, without
forward declarations.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20241215090613.89588-7-pbonzini@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The condition for optimizing repeat instruction is more or less the
opposite of what you imagine: almost always the string instruction
was _not_ optimized and optimizing the loop relied on goto_tb.
This is obviously not great for performance, due to the cost of the
exit-to-main-loop check, but also wrong. In fact, after expanding
dc->jmp_opt and simplifying "!!x" to "x", the condition for looping used
to be:
((cflags & CF_NO_GOTO_TB) ||
(flags & (HF_RF_MASK | HF_TF_MASK | HF_INHIBIT_IRQ_MASK))) && !(cflags & CF_USE_ICOUNT)
In other words, setting aside RF (it requires special handling for REP
instructions and it was completely missing), repeat instruction were
being optimized if TF or inhibit IRQ flags were set. This is certainly
wrong for TF, because string instructions trap after every execution,
and probably for interrupt shadow too.
Get rid of repz_opt completely. The next patches will reintroduce the
optimization, applying it in the common case instead of the unlikely
and wrong one.
While at it, place the CX/ECX/RCX=0 case is at the end of the function,
which saves a label and is clearer when reading the generated ops.
For clarity, mark the cc_op explicitly as DYNAMIC even if at the end
of the translation block; the cc_op can come from either the previous
instruction or the string instruction, and currently we rely on
a gen_update_cc_op() that is hidden in the bowels of gen_jcc() to
spill cc_op and mark it clean.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20241215090613.89588-6-pbonzini@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The same "if" is present in all generator functions for string instructions.
Push it inside gen_repz() and gen_repz_nz() instead.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20241215090613.89588-5-pbonzini@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It only differs in a single call to gen_jcc, so use a "bool" argument
to distinguish the two cases; do not duplicate code.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20241215090613.89588-4-pbonzini@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This is not needed anymore now that gen_jcc has been eliminated
(merged into the similarly-named gen_Jcc, where the uppercase letter
gives away that it is an emission function).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20241215090613.89588-3-pbonzini@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The code of gen_Jcc is very similar to gen_LOOP* and gen_JCXZ, but this
is hidden by gen_jcc.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/20241215090613.89588-2-pbonzini@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Do not reference TCG_TARGET_HAS_* directly.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Avoid direct usage of TCG_TARGET_deposit_*_valid.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
There are duplicated code to setup the value of MSR_CORE_THREAD_COUNT.
Extract a common function for it.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20241219110125.1266461-2-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Using a sextract or extract operation is only necessary if a
sign or zero extended value is needed. If not, a shift is
enough.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Because BT does not write back to the source operand, it can modify it to
ensure that one of the operands of TSTNE is a constant (after either gen_BT
or the optimizer's constant propagation). This produces better and more
optimizable TCG ops. For example, the sequence
movl $0x60013f, %ebx
btl %ecx, %ebx
becomes just
and_i32 tmp1,ecx,$0x1f dead: 1 2 pref=0xffff
shr_i32 tmp0,$0x60013f,tmp1 dead: 1 2 pref=0xffff
and_i32 tmp16,tmp0,$0x1 dead: 1 pref=0xbf80
On s390x, it can use four instructions to isolate bit 0 of 0x60013f >> (ecx & 31):
nilf %r12, 0x1f
lgfi %r11, 0x60013f
srlk %r12, %r11, 0(%r12)
nilf %r12, 1
Previously, it used five instructions to build 1 << (ecx & 31) and compute
TSTEQ, and also needed two more to construct the result of setcond:
nilf %r12, 0x1f
lghi %r11, 1
sllk %r12, %r11, 0(%r12)
lgfi %r9, 0x60013f
nrk %r0, %r12, %r9
lghi %r12, 0
locghilh %r12, 1
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Convert all targets simultaneously, as the gen_intermediate_code
function disappears from the target. While there are possible
workarounds, they're larger than simply performing the conversion.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Remove the single target-specific definition used in
"exec/translator.h" (TARGET_PAGE_MASK) by un-inlining
is_same_page().
Rename the method as translator_is_same_page() and
improve its documentation.
Use it in translator_use_goto_tb().
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20241218154145.71353-1-philmd@linaro.org>
TB compile flags, tb_page_addr_t type, tb_cflags() and few
other methods are defined in "exec/translation-block.h".
All these files don't include "exec/translation-block.h" but
include "exec/exec-all.h" which include it. Explicitly include
"exec/translation-block.h" to be able to remove it from
"exec/exec-all.h" later when it won't be necessary. Otherwise
we'd get errors such:
accel/tcg/internal-target.h:59:20: error: a parameter list without types is only allowed in a function definition
59 | void tb_lock_page0(tb_page_addr_t);
| ^
accel/tcg/tb-hash.h:64:23: error: unknown type name 'tb_page_addr_t'
64 | uint32_t tb_hash_func(tb_page_addr_t phys_pc, vaddr pc,
| ^
accel/tcg/tcg-accel-ops.c:62:36: error: use of undeclared identifier 'CF_CLUSTER_SHIFT'
62 | cflags = cpu->cluster_index << CF_CLUSTER_SHIFT;
| ^
accel/tcg/watchpoint.c:102:47: error: use of undeclared identifier 'CF_NOIRQ'
102 | cpu->cflags_next_tb = 1 | CF_NOIRQ | curr_cflags(cpu);
| ^
target/i386/helper.c:536:28: error: use of undeclared identifier 'CF_PCREL'
536 | if (tcg_cflags_has(cs, CF_PCREL)) {
| ^
target/rx/cpu.c:51:21: error: incomplete definition of type 'struct TranslationBlock'
51 | cpu->env.pc = tb->pc;
| ~~^
system/physmem.c:2977:9: error: call to undeclared function 'tb_invalidate_phys_range'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
2977 | tb_invalidate_phys_range(addr, addr + length - 1);
| ^
plugins/api.c:96:12: error: call to undeclared function 'tb_cflags'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
96 | return tb_cflags(tcg_ctx->gen_tb) & CF_MEMI_ONLY;
| ^
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20241114011310.3615-5-philmd@linaro.org>
Headers in include/sysemu/ are not only related to system
*emulation*, they are also used by virtualization. Rename
as system/ which is clearer.
Files renamed manually then mechanical change using sed tool.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Lei Yang <leiyang@redhat.com>
Message-Id: <20241203172445.28576-1-philmd@linaro.org>
Set the default NaN pattern explicitly, and remove the ifdef from
parts64_default_nan().
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20241202131347.498124-38-peter.maydell@linaro.org
Set the Float3NaNPropRule explicitly for i386. We had no
i386-specific behaviour in the old ifdef ladder, so we were using the
default "prefer a then b then c" fallback; this is actually the
correct per-the-spec handling for i386.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20241202131347.498124-25-peter.maydell@linaro.org
Set the FloatInfZeroNaNRule explicitly for the x86 target.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20241202131347.498124-12-peter.maydell@linaro.org
When instrumenting memory accesses for plugin, we force memory accesses
to use the slow path for mmu [1]. This create a situation where we end
up calling ptw_setl_slow. This was fixed recently in [2] but the issue
still could appear out of plugins use case.
Since this function gets called during a cpu_exec, start_exclusive then
hangs. This exclusive section was introduced initially for security
reasons [3].
I suspect this code path was never triggered, because ptw_setl_slow
would always be called transitively from cpu_exec, resulting in a hang.
[1] 6d03226b42
[2] 115ade42d5
[3] https://gitlab.com/qemu-project/qemu/-/issues/279
Fixes: https://gitlab.com/qemu-project/qemu/-/issues/2566
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20241025175857.2554252-2-pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Commit b56617bbcb4 ("target/i386: Walk NPT in guest real mode") added
logic to run the page table walker even in real mode if we are in NPT
mode. That function then determined whether real mode or paging is
active based on whether the pg_mode variable was 0.
Unfortunately pg_mode is 0 in two situations:
1) Paging is disabled (real mode)
2) Paging is in 2-level paging mode (32bit without PAE)
That means the walker now assumed that 2-level paging mode was real
mode, breaking NetBSD as well as Windows XP.
To fix that, this patch adds a new PG flag to pg_mode which indicates
whether paging is active at all and uses that to determine whether we
are in real mode or not.
Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2654
Fixes: b56617bbcb4 ("target/i386: Walk NPT in guest real mode")
Signed-off-by: Alexander Graf <graf@amazon.com>
Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Link: https://lore.kernel.org/r/20241106154329.67218-1-graf@amazon.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Set the NaN propagation rule explicitly for the float_status words
used in the x86 target.
This is a no-behaviour-change commit, so we retain the existing
behaviour of using the x87-style "prefer QNaN over SNaN, then prefer
the NaN with the larger significand" for MMX and SSE. This is
however not the documented hardware behaviour, so we leave a TODO
note about what we should be doing instead.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20241025141254.2141506-16-peter.maydell@linaro.org
This gives greater opportunity for reassociation on x86 targets,
since addition can use the LEA instruction.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If the operands of the arithmetic instruction fit within a half-register,
it's easiest to use a comparison instruction to compute the carry.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This removes the 256 byte parity table from the executable.
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This makes it easier for the compiler to understand which bits are set,
and it also removes "cltq" instructions to canonicalize the output value
as 32-bit signed.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>