In StreamingMode, fp_access_checked is handled already.
We cannot fall through to fp_access_check lest we fall
foul of the double-check assertion.
Cc: qemu-stable@nongnu.org
Fixes: 285b1d5fcef ("target/arm: Handle SME in sve_access_check")
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250307190415.982049-3-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: move declaration of 'ret' to top of block]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The check for fp_excp_el in assert_fp_access_checked is
incorrect. For SME, with StreamingMode enabled, the access
is really against the streaming mode vectors, and access
to the normal fp registers is allowed to be disabled.
C.f. sme_enabled_check.
Convert sve_access_checked to match, even though we don't
currently check the exception state.
Cc: qemu-stable@nongnu.org
Fixes: 3d74825f4d6 ("target/arm: Add SME enablement checks")
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250307190415.982049-2-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
In the Arm ARM, rule R_TYTWB states that returning to AArch32
is an illegal exception return if:
* AArch32 is not supported at any exception level
* the target EL is configured for AArch64 via SCR_EL3.RW
or HCR_EL2.RW or via CPU state at reset
We check the second of these, but not the first (which can only be
relevant for the case of a return to EL0, because if AArch32 is not
supported at one of the higher ELs then the RW bits will have an
effective value of 1 and the the "configured for AArch64" condition
will hold also).
Add the missing condition. Although this is technically a bug
(because we have one AArch64-only CPU: a64fx) it isn't worth
backporting to stable because no sensible guest code will
deliberately try to return to a nonexistent execution state
to check that it gets an illegal exception return.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
We already call env_archcpu() multiple times within the
exception_return helper function, and we're about to want to
add another use of the ARMCPU pointer. Add a local variable
cpu so we can call env_archcpu() just once.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
We would like to move arm_el_is_aa64() to internals.h; however, it is
used by access_secure_reg(). Make that function not be inline, so
that it can stay in cpu.h.
access_secure_reg() is used only in two places:
* in hflags.c
* in the user-mode arm emulators, to decide whether to store
the TLS value in the secure or non-secure banked field
The second of these is not on a super-hot path that would care about
the inlining (and incidentally will always use the NS banked field
because our user-mode CPUs never set ARM_FEATURE_EL3); put the
definition of access_secure_reg() in hflags.c, near its only use
inside target/arm.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
CpuState caches its CPUClass since commit 6fbdff87062
("cpu: cache CPUClass in CPUState for hot code paths"),
use it.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20250122093028.52416-11-philmd@linaro.org>
Move CPU TLB related methods to "exec/cputlb.h".
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-ID: <20241114011310.3615-19-philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
All the callers of op_addr_rr_post() and op_addr_ri_post() now pass in
zero for the address_offset, so we can remove that argument.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250227142746.1698904-4-peter.maydell@linaro.org
Our STRD implementation doesn't correctly implement the requirement:
* if the address is 8-aligned the access must be a 64-bit
single-copy atomic access, not two 32-bit accesses
Rewrite the handling of STRD to use a single tcg_gen_qemu_st_i64()
of a value produced by concatenating the two 32 bit source registers.
This allows us to get the atomicity right.
As with the LDRD change, now that we don't update 'addr' in the
course of performing the store we need to adjust the offset
we pass to op_addr_ri_post() and op_addr_rr_post().
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250227142746.1698904-3-peter.maydell@linaro.org
Our LDRD implementation is wrong in two respects:
* if the address is 4-aligned and the load crosses a page boundary
and the second load faults and the first load was to the
base register (as in cases like "ldrd r2, r3, [r2]", then we
must not update the base register before taking the fault
* if the address is 8-aligned the access must be a 64-bit
single-copy atomic access, not two 32-bit accesses
Rewrite the handling of the loads in LDRD to use a single
tcg_gen_qemu_ld_i64() and split the result into the destination
registers. This allows us to get the atomicity requirements
right, and also implicitly means that we won't update the
base register too early for the page-crossing case.
Note that because we no longer increment 'addr' by 4 in the course of
performing the LDRD we must change the adjustment value we pass to
op_addr_ri_post() and op_addr_rr_post(): it no longer needs to
subtract 4 to get the correct value to use if doing base register
writeback.
STRD has the same problem with not getting the atomicity right;
we will deal with that in the following commit.
Cc: qemu-stable@nongnu.org
Reported-by: Stu Grossman <stu.grossman@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250227142746.1698904-2-peter.maydell@linaro.org
When reading or writing the timer registers, sometimes we need to
apply one of the timer offsets. Specifically, this happens for
direct reads of the counter registers CNTPCT_EL0 and CNTVCT_EL0 (and
their self-synchronized variants CNTVCTSS_EL0 and CNTPCTSS_EL0). It
also applies for direct reads and writes of the CNT*_TVAL_EL*
registers that provide the 32-bit downcounting view of each timer.
We currently do this with duplicated code in gt_tval_read() and
gt_tval_write() and a special-case in gt_virt_cnt_read() and
gt_cnt_read(). Refactor this so that we handle it all in a single
function gt_direct_access_timer_offset(), to parallel how we handle
the offset for indirect accesses.
The call in the WFIT helper previously to gt_virt_cnt_offset() is
now to gt_direct_access_timer_offset(); this is the correct
behaviour, but it's not immediately obvious that it shouldn't be
considered an indirect access, so we add an explanatory comment.
This commit should make no behavioural changes.
(Cc to stable because the following bugfix commit will
depend on this one.)
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20250204125009.2281315-6-peter.maydell@linaro.org
TCGCPUOps structure makes more sense in the accelerator context
rather than hardware emulation. Move it under the accel/tcg/ scope.
Mechanical change doing:
$ sed -i -e 's,hw/core/tcg-cpu-ops.h,accel/tcg/cpu-ops.h,g' \
$(git grep -l hw/core/tcg-cpu-ops.h)
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20250123234415.59850-11-philmd@linaro.org>
The softfloat (i.e. TCG) specific handling for the FPCR
and FPSR is abstracted behind five functions:
arm_set_default_fp_behaviours
arm_set_ah_fp_behaviours
vfp_get_fpsr_from_host
vfp_clear_float_status_exc_flags
vfp_set_fpsr_to_host
Currently we rely on the first two calling softfloat functions that
work even in a KVM-only compile because they're defined as inline in
the softfloat header file, and we provide stub versions of the last
three in arm/vfp_helper.c if CONFIG_TCG isn't defined.
Move the softfloat-specific versions of these functions to
tcg/vfp_helper.c, and provide the non-TCG stub versions in
tcg-stubs.c.
This lets us drop the softfloat header include and the last
set of CONFIG_TCG ifdefs from arm/vfp_helper.c.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250221190957.811948-4-peter.maydell@linaro.org
Currently the helper_vfp_get_fpscr() and helper_vfp_set_fpscr()
functions do the actual work of updating the FPSCR, and we have
wrappers vfp_get_fpscr() and vfp_set_fpscr() which we use for calls
from other QEMU C code.
Flip these around so that it is vfp_get_fpscr() and vfp_set_fpscr()
which do the actual work, and helper_vfp_get_fpscr() and
helper_vfp_set_fpscr() which are the wrappers; this allows us to move
them to tcg/vfp_helper.c.
Since this is the last HELPER() we had in arm/vfp_helper.c, we can
drop the include of helper-proto.h.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250221190957.811948-3-peter.maydell@linaro.org
Most of the target/arm/vfp_helper.c file is purely TCG helper code,
guarded by #ifdef CONFIG_TCG. Move this into a new file in
target/arm/tcg/.
This leaves only the code relating to getting and setting the
FPCR/FPSR/FPSCR in the original file. (Some of this also is
TCG-only, but that needs more careful disentangling.)
Having two vfp_helper.c files might seem a bit confusing,
but once we've finished moving all the helper code out
of the old file we are going to rename it to vfp_fpscr.c.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250221190957.811948-2-peter.maydell@linaro.org
In t32_expandimm_imm(), we take an 8 bit value XY and construct a
32-bit value which might be of the form XY, 00XY00XY, XY00XY00, or
XYXYXYXY. We do this with multiplications, and we use an 'int' type.
For the cases where we're setting the high byte of the 32-bit value
to XY, this means that we do an integer multiplication that might
overflow, and rely on the -fwrapv semantics to keep this from being
undefined behaviour.
It's clearer to use an unsigned type here, because we're really
doing operations on the value considered as a set of bits. The
result is the same.
The return value from the function remains 'int', because this
is a decodetree !function function, and follows the API for those
functions.
Signed-off-by: Stephen Longfield <slongfield@google.com>
Signed-off-by: Roque Arcudia Hernandez <roqueh@google.com>
Message-id: 20250219165534.3387376-1-slongfield@google.com
[PMM: Rewrote the commit message]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The code for WFI/WFE trapping has several errors:
* it wasn't using arm_sctlr(), so it would look at SCTLR_EL1
even if the CPU was in the EL2&0 translation regime
* it was raising UNDEF, not Monitor Trap, for traps to
AArch32 EL3 because of SCR.{TWE,TWI}
* it was not honouring SCR.{TWE,TWI} when running in
AArch32 at EL3 not in Monitor mode
* it checked SCR.{TWE,TWI} even on v7 CPUs which don't have
those bits
Fix these bugs.
Cc: qemu-stable@nongnu.org
Fixes: b1eced713d99 ("target-arm: Add WFx instruction trap support")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250130182309.717346-15-peter.maydell@linaro.org
CP_ACCESS_TRAP_UNCATEGORIZED is technically an accurate description
of what this return value from a cpreg accessfn does, but it's liable
to confusion because it doesn't match how the Arm ARM pseudocode
indicates this case. What it does is an EXCP_UDEF with a zero
("uncategorized") syndrome value, which is what an UNDEFINED instruction
does. The pseudocode uses "UNDEFINED" to show this; rename our
constant to CP_ACCESS_UNDEFINED to make the parallel clearer.
Commit created with
sed -i -e 's/CP_ACCESS_TRAP_UNCATEGORIZED/CP_ACCESS_UNDEFINED/' $(git grep -l CP_ACCESS_TRAP_UNCATEGORIZED)
plus manual editing of the comment.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250130182309.717346-14-peter.maydell@linaro.org
There are no longer any uses of CP_ACCESS_TRAP in access functions,
because we have converted them all to use either CP_ACCESS_TRAP_EL1
or CP_ACCESS_TRAP_UNCATEGORIZED, as appropriate. Remove the handling
of bare CP_ACCESS_TRAP from the access_check_cp_reg() helper, so that
it now asserts if an access function returns a value requesting a
trap without a target EL.
Rename CP_ACCESS_TRAP to CP_ACCESS_TRAP_BIT, to make it clearer
that this is an internal-only definition, not something that
it makes sense to return from an access function. This should
help to avoid future bugs where we return the wrong syndrome
value by mistake.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250130182309.717346-13-peter.maydell@linaro.org
On XScale CPUs, there is no EL2 or AArch64, so no syndrome register.
These traps are just UNDEFs in the traditional AArch32 sense, so
CP_ACCESS_TRAP_UNCATEGORIZED is more accurate than CP_ACCESS_TRAP.
This has no visible behavioural change, because the guest doesn't
have a way to see the syndrome value we generate.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250130182309.717346-12-peter.maydell@linaro.org
In the CPAccessResult enum, the CP_ACCESS_TRAP* values indicate the
equivalent of the pseudocode AArch64.SystemAccessTrap(..., 0x18),
causing a trap to a specified exception level with a syndrome value
giving information about the failing instructions. In the
pseudocode, such traps are always taken to a specified target EL. We
support that for target EL of 2 or 3 via CP_ACCESS_TRAP_EL2 and
CP_ACCESS_TRAP_EL3, but the only way to take the access trap to EL1
currently is to use CP_ACCESS_TRAP, which takes the trap to the
"usual target EL" (EL1 if in EL0, otherwise to the current EL).
Add CP_ACCESS_TRAP_EL1 so that access functions can follow the
pseudocode more closely.
(Note that for the common case in the pseudocode of "trap to
EL2 if HCR_EL2.TGE is set, otherwise trap to EL1", we handle
this in raise_exception(), so access functions don't need to
special case it and can use CP_ACCESS_TRAP_EL1.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250130182309.717346-10-peter.maydell@linaro.org
In system register access pseudocode the common pattern for
AArch32 registers with access traps to EL3 is:
at EL1 and EL2:
if HaveEL(EL3) && !ELUsingAArch32(EL3) && (SCR_EL3.TERR == 1) then
AArch64.AArch32SystemAccessTrap(EL3, 0x03);
elsif HaveEL(EL3) && ELUsingAArch32(EL3) && (SCR.TERR == 1) then
AArch32.TakeMonitorTrapException();
at EL3:
if (PSTATE.M != M32_Monitor) && (SCR.TERR == 1) then
AArch32.TakeMonitorTrapException();
(taking as an example the ERRIDR access pseudocode).
This implements the behaviour of (in this case) SCR.TERR that
"Accesses to the specified registers from modes other than Monitor
mode generate a Monitor Trap exception" and of SCR_EL3.TERR that
"Accesses of the specified Error Record registers at EL2 and EL1
are trapped to EL3, unless the instruction generates a higher
priority exception".
In QEMU we don't implement this pattern correctly in two ways:
* in access_check_cp_reg() we turn the CP_ACCESS_TRAP_EL3 into
an UNDEF, not a trap to Monitor mode
* in the access functions, we check trap bits like SCR.TERR
only when arm_current_el(env) < 3 -- this is correct for
AArch64 EL3, but misses the "trap non-Monitor-mode execution
at EL3 into Monitor mode" case for AArch32 EL3
In this commit we fix the first of these two issues, by
making access_check_cp_reg() handle CP_ACCESS_TRAP_EL3
as a Monitor trap. This is a kind of exception that we haven't
yet implemented(!), so we need a new EXCP_MON_TRAP for it.
This diverges from the pseudocode approach, where every access check
function explicitly checks for "if EL3 is AArch32" and takes a
monitor trap; if we wanted to be closer to the pseudocode we could
add a new CP_ACCESS_TRAP_MONITOR and make all the accessfns use it
when appropriate. But because there are no non-standard cases in the
pseudocode (i.e. where either it raises a Monitor trap that doesn't
correspond to an AArch64 SystemAccessTrap or where it raises a
SystemAccessTrap that doesn't correspond to a Monitor trap), handling
this all in one place seems less likely to result in future bugs
where we forgot again about this special case when writing an
accessor.
(The cc of stable here is because "hw/intc/arm_gicv3_cpuif: Don't
downgrade monitor traps for AArch32 EL3" which is also cc:stable
will implicitly use the new EXCP_MON_TRAP code path.)
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250130182309.717346-6-peter.maydell@linaro.org
Sink common code from the callers into do_fmlal
and do_fmlal_idx. Reorder the arguments to minimize
the re-sorting from the caller's arguments.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250129013857.135256-35-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Read the bit from the source, rather than from the proxy via
get_flush_inputs_to_zero. This makes it clear that it does
not matter which of the float_status structures is used.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250129013857.135256-34-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Select on index instead of pointer.
No functional change.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250129013857.135256-16-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Replace with fp_status[FPST_A32]. As this was the last of the
old structures, we can remove the anonymous union and struct.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250129013857.135256-15-richard.henderson@linaro.org
[PMM: tweak to account for change to is_ebf()]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Replace with fp_status[FPST_A64].
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250129013857.135256-14-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Replace with fp_status[FPST_A32_F16].
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250129013857.135256-13-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Replace with fp_status[FPST_A64_F16].
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250129013857.135256-12-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Replace with fp_status[FPST_STD].
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250129013857.135256-9-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Replace with fp_status[FPST_STD_F16].
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250129013857.135256-8-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Move ARMFPStatusFlavour to cpu.h with which to index
this array. For now, place the array in an anonymous
union with the existing structures. Adjust the order
of the existing structures to match the enum.
Simplify fpstatus_ptr() using the new array.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20250129013857.135256-7-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Now the emulation is complete, we can enable FEAT_RPRES for the 'max'
CPU type.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
FEAT_RPRES implements an "increased precision" variant of the single
precision FRECPE and FRSQRTE instructions from an 8 bit to a 12
bit mantissa. This applies only when FPCR.AH == 1. Note that the
halfprec and double versions of these insns retain the 8 bit
precision regardless.
In this commit we add all the plumbing to make these instructions
call a new helper function when the increased-precision is in
effect. In the following commit we will provide the actual change
in behaviour in the helpers.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Now that we have completed the handling for FPCR.{AH,FIZ,NEP}, we
can enable FEAT_AFP for '-cpu max', and document that we support it.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Handle FPCR.AH's requirement to not negate the sign of a NaN in SVE
FMLSL (indexed), using the usual trick of negating by XOR when AH=0
and by muladd flags when AH=1.
Since we have the CPUARMState* in the helper anyway, we can
look directly at env->vfp.fpcr and don't need toa pass in the
FPCR.AH value via the SIMD data word.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250129013857.135256-33-richard.henderson@linaro.org
[PMM: tweaked commit message]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Handle FPCR.AH's requirement to not negate the sign of a NaN in SVE
FMLSL (indexed), using the usual trick of negating by XOR when AH=0
and by muladd flags when AH=1.
Since we have the CPUARMState* in the helper anyway, we can
look directly at env->vfp.fpcr and don't need toa pass in the
FPCR.AH value via the SIMD data word.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250129013857.135256-32-richard.henderson@linaro.org
[PMM: commit message tweaked]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Handle FPCR.AH's requirement to not negate the sign of a NaN
in FMLSL by element and vector, using the usual trick of
negating by XOR when AH=0 and by muladd flags when AH=1.
Since we have the CPUARMState* in the helper anyway, we can
look directly at env->vfp.fpcr and don't need toa pass in the
FPCR.AH value via the SIMD data word.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250129013857.135256-31-richard.henderson@linaro.org
[PMM: commit message tweaked]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The negation step in SVE FCMLA mustn't negate a NaN when FPCR.AH is
set. Use the same approach as we did for A64 FCMLA of passing in
FPCR.AH and using it to select whether to negate by XOR or by the
muladd negate_product flag.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250129013857.135256-28-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The negation step in FCMLA by index mustn't negate a NaN when
FPCR.AH is set. Use the same approach as vector FCMLA of
passing in FPCR.AH and using it to select whether to negate
by XOR or by the muladd negate_product flag.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250129013857.135256-27-richard.henderson@linaro.org
[PMM: Expanded commit message]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The negation step in FCMLA mustn't negate a NaN when FPCR.AH
is set. Handle this by passing FPCR.AH to the helper via the
SIMD data field, and use this to select whether to do the
negation via XOR or via the muladd negate_product flag.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250129013857.135256-26-richard.henderson@linaro.org
[PMM: Expanded commit message]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The negation step in the SVE FTMAD insn mustn't negate a NaN when
FPCR.AH is set. Pass FPCR.AH to the helper via the SIMD data field,
so we can select the correct behaviour.
Because the operand is known to be negative, negating the operand
is the same as taking the absolute value. Defer this to the muladd
operation via flags, so that it happens after NaN detection, which
is correct for FPCR.AH.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
The negation step in the SVE FTSSEL insn mustn't negate a NaN when
FPCR.AH is set. Pass FPCR.AH to the helper via the SIMD data field
and use that to determine whether to do the negation.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Handle the FPCR.AH "don't negate the sign of a NaN" semantics fro the
SVE FMLS (vector) insns, by providing new helpers for the AH=1 case
which end up passing fpcr_ah = true to the do_fmla_zpzzz_* functions
that do the work.
The float*_muladd functions have a flags argument that can
perform optional negation of various operand. We don't use
that for "normal" arm fmla, because the muladd flags are not
applied when an input is a NaN. But since FEAT_AFP does not
negate NaNs, this behaviour is exactly what we need.
The non-AH helpers pass in a zero flags argument and control the
negation via the neg1 and neg3 arguments; the AH helpers always pass
in neg1 and neg3 as zero and control the negation via the flags
argument. This allows us to avoid conditional branches within the
inner loop.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>