15153 Commits

Author SHA1 Message Date
Philippe Mathieu-Daudé
a1b4b060d7 target/mips: Inline gen_helper_1e1i() call in op_ld_INSN() macros
gen_helper_1e1i() is one-line long and is used in one place:
simply inline it.

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210816205107.2051495-6-f4bug@amsat.org>
2021-08-25 13:02:14 +02:00
Philippe Mathieu-Daudé
26fe92763a target/mips: Simplify gen_helper() macros by using tcg_constant_i32()
In all call sites the last argument is always used as a
read-only value, so we can replace tcg_const_i32() temporary
by tcg_constant_i32().

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210816205107.2051495-5-f4bug@amsat.org>
2021-08-25 13:02:14 +02:00
Philippe Mathieu-Daudé
78bdd38865 target/mips: Use tcg_constant_i32() in gen_helper_0e2i()
$rt register is used read-only, so we can replace tcg_const_i32()
temporary by tcg_constant_i32().

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210816205107.2051495-4-f4bug@amsat.org>
2021-08-25 13:02:14 +02:00
Philippe Mathieu-Daudé
53152abfc1 target/mips: Remove gen_helper_1e2i()
gen_helper_1e2i() is unused since commit 33a07fa2db6
("target/mips: reimplement SC instruction emulation
and use cmpxchg"), remove it.

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210816205107.2051495-3-f4bug@amsat.org>
2021-08-25 13:02:14 +02:00
Philippe Mathieu-Daudé
b24339bcd0 target/mips: Remove gen_helper_0e3i()
gen_helper_0e3i() is unused since commit 895c2d04359
("target-mips: switch to AREG0 free mode"), remove it.

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210816205107.2051495-2-f4bug@amsat.org>
2021-08-25 13:02:14 +02:00
Philippe Mathieu-Daudé
c1feb46d12 target/mips: Remove duplicated check_cp1_enabled() calls in Loongson EXT
We already call check_cp1_enabled() earlier in the "pre-conditions"
checks for GSLWXC1 and GSLDXC1 in gen_loongson_lsdc2() prologue.
Remove the duplicated calls.

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Huacai Chen <chenhuacai@loongson.cn>
Message-Id: <20210816001031.1720432-1-f4bug@amsat.org>
2021-08-25 13:02:14 +02:00
Philippe Mathieu-Daudé
71ed30b7d4 target/mips: Allow Loongson 3A1000 to use up to 48-bit VAddr
Per the manual '龙芯 GS264 处理器核用户手册' v1.0, chapter
1.1.5 SEGBITS: the 3A1000 (based on GS464 core) implements
48 virtual address bits in each 64-bit segment, not 40.

Fixes: af868995e1b ("target/mips: Add Loongson-3 CPU definition")
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Huacai Chen <chenhuacai@loongson.cn>
Message-Id: <20210813110149.1432692-3-f4bug@amsat.org>
2021-08-25 13:02:14 +02:00
Philippe Mathieu-Daudé
98d207cf9c target/mips: Document Loongson-3A CPU definitions
Document the cores on which each Loongson-3A CPU is based (see
commit af868995e1b, "target/mips: Add Loongson-3 CPU definition").

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Huacai Chen <chenhuacai@loongson.cn>
Message-Id: <20210813110149.1432692-2-f4bug@amsat.org>
2021-08-25 13:02:14 +02:00
Philippe Mathieu-Daudé
bf7720024c target/mips: Convert Vr54xx MSA* opcodes to decodetree
Convert the following Integer Multiply-Accumulate opcodes:

 * MSAC         Multiply, negate, accumulate, and move LO
 * MSACHI       Multiply, negate, accumulate, and move HI
 * MSACHIU      Unsigned multiply, negate, accumulate, and move HI
 * MSACU        Unsigned multiply, negate, accumulate, and move LO

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210808173018.90960-8-f4bug@amsat.org>
2021-08-25 13:02:14 +02:00
Philippe Mathieu-Daudé
a5e2932068 target/mips: Convert Vr54xx MUL* opcodes to decodetree
Convert the following Integer Multiply-Accumulate opcodes:

 * MULHI        Multiply and move HI
 * MULHIU       Unsigned multiply and move HI
 * MULS         Multiply, negate, and move LO
 * MULSHI       Multiply, negate, and move HI
 * MULSHIU      Unsigned multiply, negate, and move HI
 * MULSU        Unsigned multiply, negate, and move LO

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210808173018.90960-7-f4bug@amsat.org>
2021-08-25 13:02:14 +02:00
Philippe Mathieu-Daudé
5fa38eedbd target/mips: Convert Vr54xx MACC* opcodes to decodetree
Convert the following Integer Multiply-Accumulate opcodes:

 * MACC         Multiply, accumulate, and move LO
 * MACCHI       Multiply, accumulate, and move HI
 * MACCHIU      Unsigned multiply, accumulate, and move HI
 * MACCU        Unsigned multiply, accumulate, and move LO

Since all opcodes are generated using the same pattern, we
add the gen_helper_mult_acc_t typedef and MULT_ACC() macro
to remove boilerplate code.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210808173018.90960-6-f4bug@amsat.org>
2021-08-25 13:02:14 +02:00
Philippe Mathieu-Daudé
9d00539239 target/mips: Introduce decodetree structure for NEC Vr54xx extension
The decoder is called but doesn't decode anything. This will
ease reviewing the next commit.

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210801235926.3178085-3-f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 13:02:14 +02:00
Philippe Mathieu-Daudé
6629f79f53 target/mips: Extract NEC Vr54xx helpers to vr54xx_helper.c
Extract NEC Vr54xx helpers from op_helper.c to a new file:
'vr54xx_helper.c'.

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20201120210844.2625602-14-f4bug@amsat.org>
2021-08-25 13:02:14 +02:00
Philippe Mathieu-Daudé
07565cbf4a target/mips: Extract NEC Vr54xx helper definitions
Extract the NEC Vr54xx helper definitions to
'vendor-vr54xx_helper.h'.

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20201120210844.2625602-15-f4bug@amsat.org>
2021-08-25 13:02:14 +02:00
Philippe Mathieu-Daudé
fb3164e412 target/mips: Introduce generic TRANS() macro for decodetree helpers
Plain copy/paste of the TRANS() macro introduced in the PPC
commit f2aabda8ac9 ("target/ppc: Move D/DS/X-form integer
loads to decodetree") to the MIPS target.

Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210808173018.90960-2-f4bug@amsat.org>
2021-08-25 13:02:14 +02:00
Philippe Mathieu-Daudé
34fe9fa368 target/mips: Rename 'rtype' as 'r'
We'll soon have more opcode and decoded arguments, and 'rtype'
is not very helpful. Naming it simply 'r' ease reviewing the
.decode files when we have many opcodes.

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210801234202.3167676-5-f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 13:02:14 +02:00
Philippe Mathieu-Daudé
12f79f1173 target/mips: Merge 32-bit/64-bit Release6 decodetree definitions
We don't need to maintain 2 sets of decodetree definitions.
Merge them into a single file.

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210801234202.3167676-4-f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 13:02:06 +02:00
Philippe Mathieu-Daudé
4919f69c65 target/mips: Decode vendor extensions before MIPS ISAs
In commit ffc672aa977 ("target/mips/tx79: Move MFHI1 / MFLO1
opcodes to decodetree") we misplaced the decoder call. Move
it to the correct place.

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210801234202.3167676-3-f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 13:00:43 +02:00
Philippe Mathieu-Daudé
2e176eaf9c target/mips: Simplify PREF opcode
check_insn() checks for any bit in the set, and INSN_R5900 is
just another bit added to the set. No need to special-case it.

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210801234202.3167676-2-f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 13:00:37 +02:00
Philippe Mathieu-Daudé
c8b69a2a92 target/mips: Remove JR opcode unused arguments
JR opcode (Jump Register) only takes 1 argument, $rs.
JALR (Jump And Link Register) takes 3: $rs, $rd and $hint.

Commit 6af0bf9c7c3 added their processing into decode_opc() as:

    case 0x08 ... 0x09: /* Jumps */
        gen_compute_branch(ctx, op1 | EXT_SPECIAL, rs, rd, sa);

having both opcodes handled in the same function: gen_compute_branch.

Per JR encoding, both $rd and $hint ('sa') are decoded as zero.

Later this code got extracted to decode_opc_special(),
commit 7a387fffce5 used definitions instead of magic values:

    case OPC_JR ... OPC_JALR:
        gen_compute_branch(ctx, op1, rs, rd, sa);

Finally commit 0aefa33318b moved OPC_JR out of decode_opc_special,
to a new 'decode_opc_special_legacy' function:

  @@ -15851,6 +15851,9 @@ static void decode_opc_special_legacy(CPUMIPSState *env, DisasContext *ctx)
  +    case OPC_JR:
  +        gen_compute_branch(ctx, op1, 4, rs, rd, sa);
  +        break;

  @@ -15933,7 +15936,7 @@ static void decode_opc_special(CPUMIPSState *env, DisasContext *ctx)
  -    case OPC_JR ... OPC_JALR:
  +    case OPC_JALR:
           gen_compute_branch(ctx, op1, 4, rs, rd, sa);
           break;

Since JR is now handled individually, it is pointless to decode
and pass it unused arguments. Replace them by simple zero value
to avoid confusion with this opcode.

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210730225507.2642827-1-f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 12:49:09 +02:00
Hamza Mahfooz
dfa0d9b80e target/arm: kvm: use RCU_READ_LOCK_GUARD() in kvm_arch_fixup_msi_route()
As per commit 5626f8c6d468 ("rcu: Add automatically released rcu_read_lock
variants"), RCU_READ_LOCK_GUARD() should be used instead of
rcu_read_{un}lock().

Signed-off-by: Hamza Mahfooz <someguy@effective-light.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20210727235201.11491-1-someguy@effective-light.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-08-25 10:48:50 +01:00
Peter Maydell
e534629296 target/arm: Implement M-profile trapping on division by zero
Unlike A-profile, for M-profile the UDIV and SDIV insns can be
configured to raise an exception on division by zero, using the CCR
DIV_0_TRP bit.

Implement support for setting this bit by making the helper functions
raise the appropriate exception.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210730151636.17254-3-peter.maydell@linaro.org
2021-08-25 10:48:50 +01:00
Peter Maydell
fc7a5038a6 target/arm: Re-indent sdiv and udiv helpers
We're about to make a code change to the sdiv and udiv helper
functions, so first fix their indentation and coding style.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210730151636.17254-2-peter.maydell@linaro.org
2021-08-25 10:48:50 +01:00
Peter Maydell
075e7e97e3 target/arm: Implement MVE interleaving loads/stores
Implement the MVE interleaving load/store functions VLD2, VLD4, VST2
and VST4.  VLD2 loads 16 bytes of data from memory and writes to 2
consecutive Qregs; VLD4 loads 16 bytes of data from memory and writes
to 4 consecutive Qregs.  The 'pattern' field in the encoding
determines the offset into memory which is accessed and also which
elements in the Qregs are written to.  (The intention is that a
sequence of four consecutive VLD4 with different pattern values
performs a complete de-interleaving load of 64 bytes into all
elements of the 4 Qregs.) VST2 and VST4 do the same, but for stores.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:50 +01:00
Peter Maydell
fac80f0856 target/arm: Implement MVE scatter-gather immediate forms
Implement the MVE VLDR/VSTR insns which do scatter-gather using base
addresses from Qm plus or minus an immediate offset (possibly with
writeback). Note that writeback is not predicated but it does have
to honour ECI state, so we have to add an eci_mask check to the
VSTR_SG macros (the VLDR_SG macros already needed this to be able
to distinguish "skip beat" from "set predicated element to 0").

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:50 +01:00
Peter Maydell
dc18628b18 target/arm: Implement MVE scatter-gather insns
Implement the MVE gather-loads and scatter-stores which
form the address by adding a base value from a scalar
register to an offset in each element of a vector.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:50 +01:00
Peter Maydell
0f31e37c7f target/arm: Implement MVE VCTP
Implement the MVE VCTP insn, which sets the VPR.P0 predicate bits so
as to predicate any element at index Rn or greater is predicated.  As
with VPNOT, this insn itself is predicable and subject to beatwise
execution.

The calculation of the mask is the same as is used to determine
ltpmask in mve_element_mask(), but we precalculate masklen in
generated code to avoid having to have 4 helpers specialized by size.

We put the decode line in with the low-overhead-loop insns in
t32.decode because it's logically part of that collection of insn
patterns, even though it is an MVE only insn.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:50 +01:00
Peter Maydell
fea3958fa1 target/arm: Implement MVE VPNOT
Implement the MVE VPNOT insn, which inverts the bits in VPR.P0
(subject to both predication and to beatwise execution).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:50 +01:00
Peter Maydell
1241f148d5 target/arm: Implement MVE VMOV to/from 2 general-purpose registers
Implement the MVE VMOV forms that move data between 2 general-purpose
registers and 2 32-bit lanes in a vector register.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:50 +01:00
Peter Maydell
d5c571ea6d target/arm: Implement MVE VMAXA, VMINA
Implement the MVE VMAXA and VMINA insns, which take the absolute
value of the signed elements in the input vector and then accumulate
the unsigned max or min into the destination vector.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:50 +01:00
Peter Maydell
398e7cd3cd target/arm: Implement MVE VQABS, VQNEG
Implement the MVE 1-operand saturating operations VQABS and VQNEG.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:50 +01:00
Peter Maydell
8be9a25058 target/arm: Implement MVE saturating doubling multiply accumulates
Implement the MVE saturating doubling multiply accumulate insns
VQDMLAH, VQRDMLAH, VQDMLASH and VQRDMLASH.  These perform a multiply,
double, add the accumulator shifted by the element size, possibly
round, saturate to twice the element size, then take the high half of
the result.  The *MLAH insns do vector * scalar + vector, and the
*MLASH insns do vector * vector + scalar.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:50 +01:00
Peter Maydell
c69e34c6de target/arm: Implement MVE VMLA
Implement the MVE VMLA insn, which multiplies a vector by a scalar
and accumulates into another vector.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:50 +01:00
Peter Maydell
f0ffff5163 target/arm: Implement MVE VMLADAV and VMLSLDAV
Implement the MVE VMLADAV and VMLSLDAV insns.  Like the VMLALDAV and
VMLSLDAV insns already implemented, these accumulate multiplied
vector elements; but they accumulate a 32-bit result rather than a
64-bit one.

Note that these encodings overlap with what would be RdaHi=0b111 for
VMLALDAV, VMLSLDAV, VRMLALDAVH and VRMLSLDAVH.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
640cdf20a2 target/arm: Rename MVEGenDualAccOpFn to MVEGenLongDualAccOpFn
The MVEGenDualAccOpFn is a bit misnamed, since it is used for
the "long dual accumulate" operations that use a 64-bit
accumulator. Rename it to MVEGenLongDualAccOpFn so we can
use the former name for the 32-bit accumulator insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
54dc78a901 target/arm: Implement MVE narrowing moves
Implement the MVE narrowing move insns VMOVN, VQMOVN and VQMOVUN.
These take a double-width input, narrow it (possibly saturating) and
store the result to either the top or bottom half of the output
element.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
7f061c0ab9 target/arm: Implement MVE VABAV
Implement the MVE VABAV insn, which computes absolute differences
between elements of two vectors and accumulates the result into
a general purpose register.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
688ba4cf33 target/arm: Implement MVE integer min/max across vector
Implement the MVE integer min/max across vector insns
VMAXV, VMINV, VMAXAV and VMINAV, which find the maximum
from the vector elements and a general purpose register,
and store the maximum back into the general purpose
register.

These insns overlap with VRMLALDAVH (they use what would
be RdaHi=0b110).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
345910f8c1 target/arm: Move 'x' and 'a' bit definitions into vmlaldav formats
All the users of the vmlaldav formats have an 'x bit in bit 12 and an
'a' bit in bit 5; move these to the format rather than specifying them
in each insn pattern.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
1b15a97d4c target/arm: Implement MVE shift-by-scalar
Implement the MVE instructions which perform shifts by a scalar.
These are VSHL T2, VRSHL T2, VQSHL T1 and VQRSHL T2.  They take the
shift amount in a general purpose register and shift every element in
the vector by that amount.

Mostly we can reuse the helper functions for shift-by-immediate; we
do need two new helpers for VQRSHL.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
6b895bf8fb target/arm: Implement MVE VMLAS
Implement the MVE VMLAS insn, which multiplies a vector by a vector
and adds a scalar.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
c386443b16 target/arm: Implement MVE VPSEL
Implement the MVE VPSEL insn, which sets each byte of the destination
vector Qd to the byte from either Qn or Qm depending on the value of
the corresponding bit in VPR.P0.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
cce81873bc target/arm: Implement MVE integer vector-vs-scalar comparisons
Implement the MVE integer vector comparison instructions that compare
each element against a scalar from a general purpose register.  These
are "VCMP (vector)" encodings T4, T5 and T6 and "VPT (vector)"
encodings T4, T5 and T6.

We have to move the decodetree pattern for VPST, because it
overlaps with VCMP T4 with size = 0b11.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
eff5d9a9bd target/arm: Implement MVE integer vector comparisons
Implement the MVE integer vector comparison instructions.  These are
"VCMP (vector)" encodings T1, T2 and T3, and "VPT (vector)" encodings
T1, T2 and T3.

These insns compare corresponding elements in each vector, and update
the VPR.P0 predicate bits with the results of the comparison.  VPT
also sets the VPR.MASK01 and VPR.MASK23 fields -- it is effectively
"VCMP then VPST".

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
552517861c target/arm: Factor out gen_vpst()
Factor out the "generate code to update VPR.MASK01/MASK23" part of
trans_VPST(); we are going to want to reuse it for the VPT insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
395b92d50e target/arm: Implement MVE incrementing/decrementing dup insns
Implement the MVE incrementing/decrementing dup insns VIDUP, VDDUP,
VIWDUP and VDWDUP.  These fill the elements of a vector with
successively incrementing values, starting at the offset specified in
a general purpose register.  The final value of the offset is written
back to this register.  The wrapping variants take a second general
purpose register which specifies the point where the count should
wrap back to 0.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
c1bd78cb06 target/arm: Implement MVE VMULL (polynomial)
Implement the MVE VMULL (polynomial) insn.  Unlike Neon, this comes
in two flavours: 8x8->16 and a 16x16->32.  Also unlike Neon, the
inputs are in either the low or the high half of each double-width
element.

The assembler for this insn indicates the size with "P8" or "P16",
encoded into bit 28 as size = 0 or 1. We choose to follow the
same encoding as VQDMULL and decode this into a->size as MO_16
or MO_32 indicating the size of the result elements. This then
carries through to the helper function names where it then
matches up with the existing pmull_h() which does an 8x8->16
operation and a new pmull_w() which does the 16x16->32.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
41704cc262 target/arm: Fix VLDRB/H/W for predicated elements
For vector loads, predicated elements are zeroed, instead of
retaining their previous values (as happens for most data
processing operations). This means we need to distinguish
"beat not executed due to ECI" (don't touch destination
element) from "beat executed but predicated out" (zero
destination element).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
e3152d02da target/arm: Fix VPT advance when ECI is non-zero
We were not paying attention to the ECI state when advancing the VPT
state.  Architecturally, VPT state advance happens for every beat
(see the pseudocode VPTAdvance()), so on every beat the 4 bits of
VPR.P0 corresponding to the current beat are inverted if required,
and at the end of beats 1 and 3 the VPR MASK fields are updated.
This means that if the ECI state says we should not be executing all
4 beats then we need to skip some of the updating of the VPR that we
currently do in mve_advance_vpt().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00
Peter Maydell
e0d40070e1 target/arm: Factor out mve_eci_mask()
In some situations we need a mask telling us which parts of the
vector correspond to beats that are not being executed because of
ECI, separately from the combined "which bytes are predicated away"
mask.  Factor this mask calculation out of mve_element_mask() into
its own function.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25 10:48:49 +01:00