Load multiple 4-element structures to four registers
This instruction loads multiple 4-element structures from memory and writes the result to the four SIMD&FP registers, with de-interleaving.
For an example of de-interleaving, see LD3 (multiple structures).
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.
Variants: FEAT_AdvSIMD (ARMv8.0)
31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |||||||||||||
Q | L | opcode | size | Rn | Rt |
---|
LD4 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>]
if !IsFeatureImplemented(FEAT_AdvSIMD) then EndOfDecode(Decode_UNDEF); constant integer t = UInt(Rt); constant integer n = UInt(Rn); constant integer m = integer UNKNOWN; constant boolean wback = FALSE; constant boolean nontemporal = FALSE; constant boolean tagchecked = wback || n != 31;
Variants: FEAT_AdvSIMD (ARMv8.0)
31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | ||||||||||||||||||
Q | L | Rm | opcode | size | Rn | Rt |
---|
LD4 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>], <imm>
LD4 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>], <Xm>
if !IsFeatureImplemented(FEAT_AdvSIMD) then EndOfDecode(Decode_UNDEF); constant integer t = UInt(Rt); constant integer n = UInt(Rn); constant integer m = UInt(Rm); constant boolean wback = TRUE; constant boolean nontemporal = FALSE; constant boolean tagchecked = wback || n != 31;
constant integer datasize = 64 << UInt(Q); constant integer esize = 8 << UInt(size); constant integer elements = datasize DIV esize; constant integer rpt = 1; constant integer selem = 4; // .1D format only permitted with LD1 & ST1 if size:Q == '110' && selem != 1 then EndOfDecode(Decode_UNDEF);
CheckFPAdvSIMDEnabled64(); bits(64) address; bits(64) eaddr; bits(64) offs; bits(datasize) rval; integer tt; constant integer ebytes = esize DIV 8; constant boolean privileged = PSTATE.EL != EL0; constant AccessDescriptor accdesc = CreateAccDescASIMD(MemOp_LOAD, nontemporal, tagchecked, privileged); if n == 31 then CheckSPAlignment(); address = SP[64]; else address = X[n, 64]; offs = Zeros(64); for r = 0 to rpt-1 for e = 0 to elements-1 tt = (t + r) MOD 32; for s = 0 to selem-1 rval = V[tt, datasize]; eaddr = AddressIncrement(address, offs, accdesc); Elem[rval, e, esize] = Mem[eaddr, ebytes, accdesc]; V[tt, datasize] = rval; offs = offs + ebytes; tt = (tt + 1) MOD 32; if wback then if m != 31 then offs = X[m, 64]; address = AddressAdd(address, offs, accdesc); if n == 31 then SP[64] = address; else X[n, 64] = address;
If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.