UZP (four registers)

Concatenate elements from four vectors

This instruction concatenates every fourth element from each of the four source vectors and places them in the corresponding elements of the four destination vectors.

This instruction is unpredicated.

Encoding: 8-bit to 64-bit elements

Variants: FEAT_SME2 (ARMv9.3)

313029282726252423222120191817161514131211109876543210
110000011101101110000010
sizeZnZdop

UZP { <Zd1>.<T>-<Zd4>.<T> }, { <Zn1>.<T>-<Zn4>.<T> }

Decoding algorithm

if !IsFeatureImplemented(FEAT_SME2) then EndOfDecode(Decode_UNDEF);
if size == '11' && MaxImplementedSVL() < 256 then EndOfDecode(Decode_UNDEF);
constant integer esize = 8 << UInt(size);
constant integer n = UInt(Zn:'00');
constant integer d = UInt(Zd:'00');

Encoding: 128-bit element

Variants: FEAT_SME2 (ARMv9.3)

313029282726252423222120191817161514131211109876543210
11000001001101111110000010
ZnZdop

UZP { <Zd1>.Q-<Zd4>.Q }, { <Zn1>.Q-<Zn4>.Q }

Decoding algorithm

if !IsFeatureImplemented(FEAT_SME2) then EndOfDecode(Decode_UNDEF);
if MaxImplementedSVL() < 512 then EndOfDecode(Decode_UNDEF);
constant integer esize = 128;
constant integer n = UInt(Zn:'00');
constant integer d = UInt(Zd:'00');

Operation

CheckStreamingSVEEnabled();
constant integer VL = CurrentVL;
if VL < esize * 4 then EndOfDecode(Decode_UNDEF);
constant integer quads = VL DIV (esize * 4);
bits(VL) result0;
bits(VL) result1;
bits(VL) result2;
bits(VL) result3;

for r = 0 to 3
    constant bits(VL) operand = Z[n+r, VL];
    constant integer base = r * quads;
    for q = 0 to quads-1
        Elem[result0, base+q, esize] = Elem[operand, 4*q+0, esize];
        Elem[result1, base+q, esize] = Elem[operand, 4*q+1, esize];
        Elem[result2, base+q, esize] = Elem[operand, 4*q+2, esize];
        Elem[result3, base+q, esize] = Elem[operand, 4*q+3, esize];

Z[d+0, VL] = result0;
Z[d+1, VL] = result1;
Z[d+2, VL] = result2;
Z[d+3, VL] = result3;

Explanations

<Zd1>: Is the name of the first scalable vector register of the destination multi-vector group, encoded as "Zd" times 4.
<T>: <Zd4>: Is the name of the fourth scalable vector register of the destination multi-vector group, encoded as "Zd" times 4 plus 3.
<Zn1>: Is the name of the first scalable vector register of the source multi-vector group, encoded as "Zn" times 4.
<Zn4>: Is the name of the fourth scalable vector register of the source multi-vector group, encoded as "Zn" times 4 plus 3.

Operational Notes

If PSTATE.DIT is 1: