Index

CPYPRTWN, CPYMRTWN, CPYERTWN

Memory Copy, reads unprivileged, writes non-temporal. These instructions perform a memory copy. The prologue, main, and epilogue instructions are expected to be run in succession and to appear consecutively in memory: CPYPRTWN, then CPYMRTWN, and then CPYERTWN.

CPYPRTWN performs some preconditioning of the arguments suitable for using the CPYMRTWN instruction, and performs an IMPLEMENTATION DEFINED amount of the memory copy. CPYMRTWN performs an IMPLEMENTATION DEFINED amount of the memory copy. CPYERTWN performs the last part of the memory copy.

Note

The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be performed.

For CPYPRTWN, the following saturation logic is applied:

If Xn<63:55> != 000000000, the copy size Xn is saturated to 0x007FFFFFFFFFFFFF.

After that saturation logic is applied, the direction of the memory copy is based on the following algorithm:

If (Xs > Xd) && (Xd + saturated Xn) > Xs, then direction = forward

Elsif (Xs < Xd) && (Xs + saturated Xn) > Xd, then direction = backward

Else direction = IMPLEMENTATION DEFINED choice between forward and backward.

The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is IMPLEMENTATION DEFINED.

Note

Portable software should not assume that the choice of algorithm is constant.

After execution of CPYPRTWN, option A (which results in encoding PSTATE.C = 0):

After execution of CPYPRTWN, option B (which results in encoding PSTATE.C = 1):

For CPYMRTWN, option A (encoded by PSTATE.C = 0), the format of the arguments is:

For CPYMRTWN, option B (encoded by PSTATE.C = 1), the format of the arguments is:

For CPYERTWN, option A (encoded by PSTATE.C = 0), the format of the arguments is:

For CPYERTWN, option B (encoded by PSTATE.C = 1), the format of the arguments is:

Integer
(FEAT_MOPS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 1 0 1 op1 0 Rs 0 1 1 0 0 1 Rn Rd
op2

Epilogue (op1 == 10)

CPYERTWN [<Xd>]!, [<Xs>]!, <Xn>!

Main (op1 == 01)

CPYMRTWN [<Xd>]!, [<Xs>]!, <Xn>!

Prologue (op1 == 00)

CPYPRTWN [<Xd>]!, [<Xs>]!, <Xn>!

if !IsFeatureImplemented(FEAT_MOPS) || sz != '00' then UNDEFINED;

CPYParams memcpy;
memcpy.d = UInt(Rd);
memcpy.s = UInt(Rs);
memcpy.n = UInt(Rn);
bits(4) options = op2;
boolean rnontemporal = options<3> == '1';
boolean wnontemporal = options<2> == '1';

case op1 of
    when '00' memcpy.stage = MOPSStage_Prologue;
    when '01' memcpy.stage = MOPSStage_Main;
    when '10' memcpy.stage = MOPSStage_Epilogue;
    otherwise SEE "Memory Copy and Memory Set";

CheckMOPSEnabled();

if (memcpy.s == memcpy.n || memcpy.s == memcpy.d || memcpy.n == memcpy.d || memcpy.d == 31 || memcpy.s == 31 || memcpy.n == 31) then
    Constraint c = ConstrainUnpredictable(Unpredictable_MOPSOVERLAP31);
    assert c IN {Constraint_UNDEF, Constraint_NOP};
    case c of
        when Constraint_UNDEF UNDEFINED;
        when Constraint_NOP   EndOfInstruction();

For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on UNPREDICTABLE behaviors, and particularly Memory Copy and Memory Set CPY*.

Assembler Symbols

<Xd>

For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an encoding of the destination address, encoded in the "Rd" field.

For the prologue variant: is the 64-bit name of the general-purpose register that holds the destination address and is updated by the instruction, encoded in the "Rd" field.

<Xs>

For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an encoding of the source address, encoded in the "Rs" field.

For the prologue variant: is the 64-bit name of the general-purpose register that holds the source address and is updated by the instruction, encoded in the "Rs" field.

<Xn>

For the epilogue variant: is the 64-bit name of the general-purpose register that holds an encoding of the number of bytes to be transferred and is set to zero at the end of the instruction, encoded in the "Rn" field.

For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the number of bytes to be transferred, encoded in the "Rn" field.

For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of bytes to be transferred and is updated by the instruction to encode the remaining size and destination, encoded in the "Rn" field.

Operation

constant integer N = MaxBlockSizeCopiedBytes();
bits(8*N) readdata;

memcpy.nzcv = PSTATE.<N,Z,C,V>;
memcpy.toaddress = X[memcpy.d, 64];
memcpy.fromaddress = X[memcpy.s, 64];

if memcpy.stage == MOPSStage_Prologue then
    memcpy.cpysize = UInt(X[memcpy.n, 64]);
else
    memcpy.cpysize = SInt(X[memcpy.n, 64]);

memcpy.implements_option_a = CPYOptionA();

boolean rprivileged = if options<1> == '1' then AArch64.IsUnprivAccessPriv() else PSTATE.EL != EL0;
boolean wprivileged = if options<0> == '1' then AArch64.IsUnprivAccessPriv() else PSTATE.EL != EL0;

AccessDescriptor raccdesc = CreateAccDescMOPS(MemOp_LOAD, rprivileged, rnontemporal);
AccessDescriptor waccdesc = CreateAccDescMOPS(MemOp_STORE, wprivileged, wnontemporal);

if memcpy.stage == MOPSStage_Prologue then
    if memcpy.cpysize > 0x007FFFFFFFFFFFFF then
        memcpy.cpysize = 0x007FFFFFFFFFFFFF;

    memcpy.forward = IsMemCpyForward(memcpy);

    if memcpy.implements_option_a then
        memcpy.nzcv = '0000';
        if memcpy.forward then
            // Copy in the forward direction offsets the arguments.
            memcpy.toaddress = memcpy.toaddress + memcpy.cpysize;
            memcpy.fromaddress = memcpy.fromaddress + memcpy.cpysize;
            memcpy.cpysize = 0 - memcpy.cpysize;
    else
        if !memcpy.forward then
            // Copy in the reverse direction offsets the arguments.
            memcpy.toaddress = memcpy.toaddress + memcpy.cpysize;
            memcpy.fromaddress = memcpy.fromaddress + memcpy.cpysize;
            memcpy.nzcv = '1010';
        else
            memcpy.nzcv = '0010';

memcpy.stagecpysize = MemCpyStageSize(memcpy);

if memcpy.stage != MOPSStage_Prologue then
    memcpy.forward = memcpy.cpysize < 0 || (!memcpy.implements_option_a && memcpy.nzcv<3> == '0');
    CheckMemCpyParams(memcpy, options);

integer copied;
boolean iswrite;
AddressDescriptor memaddrdesc;
PhysMemRetStatus memstatus;
boolean fault = FALSE;
integer B;

if memcpy.implements_option_a then
    while memcpy.stagecpysize != 0 && !fault do
        // IMP DEF selection of the block size that is worked on. While many
        // implementations might make this constant, that is not assumed.
        B = CPYSizeChoice(memcpy);

        if memcpy.forward then
            assert B <= -1 * memcpy.stagecpysize;
            (copied, iswrite, memaddrdesc, memstatus) = MemCpyBytes(memcpy.toaddress + memcpy.cpysize, memcpy.fromaddress + memcpy.cpysize, memcpy.forward, B, raccdesc, waccdesc);
            if copied != B then
                fault = TRUE;
            else
                memcpy.cpysize = memcpy.cpysize + B;
                memcpy.stagecpysize = memcpy.stagecpysize + B;

        else
            assert B <= memcpy.stagecpysize;
            memcpy.cpysize = memcpy.cpysize - B;
            memcpy.stagecpysize = memcpy.stagecpysize - B;

            (copied, iswrite, memaddrdesc, memstatus) = MemCpyBytes(memcpy.toaddress + memcpy.cpysize, memcpy.fromaddress + memcpy.cpysize, memcpy.forward, B, raccdesc, waccdesc);
            if copied != B then
                fault = TRUE;
                memcpy.cpysize = memcpy.cpysize + B;
                memcpy.stagecpysize = memcpy.stagecpysize + B;

else
    while memcpy.stagecpysize > 0 && !fault do
        // IMP DEF selection of the block size that is worked on. While many
        // implementations might make this constant, that is not assumed.
        B = CPYSizeChoice(memcpy);
        assert B <= memcpy.stagecpysize;

        if memcpy.forward then
            (copied, iswrite, memaddrdesc, memstatus) = MemCpyBytes(memcpy.toaddress, memcpy.fromaddress, memcpy.forward, B, raccdesc, waccdesc);
            if copied != B then
                fault = TRUE;
            else
                memcpy.fromaddress = memcpy.fromaddress + B;
                memcpy.toaddress = memcpy.toaddress + B;
        else
            (copied, iswrite, memaddrdesc, memstatus) = MemCpyBytes(memcpy.toaddress - B, memcpy.fromaddress - B, memcpy.forward, B, raccdesc, waccdesc);

            if copied != B then
                fault = TRUE;
            else
                memcpy.fromaddress = memcpy.fromaddress - B;
                memcpy.toaddress = memcpy.toaddress - B;

        if !fault then
            memcpy.cpysize = memcpy.cpysize - B;
            memcpy.stagecpysize = memcpy.stagecpysize - B;

UpdateCpyRegisters(memcpy, fault, copied);

if fault then
    if IsFault(memaddrdesc) then
        AArch64.Abort(memaddrdesc.vaddress, memaddrdesc.fault);
    else
        AccessDescriptor accdesc = if iswrite then waccdesc else raccdesc;
        HandleExternalAbort(memstatus, iswrite, memaddrdesc, B, accdesc);

if memcpy.stage == MOPSStage_Prologue then
    PSTATE.<N,Z,C,V> = memcpy.nzcv;