LDBFMAX, LDBFMAXA, LDBFMAXAL, LDBFMAXL

BFloat16 floating-point atomic maximum in memory

This instruction atomically loads a 16-bit value from memory, computes the BFloat16 maximum with the value held in a register, and stores the result back to memory. The value initially loaded from memory is returned in the destination register.

LDBFMAXA and LDBFMAXAL load from memory with acquire semantics.

LDBFMAXL and LDBFMAXAL store to memory with release semantics.

LDBFMAX has neither acquire nor release semantics.

This instruction:

Disables alternative floating-point behaviors, as if FPCR.AH is 0.

Generates only the default NaN, as if FPCR.DN is 1.

Does not modify the cumulative FPSR exception bits (IDC, IXC, UFC, OFC, DZC, and IOC).

Disables trapped floating-point exceptions, as if the FPCR trap enable bits (IDE, IXE, UFE, OFE, DZE, and IOE) are all zero.

For more information about memory ordering semantics, see Load-Acquire, Store-Release.

For information about addressing modes, see Load/Store addressing modes.

Encoding: Floating-point

Variants: FEAT_LSFE (ARMv9.6)

size					VR			A	R		Rs					o3	opc					Rn					Rt
31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
0	0	1	1	1	1	0	0			1						0	1	0	0	0	0

No memory ordering (A == 0 && R == 0)

LDBFMAX <Hs>, <Ht>, [<Xn|SP>]

Acquire (A == 1 && R == 0)

LDBFMAXA <Hs>, <Ht>, [<Xn|SP>]

Acquire-release (A == 1 && R == 1)

LDBFMAXAL <Hs>, <Ht>, [<Xn|SP>]

Release (A == 0 && R == 1)

LDBFMAXL <Hs>, <Ht>, [<Xn|SP>]

Decoding algorithm

if !IsFeatureImplemented(FEAT_LSFE) then EndOfDecode(Decode_UNDEF);

constant integer t = UInt(Rt);
constant integer n = UInt(Rn);
constant integer s = UInt(Rs);

constant integer datasize = 16;
constant boolean acquire = A == '1';
constant boolean release = R == '1';
constant boolean tagchecked = n != 31;

Operation

CheckFPEnabled64();
bits(64) address;
bits(datasize) value;
bits(datasize) data;
constant AccessDescriptor accdesc = CreateAccDescFPAtomicOp(MemAtomicOp_BFMAX, acquire,
                                                            release, tagchecked);

value = V[s, datasize];
if n == 31 then
    CheckSPAlignment();
    address = SP[64];
else
    address = X[n, 64];

constant bits(datasize) comparevalue = bits(datasize) UNKNOWN; // Irrelevant when not executing CAS
data = MemAtomic(address, comparevalue, value, accdesc);

V[t, datasize] = data;

Explanations

<Hs>: Is the 16-bit name of the SIMD&FP register holding the data value to be operated on with the contents of the memory location, encoded in the "Rs" field.
<Ht>: Is the 16-bit name of the SIMD&FP register to be loaded, encoded in the "Rt" field.
<Xn|SP>: Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.

size					VR			A	R		Rs					o3	opc					Rn					Rt
31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
0	0	1	1	1	1	0	0			1						0	1	0	0	0	0

size					VR			A	R		Rs					o3	opc					Rn					Rt
31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
0	0	1	1	1	1	0	0			1						0	1	0	0	0	0

size					VR			A	R		Rs					o3	opc					Rn					Rt
31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
0	0	1	1	1	1	0	0			1						0	1	0	0	0	0