Floating-point fused multiply-add long to accumulator (vector)
This instruction multiplies corresponding half-precision floating-point values in the vectors in the two source SIMD&FP registers, and accumulates the intermediate product without rounding to the corresponding single-precision vector element of the destination SIMD&FP register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point exceptions and exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and Exception level, an attempt to execute the instruction might be trapped.
In Armv8.2 and Armv8.3, this is an OPTIONAL instruction. From Armv8.4, it is mandatory for all implementations to support it.
ID_AA64ISAR0_EL1.FHM indicates whether this instruction is supported.
Variants: FEAT_FHM (ARMv8.4)
31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | ||||||||||||||||
Q | U | S | sz | Rm | opcode | Rn | Rd |
---|
FMLAL <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>
if !IsFeatureImplemented(FEAT_FHM) then EndOfDecode(Decode_UNDEF); if sz == '1' then EndOfDecode(Decode_UNDEF); constant integer d = UInt(Rd); constant integer n = UInt(Rn); constant integer m = UInt(Rm); constant integer esize = 32; constant integer datasize = 64 << UInt(Q); constant integer elements = datasize DIV esize; constant integer part = 0;
Variants: FEAT_FHM (ARMv8.4)
31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | ||||||||||||||||
Q | U | S | sz | Rm | opcode | Rn | Rd |
---|
FMLAL2 <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>
if !IsFeatureImplemented(FEAT_FHM) then EndOfDecode(Decode_UNDEF); if sz == '1' then EndOfDecode(Decode_UNDEF); constant integer d = UInt(Rd); constant integer n = UInt(Rn); constant integer m = UInt(Rm); constant integer esize = 32; constant integer datasize = 64 << UInt(Q); constant integer elements = datasize DIV esize; constant integer part = 1;
CheckFPAdvSIMDEnabled64(); constant bits(datasize DIV 2) operand1 = Vpart[n, part, datasize DIV 2]; constant bits(datasize DIV 2) operand2 = Vpart[m, part, datasize DIV 2]; constant bits(datasize) operand3 = V[d, datasize]; bits(datasize) result; bits(esize DIV 2) element1; bits(esize DIV 2) element2; for e = 0 to elements-1 element1 = Elem[operand1, e, esize DIV 2]; element2 = Elem[operand2, e, esize DIV 2]; Elem[result, e, esize] = FPMulAddH(Elem[operand3, e, esize], element1, element2, FPCR); V[d, datasize] = result;