Opcode | Encoding | 16-bit | 32-bit | 64-bit | CPUID Feature Flag(s) | Description |
---|---|---|---|---|---|---|
NP 0F 58 /r ADDPD xmm1, xmm2/m128 | rm | Invalid | Valid | Valid | sse | Add packed single-precision floating-point values from xmm1 and xmm2/m128. Store the result in xmm1. |
VEX.128.NP.0F.WIG 58 /r VADDPS xmm1, xmm2, xmm3/m128 | rvm | Invalid | Valid | Valid | avx | Add packed single-precision floating-point values from xmm2 and xmm3/m128. Store the result in xmm1. |
VEX.256.NP.0F.WIG 58 /r VADDPS ymm1, ymm2, ymm3/m256 | rvm | Invalid | Valid | Valid | avx | Add packed single-precision floating-point values from ymm2 and ymm3/m256. Store the result in ymm1. |
EVEX.128.NP.0F.W0 58 /r VADDPS xmm1 {k1}{z}, xmm2, xmm3/m128/bcst32 | ervm | Invalid | Valid | Valid | avx512-f avx512-vl | Add packed single-precision floating-point values from xmm2 and xmm3/m128/bcst64. Store the result in xmm1. |
EVEX.256.NP.0F.W0 58 /r VADDPS ymm1 {k1}{z}, ymm2, ymm3/m256/bcst32 | ervm | Invalid | Valid | Valid | avx512-f avx512-vl | Add packed single-precision floating-point values from ymm2 and ymm3/m256/bcst64. Store the result in ymm1. |
EVEX.512.NP.0F.W0 58 /r VADDPS zmm1 {k1}{z}, zmm2, zmm3/m512/bcst32{er} | ervm | Invalid | Valid | Valid | avx512-f | Add packed single-precision floating-point values from zmm2 and zmm3/m512/bcst64. Store the result in zmm1. |
Encoding
Encoding | Operand 1 | Operand 2 | Operand 3 | Operand 4 |
---|---|---|---|---|
rm | n/a | ModRM.reg[rw] | ModRM.r/m[r] | |
rvm | n/a | ModRM.reg[rw] | VEX.vvvv[r] | ModRM.r/m[r] |
ervm | full | ModRM.reg[rw] | EVEX.vvvvv[r] | ModRM.r/m[r] |
Description
The (V)ADDPS
instruction adds four, eight, or 16 single-precision floating-point values from the two source operands. The result is stored in the destination operand.
All forms except the legacy SSE one will zero the upper (untouched) bits.
Operation
public void ADDPS(SimdF32 dest, SimdF32 src)
{
dest[0] += src[0];
dest[1] += src[1];
dest[2] += src[2];
dest[3] += src[3];
// dest[4..] is unmodified
}
void VADDPS_Vex(SimdF32 dest, SimdF32 src1, SimdF32 src2, int kl)
{
for (int n = 0; n < kl; n++)
dest[n] = src1[n] + src2[n];
dest[kl..] = 0;
}
public void VADDPS_Vex128(SimdF32 dest, SimdF32 src1, SimdF32 src2) =>
VADDPS_Vex(dest, src1, src2, 4);
public void VADDPS_Vex256(SimdF32 dest, SimdF32 src1, SimdF32 src2) =>
VADDPS_Vex(dest, src1, src2, 8);
void VADDPS_EvexMemory(SimdF32 dest, SimdF32 src1, SimdF32 src2, KMask k, int kl)
{
for (int n = 0; n < kl; n++)
{
if (k[n])
dest[n] = src1[n] + (EVEX.b ? src2[0] : src2[n]);
else if (EVEX.z)
dest[n] = 0;
// otherwise unchanged
}
dest[kl..] = 0;
}
public void VADDPS_Evex128Memory(SimdF32 dest, SimdF32 src1, SimdF32 src2, KMask k) =>
VADDPS_EvexMemory(dest, src1, src2, k, 4);
public void VADDPS_Evex256Memory(SimdF32 dest, SimdF32 src1, SimdF32 src2, KMask k) =>
VADDPS_EvexMemory(dest, src1, src2, k, 8);
public void VADDPS_Evex512Memory(SimdF32 dest, SimdF32 src1, SimdF32 src2, KMask k) =>
VADDPS_EvexMemory(dest, src1, src2, k, 16);
void VADDPS_EvexRegister(SimdF32 dest, SimdF32 src1, SimdF32 src2, KMask k, int kl)
{
if (kl == 16 && EVEX.b)
OverrideRoundingModeForThisInstruction(EVEX.rc);
for (int n = 0; n < kl; n++)
{
if (k[n])
dest[n] = src1[n] + src2[n];
else if (EVEX.z)
dest[n] = 0;
// otherwise unchanged
}
dest[kl..] = 0;
}
public void VADDPS_Evex128Register(SimdF32 dest, SimdF32 src1, SimdF32 src2, KMask k) =>
VADDPS_EvexRegister(dest, src1, src2, k, 4);
public void VADDPS_Evex256Register(SimdF32 dest, SimdF32 src1, SimdF32 src2, KMask k) =>
VADDPS_EvexRegister(dest, src1, src2, k, 8);
public void VADDPS_Evex512Register(SimdF32 dest, SimdF32 src1, SimdF32 src2, KMask k) =>
VADDPS_EvexRegister(dest, src1, src2, k, 16);
Intrinsics
__m128d _mm_add_ps(__m128d a, __m128d b)
__m128d _mm_mask_add_ps(__m128d s, __mmask8 k, __m128d a, __m128d b)
__m128d _mm_maskz_add_ps(__mmask8 k, __m128d a, __m128d b)
__m256d _mm256_add_ps(__m256d a, __m256d b)
__m256d _mm256_mask_add_ps(__m256d s, __mmask8 k, __m256d a, __m256d b)
__m256d _mm256_maskz_add_ps(__mmask8 k, __m256d a, __m256d b)
__m512d _mm512_add_ps(__m512d a, __m512d b)
__m512d _mm512_add_round_ps(__m512d a, __m512d b, const int rounding)
__m512d _mm512_mask_add_ps(__m512d s, __mmask8 k, __m512d a, __m512d b)
__m512d _mm512_mask_add_round_ps(__m512d s, __mmask8 k, __m512d a, __m512d b, const int rounding)
__m512d _mm512_maskz_add_ps(__mmask8 k, __m512d a, __m512d b)
__m512d _mm512_maskz_add_round_ps(__mmask8 k, __m512d a, __m512d b, const int rounding)
Exceptions
SIMD Floating-Point
#XM
#D
- Denormal operand.#I
- Invalid operation.#O
- Numeric overflow.#P
- Inexact result.#U
- Numeric underflow.
Other Exceptions
VEX Encoded Form: See Type 2 Exception Conditions.
EVEX Encoded Form: See Type E2 Exception Conditions.