Arch86

Opcode	Encoding	16-bit	32-bit	64-bit	`CPUID` Feature Flag(s)	Description
`66 0F 3A 40 /r ib` `DPPS xmm1, xmm2/m128, imm8`	`rmi`	Invalid	Valid	Valid	`sse4.1`	Compute the dot product of packed double-precision floating-point values in xmm1 and xmm2/m128. Use imm8 to control the operation. Store the result in xmm1.
Opcode: `66 0F 3A 40 /r ib` Mnemonic: `DPPS xmm1, xmm2/m128, imm8` Encoding: `rmi` Validity (16/32/64 bit): invalid, valid, valid `CPUID` Feature Flag(s): `sse4.1` Compute the dot product of packed double-precision floating-point values in xmm1 and xmm2/m128. Use imm8 to control the operation. Store the result in xmm1.
`VEX.128.66.0F3A.WIG 40 /r ib` `VDPPS xmm1, xmm2, xmm3/m128, imm8`	`rvmi`	Invalid	Valid	Valid	`avx`	Compute the dot product of packed double-precision floating-point values in xmm2 and xmm3/m128. Use imm8 to control the operation. Store the result in xmm1.
Opcode: `VEX.128.66.0F3A.WIG 40 /r ib` Mnemonic: `VDPPS xmm1, xmm2, xmm3/m128, imm8` Encoding: `rvmi` Validity (16/32/64 bit): invalid, valid, valid `CPUID` Feature Flag(s): `avx` Compute the dot product of packed double-precision floating-point values in xmm2 and xmm3/m128. Use imm8 to control the operation. Store the result in xmm1.
`VEX.256.66.0F3A.WIG 40 /r ib` `VDPPS ymm1, ymm2, ymm3/m256, imm8`	`rvmi`	Invalid	Valid	Valid	`avx`	Compute two dot products of packed double-precision floating-point values in ymm2 and ymm3/m256. Use imm8 to control the operation. Store the result in ymm1.
Opcode: `VEX.256.66.0F3A.WIG 40 /r ib` Mnemonic: `VDPPS ymm1, ymm2, ymm3/m256, imm8` Encoding: `rvmi` Validity (16/32/64 bit): invalid, valid, valid `CPUID` Feature Flag(s): `avx` Compute two dot products of packed double-precision floating-point values in ymm2 and ymm3/m256. Use imm8 to control the operation. Store the result in ymm1.

Encoding

Encoding	Operand 1	Operand 2	Operand 3	Operand 4
`rmi`	`ModRM.reg[rw]`	`ModRM.r/m[r]`	`imm8`
`rvmi`	`ModRM.reg[w]`	`VEX.vvvv[r]`	`ModRM.r/m[r]`	`imm8`

Description

The (V)DPPD instruction conditionally computes the dot product of packed double-precision floating-point values from the two source operands. The operation is controlled by the immediate. The result is stored in the destination operand.

Beginning with a sum of 0, the immediate's bits are interpreted as per this table:

Bit	Meaning if Set	Meaning if Clear
`0`	Store the computed dot product in `dest(0..31)`	Store `0.0` in `dest(0..31)`
`1`	Store the computed dot product in `dest(32..63)`	Store `0.0` in `dest(32..63)`
`2`	Store the computed dot product in `dest(64..95)`	Store `0.0` in `dest(64..95)`
`3`	Store the computed dot product in `dest(96..127)`	Store `0.0` in `dest(96..127)`
`4`	Add `src1(0..31) × src2(0..31)` to the sum	Add `0.0` to the sum
`5`	Add `src1(32..63) × src2(32..63)` to the sum
`6`	Add `src1(64..95) × src2(64..95)` to the sum
`7`	Add `src1(96..127) × src2(96..127)` to the sum

The VEX.256 form of the instruction operates in a manner similar to the legacy SSE form (only on 128 bits), but on both halves of the operands. In other words, each bit of the immediate controls two operations – one for the lower half, and one for the upper half.

All forms except the legacy SSE one will zero the upper (untouched) bits.

Operation

public void DPPS(SimdF32 dest, SimdF32 src, byte imm8)
{
    // see note 1

    F32 partial0 = imm8.Bit[4] ? dest[0] * src[0] : 0.0;
    F32 partial1 = imm8.Bit[5] ? dest[1] * src[1] : 0.0;
    F32 partial2 = imm8.Bit[6] ? dest[2] * src[2] : 0.0;
    F32 partial3 = imm8.Bit[7] ? dest[3] * src[3] : 0.0;

    F32 sum = partial0 + partial1 + partial2 + partial3;

    dest[0] = imm8.Bit[0] ? sum : 0.0;
    dest[1] = imm8.Bit[1] ? sum : 0.0;
    dest[2] = imm8.Bit[2] ? sum : 0.0;
    dest[3] = imm8.Bit[3] ? sum : 0.0;

    // dest[4..] is unmodified
}

public void VDPPS_Vex128(SimdF32 dest, SimdF32 src1, SimdF32 src2, byte imm8)
{
    // see note 1

    F32 partial0 = imm8.Bit[4] ? src1[0] * src2[0] : 0.0;
    F32 partial1 = imm8.Bit[5] ? src1[1] * src2[1] : 0.0;
    F32 partial2 = imm8.Bit[6] ? src1[2] * src2[2] : 0.0;
    F32 partial3 = imm8.Bit[7] ? src1[3] * src2[3] : 0.0;

    F32 sum = partial0 + partial1 + partial2 + partial3;

    dest[0] = imm8.Bit[0] ? sum : 0.0;
    dest[1] = imm8.Bit[1] ? sum : 0.0;
    dest[2] = imm8.Bit[2] ? sum : 0.0;
    dest[3] = imm8.Bit[3] ? sum : 0.0;

    // dest[4..] is unmodified
}
public void VDPPS_Vex256(SimdF32 dest, SimdF32 src1, SimdF32 src2, byte imm8)
{
    // see note 1

    F32 partial00 = imm8.Bit[4] ? src1[0] * src2[0] : 0.0;
    F32 partial01 = imm8.Bit[5] ? src1[1] * src2[1] : 0.0;
    F32 partial02 = imm8.Bit[6] ? src1[2] * src2[2] : 0.0;
    F32 partial03 = imm8.Bit[7] ? src1[3] * src2[3] : 0.0;

    F32 partial10 = imm8.Bit[4] ? src1[4] * src2[0] : 0.0;
    F32 partial11 = imm8.Bit[5] ? src1[5] * src2[5] : 0.0;
    F32 partial12 = imm8.Bit[6] ? src1[6] * src2[6] : 0.0;
    F32 partial13 = imm8.Bit[7] ? src1[7] * src2[7] : 0.0;

    F32 sum0 = partial00 + partial01 + partial02 + partial03;
    F32 sum1 = partial10 + partial11 + partial12 + partial13;

    dest[0] = imm8.Bit[0] ? sum0 : 0.0;
    dest[1] = imm8.Bit[1] ? sum0 : 0.0;
    dest[2] = imm8.Bit[2] ? sum0 : 0.0;
    dest[3] = imm8.Bit[3] ? sum0 : 0.0;

    dest[4] = imm8.Bit[0] ? sum1 : 0.0;
    dest[5] = imm8.Bit[1] ? sum1 : 0.0;
    dest[6] = imm8.Bit[2] ? sum1 : 0.0;
    dest[7] = imm8.Bit[3] ? sum1 : 0.0;

    dest[8..] = 0;
}

The SIMD exception flags are updated after each multiplication (if it occurs), and after the addition. If an unmasked exception is reported during the multiplications, it will be raised before the sums. If the sums report an unmasked exception, it will be raised before the destination is updated. Any unmasked exceptions will leave the destination unmodified.

Intrinsics

__m128 _mm_dp_ps(__m128 a, __m128 b, const int mask)

__m256 _mm256_dp_ps(__m256 a, __m256 b, const int mask)

Exceptions

SIMD Floating-Point

#XM

#D - Denormal operand.
#I - Invalid operation.
#O - Numeric overflow.
#P - Inexact result.
#U - Numeric underflow.

Other Exceptions

VEX Encoded Form: See Type 2 Exception Conditions.