Arch86

Opcode	Encoding	16-bit	32-bit	64-bit	`CPUID` Feature Flag(s)	Description
`66 0F 3A 41 /r ib` `DPPD xmm1, xmm2/m128, imm8`	`rmi`	Invalid	Valid	Valid	`sse4.1`	Compute the dot product of packed double-precision floating-point values in xmm1 and xmm2/m128. Use bits `0..1` and `4..5` of imm8 to control the operation. Store the result in xmm1.
Opcode: `66 0F 3A 41 /r ib` Mnemonic: `DPPD xmm1, xmm2/m128, imm8` Encoding: `rmi` Validity (16/32/64 bit): invalid, valid, valid `CPUID` Feature Flag(s): `sse4.1` Compute the dot product of packed double-precision floating-point values in xmm1 and xmm2/m128. Use bits `0..1` and `4..5` of imm8 to control the operation. Store the result in xmm1.
`VEX.128.66.0F3A.WIG 41 /r ib` `VDPPD xmm1, xmm2, xmm3/m128, imm8`	`rvmi`	Invalid	Valid	Valid	`avx`	Compute the dot product of packed double-precision floating-point values in xmm2 and xmm3/m128. Use bits `0..1` and `4..5` of imm8 to control the operation. Store the result in xmm1.
Opcode: `VEX.128.66.0F3A.WIG 41 /r ib` Mnemonic: `VDPPD xmm1, xmm2, xmm3/m128, imm8` Encoding: `rvmi` Validity (16/32/64 bit): invalid, valid, valid `CPUID` Feature Flag(s): `avx` Compute the dot product of packed double-precision floating-point values in xmm2 and xmm3/m128. Use bits `0..1` and `4..5` of imm8 to control the operation. Store the result in xmm1.

Encoding

Encoding	Operand 1	Operand 2	Operand 3	Operand 4
`rmi`	`ModRM.reg[rw]`	`ModRM.r/m[r]`	`imm8`
`rvmi`	`ModRM.reg[w]`	`VEX.vvvv[r]`	`ModRM.r/m[r]`	`imm8`

Description

The (V)DPPD instruction conditionally computes the dot product of packed double-precision floating-point values from the two source operands. The operation is controlled by the bits 0..1 and 4..5 of the immediate. The result is stored in the destination operand.

Beginning with a sum of 0, the immediate's bits are interpreted as per this table:

Bit	Meaning if Set	Meaning if Clear
`0`	Store the computed dot product in `dest(0..63)`	Store `0.0` in `dest(0..63)`
`1`	Store the computed dot product in `dest(64..127)`	Store `0.0` in `dest(64..127)`
`2..3`	Reserved
`4`	Add `src1(0..63) × src2(0..63)` to the sum	Add `0.0` to the sum
`5`	Add `src1(64..127) × src2(64..127)` to the sum	Add `0.0` to the sum
`6..7`	Reserved

All forms except the legacy SSE one will zero the upper (untouched) bits.

Operation

public void DPPD(SimdF64 dest, SimdF64 src, byte imm8)
{
    // see note 1

    F64 partial0 = imm8.Bit[4] ? dest[0] * src[0] : 0.0;
    F64 partial1 = imm8.Bit[5] ? dest[1] * src[1] : 0.0;

    F64 sum = partial0 + partial1;

    dest[0] = imm8.Bit[0] ? sum : 0.0;
    dest[1] = imm8.Bit[1] ? sum : 0.0;

    // dest[2..] is unmodified
}

public void VDPPD_Vex128(SimdF64 dest, SimdF64 src1, SimdF64 src2, byte imm8)
{
    // see note 1

    F64 partial0 = imm8.Bit[4] ? src1[0] * src2[0] : 0.0;
    F64 partial1 = imm8.Bit[5] ? src1[1] * src2[1] : 0.0;

    F64 sum = partial0 + partial1;

    dest[0] = imm8.Bit[0] ? sum : 0.0;
    dest[1] = imm8.Bit[1] ? sum : 0.0;

    dest[2..] = 0;
}

The SIMD exception flags are updated after each multiplication (if it occurs), and after the addition. If an unmasked exception is reported during the multiplications, it will be raised before the sum. If the sum reports an unmasked exception, it will be raised before the destination is updated. Any unmasked exceptions will leave the destination unmodified.

Intrinsics

__m128d _mm_dp_pd(__m128d a, __m128d b, const int mask)

Exceptions

SIMD Floating-Point

#XM

#D - Denormal operand.
#I - Invalid operation.
#O - Numeric overflow.
#P - Inexact result.
#U - Numeric underflow.

Other Exceptions

VEX Encoded Form: See Type 2 Exception Conditions.

#UD

If VEX.L is not 0.