Opcode | Encoding | 16-bit | 32-bit | 64-bit | CPUID Feature Flag(s) | Description |
---|---|---|---|---|---|---|
66 0F 3A 41 /r ib DPPD xmm1, xmm2/m128, imm8 | rmi | Invalid | Valid | Valid | sse4.1 | Compute the dot product of packed double-precision floating-point values in xmm1 and xmm2/m128. Use bits 0..1 and 4..5 of imm8 to control the operation. Store the result in xmm1. |
VEX.128.66.0F3A.WIG 41 /r ib VDPPD xmm1, xmm2, xmm3/m128, imm8 | rvmi | Invalid | Valid | Valid | avx | Compute the dot product of packed double-precision floating-point values in xmm2 and xmm3/m128. Use bits 0..1 and 4..5 of imm8 to control the operation. Store the result in xmm1. |
Encoding
Encoding | Operand 1 | Operand 2 | Operand 3 | Operand 4 |
---|---|---|---|---|
rmi | ModRM.reg[rw] | ModRM.r/m[r] | imm8 | |
rvmi | ModRM.reg[w] | VEX.vvvv[r] | ModRM.r/m[r] | imm8 |
Description
The (V)DPPD
instruction conditionally computes the dot product of packed double-precision floating-point values from the two source operands. The operation is controlled by the bits 0..1
and 4..5
of the immediate. The result is stored in the destination operand.
Beginning with a sum of 0, the immediate's bits are interpreted as per this table:
Bit | Meaning if Set | Meaning if Clear |
---|---|---|
0 | Store the computed dot product in dest(0..63) | Store 0.0 in dest(0..63) |
1 | Store the computed dot product in dest(64..127) | Store 0.0 in dest(64..127) |
2..3 | Reserved | |
4 | Add src1(0..63) × src2(0..63) to the sum | Add 0.0 to the sum |
5 | Add src1(64..127) × src2(64..127) to the sum | |
6..7 | Reserved |
All forms except the legacy SSE one will zero the upper (untouched) bits.
Operation
public void DPPD(SimdF64 dest, SimdF64 src, byte imm8)
{
// see note 1
F64 partial0 = imm8.Bit[4] ? dest[0] * src[0] : 0.0;
F64 partial1 = imm8.Bit[5] ? dest[1] * src[1] : 0.0;
F64 sum = partial0 + partial1;
dest[0] = imm8.Bit[0] ? sum : 0.0;
dest[1] = imm8.Bit[1] ? sum : 0.0;
// dest[2..] is unmodified
}
public void VDPPD_Vex128(SimdF64 dest, SimdF64 src1, SimdF64 src2, byte imm8)
{
// see note 1
F64 partial0 = imm8.Bit[4] ? src1[0] * src2[0] : 0.0;
F64 partial1 = imm8.Bit[5] ? src1[1] * src2[1] : 0.0;
F64 sum = partial0 + partial1;
dest[0] = imm8.Bit[0] ? sum : 0.0;
dest[1] = imm8.Bit[1] ? sum : 0.0;
dest[2..] = 0;
}
- The SIMD exception flags are updated after each multiplication (if it occurs), and after the addition. If an unmasked exception is reported during the multiplications, it will be raised before the sum. If the sum reports an unmasked exception, it will be raised before the destination is updated. Any unmasked exceptions will leave the destination unmodified.
Intrinsics
__m128d _mm_dp_pd(__m128d a, __m128d b, const int mask)
Exceptions
SIMD Floating-Point
#XM
#D
- Denormal operand.#I
- Invalid operation.#O
- Numeric overflow.#P
- Inexact result.#U
- Numeric underflow.
Other Exceptions
VEX Encoded Form: See Type 2 Exception Conditions.
#UD
- If
VEX.L
- is not
0
- .