Dot Product of Packed Single-Precision Floating-Point Values

Encoding

EncodingOperand 1Operand 2Operand 3Operand 4
rmiModRM.reg[rw]ModRM.r/m[r]imm8
rvmiModRM.reg[w]VEX.vvvv[r]ModRM.r/m[r]imm8

Description

The (V)DPPD instruction conditionally computes the dot product of packed double-precision floating-point values from the two source operands. The operation is controlled by the immediate. The result is stored in the destination operand.

Beginning with a sum of 0, the immediate's bits are interpreted as per this table:

BitMeaning if SetMeaning if Clear
0Store the computed dot product in dest(0..31)Store 0.0 in dest(0..31)
1Store the computed dot product in dest(32..63)Store 0.0 in dest(32..63)
2Store the computed dot product in dest(64..95)Store 0.0 in dest(64..95)
3Store the computed dot product in dest(96..127)Store 0.0 in dest(96..127)
4Add src1(0..31) × src2(0..31) to the sumAdd 0.0 to the sum
5Add src1(32..63) × src2(32..63) to the sum
6Add src1(64..95) × src2(64..95) to the sum
7Add src1(96..127) × src2(96..127) to the sum

The VEX.256 form of the instruction operates in a manner similar to the legacy SSE form (only on 128 bits), but on both halves of the operands. In other words, each bit of the immediate controls two operations – one for the lower half, and one for the upper half.

All forms except the legacy SSE one will zero the upper (untouched) bits.

Operation

public void DPPS(SimdF32 dest, SimdF32 src, byte imm8)
{
    // see note 1

    F32 partial0 = imm8.Bit[4] ? dest[0] * src[0] : 0.0;
    F32 partial1 = imm8.Bit[5] ? dest[1] * src[1] : 0.0;
    F32 partial2 = imm8.Bit[6] ? dest[2] * src[2] : 0.0;
    F32 partial3 = imm8.Bit[7] ? dest[3] * src[3] : 0.0;

    F32 sum = partial0 + partial1 + partial2 + partial3;

    dest[0] = imm8.Bit[0] ? sum : 0.0;
    dest[1] = imm8.Bit[1] ? sum : 0.0;
    dest[2] = imm8.Bit[2] ? sum : 0.0;
    dest[3] = imm8.Bit[3] ? sum : 0.0;

    // dest[4..] is unmodified
}

public void VDPPS_Vex128(SimdF32 dest, SimdF32 src1, SimdF32 src2, byte imm8)
{
    // see note 1

    F32 partial0 = imm8.Bit[4] ? src1[0] * src2[0] : 0.0;
    F32 partial1 = imm8.Bit[5] ? src1[1] * src2[1] : 0.0;
    F32 partial2 = imm8.Bit[6] ? src1[2] * src2[2] : 0.0;
    F32 partial3 = imm8.Bit[7] ? src1[3] * src2[3] : 0.0;

    F32 sum = partial0 + partial1 + partial2 + partial3;

    dest[0] = imm8.Bit[0] ? sum : 0.0;
    dest[1] = imm8.Bit[1] ? sum : 0.0;
    dest[2] = imm8.Bit[2] ? sum : 0.0;
    dest[3] = imm8.Bit[3] ? sum : 0.0;

    // dest[4..] is unmodified
}
public void VDPPS_Vex256(SimdF32 dest, SimdF32 src1, SimdF32 src2, byte imm8)
{
    // see note 1

    F32 partial00 = imm8.Bit[4] ? src1[0] * src2[0] : 0.0;
    F32 partial01 = imm8.Bit[5] ? src1[1] * src2[1] : 0.0;
    F32 partial02 = imm8.Bit[6] ? src1[2] * src2[2] : 0.0;
    F32 partial03 = imm8.Bit[7] ? src1[3] * src2[3] : 0.0;

    F32 partial10 = imm8.Bit[4] ? src1[4] * src2[0] : 0.0;
    F32 partial11 = imm8.Bit[5] ? src1[5] * src2[5] : 0.0;
    F32 partial12 = imm8.Bit[6] ? src1[6] * src2[6] : 0.0;
    F32 partial13 = imm8.Bit[7] ? src1[7] * src2[7] : 0.0;

    F32 sum0 = partial00 + partial01 + partial02 + partial03;
    F32 sum1 = partial10 + partial11 + partial12 + partial13;

    dest[0] = imm8.Bit[0] ? sum0 : 0.0;
    dest[1] = imm8.Bit[1] ? sum0 : 0.0;
    dest[2] = imm8.Bit[2] ? sum0 : 0.0;
    dest[3] = imm8.Bit[3] ? sum0 : 0.0;

    dest[4] = imm8.Bit[0] ? sum1 : 0.0;
    dest[5] = imm8.Bit[1] ? sum1 : 0.0;
    dest[6] = imm8.Bit[2] ? sum1 : 0.0;
    dest[7] = imm8.Bit[3] ? sum1 : 0.0;

    dest[8..] = 0;
}
  1. The SIMD exception flags are updated after each multiplication (if it occurs), and after the addition. If an unmasked exception is reported during the multiplications, it will be raised before the sums. If the sums report an unmasked exception, it will be raised before the destination is updated. Any unmasked exceptions will leave the destination unmodified.

Intrinsics

Exceptions

SIMD Floating-Point

#XM
  • #D - Denormal operand.
  • #I - Invalid operation.
  • #O - Numeric overflow.
  • #P - Inexact result.
  • #U - Numeric underflow.

Other Exceptions

VEX Encoded Form: See Type 2 Exception Conditions.