Opcode | Encoding | 16 bit | 32 bit | 64 bit | `CPUID` Feature Flag(s) | Description |
---|---|---|---|---|---|---|

`66 0F 3A 41 /r ` `DPPD ` | `rmi` | Invalid | Valid | Valid | `sse4.1` | Compute the dot product of packed double-precision floating-point values in xmm1 and xmm2/m128. Use bits `0..1` and `4..5` of imm8 to control the operation. Store the result in xmm1. |

`VEX.128.66.0F3A.WIG 41 /r ` `VDPPD ` | `rvmi` | Invalid | Valid | Valid | `avx` | Compute the dot product of packed double-precision floating-point values in xmm2 and xmm3/m128. Use bits `0..1` and `4..5` of imm8 to control the operation. Store the result in xmm1. |

## Encoding

Encoding | Operand 1 | Operand 2 | Operand 3 | Operand 4 |
---|---|---|---|---|

`rmi` | `ModRM.reg[rw]` | `ModRM.r/m[r]` | `imm8` | |

`rvmi` | `ModRM.reg[w]` | `VEX.vvvv[r]` | `ModRM.r/m[r]` | `imm8` |

## Description

The `(V)DPPD`

instruction conditionally computes the dot product of packed double-precision floating-point values from the two source operands. The operation is controlled by the bits `0..1`

and `4..5`

of the immediate. The result is stored in the destination operand.

Beginning with a sum of 0, the immediate's bits are interpreted as per this table:

Bit | Meaning if Set | Meaning if Clear |
---|---|---|

`0` | Store the computed dot product in `dest(0..63)` | Store `0.0` in `dest(0..63)` |

`1` | Store the computed dot product in `dest(64..127)` | Store `0.0` in `dest(64..127)` |

`2..3` | Reserved | |

`4` | Add `src1(0..63) × src2(0..63)` to the sum | Add `0.0` to the sum |

`5` | Add `src1(64..127) × src2(64..127)` to the sum | |

`6..7` | Reserved |

All forms except the legacy SSE one will zero the upper (untouched) bits.

## Operation

```
public void DPPD(SimdF64 dest, SimdF64 src, byte imm8)
{
// see note 1
F64 partial0 = imm8.Bit[4] ? dest[0] * src[0] : 0.0;
F64 partial1 = imm8.Bit[5] ? dest[1] * src[1] : 0.0;
F64 sum = partial0 + partial1;
dest[0] = imm8.Bit[0] ? sum : 0.0;
dest[1] = imm8.Bit[1] ? sum : 0.0;
// dest[2..] is unmodified
}
public void VDPPD_Vex128(SimdF64 dest, SimdF64 src1, SimdF64 src2, byte imm8)
{
// see note 1
F64 partial0 = imm8.Bit[4] ? src1[0] * src2[0] : 0.0;
F64 partial1 = imm8.Bit[5] ? src1[1] * src2[1] : 0.0;
F64 sum = partial0 + partial1;
dest[0] = imm8.Bit[0] ? sum : 0.0;
dest[1] = imm8.Bit[1] ? sum : 0.0;
dest[2..] = 0;
}
```

- The SIMD exception flags are updated after each multiplication (if it occurs), and after the addition. If an unmasked exception is reported during the multiplications, it will be raised before the sum. If the sum reports an unmasked exception, it will be raised before the destination is updated. Any unmasked exceptions will leave the destination unmodified.

## Intrinsics

`__m128d _mm_dp_pd(__m128d a, __m128d b, const int mask)`

## Exceptions

### SIMD Floating-Point

`#XM`

`#D`

- Denormal operand.`#I`

- Invalid operation.`#O`

- Numeric overflow.`#P`

- Inexact result.`#U`

- Numeric underflow.

### Other Exceptions

VEX Encoded Form: See Type 2 Exception Conditions.

`#UD`

- If
`VEX.L`

is not`0`

.