Intel® Architecture Instruction Set Extensions Programming Reference Intel Manual
User Manual:
Open the PDF directly: View PDF
Page Count: 1178 [warning: Documents this large are best viewed by clicking the View PDF Link!]
- Chapter 1 Future Intel® Architecture Instruction Extensions
- Chapter 2 Intel® AVX-512 Application Programming Model
- 2.1 Detection of AVX-512 Foundation Instructions
- 2.2 Detection of 512-bit Instruction Groups of Intel® AVX-512 Family
- 2.3 Detection of Intel AVX-512 Instruction Groups Operating at 256 and 128-bit Vector Lengths
- 2.4 Accessing XMM, YMM AND ZMM Registers
- 2.5 Enhanced Vector Programming Environment Using EVEX Encoding
- 2.6 Memory Alignment
- 2.7 SIMD Floating-Point Exceptions
- 2.8 Instruction Exception Specification
- 2.9 CPUID Instruction
- Chapter 3 System Programming For Intel® AVX-512
- 3.1 AVX-512 State, EVEX Prefix and Supported Operating Modes
- 3.2 AVX-512 State Management
- 3.2.1 Detection of ZMM and Opmask State Support
- 3.2.2 Enabling of ZMM and Opmask Register State
- 3.2.3 Enabling of SIMD Floating-Exception Support
- 3.2.4 The Layout of XSAVE Sate Save Area
- 3.2.5 XSAVE/XRSTOR Interaction with YMM State and MXCSR
- 3.2.6 XSAVE/XRSTOR/XSAVEOPT and Managing ZMM and Opmask States
- 3.3 Reset Behavior
- 3.4 Emulation
- 3.5 Writing floating-point exception handlers
- Chapter 4 AVX-512 Instruction Encoding
- 4.1 Overview Section
- 4.2 Instruction Format and EVEX
- 4.3 Register Specifier Encoding and EVEX
- 4.4 MAsking support in EVEX
- 4.5 Compressed displacement (disp8*N) support in EVEX
- 4.6 EVEX encoding of broadcast/Rounding/SAE Support
- 4.7 #UD equations for EVEX
- 4.8 Device Not Available
- 4.9 Scalar Instructions
- 4.10 Exception Classifications of EVEX-Encoded instructions
- 4.10.1 Exceptions Type E1 and E1NF of EVEX-Encoded Instructions
- 4.10.2 Exceptions Type E2 of EVEX-Encoded Instructions
- 4.10.3 Exceptions Type E3 and E3NF of EVEX-Encoded Instructions
- 4.10.4 Exceptions Type E4 and E4NF of EVEX-Encoded Instructions
- 4.10.5 Exceptions Type E5 and E5NF
- 4.10.6 Exceptions Type E6 and E6NF
- 4.10.7 Exceptions Type E7NM
- 4.10.8 Exceptions Type E9 and E9NF
- 4.10.9 Exceptions Type E10
- 4.10.10 Exception Type E11 (EVEX-only, mem arg no AC, floating-point exceptions)
- 4.10.11 Exception Type E12 and E12NP (VSIB mem arg, no AC, no floating-point exceptions)
- 4.11 Exception Classifications of Opmask instructions
- Chapter 5 Instruction Set Reference, A-Z
- 5.1 Interpreting InstructIon Reference Pages
- 5.2 Summary of Terms
- 5.3 Ternary Bit Vector Logic Table
- 5.4 Instruction SET Reference
- ADDPD—Add Packed Double-Precision Floating-Point Values
- ADDPS—Add Packed Single-Precision Floating-Point Values
- ADDSD—Add Scalar Double-Precision Floating-Point Values
- ADDSS—Add Scalar Single-Precision Floating-Point Values
- VALIGND/VALIGNQ—Align Doubleword/Quadword Vectors
- VBLENDMPD/VBLENDMPS—Blend Float64/Float32 Vectors Using an OpMask Control
- VPBLENDMB/VPBLENDMW—Blend Byte/Word Vectors Using an Opmask Control
- VPBLENDMD/VPBLENDMQ—Blend Int32/Int64 Vectors Using an OpMask Control
- ANDPD—Bitwise Logical AND of Packed Double Precision Floating-Point Values
- ANDPS—Bitwise Logical AND of Packed Single Precision Floating-Point Values
- ANDNPD—Bitwise Logical AND NOT of Packed Double Precision Floating-Point Values
- ANDNPS—Bitwise Logical AND NOT of Packed Single Precision Floating-Point Values
- VBROADCAST—Load with Broadcast Floating-Point Data
- VPBROADCASTB/W/D/Q—Load with Broadcast Integer Data from General Purpose Register
- VPBROADCAST—Load Integer and Broadcast
- CMPPD—Compare Packed Double-Precision Floating-Point Values
- CMPPS—Compare Packed Single-Precision Floating-Point Values
- CMPSD—Compare Scalar Double-Precision Floating-Point Value
- CMPSS—Compare Scalar Single-Precision Floating-Point Value
- COMISD—Compare Scalar Ordered Double-Precision Floating-Point Values and Set EFLAGS
- COMISS—Compare Scalar Ordered Single-Precision Floating-Point Values and Set EFLAGS
- DIVPD—Divide Packed Double-Precision Floating-Point Values
- DIVPS—Divide Packed Single-Precision Floating-Point Values
- DIVSD—Divide Scalar Double-Precision Floating-Point Value
- DIVSS—Divide Scalar Single-Precision Floating-Point Values
- VCOMPRESSPD—Store Sparse Packed Double-Precision Floating-Point Values into Dense Memory
- VCOMPRESSPS—Store Sparse Packed Single-Precision Floating-Point Values into Dense Memory
- CVTDQ2PD—Convert Packed Doubleword Integers to Packed Double-Precision Floating-Point Values
- CVTDQ2PS—Convert Packed Doubleword Integers to Packed Single-Precision Floating-Point Values
- CVTPD2DQ—Convert Packed Double-Precision Floating-Point Values to Packed Doubleword Integers
- CVTPD2PS—Convert Packed Double-Precision Floating-Point Values to Packed Single-Precision Floating-Point Values
- VCVTPD2QQ—Convert Packed Double-Precision Floating-Point Values to Packed Quadword Integers
- VCVTPD2UDQ—Convert Packed Double-Precision Floating-Point Values to Packed Unsigned Doubleword Integers
- VCVTPD2UQQ—Convert Packed Double-Precision Floating-Point Values to Packed Unsigned Quadword Integers
- VCVTPH2PS—Convert 16-bit FP values to Single-Precision FP values
- VCVTPS2PH—Convert Single-Precision FP value to 16-bit FP value
- CVTPS2DQ—Convert Packed Single-Precision Floating-Point Values to Packed Signed Doubleword Integer Values
- VCVTPS2UDQ—Convert Packed Single-Precision Floating-Point Values to Packed Unsigned Doubleword Integer Values
- VCVTPS2QQ—Convert Packed Single Precision Floating-Point Values to Packed Singed Quadword Integer Values
- VCVTPS2UQQ—Convert Packed Single Precision Floating-Point Values to Packed Unsigned Quadword Integer Values
- CVTPS2PD—Convert Packed Single-Precision Floating-Point Values to Packed Double-Precision Floating-Point Values
- VCVTQQ2PD—Convert Packed Quadword Integers to Packed Double-Precision Floating-Point Values
- VCVTQQ2PS—Convert Packed Quadword Integers to Packed Single-Precision Floating-Point Values
- CVTSD2SI—Convert Scalar Double-Precision Floating-Point Value to Doubleword Integer
- VCVTSD2USI—Convert Scalar Double-Precision Floating-Point Value to Unsigned Doubleword Integer
- CVTSD2SS—Convert Scalar Double-Precision Floating-Point Value to Scalar Single-Precision Floating-Point Value
- CVTSI2SD—Convert Doubleword Integer to Scalar Double-Precision Floating-Point Value
- CVTSI2SS—Convert Doubleword Integer to Scalar Single-Precision Floating-Point Value
- CVTSS2SD—Convert Scalar Single-Precision Floating-Point Value to Scalar Double-Precision Floating-Point Value
- CVTSS2SI—Convert Scalar Single-Precision Floating-Point Value to Doubleword Integer
- VCVTSS2USI—Convert Scalar Single-Precision Floating-Point Value to Unsigned Doubleword Integer
- CVTTPD2DQ—Convert with Truncation Packed Double-Precision Floating-Point Values to Packed Doubleword Integers
- VCVTTPD2QQ—Convert with Truncation Packed Double-Precision Floating-Point Values to Packed Quadword Integers
- VCVTTPD2UDQ—Convert with Truncation Packed Double-Precision Floating-Point Values to Packed Unsigned Doubleword Integers
- VCVTTPD2UQQ—Convert with Truncation Packed Double-Precision Floating-Point Values to Packed Unsigned Quadword Integers
- CVTTPS2DQ—Convert with Truncation Packed Single-Precision Floating-Point Values to Packed Signed Doubleword Integer Values
- VCVTTPS2UDQ—Convert with Truncation Packed Single-Precision Floating-Point Values to Packed Unsigned Doubleword Integer Values
- VCVTTPS2QQ—Convert with Truncation Packed Single Precision Floating-Point Values to Packed Singed Quadword Integer Values
- VCVTTPS2UQQ—Convert with Truncation Packed Single Precision Floating-Point Values to Packed Unsigned Quadword Integer Values
- CVTTSD2SI—Convert with Truncation Scalar Double-Precision Floating-Point Value to Signed Integer
- VCVTTSD2USI—Convert with Truncation Scalar Double-Precision Floating-Point Value to Unsigned Integer
- CVTTSS2SI—Convert with Truncation Scalar Single-Precision Floating-Point Value to Integer
- VCVTTSS2USI—Convert with Truncation Scalar Single-Precision Floating-Point Value to Unsigned Integer
- VCVTUDQ2PD—Convert Packed Unsigned Doubleword Integers to Packed Double-Precision Floating-Point Values
- VCVTUDQ2PS—Convert Packed Unsigned Doubleword Integers to Packed Single-Precision Floating-Point Values
- VCVTUQQ2PD—Convert Packed Unsigned Quadword Integers to Packed Double-Precision Floating-Point Values
- VCVTUQQ2PS—Convert Packed Unsigned Quadword Integers to Packed Single-Precision Floating-Point Values
- VCVTUSI2SD—Convert Unsigned Integer to Scalar Double-Precision Floating-Point Value
- VCVTUSI2SS—Convert Unsigned Integer to Scalar Single-Precision Floating-Point Value
- VDBPSADBW—Double Block Packed Sum-Absolute-Differences (SAD) on Unsigned Bytes
- VEXPANDPD—Load Sparse Packed Double-Precision Floating-Point Values from Dense Memory
- VEXPANDPS—Load Sparse Packed Single-Precision Floating-Point Values from Dense Memory
- VEXTRACTF128/VEXTRACTF32x4/VEXTRACTF64x2/VEXTRACTF32x8/VEXTRACTF64x4—Extr act Packed Floating-Point Values
- VEXTRACTI128/VEXTRACTI32x4/VEXTRACTI64x2/VEXTRACTI32x8/VEXTRACTI64x4—Extract packed Integer Values
- EXTRACTPS—Extract Packed Floating-Point Values
- VFIXUPIMMPD—Fix Up Special Packed Float64 Values
- VFIXUPIMMPS—Fix Up Special Packed Float32 Values
- VFIXUPIMMSD—Fix Up Special Scalar Float64 Value
- VFIXUPIMMSS—Fix Up Special Scalar Float32 Value
- VFMADD132PD/VFMADD213PD/VFMADD231PD—Fused Multiply-Add of Packed Double- Precision Floating-Point Values
- VFMADD132PS/VFMADD213PS/VFMADD231PS—Fused Multiply-Add of Packed Single- Precision Floating-Point Values
- VFMADD132SD/VFMADD213SD/VFMADD231SD—Fused Multiply-Add of Scalar Double- Precision Floating-Point Values
- VFMADD132SS/VFMADD213SS/VFMADD231SS—Fused Multiply-Add of Scalar Single-Precision Floating-Point Values
- VFMADDSUB132PD/VFMADDSUB213PD/VFMADDSUB231PD—Fused Multiply-Alternating Add/Subtract of Packed Double-Precision Floating-Point Values
- VFMADDSUB132PS/VFMADDSUB213PS/VFMADDSUB231PS—Fused Multiply-Alternating Add/Subtract of Packed Single-Precision Floating-Point Values
- VFMSUBADD132PD/VFMSUBADD213PD/VFMSUBADD231PD—Fused Multiply-Alternating Subtract/Add of Packed Double-Precision Floating-Point Values
- VFMSUBADD132PS/VFMSUBADD213PS/VFMSUBADD231PS—Fused Multiply-Alternating Subtract/Add of Packed Single-Precision Floating-Point Values
- VFMSUB132PD/VFMSUB213PD/VFMSUB231PD—Fused Multiply-Subtract of Packed Double- Precision Floating-Point Values
- VFMSUB132PS/VFMSUB213PS/VFMSUB231PS—Fused Multiply-Subtract of Packed Single- Precision Floating-Point Values
- VFMSUB132SD/VFMSUB213SD/VFMSUB231SD—Fused Multiply-Subtract of Scalar Double- Precision Floating-Point Values
- VFMSUB132SS/VFMSUB213SS/VFMSUB231SS—Fused Multiply-Subtract of Scalar Single- Precision Floating-Point Values
- VFNMADD132PD/VFNMADD213PD/VFNMADD231PD—Fused Negative Multiply-Add of Packed Double-Precision Floating-Point Values
- VFNMADD132PS/VFNMADD213PS/VFNMADD231PS—Fused Negative Multiply-Add of Packed Single-Precision Floating-Point Values
- VFNMADD132SD/VFNMADD213SD/VFNMADD231SD—Fused Negative Multiply-Add of Scalar Double-Precision Floating-Point Values
- VFNMADD132SS/VFNMADD213SS/VFNMADD231SS—Fused Negative Multiply-Add of Scalar Single-Precision Floating-Point Values
- VFNMSUB132PD/VFNMSUB213PD/VFNMSUB231PD—Fused Negative Multiply-Subtract of Packed Double-Precision Floating-Point Values
- VFNMSUB132PS/VFNMSUB213PS/VFNMSUB231PS—Fused Negative Multiply-Subtract of Packed Single-Precision Floating-Point Values
- VFNMSUB132SD/VFNMSUB213SD/VFNMSUB231SD—Fused Negative Multiply-Subtract of Scalar Double-Precision Floating-Point Values
- VFNMSUB132SS/VFNMSUB213SS/VFNMSUB231SS—Fused Negative Multiply-Subtract of Scalar Single-Precision Floating-Point Values
- VFPCLASSPD—Tests Types Of a Packed Float64 Values
- VFPCLASSPS—Tests Types Of a Packed Float32 Values
- VFPCLASSSD—Tests Types Of a Scalar Float64 Values
- VFPCLASSSS—Tests Types Of a Scalar Float32 Values
- VPGATHERDD/VPGATHERDQ—Gather Packed Dword, Packed Qword with Signed Dword Indices
- VPGATHERQD/VPGATHERQQ—Gather Packed Dword, Packed Qword with Signed Qword Indices
- VGATHERDPS/VGATHERDPD—Gather Packed Single, Packed Double with Signed Dword
- VGATHERQPS/VGATHERQPD—Gather Packed Single, Packed Double with Signed Qword Indices
- VGETEXPPD—Convert Exponents of Packed DP FP Values to DP FP Values
- VGETEXPPS—Convert Exponents of Packed SP FP Values to SP FP Values
- VGETEXPSD—Convert Exponents of Scalar DP FP Values to DP FP Value
- VGETEXPSS—Convert Exponents of Scalar SP FP Values to SP FP Value
- VGETMANTPD—Extract Float64 Vector of Normalized Mantissas from Float64 Vector
- VGETMANTPS—Extract Float32 Vector of Normalized Mantissas from Float32 Vector
- VGETMANTSD—Extract Float64 of Normalized Mantissas from Float64 Scalar
- VGETMANTSS—Extract Float32 Vector of Normalized Mantissa from Float32 Vector
- VINSERTF128/VINSERTF32x4/VINSERTF64x2/VINSERTF32x8/VINSERTF64x4—Insert Packed Floating-Point Values
- VINSERTI128/VINSERTI32x4/VINSERTI64x2/VINSERTI32x8/VINSERTI64x4—Insert Packed Integer Values
- INSERTPS—Insert Scalar Single-Precision Floating-Point Value
- MAXPD—Maximum of Packed Double-Precision Floating-Point Values
- MAXPS—Maximum of Packed Single-Precision Floating-Point Values
- MAXSD—Return Maximum Scalar Double-Precision Floating-Point Value
- MAXSS—Return Maximum Scalar Single-Precision Floating-Point Value
- MINPD—Minimum of Packed Double-Precision Floating-Point Values
- MINPS—Minimum of Packed Single-Precision Floating-Point Values
- MINSD—Return Minimum Scalar Double-Precision Floating-Point Value
- MINSS—Return Minimum Scalar Single-Precision Floating-Point Value
- MOVAPD—Move Aligned Packed Double-Precision Floating-Point Values
- MOVAPS—Move Aligned Packed Single-Precision Floating-Point Values
- MOVD/MOVQ—Move Doubleword and Quadword
- MOVQ—Move Quadword
- MOVDDUP—Replicate Double FP Values
- MOVDQA,VMOVDQA32/64—Move Aligned Packed Integer Values
- MOVDQU,VMOVDQU8/16/32/64—Move Unaligned Packed Integer Values
- MOVHLPS—Move Packed Single-Precision Floating-Point Values High to Low
- MOVHPD—Move High Packed Double-Precision Floating-Point Value
- MOVHPS—Move High Packed Single-Precision Floating-Point Values
- MOVLHPS—Move Packed Single-Precision Floating-Point Values Low to High
- MOVLPD—Move Low Packed Double-Precision Floating-Point Value
- MOVLPS—Move Low Packed Single-Precision Floating-Point Values
- MOVNTDQA—Load Double Quadword Non-Temporal Aligned Hint
- MOVNTDQ—Store Packed Integers Using Non-Temporal Hint
- MOVNTPD—Store Packed Double-Precision Floating-Point Values Using Non-Temporal Hint
- MOVNTPS—Store Packed Single-Precision Floating-Point Values Using Non-Temporal Hint
- MOVSD—Move or Merge Scalar Double-Precision Floating-Point Value
- MOVSHDUP—Replicate Single FP Values
- MOVSLDUP—Replicate Single FP Values
- MOVSS—Move or Merge Scalar Single-Precision Floating-Point Value
- MOVUPD—Move Unaligned Packed Double-Precision Floating-Point Values
- MOVUPS—Move Unaligned Packed Single-Precision Floating-Point Values
- PSADBW—Compute Sum of Absolute Differences
- MULPD—Multiply Packed Double-Precision Floating-Point Values
- MULPS—Multiply Packed Single-Precision Floating-Point Values
- MULSD—Multiply Scalar Double-Precision Floating-Point Value
- MULSS—Multiply Scalar Single-Precision Floating-Point Values
- ORPD—Bitwise Logical OR of Packed Double Precision Floating-Point Values
- ORPS—Bitwise Logical OR of Packed Single Precision Floating-Point Values
- PABSB/PABSW/PABSD/PABSQ—Packed Absolute Value
- PACKSSWB/PACKSSDW—Pack with Signed Saturation
- PACKUSDW—Pack with Unsigned Saturation
- PACKUSWB—Pack with Unsigned Saturation
- PADDB/PADDW/PADDD/PADDQ—Add Packed Integers
- PADDSB/PADDSW—Add Packed Signed Integers with Signed Saturation
- PADDUSB/PADDUSW—Add Packed Unsigned Integers with Unsigned Saturation
- PALIGNR—Byte Align
- PAND—Logical AND
- PANDN—Logical AND NOT
- PAVGB/PAVGW—Average Packed Integers
- VPBROADCASTM—Broadcast Mask to Vector Register
- PCMPEQB/PCMPEQW/PCMPEQD/PCMPEQQ—Compare Packed Integers for Equality
- PCMPGTB/PCMPGTW/PCMPGTD/PCMPGTQ—Compare Packed Integers for Greater Than
- VPCMPB/VPCMPUB—Compare Packed Byte Values Into Mask
- VPCMPD/VPCMPUD—Compare Packed Integer Values into Mask
- VPCMPQ/VPCMPUQ—Compare Packed Integer Values into Mask
- VPCMPW/VPCMPUW—Compare Packed Word Values Into Mask
- VPCOMPRESSD—Store Sparse Packed Doubleword Integer Values into Dense Memory/Register
- VPCOMPRESSQ—Store Sparse Packed Quadword Integer Values into Dense Memory/Register
- VPCONFLICTD/Q—Detect Conflicts Within a Vector of Packed Dword/Qword Values into Dense Memory/ Register
- VPERMB—Permute Packed Bytes Elements
- VPERMD/VPERMW—Permute Packed Doublewords/Words Elements
- VPERMI2B—Full Permute of Bytes From Two Tables Overwriting the Index
- VPERMI2W/D/Q/PS/PD—Full Permute From Two Tables Overwriting the Index
- VPERMT2B—Full Permute of Bytes From Two Tables Overwriting a Table
- See Exceptions Type E4NF.nb.
- VPERMT2W/D/Q/PS/PD—Full Permute from Two Tables Overwriting one Table
- VPERMILPD—Permute In-Lane of Pairs of Double-Precision Floating-Point Values
- VPERMILPS—Permute In-Lane of Quadruples of Single-Precision Floating-Point Values
- VPERMPD—Permute Double-Precision Floating-Point Elements
- VPERMPS—Permute Single-Precision Floating-Point Elements
- VPERMQ—Qwords Element Permutation
- VPEXPANDD—Load Sparse Packed Doubleword Integer Values from Dense Memory / Register
- VPEXPANDQ—Load Sparse Packed Quadword Integer Values from Dense Memory / Register
- PEXTRB/PEXTRW/PEXTRD/PEXTRQ—Extract Integer
- VPLZCNTD/Q—Count the Number of Leading Zero Bits for Packed Dword, Packed Qword Values
- PMADDUBSW—Multiply and Add Packed Integers
- PMADDWD—Multiply and Add Packed Integers
- PINSRB/PINSRW/PINSRD/PINSRQ—Insert Integer
- VPMADD52LUQ—Packed Multiply of Unsigned 52-bit Integers and Add the Low 52-bit Products to Qword Accumulators
- VPMADD52HUQ—Packed Multiply of Unsigned 52-bit Unsigned Integers and Add High 52-bit Products to 64-bit Accumulators’
- PMAXSB/PMAXSW/PMAXSD/PMAXSQ—Maximum of Packed Signed Integers
- PMAXUB/PMAXUW—Maximum of Packed Unsigned Integers
- PMAXUD/PMAXUQ—Maximum of Packed Unsigned Integers
- PMINSB/PMINSW—Minimum of Packed Signed Integers
- PMINSD/PMINSQ—Minimum of Packed Signed Integers
- PMINUB/PMINUW—Minimum of Packed Unsigned Integers
- PMINUD/PMINUQ—Minimum of Packed Unsigned Integers
- VPMOVM2B/VPMOVM2W/VPMOVM2D/VPMOVM2Q—Convert a Mask Register to a Vector Register
- VPMOVB2M/VPMOVW2M/VPMOVD2M/VPMOVQ2M—Convert a Vector Register to a Mask
- VPMOVQB/VPMOVSQB/VPMOVUSQB—Down Convert QWord to Byte
- VPMOVQW/VPMOVSQW/VPMOVUSQW—Down Convert QWord to Word
- VPMOVQD/VPMOVSQD/VPMOVUSQD—Down Convert QWord to DWord
- VPMOVDB/VPMOVSDB/VPMOVUSDB—Down Convert DWord to Byte
- VPMOVDW/VPMOVSDW/VPMOVUSDW—Down Convert DWord to Word
- VPMOVWB/VPMOVSWB/VPMOVUSWB—Down Convert Word to Byte
- PMOVSX—Packed Move with Sign Extend
- PMOVZX—Packed Move with Zero Extend
- PMULDQ—Multiply Packed Doubleword Integers
- PMULHRSW—Multiply Packed Unsigned Integers with Round and Scale
- PMULHUW—Multiply Packed Unsigned Integers and Store High Result
- PMULHW—Multiply Packed Integers and Store High Result
- PMULLD/PMULLQ—Multiply Packed Integers and Store Low Result
- PMULLW—Multiply Packed Integers and Store Low Result
- VPMULTISHIFTQB – Select Packed Unaligned Bytes from Quadword Sources
- PMULUDQ—Multiply Packed Unsigned Doubleword Integers
- POR—Bitwise Logical Or
- PROLD/PROLVD/PROLQ/PROLVQ—Bit Rotate Left
- PRORD/PRORVD/PRORQ/PRORVQ—Bit Rotate Right
- VPSCATTERDD/VPSCATTERDQ/VPSCATTERQD/VPSCATTERQQ—Scatter Packed Dword, Packed Qword with Signed Dword, Signed Qword Indices
- PSHUFB—Packed Shuffle Bytes
- PSHUFHW—Shuffle Packed High Words
- PSHUFLW—Shuffle Packed Low Words
- PSHUFD—Shuffle Packed Doublewords
- PSLLDQ—Byte Shift Left
- PSLLW/PSLLD/PSLLQ—Bit Shift Left
- PSRAW/PSRAD/PSRAQ—Bit Shift Arithmetic Right
- PSRLDQ—Byte Shift Right
- PSRLW/PSRLD/PSRLQ—Shift Packed Data Right Logical
- VPSLLVW/VPSLLVD/VPSLLVQ—Variable Bit Shift Left Logical
- VPSRLVW/VPSRLVD/VPSRLVQ—Variable Bit Shift Right Logical
- PSUBB/PSUBW/PSUBD/PSUBQ—Packed Integer Subtract
- PSUBSB/PSUBSW—Subtract Packed Signed Integers with Signed Saturation
- PSUBUSB/PSUBUSW—Subtract Packed Unsigned Integers with Unsigned Saturation
- VPTESTNMB/W/D/Q—Logical NAND and Set
- PUNPCKHBW/PUNPCKHWD/PUNPCKHDQ/PUNPCKHQDQ—Unpack High Data
- PUNPCKLBW/PUNPCKLWD/PUNPCKLDQ/PUNPCKLQDQ—Unpack Low Data
- SHUFF32x4/SHUFF64x2/SHUFI32x4/SHUFI64x2—Shuffle Packed Values at 128-bit Granularity
- SHUFPD—Packed Interleave Shuffle of Pairs of Double-Precision Floating-Point Values
- SHUFPS—Packed Interleave Shuffle of Quadruplets of Single-Precision Floating-Point Values
- SQRTPD—Square Root of Double-Precision Floating-Point Values
- SQRTPS—Square Root of Single-Precision Floating-Point Values
- SQRTSD—Compute Square Root of Scalar Double-Precision Floating-Point Value
- SQRTSS—Compute Square Root of Scalar Single-Precision Value
- VPTERNLOGD/VPTERNLOGQ—Bitwise Ternary Logic
- VPTESTMB/VPTESTMW/VPTESTMD/VPTESTMQ—Logical AND and Set Mask
- VPSRAVW/VPSRAVD/VPSRAVQ—Variable Bit Shift Right Arithmetic
- PXOR/PXORD/PXORQ—Exclusive Or
- VRANGEPD—Range Restriction Calculation For Packed Pairs of Float64 Values
- VRANGEPS—Range Restriction Calculation For Packed Pairs of Float32 Values
- VRANGESD—Range Restriction Calculation From a pair of Scalar Float64 Values
- VRANGESS—Range Restriction Calculation From a Pair of Scalar Float32 Values
- VRCP14PD—Compute Approximate Reciprocals of Packed Float64 Values
- VRCP14SD—Compute Approximate Reciprocal of Scalar Float64 Value
- VRCP14PS—Compute Approximate Reciprocals of Packed Float32 Values
- VRCP14SS—Compute Approximate Reciprocal of Scalar Float32 Value
- VREDUCEPD—Perform Reduction Transformation on Packed Float64 Values
- VREDUCESD—Perform a Reduction Transformation on a Scalar Float64 Value
- VREDUCEPS—Perform Reduction Transformation on Packed Float32 Values
- VREDUCESS—Perform a Reduction Transformation on a Scalar Float32 Value
- VRNDSCALEPD—Round Packed Float64 Values To Include A Given Number Of Fraction Bits
- VRNDSCALESD—Round Scalar Float64 Value To Include A Given Number Of Fraction Bits
- VRNDSCALEPS—Round Packed Float32 Values To Include A Given Number Of Fraction Bits
- VRNDSCALESS—Round Scalar Float32 Value To Include A Given Number Of Fraction Bits
- VRSQRT14PD—Compute Approximate Reciprocals of Square Roots of Packed Float64 Values
- VRSQRT14SD—Compute Approximate Reciprocal of Square Root of Scalar Float64 Value
- VRSQRT14PS—Compute Approximate Reciprocals of Square Roots of Packed Float32 Values
- VRSQRT14SS—Compute Approximate Reciprocal of Square Root of Scalar Float32 Value
- VSCALEFPD—Scale Packed Float64 Values With Float64 Values
- VSCALEFSD—Scale Scalar Float64 Values With Float64 Values
- VSCALEFPS—Scale Packed Float32 Values With Float32 Values
- VSCALEFSS—Scale Scalar Float32 Value With Float32 Value
- VSCATTERDPS/VSCATTERDPD/VSCATTERQPS/VSCATTERQPD—Scatter Packed Single, Packed Double with Signed Dword and Qword Indices
- SUBPD—Subtract Packed Double-Precision Floating-Point Values
- SUBPS—Subtract Packed Single-Precision Floating-Point Values
- SUBSD—Subtract Scalar Double-Precision Floating-Point Value
- SUBSS—Subtract Scalar Single-Precision Floating-Point Value
- UCOMISD—Unordered Compare Scalar Double-Precision Floating-Point Values and Set EFLAGS
- UCOMISS—Unordered Compare Scalar Single-Precision Floating-Point Values and Set EFLAGS
- UNPCKHPD—Unpack and Interleave High Packed Double-Precision Floating-Point Values
- UNPCKHPS—Unpack and Interleave High Packed Single-Precision Floating-Point Values
- UNPCKLPD—Unpack and Interleave Low Packed Double-Precision Floating-Point Values
- UNPCKLPS—Unpack and Interleave Low Packed Single-Precision Floating-Point Values
- XORPD—Bitwise Logical XOR of Packed Double Precision Floating-Point Values
- XORPS—Bitwise Logical XOR of Packed Single Precision Floating-Point Values
- Chapter 6 Instruction Set Reference - OpMASK
- 6.1 MASK INSTRUCTIONS
- KADDW/KADDB/KADDQ/KADDD—ADD Two Masks
- KANDW/KANDB/KANDQ/KANDD—Bitwise Logical AND Masks
- KANDNW/KANDNB/KANDNQ/KANDND—Bitwise Logical AND NOT Masks
- KMOVW/KMOVB/KMOVQ/KMOVD—Move from and to Mask Registers
- KUNPCKBW/KUNPCKWD/KUNPCKDQ—Unpack for Mask Registers
- KNOTW/KNOTB/KNOTQ/KNOTD—NOT Mask Register
- KORW/KORB/KORQ/KORD—Bitwise Logical OR Masks
- KORTESTW/KORTESTB/KORTESTQ/KORTESTD—OR Masks And Set Flags
- KSHIFTLW/KSHIFTLB/KSHIFTLQ/KSHIFTLD—Shift Left Mask Registers
- KSHIFTRW/KSHIFTRB/KSHIFTRQ/KSHIFTRD—Shift Right Mask Registers
- KXNORW/KXNORB/KXNORQ/KXNORD—Bitwise Logical XNOR Masks
- KTESTW/KTESTB/KTESTQ/KTESTD—Packed Bit Test Masks and Set Flags
- KXORW/KXORB/KXORQ/KXORD—Bitwise Logical XOR Masks
- 6.1 MASK INSTRUCTIONS
- Chapter 7 Additional 512-bit Instruction Extensions
- 7.1 Detection of 512-bit Instruction Extensions
- 7.2 Instruction SET Reference
- VEXP2PD—Approximation to the Exponential 2^x of Packed Double-Precision Floating-Point Values with Less Than 2^-23 Relative Error
- VEXP2PS—Approximation to the Exponential 2^x of Packed Single-Precision Floating-Point Values with Less Than 2^-23 Relative Error
- VRCP28PD—Approximation to the Reciprocal of Packed Double-Precision Floating-Point Values with Less Than 2^-28 Relative Error
- VRCP28SD—Approximation to the Reciprocal of Scalar Double-Precision Floating-Point Value with Less Than 2^-28 Relative Error
- VRCP28PS—Approximation to the Reciprocal of Packed Single-Precision Floating-Point Values with Less Than 2^-28 Relative Error
- VRCP28SS—Approximation to the Reciprocal of Scalar Single-Precision Floating-Point Value with Less Than 2^-28 Relative Error
- VRSQRT28PD—Approximation to the Reciprocal Square Root of Packed Double-Precision Floating-Point Values with Less Than 2^-28 Relative Error
- VRSQRT28SD—Approximation to the Reciprocal Square Root of Scalar Double-Precision Floating-Point Value with Less Than 2^-28 Relative Error
- VRSQRT28PS—Approximation to the Reciprocal Square Root of Packed Single-Precision Floating-Point Values with Less Than 2^-28 Relative Error
- VRSQRT28SS—Approximation to the Reciprocal Square Root of Scalar Single-Precision Floating- Point Value with Less Than 2^-28 Relative Error
- VGATHERPF0DPS/VGATHERPF0QPS/VGATHERPF0DPD/VGATHERPF0QPD—Sparse Prefetch Packed SP/DP Data Values with Signed Dword, Signed Qword Indices Using T0 Hint
- VGATHERPF1DPS/VGATHERPF1QPS/VGATHERPF1DPD/VGATHERPF1QPD—Sparse Prefetch Packed SP/DP Data Values with Signed Dword, Signed Qword Indices Using T1 Hint
- VSCATTERPF0DPS/VSCATTERPF0QPS/VSCATTERPF0DPD/VSCATTERPF0QPD—Sparse Prefetch Packed SP/DP Data Values with Signed Dword, Signed Qword Indices Using T0 Hint with Intent to Write
- VSCATTERPF1DPS/VSCATTERPF1QPS/VSCATTERPF1DPD/VSCATTERPF1QPD—Sparse Prefetch Packed SP/DP Data Values with Signed Dword, Signed Qword Indices Using T1 Hint with Intent to Write
- Chapter 8 Intel® SHA Extensions
- 8.1 Overview
- 8.2 Detection of Intel SHA Extensions
- 8.3 SHA Extensions Reference
- SHA1RNDS4—Perform Four Rounds of SHA1 Operation
- SHA1NEXTE—Calculate SHA1 State Variable E after Four Rounds
- SHA1MSG1—Perform an Intermediate Calculation for the Next Four SHA1 Message Dwords
- SHA1MSG2—Perform a Final Calculation for the Next Four SHA1 Message Dwords
- SHA256RNDS2—Perform Two Rounds of SHA256 Operation
- SHA256MSG1—Perform an Intermediate Calculation for the Next Four SHA256 Message Dwords
- SHA256MSG2—Perform a Final Calculation for the Next Four SHA256 Message Dwords
- Chapter 9 Additional New Instructions
- Chapter 10 Memory Instructions