AMD64 Architecture Programmer’s Manual, Volume 4: 128 Bit And 256 Media Instructions Apm4

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 1047

DownloadAMD64 Architecture Programmer’s Manual, Volume 4: 128-Bit And 256-Bit Media Instructions Amd64-apm4-128-bit-and-256-bit-media-instructions
Open PDF In BrowserView PDF
AMD64 Technology
AMD64 Architecture
Programmer’s Manual
Volume 4:
128-Bit and 256-Bit
Media Instructions

Publication No.

Revision

Date

26568

3.22

May 2018

Advanced Micro Devices

© 2013 – 2018 Advanced Micro Devices Inc. All rights reserved.

The information contained herein is for informational purposes only, and is subject to change without notice.
While every precaution has been taken in the preparation of this document, it may contain technical
inaccuracies, omissions and typographical errors, and AMD is under no obligation to update or otherwise
correct this information. Advanced Micro Devices, Inc. makes no representations or warranties with respect to
the accuracy or completeness of the contents of this document, and assumes no liability of any kind, including
the implied warranties of noninfringement, merchantability or fitness for particular purposes, with respect to the
operation or use of AMD hardware, software or other products described herein. No license, including implied
or arising by estoppel, to any intellectual property rights is granted by this document. Terms and limitations
applicable to the purchase or use of AMD’s products are as set forth in a signed agreement between the parties
or in AMD's Standard Terms and Conditions of Sale.

Trademarks
AMD, the AMD Arrow logo, and combinations thereof, and 3DNow! are trademarks of Advanced
Micro Devices, Inc. Other product names used in this publication are for identification purposes only
and may be trademarks of their respective companies.
MMX is a trademark and Pentium is a registered trademark of Intel Corporation.

26568—Rev. 3.22—May 2018

AMD64 Technology

Contents
Revision History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxvii
About This Book. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxvii
Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxvii
Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxvii
Conventions and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxviii
Related Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xl

1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1
1.1
1.2
1.3

1.4
1.5

2

Syntax and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Extended Instruction Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Immediate Byte Usage Unique to the SSE instructions . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Instruction Format Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
VSIB Addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.1 Effective Address Array Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.2 Notational Conventions Related to VSIB Addressing Mode . . . . . . . . . . . . . . . . . . . . . . 8
1.3.3 Memory Ordering and Exception Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Enabling SSE Instruction Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
String Compare Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5.1 Source Data Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5.2 Comparison Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.5.3 Comparison Summary Bit Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5.4 Intermediate Result Post-processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.5.5 Output Option Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.5.6 Effect on Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Instruction Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21
ADDPD
VADDPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
ADDPS
VADDPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
ADDSD
VADDSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
ADDSS
VADDSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
ADDSUBPD
VADDSUBPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
ADDSUBPS
VADDSUBPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
AESDEC
VAESDEC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
AESDECLAST
VAESDECLAST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
AESENC

iii

AMD64 Technology

26568—Rev. 3.22—May 2018

VAESENC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
AESENCLAST
VAESENCLAST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
AESIMC
VAESIMC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
AESKEYGENASSIST
VAESKEYGENASSIST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
ANDNPD
VANDNPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
ANDNPS
VANDNPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
ANDPD
VANDPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
ANDPS
VANDPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
BLENDPD
VBLENDPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
BLENDPS
VBLENDPS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
BLENDVPD
VBLENDVPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
BLENDVPS
VBLENDVPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
CMPPD
VCMPPD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
CMPPS
VCMPPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
CMPSD
VCMPSD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
CMPSS
VCMPSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
COMISD
VCOMISD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
COMISS
VCOMISS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
CVTDQ2PD
VCVTDQ2PD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
CVTDQ2PS
VCVTDQ2PS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
CVTPD2DQ
VCVTPD2DQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
CVTPD2PS
VCVTPD2PS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
CVTPS2DQ
VCVTPS2DQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
CVTPS2PD
VCVTPS2PD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

iv

26568—Rev. 3.22—May 2018

AMD64 Technology

CVTSD2SI
VCVTSD2SI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
CVTSD2SS
VCVTSD2SS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
CVTSI2SD
VCVTSI2SD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
CVTSI2SS
VCVTSI2SS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
CVTSS2SD
VCVTSS2SD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
CVTSS2SI
VCVTSS2SI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
CVTTPD2DQ
VCVTTPD2DQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
CVTTPS2DQ
VCVTTPS2DQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
CVTTSD2SI
VCVTTSD2SI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
CVTTSS2SI
VCVTTSS2SI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
DIVPD
VDIVPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
DIVPS
VDIVPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
DIVSD
VDIVSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
DIVSS
VDIVSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
DPPD
VDPPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
DPPS
VDPPS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
EXTRACTPS
VEXTRACTPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
EXTRQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
HADDPD
VHADDPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
HADDPS
VHADDPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
HSUBPD
VHSUBPD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
HSUBPS
VHSUBPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
INSERTPS
VINSERTPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
INSERTQ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
LDDQU

v

AMD64 Technology

26568—Rev. 3.22—May 2018

VLDDQU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
LDMXCSR
VLDMXCSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
MASKMOVDQU
VMASKMOVDQU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
MAXPD
VMAXPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
MAXPS
VMAXPS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
MAXSD
VMAXSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
MAXSS
VMAXSS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
MINPD
VMINPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
MINPS
VMINPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
MINSD
VMINSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
MINSS
VMINSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
MOVAPD
VMOVAPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
MOVAPS
VMOVAPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
MOVD
VMOVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
MOVDDUP
VMOVDDUP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
MOVDQA
VMOVDQA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
MOVDQU
VMOVDQU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
MOVHLPS
VMOVHLPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
MOVHPD
VMOVHPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
MOVHPS
VMOVHPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
MOVLHPS
VMOVLHPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
MOVLPD
VMOVLPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
MOVLPS
VMOVLPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
MOVMSKPD
VMOVMSKPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

vi

26568—Rev. 3.22—May 2018

AMD64 Technology

MOVMSKPS
VMOVMSKPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
MOVNTDQ
VMOVNTDQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
MOVNTDQA
VMOVNTDQA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
MOVNTPD
VMOVNTPD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
MOVNTPS
VMOVNTPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
MOVNTSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
MOVNTSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
MOVQ
VMOVQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
MOVSD
VMOVSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
MOVSHDUP
VMOVSHDUP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
MOVSLDUP
VMOVSLDUP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
MOVSS
VMOVSS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
MOVUPD
VMOVUPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
MOVUPS
VMOVUPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
MPSADBW
VMPSADBW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
MULPD
VMULPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
MULPS
VMULPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
MULSD
VMULSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
MULSS
VMULSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
ORPD
VORPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
ORPS
VORPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
PABSB
VPABSB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
PABSD
VPABSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
PABSW
VPABSW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
PACKSSDW

vii

AMD64 Technology

26568—Rev. 3.22—May 2018

VPACKSSDW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
PACKSSWB
VPACKSSWB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
PACKUSDW
VPACKUSDW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
PACKUSWB
VPACKUSWB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
PADDB
VPADDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
PADDD
VPADDD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
PADDQ
VPADDQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
PADDSB
VPADDSB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
PADDSW
VPADDSW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
PADDUSB
VPADDUSB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
PADDUSW
VPADDUSW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
PADDW
VPADDW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
PALIGNR
VPALIGNR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
PAND
VPAND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
PANDN
VPANDN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
PAVGB
VPAVGB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
PAVGW
VPAVGW. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
PBLENDVB
VPBLENDVB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
PBLENDW
VPBLENDW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
PCLMULQDQ
VPCLMULQDQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
PCMPEQB
VPCMPEQB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
PCMPEQD
VPCMPEQD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
PCMPEQQ
VPCMPEQQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
PCMPEQW
VPCMPEQW. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305

viii

26568—Rev. 3.22—May 2018

AMD64 Technology

PCMPESTRI
VPCMPESTRI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
PCMPESTRM
VPCMPESTRM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
PCMPGTB
VPCMPGTB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
PCMPGTD
VPCMPGTD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
PCMPGTQ
VPCMPGTQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
PCMPGTW
VPCMPGTW. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
PCMPISTRI
VPCMPISTRI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
PCMPISTRM
VPCMPISTRM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
PEXTRB
VPEXTRB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
PEXTRD
VPEXTRD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
PEXTRQ
VPEXTRQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
PEXTRW
VPEXTRW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
PHADDD
VPHADDD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
PHADDSW
VPHADDSW. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
PHADDW
VPHADDW. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
PHMINPOSUW
VPHMINPOSUW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
PHSUBD
VPHSUBD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
PHSUBSW
VPHSUBSW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
PHSUBW
VPHSUBW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
PINSRB
VPINSRB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
PINSRD
VPINSRD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
PINSRQ
VPINSRQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
PINSRW
VPINSRW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
PMADDUBSW

ix

AMD64 Technology

26568—Rev. 3.22—May 2018

VPMADDUBSW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
PMADDWD
VPMADDWD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
PMAXSB
VPMAXSB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
PMAXSD
VPMAXSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
PMAXSW
VPMAXSW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
PMAXUB
VPMAXUB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
PMAXUD
VPMAXUD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
PMAXUW
VPMAXUW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
PMINSB
VPMINSB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
PMINSD
VPMINSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
PMINSW
VPMINSW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
PMINUB
VPMINUB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
PMINUD
VPMINUD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
PMINUW
VPMINUW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
PMOVMSKB
VPMOVMSKB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391
PMOVSXBD
VPMOVSXBD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393
PMOVSXBQ
VPMOVSXBQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
PMOVSXBW
VPMOVSXBW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
PMOVSXDQ
VPMOVSXDQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
PMOVSXWD
VPMOVSXWD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
PMOVSXWQ
VPMOVSXWQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
PMOVZXBD
VPMOVZXBD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
PMOVZXBQ
VPMOVZXBQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
PMOVZXBW
VPMOVZXBW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409

x

26568—Rev. 3.22—May 2018

AMD64 Technology

PMOVZXDQ
VPMOVZXDQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
PMOVZXWD
VPMOVZXWD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
PMOVZXWQ
VPMOVZXWQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
PMULDQ
VPMULDQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
PMULHRSW
VPMULHRSW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
PMULHUW
VPMULHUW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
PMULHW
VPMULHW. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
PMULLD
VPMULLD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
PMULLW
VPMULLW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
PMULUDQ
VPMULUDQ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
POR
VPOR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
PSADBW
VPSADBW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433
PSHUFB
VPSHUFB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
PSHUFD
VPSHUFD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
PSHUFHW
VPSHUFHW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440
PSHUFLW
VPSHUFLW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
PSIGNB
VPSIGNB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446
PSIGND
VPSIGND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448
PSIGNW
VPSIGNW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450
PSLLD
VPSLLD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452
PSLLDQ
VPSLLDQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455
PSLLQ
VPSLLQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457
PSLLW
VPSLLW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460
PSRAD

xi

AMD64 Technology

26568—Rev. 3.22—May 2018

VPSRAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463
PSRAW
VPSRAW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466
PSRLD
VPSRLD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469
PSRLDQ
VPSRLDQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472
PSRLQ
VPSRLQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474
PSRLW
VPSRLW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477
PSUBB
VPSUBB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480
PSUBD
VPSUBD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482
PSUBQ
VPSUBQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484
PSUBSB
VPSUBSB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486
PSUBSW
VPSUBSW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 488
PSUBUSB
VPSUBUSB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490
PSUBUSW
VPSUBUSW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492
PSUBW
VPSUBW. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494
PTEST
VPTEST. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496
PUNPCKHBW
VPUNPCKHBW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498
PUNPCKHDQ
VPUNPCKHDQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501
PUNPCKHQDQ
VPUNPCKHQDQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504
PUNPCKHWD
VPUNPCKHWD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507
PUNPCKLBW
VPUNPCKLBW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 510
PUNPCKLDQ
VPUNPCKLDQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513
PUNPCKLQDQ
VPUNPCKLQDQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516
PUNPCKLWD
VPUNPCKLWD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519
PXOR
VPXOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522

xii

26568—Rev. 3.22—May 2018

AMD64 Technology

RCPPS
VRCPPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524
RCPSS
VRCPSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526
ROUNDPD
VROUNDPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528
ROUNDPS
VROUNDPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531
ROUNDSD
VROUNDSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534
ROUNDSS
VROUNDSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537
RSQRTPS
VRSQRTPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 540
RSQRTSS
VRSQRTSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542
SHA1RNDS4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544
SHA1NEXTE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546
SHA1MSG1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 548
SHA1MSG2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 550
SHA256RNDS2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 552
SHA256MSG1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554
SHA256MSG2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556
SHUFPD
VSHUFPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 558
SHUFPS
VSHUFPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561
SQRTPD
VSQRTPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564
SQRTPS
VSQRTPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566
SQRTSD
VSQRTSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568
SQRTSS
VSQRTSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570
STMXCSR
VSTMXCSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572
SUBPD
VSUBPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574
SUBPS
VSUBPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576
SUBSD
VSUBSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 578
SUBSS
VSUBSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 580
UCOMISD
VUCOMISD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582

xiii

AMD64 Technology

26568—Rev. 3.22—May 2018

UCOMISS
VUCOMISS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584
UNPCKHPD
VUNPCKHPD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586
UNPCKHPS
VUNPCKHPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 588
UNPCKLPD
VUNPCKLPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 590
UNPCKLPS
VUNPCKLPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592
VBROADCASTF128 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594
VBROADCASTI128 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596
VBROADCASTSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598
VBROADCASTSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 600
VCVTPH2PS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602
VCVTPS2PH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605
VEXTRACTF128 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609
VEXTRACTI128. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611
VFMADDPD
VFMADD132PD
VFMADD213PD
VFMADD231PD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613
VFMADDPS
VFMADD132PS
VFMADD213PS
VFMADD231PS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 616
VFMADDSD
VFMADD132SD
VFMADD213SD
VFMADD231SD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 619
VFMADDSS
VFMADD132SS
VFMADD213SS
VFMADD231SS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 622
VFMADDSUBPD
VFMADDSUB132PD
VFMADDSUB213PD
VFMADDSUB231PD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625
VFMADDSUBPS
VFMADDSUB132PS
VFMADDSUB213PS
VFMADDSUB231PS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 628
VFMSUBADDPD
VFMSUBADD132PD
VFMSUBADD213PD
VFMSUBADD231PD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631
VFMSUBADDPS

xiv

26568—Rev. 3.22—May 2018

AMD64 Technology

VFMSUBADD132PS
VFMSUBADD213PS
VFMSUBADD231PS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634
VFMSUBPD
VFMSUB132PD
VFMSUB213PD
VFMSUB231PD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637
VFMSUBPS
VFMSUB132PS
VFMSUB213PS
VFMSUB231PS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 640
VFMSUBSD
VFMSUB132SD
VFMSUB213SD
VFMSUB231SD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643
VFMSUBSS
VFMSUB132SS
VFMSUB213SS
VFMSUB231SS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 646
VFNMADDPD
VFNMADD132PD
VFNMADD213PD
VFNMADD231PD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 649
VFNMADDPS
VFNMADD132PS
VFNMADD213PS
VFNMADD231PS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 652
VFNMADDSD
VFNMADD132SD
VFNMADD213SD
VFNMADD231SD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655
VFNMADDSS
VFNMADD132SS
VFNMADD213SS
VFNMADD231SS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 658
VFNMSUBPD
VFNMSUB132PD
VFNMSUB213PD
VFNMSUB231PD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 661
VFNMSUBPS
VFNMSUB132PS
VFNMSUB213PS
VFNMSUB231PS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664
VFNMSUBSD
VFNMSUB132SD
VFNMSUB213SD
VFNMSUB231SD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 667

xv

AMD64 Technology

26568—Rev. 3.22—May 2018

VFNMSUBSS
VFNMSUB132SS
VFNMSUB213SS
VFNMSUB231SS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 670
VFRCZPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673
VFRCZPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675
VFRCZSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 677
VFRCZSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 679
VGATHERDPD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 681
VGATHERDPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 683
VGATHERQPD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685
VGATHERQPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 687
VINSERTF128 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 689
VINSERTI128 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 691
VMASKMOVPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693
VMASKMOVPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695
VPBLENDD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697
VPBROADCASTB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 699
VPBROADCASTD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 701
VPBROADCASTQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703
VPBROADCASTW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705
VPCMOV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707
VPCOMB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 709
VPCOMD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 711
VPCOMQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713
VPCOMUB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715
VPCOMUD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 717
VPCOMUQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 719
VPCOMUW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 721
VPCOMW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723
VPERM2F128 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725
VPERM2I128 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 727
VPERMD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 729
VPERMIL2PD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 731
VPERMIL2PS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735
VPERMILPD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 739
VPERMILPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 742
VPERMPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 746
VPERMPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 748
VPERMQ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 750
VPGATHERDD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 752
VPGATHERDQ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 754
VPGATHERQD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 756
VPGATHERQQ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 758
VPHADDBD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 760
VPHADDBQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 762
VPHADDBW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764

xvi

26568—Rev. 3.22—May 2018

AMD64 Technology

VPHADDDQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 766
VPHADDUBD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 768
VPHADDUBQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 770
VPHADDUBW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 772
VPHADDUDQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774
VPHADDUWD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 776
VPHADDUWQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 778
VPHADDWD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 780
VPHADDWQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 782
VPHSUBBW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 784
VPHSUBDQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 786
VPHSUBWD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 788
VPMACSDD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 790
VPMACSDQH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 792
VPMACSDQL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 794
VPMACSSDD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 796
VPMACSSDQH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 798
VPMACSSDQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 800
VPMACSSWD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 802
VPMACSSWW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 804
VPMACSWD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 806
VPMACSWW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 808
VPMADCSSWD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 810
VPMADCSWD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 812
VPMASKMOVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 814
VPMASKMOVQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 816
VPPERM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 818
VPROTB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 820
VPROTD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 822
VPROTQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 824
VPROTW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 826
VPSHAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 828
VPSHAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 830
VPSHAQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 832
VPSHAW. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 834
VPSHLB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 836
VPSHLD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 838
VPSHLQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 840
VPSHLW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 842
VPSLLVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 844
VPSLLVQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 846
VPSRAVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 848
VPSRLVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 850
VPSRLVQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 852
VTESTPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 854
VTESTPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 856
VZEROALL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 858

xvii

AMD64 Technology

26568—Rev. 3.22—May 2018

VZEROUPPER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 859
XGETBV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 860
XORPD
VXORPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 861
XORPS
VXORPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 863
XRSTOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 865
XRSTORS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 867
XSAVE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 869
XSAVEC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 871
XSAVEOPT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 873
XSAVES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 875
XSETBV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 877

3

Exception Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .879

Appendix A
A.1
A.2
A.3
A.4
A.5
A.6
A.7
A.8

A.9

A.10
A.11

AES Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .973
AES Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 973
Coding Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 973
AES Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 974
Algebraic Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 974
A.4.1 Multiplication in the Field GF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 975
A.4.2 Multiplication of 4x4 Matrices Over GF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 976
AES Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 976
A.5.1 Sequence of Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 978
Initializing the Sbox and InvSBox Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 979
A.6.1 Computation of SBox and InvSBox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 980
A.6.2 Initialization of InvSBox[ ] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 982
Encryption and Decryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 984
A.7.1 The Encrypt( ) and Decrypt( ) Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 984
A.7.2 Round Sequences and Key Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 985
The Cipher Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 986
A.8.1 Text to Matrix Conversion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 987
A.8.2 Cipher Transformations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 987
A.8.3 Matrix to Text Conversion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 989
The InvCipher Function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 989
A.9.1 Text to Matrix Conversion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 990
A.9.2 InvCypher Transformations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 990
A.9.3 Matrix to Text Conversion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 992
An Alternative Decryption Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 992
Computation of GFInv with Euclidean Greatest Common Divisor . . . . . . . . . . . . . . . . . . . 994

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 997

xviii

26568—Rev. 3.22—May 2018

AMD64 Technology

Figures
Figure 1-1.

Typical Descriptive Synopsis - Extended SSE Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Figure 1-2.

VSIB Byte Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Figure 1-3.

Byte-wide Character String – Memory and Register Image. . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Figure 2-1.

Typical Instruction Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

Figure 2-2.

(V)MPSADBW Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238

Figure A-1.

GFMatrix Representation of 16-byte Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 974

Figure A-2.

GFMatrix to Operand Byte Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 974

xix

AMD64 Technology

xx

26568—Rev. 3.22—May 2018

26568—Rev. 3.22—May 2018

AMD64 Technology

Tables
Table 1-1.

Three-Operand Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Table 1-2.

Four-Operand Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Table 1-3.

Source Data Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Table 1-4.

Comparison Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Table 1-5.

Post-processing Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Table 1-6.

Indexed Output Option Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Table 1-7.

Masked Output Option Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Table 1-8.

State of Affected Flags After Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Table 3-1.

Instructions By Exception Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 879

Table A-1.

SBox Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 982

Table A-2.

InvSBox Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 984

Table A-3.

Cipher Key, Round Sequence, and Round Key Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 985

xxi

AMD64 Technology

xxii

26568—Rev. 3.22—May 2018

26568—Rev. 3.22—May 2018

AMD64 Technology

Revision History
Date

Revision

Description

May 2018

3.22

Update Packed String Compare Algorithm
Fixed a number of erroneous references to double precision that
should be single precision
Separate out MOVQ from MOVD

December 2017

3.21

Clarifications to XGETBV, XRSTOR, XRSTORS, XSAVE,
XSAVEC, XSAVEOPT, XSAVES, and XSETBV instructions.

March 2017

3.20

Corrections to ROUNDPD, VROUNDPD, ROUNDPS,
VROUNDPS, ROUNDSD, VROUNDSD, ROUNDSS,
VROUNDSS, VPERMD, VPERMPD, VPERMPS, VPERMQ,
VTESTPD, VTESTPS, XGETBV, XSETBV, XSAVE, and AVX
instruction descriptions.
Added SHA1RNDS4, SHA1NEXTE, SHA1MSG1, SHA1MSG2,
SHA256RNDS2, SHA256MSG1, SHA256MSG2, XRSTOR,
XRSTORS and XSAVEC instructions.

June 2015

3.19

Corrections to the MOVLPD, PHSUBW, PHSUBSW instruction
descriptions.

October 2013

3.18

Added AVX2 Instructions.
Added “Instruction Support” subsection to each instruction
reference page that lists CPUID feature bit information in a table.

3.17

Removed all references to the CPUID specification which has
been superseded by Volume 3, Appendix E, "Obtaining
Processor Information Via the CPUID Instruction."
Corrected exceptions table for the explicitly-aligned load/store
instructions. General protection exception does not depend on
state of MXCSR.MM bit.

September
2012

3.16

Corrected REX.W bit encoding for the MOVD instruction. (See
page 186.)
Corrected L bit encoding for the VMOVQ (D6h opcode)
instruction. (See page 222.)
Corrected statement about zero extension for third encoding (11h
opcode) of MOVSS instruction. (See page 230.)

March 2012

3.15

Corrected instruction encoding for VPCOMUB, VPCOMUD,
VPCOMUQ, VPCOMUW, and VPHSUBDQ instructions. Other
minor corrections.

May 2013

xxiii

AMD64 Technology

Date

26568—Rev. 3.22—May 2018

Revision

Description

3.14

Reworked Section 1.5, "String Compare Instructions" on page 10.
Revised descriptions of the string compare instructions in
instruction reference.
Moved AES overview to Appendix A.
Clarified trap and exception behavior for elements not selected
for writing. See MASKMOVDQU VMASKMOVDQU on page 160.
Additional minor corrections and clarifications.

September 2011

3.13

Moved discussion of extended instruction encoding; VEX and
XOP prefixes to Volume 3.
Added FMA instructions. Described on the corresponding FMA4
reference page.
Moved BMI and TBM instructions to Volume 3.
Added XSAVEOPT instruction.
Corrected descriptions of VSQRTSD and VSQRTSS.

May 2011

3.12

Added F16C, BMI, and TBM instructions.

December 2010

3.11

Complete revision and reformat accommodating 128-bit and 256-bit
media instructions. Includes revised definitions of legacy SSE, SSE2,
SSE3, SSE4.1, SSE4.2, and SSSE3 instructions, as well as new
definitions of extended AES, AVX, CLMUL, FMA4, and XOP
instructions. Introduction includes supplemental information concerning
encoding of extended instructions, enhanced processor state
management provided by the XSAVE/XRSTOR instructions,
cryptographic capabilities of the AES instructions, and functionality of
extended string comparison instructions.

September 2007

3.10

Added minor clarifications and corrected typographical and formatting
errors.

July 2007

3.09

Added the following instructions: EXTRQ, INSERTQ, MOVNTSD, and
MOVNTSS.
Added misaligned exception mask (MXCSR.MM) information.
Added imm8 values with corresponding mnemonics to (V)CMPPD,
(V)CMPPS, (V)CMPSD, and (V)CMPSS.
Reworded CPUID information in condition tables.
Added minor clarifications and corrected typographical and formatting
errors.

September 2006

3.08

Made minor corrections.

December 2005

3.07

Made minor editorial and formatting changes.

December 2011

xxiv

26568—Rev. 3.22—May 2018

AMD64 Technology

Date

Revision

Description

January 2005

3.06

Added documentation on SSE3 instructions. Corrected numerous
minor factual errors and typos.

September 2003

3.05

Made numerous small factual corrections.

April 2003

3.04

Made minor corrections.

xxv

AMD64 Technology

xxvi

26568—Rev. 3.22—May 2018

26568—Rev. 3.22—May 2018

AMD64 Technology

Preface
About This Book
This book is part of a multivolume work entitled the AMD64 Architecture Programmer’s Manual.
The complete set includes the following volumes.
Title

Order No.

Volume 1: Application Programming

24592

Volume 2: System Programming

24593

Volume 3: General-Purpose and System Instructions

24594

Volume 4: 128-Bit and 256-Bit Media Instructions

26568

Volume 5: 64-Bit Media and x87 Floating-Point Instructions

26569

Audience
This volume is intended for programmers who develop application or system software.

Organization
Volumes 3, 4, and 5 describe the AMD64 instruction set in detail, providing mnemonic syntax,
instruction encoding, functions, affected flags, and possible exceptions.
The AMD64 instruction set is divided into five subsets:
•
•
•
•
•

General-purpose instructions
System instructions
Streaming SIMD Extensions (includes 128-bit and 256-bit media instructions)
64-bit media instructions (MMX™)
x87 floating-point instructions

Several instructions belong to, and are described identically in, multiple instruction subsets.
This volume describes the Streaming SIMD Extensions (SSE) instruction set which includes 128-bit
and 256-bit media instructions. SSE includes both legacy and extended forms. The index at the end
cross-references topics within this volume. For other topics relating to the AMD64 architecture, and
for information on instructions in other subsets, see the tables of contents and indexes of the other
volumes.

xxvii

AMD64 Technology

26568—Rev. 3.22—May 2018

Conventions and Definitions
The section which follows, Notational Conventions, describes notational conventions used in this
volume. The next section, Definitions, lists a number of terms used in this volume along with their
technical definitions. Some of these definitions assume knowledge of the legacy x86 architecture. See
“Related Documents” on page xl for further information about the legacy x86 architecture. Finally, the
Registers section lists the registers which are a part of the system programming model.
Notational Conventions
Section 1.1, “Syntax and Notation” on page 2 describes notation relating specifically to instruction
encoding.
#GP(0)
An instruction exception—in this example, a general-protection exception with error code of 0.
1011b
A binary value, in this example, a 4-bit value.
F0EA_0B40h
A hexadecimal value, in this example a 32-bit value. Underscore characters may be used to
improve readability.
128
Numbers without an alpha suffix are decimal unless the context indicates otherwise.
7:4
A bit range, from bit 7 to 4, inclusive. The high-order bit is shown first. Commas may be inserted
to indicate gaps.
#GP(0)
A general-protection exception (#GP) with error code of 0.
CPUID FnXXXX_XXXX_RRR[FieldName]
Support for optional features or the value of an implementation-specific parameter of a processor
can be discovered by executing the CPUID instruction on that processor. To obtain this value,
software must execute the CPUID instruction with the function code XXXX_XXXXh in EAX and
then examine the field FieldName returned in register RRR. If the “_RRR” notation is followed by
“_xYYY”, register ECX must be set to the value YYYh before executing CPUID. When FieldName
is not given, the entire contents of register RRR contains the desired value. When determining
optional feature support, if the bit identified by FieldName is set to a one, the feature is supported
on that processor.
CR0–CR4
A register range, from register CR0 through CR4, inclusive, with the low-order register first.

xxviii

26568—Rev. 3.22—May 2018

AMD64 Technology

CR4[OSXSAVE], CR4.OSXSAVE
The OSXSAVE bit of the CR4 register.
CR0[PE] = 1, CR0.PE = 1
The PE bit of the CR0 register has a value of 1.
EFER[LME] = 0, EFER.LME = 0
The LME field of the EFER register is cleared (contains a value of 0).
DS:rSI
The content of a memory location whose segment address is in the DS register and whose offset
relative to that segment is in the rSI register.
RFLAGS[13:12]
A field within a register identified by its bit range. In this example, corresponding to the IOPL
field.
Definitions
128-bit media instruction
Instructions that operate on the various 128-bit vector data types. Supported within both the legacy
SSE and extended SSE instruction sets.
256-bit media instruction
Instructions that operate on the various 256-bit vector data types. Supported within the extended
SSE instruction set.
64-bit media instructions
Instructions that operate on the 64-bit vector data types. These are primarily a combination of
MMX and 3DNow!™ instruction sets and their extensions, with some additional instructions from
the SSE1 and SSE2 instruction sets.
16-bit mode
Legacy mode or compatibility mode in which a 16-bit address size is active. See legacy mode and
compatibility mode.
32-bit mode
Legacy mode or compatibility mode in which a 32-bit address size is active. See legacy mode and
compatibility mode.
64-bit mode
A submode of long mode. In 64-bit mode, the default address size is 64 bits and new features, such
as register extensions, are supported for system and application software.

xxix

AMD64 Technology

26568—Rev. 3.22—May 2018

absolute
A displacement that references the base of a code segment rather than an instruction pointer.
See relative.
AES
Advance Encryption Standard (AES) algorithm acceleration instructions; part of Streaming SIMD
Extensions (SSE).
ASID
Address space identifier.
AVX
Extension of the SSE instruction set supporting 256-bit vector (packed) operands. See Streaming
SIMD Extensions.
biased exponent
The sum of a floating-point value’s exponent and a constant bias for a particular floating-point data
type. The bias makes the range of the biased exponent always positive, which allows reciprocation
without overflow.
byte
Eight bits.
clear, cleared
To write the value 0 to a bit or a range of bits. See set.
compatibility mode
A submode of long mode. In compatibility mode, the default address size is 32 bits, and legacy 16bit and 32-bit applications run without modification.
commit
To irreversibly write, in program order, an instruction’s result to software-visible storage, such as a
register (including flags), the data cache, an internal write buffer, or memory.
CPL
Current privilege level.
direct
Referencing a memory address included in the instruction syntax as an immediate operand. The
address may be an absolute or relative address. See indirect.
displacement
A signed value that is added to the base of a segment (absolute addressing) or an instruction pointer
(relative addressing). Same as offset.

xxx

26568—Rev. 3.22—May 2018

AMD64 Technology

doubleword
Two words, or four bytes, or 32 bits.
double quadword
Eight words, or 16 bytes, or 128 bits. Also called octword.
effective address size
The address size for the current instruction after accounting for the default address size and any
address-size override prefix.
effective operand size
The operand size for the current instruction after accounting for the default operand size and any
operand-size override prefix.
element
See vector.
exception
An abnormal condition that occurs as the result of instruction execution. Processor response to an
exception depends on the type of exception. For all exceptions except SSE floating-point
exceptions and x87 floating-point exceptions, control is transferred to a handler (or service
routine) for that exception as defined by the exception’s vector. For floating-point exceptions
defined by the IEEE 754 standard, there are both masked and unmasked responses. When
unmasked, the exception handler is called, and when masked, a default response is provided
instead of calling the handler.
extended SSE instructions
Enhanced set of SIMD instructions supporting 256-bit vector data types and allowing the
specification of up to four operands. A subset of the Streaming SIMD Extensions (SSE). Includes
the AVX, FMA, FMA4, and XOP instructions. Compare legacy SSE.
flush
An often ambiguous term meaning (1) writeback, if modified, and invalidate, as in “flush the cache
line,” or (2) invalidate, as in “flush the pipeline,” or (3) change a value, as in “flush to zero.”
FMA4
Fused Multiply Add, four operand. Part of the extended SSE instruction set.
FMA
Fused Multiply Add. Part of the extended SSE instruction set.
GDT
Global descriptor table.

xxxi

AMD64 Technology

26568—Rev. 3.22—May 2018

GIF
Global interrupt flag.
IDT
Interrupt descriptor table.
IGN
Ignored. Value written is ignored by hardware. Value returned on a read is indeterminate. See
reserved.
indirect
Referencing a memory location whose address is in a register or other memory location. The
address may be an absolute or relative address. See direct.
IRB
The virtual-8086 mode interrupt-redirection bitmap.
IST
The long-mode interrupt-stack table.
IVT
The real-address mode interrupt-vector table.
LDT
Local descriptor table.
legacy x86
The legacy x86 architecture.
legacy mode
An operating mode of the AMD64 architecture in which existing 16-bit and 32-bit applications and
operating systems run without modification. A processor implementation of the AMD64
architecture can run in either long mode or legacy mode. Legacy mode has three submodes, real
mode, protected mode, and virtual-8086 mode.
legacy SSE instructions
All Streaming SIMD Extensions instructions prior to AVX, XOP, and FMA4. Legacy SSE
instructions primarily utilize operands held in XMM registers. The legacy SSE instructions
include the original Streaming SIMD Extensions (SSE1) and the subsequent extensions SSE2,
SSE3, SSSE3, SSE4, SSE4A, SSE4.1, and SSE4.2. See Streaming SIMD instructions.
long mode
An operating mode unique to the AMD64 architecture. A processor implementation of the
AMD64 architecture can run in either long mode or legacy mode. Long mode has two submodes,
64-bit mode and compatibility mode.

xxxii

26568—Rev. 3.22—May 2018

AMD64 Technology

lsb
Least-significant bit.
LSB
Least-significant byte.
main memory
Physical memory, such as RAM and ROM (but not cache memory) that is installed in a particular
computer system.
mask
(1) A control bit that prevents the occurrence of a floating-point exception from invoking an
exception-handling routine. (2) A field of bits used for a control purpose.
MBZ
Must be zero. If software attempts to set an MBZ bit to 1, a general-protection exception (#GP)
occurs. See reserved.
memory
Unless otherwise specified, main memory.
moffset
A 16, 32, or 64-bit offset that specifies a memory operand directly, without using a ModRM or SIB
byte.
msb
Most-significant bit.
MSB
Most-significant byte.
octword
Same as double quadword.
offset
Same as displacement.
overflow
The condition in which a floating-point number is larger in magnitude than the largest, finite,
positive or negative number that can be represented in the data-type format being used.
packed
See vector.
PAE
Physical-address extensions.

xxxiii

AMD64 Technology

26568—Rev. 3.22—May 2018

physical memory
Actual memory, consisting of main memory and cache.
probe
A check for an address in processor caches or internal buffers. External probes originate outside
the processor, and internal probes originate within the processor.
protected mode
A submode of legacy mode.
quadword
Four words, eight bytes, or 64 bits.
RAZ
Read as zero. Value returned on a read is always zero (0) regardless of what was previously
written. See reserved.
real-address mode, real mode
A short name for real-address mode, a submode of legacy mode.
relative
Referencing with a displacement (offset) from an instruction pointer rather than the base of a code
segment. See absolute.
reserved
Fields marked as reserved may be used at some future time.
To preserve compatibility with future processors, reserved fields require special handling when
read or written by software. Software must not depend on the state of a reserved field (unless
qualified as RAZ), nor upon the ability of such fields to return a previously written state.
If a field is marked reserved without qualification, software must not change the state of that field;
it must reload that field with the same value returned from a prior read.
Reserved fields may be qualified as IGN, MBZ, RAZ, or SBZ (see definitions).
REX
A legacy instruction modifier prefix that specifies 64-bit operand size and provides access to
additional registers.
RIP-relative addressing
Addressing relative to the 64-bit relative instruction pointer.
SBZ
Should be zero. An attempt by software to set an SBZ bit to 1 results in undefined behavior. See
reserved.

xxxiv

26568—Rev. 3.22—May 2018

AMD64 Technology

scalar
An atomic value existing independently of any specification of location, direction, etc., as opposed
to vectors.
set
To write the value 1 to a bit or a range of bits. See clear.
SIMD
Single instruction, multiple data. See vector.
Streaming SIMD Extensions (SSE)
Instructions that operate on scalar or vector (packed) integer and floating point numbers. The SSE
instruction set comprises the legacy SSE and extended SSE instruction sets.
SSE1
Original SSE instruction set. Includes instructions that operate on vector operands in both the
MMX and the XMM registers.
SSE2
Extensions to the SSE instruction set.
SSE3
Further extensions to the SSE instruction set.
SSSE3
Further extensions to the SSE instruction set.
SSE4.1
Further extensions to the SSE instruction set.
SSE4.2
Further extensions to the SSE instruction set.
SSE4A
A minor extension to the SSE instruction set adding the instructions EXTRQ, INSERTQ,
MOVNTSS, and MOVNTSD.
sticky bit
A bit that is set or cleared by hardware and that remains in that state until explicitly changed by
software.
TSS
Task-state segment.

xxxv

AMD64 Technology

26568—Rev. 3.22—May 2018

underflow
The condition in which a floating-point number is smaller in magnitude than the smallest nonzero,
positive or negative number that can be represented in the data-type format being used.
vector
(1) A set of integer or floating-point values, called elements, that are packed into a single operand.
Most media instructions use vectors as operands. Also called packed or SIMD operands.
(2) An interrupt descriptor table index, used to access exception handlers. See exception.
VEX prefix
Extended instruction encoding escape prefix. Introduces a two- or three-byte encoding escape
sequence used in the encoding of AVX instructions. Opens a new extended instruction encoding
space. Fields select the opcode map and allow the specification of operand vector length and an
additional operand register. See XOP prefix.
virtual-8086 mode
A submode of legacy mode.
VMCB
Virtual machine control block.
VMM
Virtual machine monitor.
word
Two bytes, or 16 bits.
x86
See legacy x86.
XOP instructions
Part of the extended SSE instruction set using the XOP prefix. See Streaming SIMD Extensions.
XOP prefix
Extended instruction encoding escape prefix. Introduces a three-byte escape sequence used in the
encoding of XOP instructions. Opens a new extended instruction encoding space distinct from the
VEX opcode space. Fields select the opcode map and allow the specification of operand vector
length and an additional operand register. See VEX prefix.
Registers
In the following list of registers, mnemonics refer either to the register itself or to the register content:
AH–DH
The high 8-bit AH, BH, CH, and DH registers. See [AL–DL].

xxxvi

26568—Rev. 3.22—May 2018

AMD64 Technology

AL–DL
The low 8-bit AL, BL, CL, and DL registers. See [AH–DH].
AL–r15B
The low 8-bit AL, BL, CL, DL, SIL, DIL, BPL, SPL, and [r8B–r15B] registers, available in 64-bit
mode.
BP
Base pointer register.
CRn
Control register number n.
CS
Code segment register.
eAX–eSP
The 16-bit AX, BX, CX, DX, DI, SI, BP, and SP registers or the 32-bit EAX, EBX, ECX, EDX,
EDI, ESI, EBP, and ESP registers. See [rAX–rSP].
EFER
Extended features enable register.
eFLAGS
16-bit or 32-bit flags register. See rFLAGS.
EFLAGS
32-bit (extended) flags register.
eIP
16-bit or 32-bit instruction-pointer register. See rIP.
EIP
32-bit (extended) instruction-pointer register.
FLAGS
16-bit flags register.
GDTR
Global descriptor table register.
GPRs
General-purpose registers. For the 16-bit data size, these are AX, BX, CX, DX, DI, SI, BP, and SP.
For the 32-bit data size, these are EAX, EBX, ECX, EDX, EDI, ESI, EBP, and ESP. For the 64-bit
data size, these include RAX, RBX, RCX, RDX, RDI, RSI, RBP, RSP, and R8–R15.

xxxvii

AMD64 Technology

26568—Rev. 3.22—May 2018

IDTR
Interrupt descriptor table register.
IP
16-bit instruction-pointer register.
LDTR
Local descriptor table register.
MSR
Model-specific register.
r8–r15
The 8-bit R8B–R15B registers, or the 16-bit R8W–R15W registers, or the 32-bit R8D–R15D
registers, or the 64-bit R8–R15 registers.
rAX–rSP
The 16-bit AX, BX, CX, DX, DI, SI, BP, and SP registers, or the 32-bit EAX, EBX, ECX, EDX,
EDI, ESI, EBP, and ESP registers, or the 64-bit RAX, RBX, RCX, RDX, RDI, RSI, RBP, and RSP
registers. Replace the placeholder r with nothing for 16-bit size, “E” for 32-bit size, or “R” for 64bit size.
RAX
64-bit version of the EAX register.
RBP
64-bit version of the EBP register.
RBX
64-bit version of the EBX register.
RCX
64-bit version of the ECX register.
RDI
64-bit version of the EDI register.
RDX
64-bit version of the EDX register.
rFLAGS
16-bit, 32-bit, or 64-bit flags register. See RFLAGS.
RFLAGS
64-bit flags register. See rFLAGS.

xxxviii

26568—Rev. 3.22—May 2018

AMD64 Technology

rIP
16-bit, 32-bit, or 64-bit instruction-pointer register. See RIP.
RIP
64-bit instruction-pointer register.
RSI
64-bit version of the ESI register.
RSP
64-bit version of the ESP register.
SP
Stack pointer register.
SS
Stack segment register.
TPR
Task priority register (CR8).
TR
Task register.
YMM/XMM
Set of sixteen (eight accessible in legacy and compatibility modes) 256-bit wide registers that hold
scalar and vector operands used by the SSE instructions.
Endian Order
The x86 and AMD64 architectures address memory using little-endian byte-ordering. Multibyte
values are stored with the least-significant byte at the lowest byte address, and illustrated with their
least significant byte at the right side. Strings are illustrated in reverse order, because the addresses of
string bytes increase from right to left.

xxxix

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Documents
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•

xl

Peter Abel, IBM PC Assembly Language and Programming, Prentice-Hall, Englewood Cliffs, NJ,
1995.
Rakesh Agarwal, 80x86 Architecture & Programming: Volume II, Prentice-Hall, Englewood
Cliffs, NJ, 1991.
AMD, AMD-K6™ MMX™ Enhanced Processor Multimedia Technology, Sunnyvale, CA, 2000.
AMD, 3DNow!™ Technology Manual, Sunnyvale, CA, 2000.
AMD, AMD Extensions to the 3DNow!™ and MMX™ Instruction Sets, Sunnyvale, CA, 2000.
Don Anderson and Tom Shanley, Pentium Processor System Architecture, Addison-Wesley, New
York, 1995.
Nabajyoti Barkakati and Randall Hyde, Microsoft Macro Assembler Bible, Sams, Carmel, Indiana,
1992.
Barry B. Brey, 8086/8088, 80286, 80386, and 80486 Assembly Language Programming,
Macmillan Publishing Co., New York, 1994.
Barry B. Brey, Programming the 80286, 80386, 80486, and Pentium Based Personal Computer,
Prentice-Hall, Englewood Cliffs, NJ, 1995.
Ralf Brown and Jim Kyle, PC Interrupts, Addison-Wesley, New York, 1994.
Penn Brumm and Don Brumm, 80386/80486 Assembly Language Programming, Windcrest
McGraw-Hill, 1993.
Geoff Chappell, DOS Internals, Addison-Wesley, New York, 1994.
Chips and Technologies, Inc. Super386 DX Programmer’s Reference Manual, Chips and
Technologies, Inc., San Jose, 1992.
John Crawford and Patrick Gelsinger, Programming the 80386, Sybex, San Francisco, 1987.
Cyrix Corporation, 5x86 Processor BIOS Writer's Guide, Cyrix Corporation, Richardson, TX,
1995.
Cyrix Corporation, M1 Processor Data Book, Cyrix Corporation, Richardson, TX, 1996.
Cyrix Corporation, MX Processor MMX Extension Opcode Table, Cyrix Corporation, Richardson,
TX, 1996.
Cyrix Corporation, MX Processor Data Book, Cyrix Corporation, Richardson, TX, 1997.
Ray Duncan, Extending DOS: A Programmer's Guide to Protected-Mode DOS, Addison Wesley,
NY, 1991.
William B. Giles, Assembly Language Programming for the Intel 80xxx Family, Macmillan, New
York, 1991.
Frank van Gilluwe, The Undocumented PC, Addison-Wesley, New York, 1994.
John L. Hennessy and David A. Patterson, Computer Architecture, Morgan Kaufmann Publishers,
San Mateo, CA, 1996.
Thom Hogan, The Programmer’s PC Sourcebook, Microsoft Press, Redmond, WA, 1991.

26568—Rev. 3.22—May 2018

•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•

AMD64 Technology

Hal Katircioglu, Inside the 486, Pentium, and Pentium Pro, Peer-to-Peer Communications, Menlo
Park, CA, 1997.
IBM Corporation, 486SLC Microprocessor Data Sheet, IBM Corporation, Essex Junction, VT,
1993.
IBM Corporation, 486SLC2 Microprocessor Data Sheet, IBM Corporation, Essex Junction, VT,
1993.
IBM Corporation, 80486DX2 Processor Floating Point Instructions, IBM Corporation, Essex
Junction, VT, 1995.
IBM Corporation, 80486DX2 Processor BIOS Writer's Guide, IBM Corporation, Essex Junction,
VT, 1995.
IBM Corporation, Blue Lightning 486DX2 Data Book, IBM Corporation, Essex Junction, VT,
1994.
Institute of Electrical and Electronics Engineers, IEEE Standard for Binary Floating-Point
Arithmetic, ANSI/IEEE Std 754-1985.
Institute of Electrical and Electronics Engineers, IEEE Standard for Radix-Independent FloatingPoint Arithmetic, ANSI/IEEE Std 854-1987.
Muhammad Ali Mazidi and Janice Gillispie Mazidi, 80X86 IBM PC and Compatible Computers,
Prentice-Hall, Englewood Cliffs, NJ, 1997.
Hans-Peter Messmer, The Indispensable Pentium Book, Addison-Wesley, New York, 1995.
Karen Miller, An Assembly Language Introduction to Computer Architecture: Using the Intel
Pentium, Oxford University Press, New York, 1999.
Stephen Morse, Eric Isaacson, and Douglas Albert, The 80386/387 Architecture, John Wiley &
Sons, New York, 1987.
NexGen Inc., Nx586 Processor Data Book, NexGen Inc., Milpitas, CA, 1993.
NexGen Inc., Nx686 Processor Data Book, NexGen Inc., Milpitas, CA, 1994.
Bipin Patwardhan, Introduction to the Streaming SIMD Extensions in the Pentium III,
www.x86.org/articles/sse_pt1/ simd1.htm, June, 2000.
Peter Norton, Peter Aitken, and Richard Wilton, PC Programmer’s Bible, Microsoft Press,
Redmond, WA, 1993.
PharLap 386|ASM Reference Manual, Pharlap, Cambridge MA, 1993.
PharLap TNT DOS-Extender Reference Manual, Pharlap, Cambridge MA, 1995.
Sen-Cuo Ro and Sheau-Chuen Her, i386/i486 Advanced Programming, Van Nostrand Reinhold,
New York, 1993.
Jeffrey P. Royer, Introduction to Protected Mode Programming, course materials for an onsite
class, 1992.
Tom Shanley, Protected Mode System Architecture, Addison Wesley, NY, 1996.
SGS-Thomson Corporation, 80486DX Processor SMM Programming Manual, SGS-Thomson
Corporation, 1995.

xli

AMD64 Technology

•
•
•

xlii

26568—Rev. 3.22—May 2018

Walter A. Triebel, The 80386DX Microprocessor, Prentice-Hall, Englewood Cliffs, NJ, 1992.
John Wharton, The Complete x86, MicroDesign Resources, Sebastopol, California, 1994.
Web sites and newsgroups:
- www.amd.com
- news.comp.arch
- news.comp.lang.asm.x86
- news.intel.microprocessors
- news.microsoft

26568—Rev. 3.22—May 2018

AMD64 Technology

1 Introduction
Processors capable of performing the same mathematical operation simultaneously on multiple data
streams are classified as single-instruction, multiple-data (SIMD). Instructions that utilize this
hardware capability are called SIMD instructions.
Software can utilize SIMD instructions to drastically increase the performance of media applications
which typically employ algorithms that perform the same mathematical operation on a set of values in
parallel. The original SIMD instruction set was called MMX and operated on 64-bit wide vectors of
integer and floating-point elements. Subsequently a new SIMD instruction set called the Streaming
SIMD Extensions (SSE) was added to the architecture.
The SSE instruction set defines a new programming model with its own array of vector data registers
(YMM/XMM registers) and a control and status register (MXCSR). Most SSE instructions pull their
operands from one or more YMM/XMM registers and store results in a YMM/XMM register,
although some instructions use a GPR as either a source or destination. Most instructions allow one
operand to be loaded from memory. The set includes instructions to load a YMM/XMM register from
memory (aligned or unaligned) and store the contents of a YMM/XMM register.
An overview of the SSE instruction set is provided in Volume 1, Chapter 4.
This volume provides detailed descriptions of each instruction within the SSE instruction set. The SSE
instruction set comprises the legacy SSE instructions and the extended SSE instructions.
Legacy SSE instructions comprise the following subsets:
•
•
•
•
•
•
•
•

The original Streaming SIMD Extensions (herein referred to as SSE1)
SSE2
SSE3
SSSE3
SSE4.1
SSE4.2
SSE4A
Advanced Encryption Standard (AES)

Extended SSE instructions comprise the following subsets:
•
•
•
•
•

AVX
AVX2
FMA
FMA4
XOP

1

AMD64 Technology

26568—Rev. 3.22—May 2018

Legacy SSE architecture supports operations involving 128-bit vectors and defines the base
programming model including the SSE registers, the Media eXtension Control and Status Register
(MXCSR), and the instruction exception behavior.
The Streaming SIMD Extensions (SSE) instruction set is extended to include the AVX, FMA, FMA4,
and XOP instruction sets. The AVX instruction set provides an extended form for most legacy SSE
instructions and several new instructions. Extensions include providing for the specification of a
unique destination register for operations with two or more source operands and support for 256-bit
wide vectors. Some AVX instructions also provide enhanced functionality compared to their legacy
counterparts.
A significant feature of the extended SSE instruction set architecture is the doubling of the width of the
XMM registers. These registers are referred to as the YMM registers. The XMM registers overlay the
lower octword (128 bits) of the YMM registers. Registers YMM/XMM0–7 are accessible in legacy
and compatibility mode. Registers YMM/XMM8–15 are available in 64-bit mode (a subset of long
mode). VEX/XOP instruction prefixes allow instruction encodings to address the additional registers.
The SSE instructions can be used in processor legacy mode or long (64-bit) mode. CPUID
Fn8000_0001_EDX[LM] indicates the availability of long mode.
Compilation for execution in 64-bit mode offers the following advantages:
•
•
•

Access to an additional eight YMM/XMM registers for a total of 16
Access to an additional eight 64-bit general-purpose registers for a total of 16
Access to the 64-bit virtual address space and the RIP-relative addressing mode

Hardware support for each of the subsets of SSE instructions listed above is indicated by CPUID
feature flags. Refer to Volume 3, Appendix D, “Instruction Subsets and CPUID Feature Flags,” for a
complete list of instruction-related feature flags. The CPUID feature flags that pertain to each
instruction are also given in the instruction descriptions below. For information on using the CPUID
instruction, see the instruction description in Volume 3.
Chapter 2, “Instruction Reference” contains detailed descriptions of each instruction, organized in
alphabetic order by mnemonic. For those legacy SSE instructions that have an AVX form, the
extended form of the instruction is described together with the legacy instruction in one entry. For
these instructions, the instruction reference page is located based on the instruction mnemonic of the
legacy SSE and not the extended (AVX) form. Those AVX instructions without a legacy form are
listed in order by their AVX mnemonic. The mnemonic for all extended SSE instructions including the
FMA and XOP instructions begin with the letter V.

1.1 Syntax and Notation
The descriptive synopsis of opcode syntax for legacy SSE instructions follows the conventions
described in Volume 3: General Purpose and System Instructions. See Chapter 2 and the section
entitled “Notation.”

2

26568—Rev. 3.22—May 2018

AMD64 Technology

For general information on the programming model and overview descriptions of the SSE instruction
set, see:
•
•
•

“Streaming SIMD Extensions Media and Scientific Programming” in Volume 1.
“Instruction Encoding” in Volume 3
“Summary of Registers and Data Types” in Volume 3.

The syntax of the extended instruction sets requires an expanded synopsis. The expanded synopsis
includes a mnemonic summary and a summary of prefix sequence fields. Figure 1-1 shows the
descriptive synopsis of a typical XOP instruction. The synopsis of VEX-encoded instructions have the
same format, differing only in regard to the instruction encoding escape prefix, that is, VEX instead of
XOP.
Mnemonic

Encoding
XOP RXB.map_select W.vvvv.L.pp

VPCMOV ymm1, ymm2, ymm3/mem256, ymm4

8F

assembly language representation
encoding escape prefix
3-bit field representing R, X, B bit values
5-bit map_select field

RXB.08

0.src.1.00

Opcode
A2 /r ib

W bit
vvvv field
L bit
pp field
opcode
register/memory type specifier
immediate operand

Figure 1-1. Typical Descriptive Synopsis - Extended SSE Instructions

1.2 Extended Instruction Encoding
The legacy SSE instructions are encoded using the legacy encoding syntax and the extended
instructions are encoded using an enhanced encoding syntax which is compatible with the legacy
syntax. Both are described in detail in Chapter 1 of Volume 3.
As described in Volume 3, the extended instruction encoding syntax utilizes multi-byte escape
sequences to both select alternate opcode maps as well as augment the encoding of the instruction.
Multi-byte escape sequences are introduced by one of the two VEX prefixes or the XOP prefix.
The AVX and AVX2 instructions utilize either the two-byte (introduced by the VEX C5h prefix) or the
three-byte (introduced by the VEX C4h prefix) encoding escape sequence. XOP instructions are
encoded using a three-byte encoding escape sequence introduced by the XOP prefix (except for the
XOP instructions VPERMIL2PD and VPERMIL2PS which are encoded using the VEX prefix). The
XOP prefix is 8Fh. The three-byte encoding escape sequences utilize the map_select field of the
second byte to select the opcode map used to interpret the opcode byte.

3

AMD64 Technology

26568—Rev. 3.22—May 2018

The two-byte VEX prefix sequence implicitly selects the secondary (“two-byte”) opcode map.

1.2.1

Immediate Byte Usage Unique to the SSE instructions

An immediate is a value, typically an operand, explicitly provided within the instruction encoding.
Depending on the opcode and the operating mode, the size of an immediate operand can be 1, 2, 4, or 8
bytes. Legacy and extended media instructions typically use an immediate byte operand (imm8).
A one-byte immediate is generally shown in the instruction synopsis as “ib” suffix. For extended SSE
instructions with four source operands, the suffix “is4” is used to indicate the presence of the
immediate byte used to select the fourth source operand.
The VPERMIL2PD and VPERMIL2PS instructions utilize a fifth 2-bit operand which is encoded
along with the fourth register select index in an immediate byte. For this special case the immediate
byte will be shown in the instruction synopsis as “is5”.

1.2.2

Instruction Format Examples

The following sections provide examples of two-, three-, and four-operand extended instructions.
These instructions generally perform nondestructive-source operations, meaning that the result of the
operation is written to a separately specified destination register rather than overwriting one of the
source operands. This preserves the contents of the source registers. Most legacy SSE instructions
perform destructive-source operations, in which a single register is both source and destination, so
source content is lost.
1.2.2.1 XMM Register Destinations
The following general properties apply to YMM/XMM register destination operands.
•
•

For legacy instructions that use XMM registers as a destination: When a result is written to a
destination XMM register, bits [255:128] of the corresponding YMM register are not affected.
For extended instructions that use XMM registers as a destination: When a result is written to a
destination XMM register, bits [255:128] of the corresponding YMM register are cleared.

1.2.2.2 Two Operand Instructions
Two-operand instructions use ModRM-based operand assignment. For most instructions, the first
operand is the destination, selected by the ModRM.reg field, and the second operand is either a register
or a memory source, selected by the ModRM.r/m field.
VCVTDQ2PD is an example of a two-operand AVX instruction.
Mnemonic

Encoding
VEX RXB.map_select W.vvvv.L.pp

Opcode

VCVTDQ2PD xmm1, xmm2/mem64

C4

RXB.01

0.1111.0.10

E6 /r

VCVTDQ2PD ymm1, xmm2/mem128

C4

RXB.01

0.1111.1.10

E6 /r

4

26568—Rev. 3.22—May 2018

AMD64 Technology

The destination register is selected by ModRM.reg. The size of the destination register is determined
by VEX.L. The source is either a YMM/XMM register or a memory location specified by ModRM.r/m
Because this instruction converts packed doubleword integers to double-precision floating-point
values, the source data size is smaller than the destination data size.
VEX.vvvv is not used and must be set to 1111b.
1.2.2.3 Three-Operand Instructions
These extended instructions have two source operands and a destination operand.
VPROTB is an example of a three-operand XOP instruction.
There are versions of the instruction for variable-count rotation and for fixed-count rotation.
VPROTB dest, src, variable-count
VPROTB dest, src, fixed-count
Mnemonic

Encoding
XOP

RXB.map_select

W.vvvv.L.pp

Opcode

VPROTB xmm1, xmm2/mem128, xmm3

8F

RXB.09

0.src.0.00

90 /r

VPROTB xmm1, xmm2, xmm3/mem128

8F

RXB.09

1.src.0.00

90 /r

VPROTB xmm1, xmm2/mem128, imm8

8F

RXB.08

0.1111.0.00

90 /r ib

For both versions of the instruction, the destination (dest) operand is an XMM register specified by
ModRM.reg.
The variable-count version of the instruction rotates each byte of the source as specified by the
corresponding byte element variable-count.
Selection of src and variable-count is controlled by XOP.W.
•
•

When XOP.W = 0, src is either an XMM register or a 128-bit memory location specified by
ModRM.r/m, and variable-count is an XMM register specified by XOP.vvvv.
When XOP.W = 1, src is an XMM register specified by XOP.vvvv and variable-count is either an
XMM register or a 128-bit memory location specified by ModRM.r/m.

Table 1-1 summarizes the effect of the XOP.W bit on operand selection.
Table 1-1. Three-Operand Selection
XOP.W

dest

src

variable-count

0

ModRM.reg

ModRM.r/m

XOP.vvvv

1

ModRM.reg

XOP.vvvv

ModRM.r/m

The fixed-count version of the instruction rotates each byte of src as specified by the immediate byte
operand fixed-count. For this version, src is either an XMM register or a 128-bit memory location

5

AMD64 Technology

26568—Rev. 3.22—May 2018

specified by ModRM.r/m. Because XOP.vvvv is not used to specify the source register, it must be set
to 1111b or execution of the instruction will cause an Invalid Opcode (#UD) exception.
1.2.2.4 Four-Operand Instructions
Some extended instructions have three source operands and a destination operand. This is
accomplished by using the VEX/XOP.vvvv field, the ModRM.reg and ModRM.r/m fields, and bits
[7:4] of an immediate byte to select the operands. The opcode suffix “is4” is used to identify the
immediate byte, and the selected operands are shown in the synopsis.
VFMSUBPD is an example of an four-operand FMA4 instruction.
VFMSUBPD dest, src1, src2, src3

dest = src1* src2 - src3

Mnemonic

Encoding
VEX RXB.map_select W.vvvv.L.pp

Opcode

VFMSUBPD xmm1, xmm2, xmm3/mem128, xmm4

C4

RXB.03

0.src.0.01

6D /r is4

VFMSUBPD ymm1, ymm2, ymm3/mem256, ymm4

C4

RXB.03

0.src.1.01

6D /r is4

VFMSUBPD xmm1, xmm2, xmm3, xmm4/mem128

C4

RXB.03

1.src.0.01

6D /r is4

VFMSUBPD ymm1, ymm2, ymm3, ymm4/mem256

C4

RXB.03

1.src.1.01

6D /r is4

The first operand, the destination (dest), is an XMM register or a YMM register (as determined by
VEX.L) selected by ModRM.reg. The following three operands (src1, src2, src3) are sources.
The src1 operand is an XMM or YMM register specified by VEX.vvvv.
VEX.W determines the configuration of the src2 and src3 operands.
•
•

When VEX.W = 0, src2 is either a register or a memory location specified by ModRM.r/m, and
src3 is a register specified by bits [7:4] of the immediate byte.
When VEX.W = 1, src2 is a register specified by bits [7:4] of the immediate byte and src3 is either
a register or a memory location specified by ModRM.r/m.

Table 1-1 summarizes the effect of the VEX.W bit on operand selection.
Table 1-2. Four-Operand Selection
VEX.W

dest

src1

src2

src3

0

ModRM.reg

VEX.vvvv

ModRM.r/m

is4[7:4]

1

ModRM.reg

VEX.vvvv

is4[7:4]

ModRM.r/m

1.3 VSIB Addressing
Specific AVX2 instructions utilize a vectorized form of indexed register-indirect addressing called
vector SIB (VSIB) addressing. In contrast to the standard indexed register-indirect address mode,
which generates a single effective address to access a single memory operand, VSIB addressing generates an array of effective addresses which is used to access data from multiple memory locations in
a single operation.

6

26568—Rev. 3.22—May 2018

AMD64 Technology

VSIB addressing is encoded using three or six bytes following the opcode byte, augmented by the X
and B bits from the VEX prefix. The first byte is the ModRM byte with the standard mod, reg, and
r/m fields (although allowed values for the mod and r/m fields are restricted). The second is the VSIB
byte which replaces the SIB byte in the encoding. The VSIB byte specifies a GPR which serves as a
base address register and an XMM/YMM register that contains a packed array of index values. The
two-bit scale field specifies a common scaling factor to be applied to all of the index values. A constant displacement value is encoded in the one or four bytes that follow the VSIB byte.
Figure 1-2 shows the format of the VSIB byte.
7

6

SS

5

4

3

index

2

1

0

VSIB

base

VEX.X extends this field to 4 bits

VEX.B extends this field to 4 bits
v4_VSIB_format.eps

Figure 1-2. VSIB Byte Format
VSIB.SS (Bits [7:6]). The SS field is used to specify the scale factor to be used in the computation
of each of the effective addresses. The scale factor scale is equal to 2SS (two raised to power of the
value of the SS field). Therefore, if SS = 00b, scale = 1; if SS = 01b, scale = 2; if SS = 10b, scale = 4;
and if SS = 11b, scale = 8.
VSIB.index (Bits [5:3]). This field is concatenated with the complement of the VEX.X bit ({X,
index}) to specify the YMM/XMM register that contains the packed array of index values index[i] to
be used in the computation of the array of effective addresses effective address[i].
VSIB.base (Bits [5:3]). This field is concatenated with the complement of the VEX.B bit ({B,
base}) to specify the general-purpose register (base GPR) that contains the base address base to be
used in the computation of each of the effective addresses.

1.3.1

Effective Address Array Computation

Each element i of the effective address array is computed using the formula:
effective address[i] = scale * index[i] + base + displacement.

where index[i] is the ith element of the XMM/YMM register specified by {X,VSIB.index}. An index
element is either 32 or 64 bits wide and is treated as a signed integer.
Variants of this mode use either an eight-bit or a 32-bit displacement value. One variant sets the base
to zero. The value of the ModRM.mod field specifies the specific variant of VSIB addressing mode,
as shown in Table 1. In the table, the notation [XMMn/YMMn] indicates the XMM/YMM register
that contains the packed index array and [base GPR] means the contents of the base GPR selected by
{B, base}.

7

AMD64 Technology

26568—Rev. 3.22—May 2018

Table 1: Vectorized Addressing Modes
Index1

ModRM.mod
00

01

10

0000

scale * [XMM0/YMM0] + Disp32

scale * [XMM0/YMM0] + Disp8 +
[base GPR]

scale * [XMM0/YMM0] + Disp32 +
[base GPR]

0001

scale * [XMM1/YMM1] + Disp32

scale * [XMM1/YMM1] + Disp8 +
[base GPR]

scale * [XMM1/YMM1] + Disp32 +
[base GPR]

0010

scale * [XMM2/YMM2] + Disp32

scale * [XMM2/YMM2] + Disp8 +
[base GPR]

scale * [XMM2/YMM2] + Disp32 +
[base GPR]

0011

scale * [XMM3/YMM3] + Disp32

scale * [XMM3/YMM3] + Disp8 +
[base GPR]

scale * [XMM3/YMM3] + Disp32 +
[base GPR]

0100

scale * [XMM4/YMM4] + Disp32

scale * [XMM4/YMM4] + Disp8 +
[base GPR]

scale * [XMM4/YMM4] + Disp32 +
[base GPR]

0101

scale * [XMM5/YMM5] + Disp32

scale * [XMM5/YMM5] + Disp8 +
[base GPR]

scale * [XMM5/YMM5] + Disp32 +
[base GPR]

0110

scale * [XMM6/YMM6] + Disp32

scale * [XMM6/YMM6] + Disp8 +
[base GPR]

scale * [XMM6/YMM6] + Disp32 +
[base GPR]

0111

scale * [XMM7/YMM7] + Disp32

scale * [XMM7/YMM7] + Disp8 +
[base GPR]

scale * [XMM7/YMM7] + Disp32 +
[base GPR]

1000

scale * [XMM8/YMM8] + Disp32

scale * [XMM8/YMM8] + Disp8 +
[base GPR]

scale * [XMM8/YMM8] + Disp32 +
[base GPR]

1001

scale * [XMM9/YMM9] + Disp32

scale * [XMM9/YMM9] + Disp8 +
[base GPR]

scale * [XMM9/YMM9] + Disp32 +
[base GPR]

1010

scale * [XMM10/YMM10] + Disp32

scale * [XMM10/YMM10] + Disp8 + scale * [XMM10/YMM10] + Disp32 +
[base GPR]
[base GPR]

1011

scale * [XMM11/YMM11] + Disp32

scale * [XMM11/YMM11] + Disp8 + scale * [XMM11/YMM11] + Disp32 +
[base GPR]
[base GPR]

1100

scale * [XMM12/YMM12] + Disp32

scale * [XMM12/YMM12] + Disp8 + scale * [XMM12/YMM12] + Disp32 +
[base GPR]
[base GPR]

1101

scale * [XMM13/YMM13] + Disp32

scale * [XMM13/YMM13] + Disp8 + scale * [XMM13/YMM13] + Disp32 +
[base GPR]
[base GPR]

1110

scale * [XMM14/YMM14] + Disp32

scale * [XMM14/YMM14] + Disp8 + scale * [XMM14/YMM14] + Disp32 +
[base GPR]
[base GPR]

1111

scale * [XMM15/YMM15] + Disp32

scale * [XMM15/YMM15] + Disp8 + scale * [XMM15/YMM15] + Disp32 +
[base GPR]
[base GPR]

Note 1. Index = {VEX.X,VSIB.index}. In 32-bit mode, VEX.X = 1.

1.3.2

Notational Conventions Related to VSIB Addressing Mode

In the instruction descriptions that follow, the notation vm32x indicates a packed array of four 32-bit
index values contained in the specified XMM index register and vm32y indicates a packed array of
eight 32-bit index values contained in the specified YMM index register. Depending on the instruction, these indices can be used to compute the effective address of up to four (vm32x) or eight
(vm32y) memory-based operands.
The notation vm64x indicates a packed array of two 64-bit index values contained in the specified
XMM index register and vm64y indicates a packed array of four 64-bit index values contained in the
specified YMM index register. Depending on the instruction, these indices can be used to compute
the effective address of up to two (vm64x) or four (vm64y) memory-based operands.

8

26568—Rev. 3.22—May 2018

AMD64 Technology

In body of the description of the instructions, the notation mem32[vm32x] is used to represent a
sparse array of 32-bit memory operands where the packed array of four 32-bit indices used to calculate the effective addresses of the operands is held in an XMM register. The notation mem32[vm32y]
refers to a similar array of 32-bit memory operands where the packed array of eight 32-bit indices is
held in a YMM register. The notation mem32[vm64x] means a sparse array of 32-bit memory operands where the packed array of two 64-bit indices is held in an XMM register and mem32[vm64y]
means a sparse array of 32-bit memory operands where the packed array of four 64-bit indices is held
in a YMM register.
The notation mem64[index_array], where index_array is either vm32x, vm64x, or vm64y, specifies a sparse array of 64-bit memory operands addressed via a packed array of 32-bit or 64-bit indices
held in an XMM/YMM register. If an instruction uses either an XMM or a YMM register, depending
on operand size, to hold the index array, the notation vm32x/y or vm64x/y is used to represent the
array.
In summary, given a maximum operand size of 256-bits, a sparse array of 32-bit memory-based operands can be addressed using a vm32x, vm32y, vm64x, or vm64y index array. A sparse array of 64bit memory-based operands can be addressed using a vm32x, vm64x, or vm64y index array. Specific instructions may use fewer than the maximum number of memory operands that can be
addressed using the specified index array.
VSIB addressing is only valid in 32-bit or 64-bit effective addressing mode and is only supported for
instruction encodings using the VEX prefix. The ModRM.mod value of 11b is not valid in VSIB
addressing mode and ModRM.r/m must be set to 100b.

1.3.3

Memory Ordering and Exception Behavior

VSIB addressing has some special considerations relative to memory ordering and the signaling of
exceptions.
VSIB addressing specifies an array of addresses that allows an instruction to access multiple memory
locations. The order in which data is read from or written to memory is not specified. Memory ordering with respect to other instructions follows the memory-ordering model described in Volume 2.
Data may be accessed by the instruction in any order, but access-triggered exceptions are delivered in
right-to-left order. That is, if a exception is triggered by the load or store of an element of an
XMM/YMM register and delivered, all elements to the right of that element (all the lower indexed
elements) have been or will be completed without causing an exception. Elements to the left of the
element causing the exception may or may not be completed. If the load or store of a given element
triggers multiple exceptions, they are delivered in the conventional order.
Because data can be accessed in any order, elements to the left of the one that triggered the exception
may be read or written before the exception is delivered. Although the ordering of accesses is not
specified, it is repeatable in a specific processor implementation. Given the same input values and initial architectural state, the same set of elements to the left of the faulting one will be accessed.
VSIB addressing should not be used to access memory mapped I/O as the ordering of the individual
loads is implementation-specific and some implementations may access data larger than the data element size or access elements more than once.

9

AMD64 Technology

26568—Rev. 3.22—May 2018

1.4 Enabling SSE Instruction Execution
Application software that utilizes the SSE instructions requires support from operating system
software.
To enable and support SSE instruction execution, operating system software must:
•
•
•

enable hardware for supported SSE subsets
manage the SSE hardware architectural state, saving and restoring it as required during and after
task switches
provide exception handlers for all unmasked SSE exceptions.

See Volume 2, Chapter 11, for details on enabling SSE execution and managing its execution state.

1.5 String Compare Instructions
The legacy SSE instructions PCMPESTRI, PCMPESTRM, PCMPISTRI, and PCMPISTRM and the
extended SSE instructions VPCMPESTRI, VPCMPESTRM, VPCMPISTRI, and VPCMPISTRM
provide a versatile means of classifying characters of a string by performing one of several different
types of comparison operations using a second string as a prototype.
This section describes the operation of the legacy string compare instructions. This discussion applies
equally to the extended versions of the instructions. Any difference between the legacy and the
extended version of a given instruction is described in the instruction reference entry for the
instruction in the following chapter.
A character string is a vector of data elements that is normally used to represent an ordered
arrangement of graphemes which may be stored, processed, displayed, or printed. Ordered strings of
graphemes are most often used to convey information in a human-readable manner. The string
compare instructions, however, do not restrict the use or interpretation of their operands.
The first source operand provides the prototype string and the second operand is the string to be
scanned and characterized (referred to herein as the string under test, or SUT). Four string formats and
four types of comparisons are supported. The intermediate result of this processing is a bit vector that
summarizes the characterization of each character in the SUT. This bit vector is then post-processed
based on options specified in the instruction encoding. Instruction variants determine the final result—
either an index or a mask.
Instruction execution affects the arithmetic status flags (ZF, CF, SF, OF, AF, PF), but the significance
of many of the flags is redefined to provide information tailored to the result of the comparison
performed. See Section 1.5.6, “Affect on Flags” on page 19.
The instructions have a defined base function and additional functionality controlled by bit fields in an
immediate byte operand (imm8). The base function determines whether the source strings have
implicitly (PCMPISTRI and PCMPISTRM) or explicitly (PCMPESTRI and PCMPESTRM) defined
lengths, and whether the result is an index (PCMPISTRI and PCMPESTRI) or a mask (PCMPISTRM
and PCMPESTRM).

10

26568—Rev. 3.22—May 2018

AMD64 Technology

PCMPISTRI and PCMPESTRI return their final result (an integer value) via the ECX register, while
PCMPISTRM and PCMPESTRM write a bit or character mask, depending on the option selected, to
the XMM0 register.
There are a number of different schemes for encoding a set of graphemes, but the most common ones
use either an 8-bit code (ASCII) or a 16-bit code (unicode). The string compare instructions support
both character sizes.

11

AMD64 Technology

26568—Rev. 3.22—May 2018

Bit fields of the immediate operand control the following functions:
•
•

Source data format — character size (byte or word), signed or unsigned values
Comparison type

•
•

Intermediate result postprocessing
Output option selection

This overview description covers functions common to all of the string compare instructions and
describes some of the differentiated features of specific instructions. Information on instruction
encoding and exception behavior are covered in the individual instruction reference pages in the
following chapter.

12

26568—Rev. 3.22—May 2018

1.5.1

AMD64 Technology

Source Data Format

The character strings that constitute the source operands for the string compare instructions are
formatted as either 8-bit or 16-bit integer values packed into a 128-bit data type. The figure below
illustrates how a string of byte-wide characters is laid out in memory and how these characters are
arranged when loaded into an XMM register.
[null] (00)

112h

. (2Eh)

111h

g (67h)

110h

n (6Eh)

10Fh

i (69h)

10Eh

r (72h)

10Dh

t (74h)

10Ch

s (73h)

10Bh

[blank] (20h)

10Ah

t (74h)

109h

r (72h)

108h

o (6Fh)

107h

h (68h)

106h

s (73h)

105h

[blank] (20h)

104h

A (41h)

103h

Memory Image

128-bit String of
Byte-wide
Characters in
Memory (ASCII
Encoded)

Highest address

Lowest address
Defines address of string

XMM Register Image
7

6

5

4

3

2

1

0

[blank] (20h)

t (74h)

r (72h)

o (6Fh)

h (68h)

s (73h)

[blank] (20h)

A (41h)

15

14

13

12

11

10

9

8

[null] (00)

. (2Eh)

g (67h)

n (6Eh)

i (69h)

r (72h)

t (74h)

s (73h)

63

127

0

64

v4_String_layout.eps

Figure 1-3.

Byte-wide Character String – Memory and Register Image

Note from the figure that the longest string that can be packed in a 128-bit data object is either sixteen
8-bit characters (as illustrated) or eight 16-bit characters. When loaded from memory, the character
read from the lowest address in memory is placed in the least-significant position of the register and
the character read from the highest address is placed in the most-significant position. In other words,
for character i of width w, bits [w−1:0] of the character are placed in bits [iw + (w−1):iw] of the
register.

13

AMD64 Technology

26568—Rev. 3.22—May 2018

Bits [1:0] of the immediate byte operand specify the source string data format, as shown in Table 1-3.
Table 1-3.

Source Data Format

Imm8[1:0]

Character Format

Maximum String Length

00b

unsigned bytes

16

01b

unsigned words

8

10b

signed bytes

16

11b

signed words

8

The string compare instructions are defined with the capability of operating on strings of lengths from
0 to the maximum that can be packed into the 128-bit data type as shown in the table above. Because
strings being processed may be shorter than the maximum string length, a means is provided to
designate the length of each string. As mentioned above, one pair of string compare instructions relies
on an explicit method while the other utilizes an implicit method.
For the explicit method, the length of the first operand (the prototype string) is specified by the
absolute value of the signed integer contained in rAX and the length of the second operand (the SUT)
is specified by the absolute value of the signed integer contained in rDX. If a specified length is greater
than the maximum allowed, the maximum value is used. Using the explicit method of length
specification, null characters (characters whose numerical value is 0) can be included within a string.
Using the implicit method, a string shorter than the maximum length is terminated by a null character.
If no null character is found in the string, its length is implied to be the maximum. For the example
illustrated in Figure 1-3 above, the implicit length of the string is 15 because the final character is null.
However, using the explicit method, a specified length of 16 would include the null character in the
string.
In the following discussion, l1 is the length of the first operand string (the prototype string), l2 is the
length of the second operand string (the SUT) and m is the maximum string length based on the
selected character size.

1.5.2

Comparison Type

Although the string compare instructions can be implemented in many different ways, the instructions
are most easily understood as the sequential processing of the SUT using the characters of the
prototype string as a template. The template is applied at each character index of SUT, processing the
string from the first character (index 0) to the last character (index l2−1).
The result of each comparison is recorded in successive positions of a summary bit vector CmprSumm.
When the sequence of comparisons is complete, this bit vector summarizes the results of comparison
operations that were performed. The length of the CmprSumm bit vector is equal to the maximum
input operand string length (m). The rules for the setting of CmprSumm bits beyond the end of the SUT
(CmprSumm[m−1:l2]) are dependent on the comparison type (see Table 1-4 below.)
Bits [3:2] of the immediate byte operand determine the comparison type, as shown in Table 1-4.

14

26568—Rev. 3.22—May 2018

AMD64 Technology

Table 1-4.

Comparison Type

Imm8[3:2]

Comparison
Type

00b

Subset

Tests each character of the SUT to determine if it is within the subset of
characters specified by the prototype string. Each set bit of CmprSumm
indicates that the corresponding character of the SUT is within the subset
specified by the prototype. Bits [m−1:l2] are cleared.

01b

Ranges

Tests each character of the SUT to determine if it lies within one or more
ranges specified by pairs of values within the prototype string. The ranges
are inclusive. Each set bit in CmprSumm indicates that the corresponding
character of the SUT is within one or more of the inclusive ranges specified.
Bits [m−1:l2] are cleared. If the length of the prototype is odd, the last value
in the prototype is effectively ignored.

10b

Match

Performs a character-by-character comparison between the SUT and the
prototype string. Each set bit of CmprSumm indicates that the
corresponding characters in the two strings match. If not, the bit is cleared.
Bits [m−1:max(l1, l2)] of CmprSumm are set.

11b

Sub-string

Searches for an exact match between the prototype string and an ordered
sequence of characters (a sub-string) in the SUT beginning at the current
index i. Bit i of CmprSumm is set for each value of i where the sub-string
match is made, otherwise the bit is cleared. See discussion below.

Description

In the Sub-string comparison type, any matching sub-string of the SUT must match the prototype
string one-for-one, in order, and without gaps. Null characters in the SUT do not match non-null
characters in the prototype. If the prototype and the SUT are equal in length and less than the max
length, the two strings must be identical for the comparison to be TRUE. In this case, bit 0 of
CmprSumm is set to one and the remainder are all 0s. If the length of the SUT is less than the prototype
string, no match is possible and CmprSumm is all 0s.
If the prototype string is shorter than the SUT (l1 < l2), a sequential search of the SUT is performed.
For each i from 0 to l2−l1, the prototype is compared to characters [i + l1−1:i] of the SUT. If the
prototype and the sub-string SUT[i + l1−1:i] match exactly, then CmprSumm[i] is set, otherwise the bit
is cleared. When the comparison at i = l2−l1 is complete, no further testing is required because there
are not enough characters remaining in the SUT for a match to be possible. The remaining bits l2−l1+1
through m-1 are all set to 0.
For the Match comparison type, the character-by-character comparison is performed on all m
characters in the 128-bit operand data, which may extend beyond the end of one or both strings. A null
character at index i within one string is not considered a match when compared with a character
beyond the end of the other string. In this case, CmprSumm[i] is cleared. For index positions beyond
the end of both strings, CmprSumm[i] is set.
The following section provides more detail on the generation of the comparison summary bit vector
based on the specified comparison type.

15

AMD64 Technology

1.5.3

26568—Rev. 3.22—May 2018

Comparison Summary Bit Vector

The following pseudo code provides more detail on the generation of the comparison summary bit
vector CmprSumm. The function CompareStrgs defined below returns a bit vector of length m, the
maximum length of the operand data strings.
bit vector CompareStrgs(ProtoType, length1, SUT, length2, CmpType, signed, m)
doubleword vector StrUndTst
// temp vector; holds string under test
doubleword vector StrProto
// temp vector; holds prototype string
bit vector[m] Result
// length of vector is m
StrProto = m{0}
StrUndTst = m{0}
Result = m{0}

//initialize m elements of StrProto to 0
//initialize m elements of StrUndTst to 0
//initialize result bit vector

FOR i = 0 to length1
StrProto[i] = signed ? SignExtend(ProtoType[i]) : ZeroExtend(ProtoType[i])
FOR i = 0 to length2
StrUndTst[i] = signed ? SignExtend(SUT[i]) : ZeroExtend(SUT[i])
IF CmpType == Subset
FOR j = 0 to length2 - 1
// j indexes SUT
FOR i = 0 to length1 - 1
// i indexes prototype
Result[j] |= (StrProto[i] == StrUndTst[j])
IF CmpType == Ranges
FOR j = 0 to length2 - 1
// j indexes SUT
FOR i = 0 to length1 - 2, BY 2
// i indexes prototype
Result[j] |= (StrProto[i] <= StrUndTst[j])
&& (StrProto[i+1] >= StrUndTst[j])
IF CmpType == Match
FOR i = 0 to (min(length1, length2)-1)
Result[i] = (StrProto[i] == StrUndTst[i])
FOR i = min(length1, length2) to (max(length1, length2)-1)
Result[i] = 0
FOR i = max(length1, length2) to (m-1)
Result[i] = 1
IF CmpType == Sub-string
IF (length2==16)&& (length1==16)
maxlength=15
else
maxlength = length2-length1
IF length2 >= lenght1
FOR j = 0 to maxlength
// j indexes result bit vector
Result[j] = 1
k = j
// k scans the SUT
FOR i = 0 to length1 - 1
// i scans the Prototype
Result[j] &= (StrProto[i] == StrUndTst[k])// Result[j] is cleared if
any of the comparisons do not match
k++
Return Result

16

26568—Rev. 3.22—May 2018

AMD64 Technology

Given the above definition of CompareStrgs(), the following pseudo code computes the value of
CmprSumm:
ProtoType = contents of first source operand (xmm1)
SUT = contents of xmm2 or 128-bit value read from the specified memory location
length1 = length of first operand string
//specified implicitly or explicitly
length2 = length of second operand string
//specified implicitly or explicitly
m = Maximum String Length from Table 1-3 above
CmpType = Comparison Type from Table 1-4 above
signed = (imm8[1] == 1) ? TRUE : FALSE
bit vector [m] CmprSumm
// CmprSumm is m bits long
CmprSumm = CompareStrgs(ProtoType, length1, SUT, length2, CmpType, signed, m)

The following examples demonstrate the comparison summary bit vector CmprSumm for each
comparison type. For the sake of illustration, the operand strings are represented as ASCII-encoded
strings. Each character value is represented by its ASCII grapheme. Strings are displayed with the
lowest indexed character on the left as they would appear when printed or displayed. CmprSumm is
shown in reverse order with the least significant bit on the left to agree with the string presentation.

Comparison Type = Subset
Prototype: ZCx
SUT:
aCx%xbZreCx
CmprSumm: 0110101001100000

Comparison Type = Ranges
Prototype: ACax
SUT:
aCx%xbZreCx
CmprSumm: 1110110111100000

Comparison Type = Match
Prototype: ZCx
SUT:
aCx%xbZreCx
CmprSumm: 0110000000011111

Comparison Type = Sub-string
Prototype: ZCx
SUT:
aZCx%xCZreZCxCZ
CmprSumm: 0100000000100000

17

AMD64 Technology

1.5.4

26568—Rev. 3.22—May 2018

Intermediate Result Post-processing

Post-processing of the CmprSumm bit vector is controlled by imm8[5:4]. The result of this step is
designated pCmprSumm.
Bit [4] of the immediate operand determines whether a ones’ complement (bit-wise inversion) is
performed on CmprSumm; bit [5] of the immediate operand determines whether the inversion applies
to the entire comparison summary bit vector (CmprSumm) or just to those bits that correspond to
characters within the SUT. See Table 1-5 below for the encoding of the imm8[5:4] field.
Table 1-5. Post-processing Options
Imm8[5:4]

1.5.5

Post-processing Applied

x0b

pCmprSumm = CmprSumm

01b

pCmprSumm = NOT CmprSumm

11b

pCmprSumm[i] = !CmprSumm[i] for i < l2,
pCmprSumm[i] = CmprSumm[i], for l2 ≤ i < m

Output Option Selection

For PCMPESTRI and PCMPISTRI, imm8[6] determines whether the index of the lowest set bit or the
highest set bit of pCmprSumm is written to ECX, as shown in Table 1-6.
Table 1-6.
Imm8[6]

Indexed Output Option Selection
Description

0b

Return the index of the least significant set bit in pCmprSumm.

1b

Return the index of the most significant set bit in pCmprSumm.

For PCMPESTRM and PCMPISTRM, imm8[6] specifies whether the output from the instruction is a
bit mask or an expanded mask. The bit mask is a copy of pCmprSumm zero-extended to 128 bits. The
expanded mask is a packed vector of byte or word elements, as determined by the string operand
format (as indicated by imm8[0]). The expanded mask is generated by copying each bit of
pCmprSumm to all bits of the element of the same index. Table 1-7 below shows the encoding of
imm8[6].
Table 1-7.
Imm8[6]

Masked Output Option Selection
Description

0b

Return pCmprSumm as the output with zero extension to 128 bits.

1b

Return expanded pCmprSumm byte or word mask.

The PCMPESTRM and PCMPISTRM instructions return their output in register XMM0. For the
extended forms of the instructions, bits [127:64] of YMM0 are cleared.

18

26568—Rev. 3.22—May 2018

1.5.6

AMD64 Technology

Effect on Flags

The execution of a string compare instruction updates the state of the CF, PF, AF, ZF, SF, and OF flags
within the rFLAGs register. All other flags are unaffected. The PF and AF flags are always cleared.
The ZF and SF flags are set or cleared based on attributes of the source strings and the CF and OF flags
are set or cleared based on attributes of the summary bit vector after post processing.
The CF flag is cleared if the summary bit vector, after post processing, is zero; the flag is set if one or
more of the bits in the post-processed bit vector are 1. The OF flag is updated to match the value of the
least significant bit of the post-processed summary bit vector.
The ZF flag is set if the length of the second string operand (SUT) is shorter than m, the maximum
number of 8-bit or 16-bit characters that can be packed into 128 bits. Similarly, the SF flag is set if the
length of the first string operand (prototype) is shorter than m.
This information is summarized in Table 1-8 below.
Table 1-8.
Unconditional

State of Affected Flags After Execution
Source String Length

PF

AF

SF

ZF

0

0

(l1 < m)

(l2 < m)

Post-processed Bit Vector
CF

OF

pCmprSumm ≠ 0 pCmprSumm [0]

19

AMD64 Technology

20

26568—Rev. 3.22—May 2018

26568—Rev. 3.22—May 2018

2

AMD64 Technology

Instruction Reference

Instructions are listed by mnemonic, in alphabetic order. Each entry describes instruction function,
syntax, opcodes, affected flags and exceptions related to the instruction.
Figure 2-1 shows the conventions used in the descriptions. Items that do not pertain to a particular
instruction, such as a synopsis of the 256-bit form, may be omitted.

INST
VINST

Instruction
Mnemonic Expansion

Brief functional description
INST
Description of legacy version of instruction.
VINST
Description of extended version of instruction.
XMM Encoding

Description of 128-bit extended instruction.
YMM Encoding

Description of 256-bit extended instruction.
Information about CPUID functions related to the instruction set.
Synopsis diagrams for legacy and extended versions of the instruction.
Mnemonic
INST xmm1, xmm2/mem128

Opcode
FF FF /r

Mnemonic
VINST xmm1, xmm2/mem128, xmm3
V,167 ymm1, ymm2/mem256, ymm3

Description
Brief summary of legacy operation.
Encoding
VEX RXB.mmmmm
W.vvvv.L.pp
RXB.11
0.src.0.00
C4
C4
RXB.11
0.src.0.00

Opcode
FF /r
FF /r

Related Instructions
Instructions that perform similar or related functions.
rFLAGS Affected
Rflags diagram.
MXCSR Flags Affected
MXCSR diagram.
Exceptions
Exception summary table.

Figure 2-1. Typical Instruction Description

Instruction Reference

21

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Exceptions
Under various conditions instructions described below can cause exceptions. The conditions that
cause these exceptions can differ based on processor mode and instruction subset. This information is
summarized at the end of each instruction reference page in an Exception Table. Rows list the applicable exceptions and the different conditions that trigger each exception for the instruction. For each
processor mode (real, virtual, and protected) a symbol in the table indicates whether this exception
condition applies.
Each AVX instruction has a legacy form that comes from one of the legacy (SSE1, SSE2, ...) subsets.
An “X” at the intersection of a processor mode column and an exception cause row indicates that the
causing condition and potential exception applies to both the AVX instruction and the legacy SSE
instruction. “A” indicates that the causing condition applies only to the AVX instruction and “S” indicates that the condition applies to the SSE legacy instruction.
Note that XOP and FMA4 instructions do not have corresponding instructions from the SSE legacy
subsets. In the exception tables for these instructions, “X” represents the XOP instruction and “F”
represents the FMA4 instruction.

22

Instruction Reference

26568—Rev. 3.22—May 2018

ADDPD
VADDPD

AMD64 Technology

Add
Packed Double-Precision Floating-Point

Adds each packed double-precision floating-point value of the first source operand to the corresponding value of the second source operand and writes the result of each addition into the corresponding
quadword of the destination.
There are legacy and extended forms of the instruction:
ADDPD

Adds two pairs of values.
The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VADDPD

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Adds two pairs of values.
The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

Adds four pairs of values.
The first source operand is a YMM register and the second source operand is either a YMM register
or a 256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

ADDPD

SSE2

VADDPD

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
ADDPD xmm1, xmm2/mem128

Opcode
66 0F 58 /r

Description
Adds two packed double-precision floating-point
values in xmm1 to corresponding values in xmm2
or mem128. Writes results to xmm1.

Mnemonic

Encoding
VEX RXB.map_select

W.vvvv.L.pp

Opcode

VADDPD xmm1, xmm2, xmm3/mem128

C4

RXB.00001

X.src.0.01

58 /r

VADDPD ymm1, ymm2, ymm3/mem256

C4

RXB.00001

X.src.1.01

58 /r

Instruction Reference

ADDPD, VADDPD

23

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)ADDPS, (V)ADDSD, (V)ADDSS
rFLAGS Affected
None
MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM

OM

11

10

ZM
9

DM
8

IM
7

DAZ
6

PE

UE

OE

M

M

M

5

4

3

ZE
2

DE

IE

M

M

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S

X
X
X
S
X

S

S

S

S

A
X

S

X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
SIMD floating-point, #XF

S

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

24

S
S
S
S
S
S

S
S
S
S
S
S

X
X
X
X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

ADDPD, VADDPD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

ADDPS
VADDPS

Add
Packed Single-Precision Floating-Point

Adds each packed single-precision floating-point value of the first source operand to the corresponding value of the second source operand and writes the result of each addition into the corresponding
elements of the destination.
There are legacy and extended forms of the instruction:
ADDPS

Adds four pairs of values.
The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VADDPS

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Adds four pairs of values.
The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

Adds eight pairs of values.
The first source operand is a YMM register and the second source operand is either a YMM register
or a 256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

ADDPS

SSE2

VADDPS

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

Description

ADDPS xmm1, xmm2/mem128

0F 58 /r

Adds four packed single-precision floating-point values in
xmm1 to corresponding values in xmm2 or mem128. Writes
results to xmm1.

Mnemonic

Encoding
VEX RXB.map_select W.vvvv.L.pp Opcode

VADDPS xmm1, xmm2, xmm3/mem128

C4

RXB.00001

X.src.0.00

58 /r

VADDPS ymm1, ymm2, ymm3/mem256

C4

RXB.00001

X.src.1.00

58 /r

Instruction Reference

ADDPS, VADDPS

25

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)ADDPD, (V)ADDSD, (V)ADDSS
rFLAGS Affected
None
MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM

OM

11

10

ZM
9

DM
8

IM
7

DAZ
6

PE

UE

OE

M

M

M

5

4

3

ZE
2

DE

IE

M

M

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S

X
X
X
S
X

S

S

S

S

A
X

S

X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
SIMD floating-point, #XF

S

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

26

S
S
S
S
S
S

S
S
S
S
S
S

X
X
X
X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

ADDPS, VADDPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

ADDSD
VADDSD

Add
Scalar Double-Precision Floating-Point

Adds the double-precision floating-point value in the low-order quadword of the first source operand
to the corresponding value in the low-order quadword of the second source operand and writes the
result into the low-order quadword of the destination.
There are legacy and extended forms of the instruction:
ADDSD

The first source operand is an XMM register and the second source operand is either an XMM register or a 64-bit memory location. The first source register is also the destination register. Bits [127:64]
of the destination and bits [255:128] of the corresponding YMM register are not affected.
VADDSD

The extended form of the instruction has a 128-bit encoding only.
The first source operand is an XMM register and the second source operand is either an XMM register or a 64-bit memory location. The destination is a third XMM register. Bits [127:64] of the first
source operand are copied to bits [127:64] of the destination. Bits [255:128] of the YMM register that
corresponds to the destination are cleared.
Instruction Support
Form

Subset

ADDSD

SSE2

VADDSD

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
ADDSD xmm1, xmm2/mem64

Opcode
F2 0F 58 /r

Description
Adds low-order double-precision floating-point values in
xmm1 to corresponding values in xmm2 or mem64.
Writes results to xmm1.

Mnemonic
VADDSD xmm1, xmm2, xmm3/mem64

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.00001

X.src.X.11

58 /r

Related Instructions
(V)ADDPD, (V)ADDPS, (V)ADDSS
rFLAGS Affected
None

Instruction Reference

ADDSD, VADDSD

27

AMD64 Technology

26568—Rev. 3.22—May 2018

MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM

OM

11

10

ZM
9

DM
8

IM
7

DAZ
6

PE

UE

OE

M

M

M

5

4

3

ZE
2

DE

IE

M

M

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S

S
S
S
S
S

X
X
X
X
X
X

S

S

X

S
S
S
S
S
S

S
S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

28

X
X
X
X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

ADDSD, VADDSD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

ADDSS
VADDSS

Add
Scalar Single-Precision Floating-Point

Adds the single-precision floating-point value in the low-order doubleword of the first source operand to the corresponding value in the low-order doubleword of the second source operand and writes
the result into the low-order doubleword of the destination.
There are legacy and extended forms of the instruction:
ADDSS

The first source operand is an XMM register and the second source operand is either an XMM register or a 32-bit memory location. The first source register is also the destination. Bits [127:32] of the
destination register and bits [255:128] of the corresponding YMM register are not affected.
VADDSS

The extended form of the instruction has a 128-bit encoding only.
The first source operand is an XMM register and the second source operand is either an XMM register or a 32-bit memory location. The destination is a third XMM register. Bits [127:32] of the first
source register are copied to bits [127:32] of the of the destination. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

Feature Flag

ADDSS

SSE1

CPUID Fn0000_0001_EDX[SSE] (bit 25)

VADDSS

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
ADDSS xmm1, xmm2/mem32

Opcode

Description

F3 0F 58 /r

Adds a single-precision floating-point value in the low-order
doubleword of xmm1 to a corresponding value in xmm2 or
mem32. Writes results to xmm1.

Mnemonic
VADDSS xmm1, xmm2, xmm3/mem32

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.00001

X.src.X.10

58 /r

Related Instructions
(V)ADDPD, (V)ADDPS, (V)ADDSD
rFLAGS Affected
None

Instruction Reference

ADDSS, VADDSS

29

AMD64 Technology

26568—Rev. 3.22—May 2018

MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM

OM

11

10

ZM
9

DM
8

IM
7

DAZ
6

PE

UE

OE

M

M

M

5

4

3

ZE
2

DE

IE

M

M

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S

S
S
S
S
S

X
X
X
X
X
X

S

S

X

S
S
S
S
S
S

S
S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

30

X
X
X
X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

ADDSS, VADDSS

Instruction Reference

26568—Rev. 3.22—May 2018

ADDSUBPD
VADDSUBPD

AMD64 Technology

Alternating Addition and Subtraction
Packed Double-Precision Floating-Point

Adds the odd-numbered packed double-precision floating-point values of the first source operand to
the corresponding values of the second source operand and writes the sum to the corresponding oddnumbered element of the destination; subtracts the even-numbered packed double-precision floatingpoint values of the second source operand from the corresponding values of the first source operand
and writes the differences to the corresponding even-numbered element of the destination.
There are legacy and extended forms of the instruction:
ADDSUBPD

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VADDSUBPD

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register and the second source operand is either a YMM register
or a 256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

ADDSUBPD

SSE2

VADDSUBPD

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
ADDSUBPD xmm1, xmm2/mem128

Opcode

Description

66 0F D0 /r

Adds a value in the upper 64 bits of xmm1 to the
corresponding value in xmm2 and writes the result to
the upper 64 bits of xmm1; subtracts the value in the
lower 64 bits of xmm1 from the corresponding value
in xmm2 and writes the result to the lower 64 bits of
xmm1.

Mnemonic

Encoding
VEX RXB.map_select

W.vvvv.L.pp

Opcode

VADDSUBPD xmm1, xmm2, xmm3/mem128

C4

RXB.00001

X.src.0.01

D0 /r

VADDSUBPD ymm1, ymm2, ymm3/mem256

C4

RXB.00001

X.src.1.01

D0 /r

Instruction Reference

ADDSUBPD, VADDSUBPD

31

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)ADDSUBPS
rFLAGS Affected
None
MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM

OM

11

10

ZM
9

DM
8

IM
7

DAZ
6

PE

UE

OE

M

M

M

5

4

3

ZE
2

DE

IE

M

M

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S

X
X
X
S
X

S

S

S

S

A
X

S

X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
SIMD floating-point, #XF

S

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

32

S
S
S
S
S
S

S
S
S
S
S
S

X
X
X
X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

ADDSUBPD, VADDSUBPD

Instruction Reference

26568—Rev. 3.22—May 2018

ADDSUBPS
VADDSUBPS

AMD64 Technology

Alternating Addition and Subtraction
Packed Single-Precision Floating Point

Adds the second and fourth single-precision floating-point values of the first source operand to the
corresponding values of the second source operand and writes the sums to the second and fourth elements of the destination. Subtracts the first and third single-precision floating-point values of the second source operand from the corresponding values of the first source operand and writes the
differences to the first and third elements of the destination.
There are legacy and extended forms of the instruction:
ADDSUBPS

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VADDSUBPS

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register and the second source operand is either a YMM register
or a 256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

ADDSUBPS

SSE1

CPUID Fn0000_0001_EDX[SSE] (bit 25)

VADDSUBPS

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
ADDSUBPS xmm1, xmm2/mem128

Opcode

Description

F2 0F D0 /r

Adds the second and fourth packed single-precision
values in xmm2 or mem128 to the corresponding
values in xmm1 and writes results to the
corresponding positions of xmm1. Subtracts the first
and third packed single-precision values in xmm2 or
mem128 from the corresponding values in xmm1 and
writes results to the corresponding positions of xmm1.

Mnemonic

Encoding
VEX RXB.map_select

W.vvvv.L.pp

Opcode

VADDSUBPS xmm1, xmm2, xmm3/mem128

C4

RXB.00001

X.src.0.11

D0 /r

VADDSUBPS ymm1, ymm2, ymm3/mem256

C4

RXB.00001

X.src.1.11

D0 /r

Instruction Reference

ADDSUBPS, VADDSUBPS

33

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)ADDSUBPD
rFLAGS Affected
None
MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM

OM

11

10

ZM
9

DM
8

IM
7

DAZ
6

PE

UE

OE

M

M

M

5

4

3

ZE
2

DE

IE

M

M

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S

X
X
X
S
X

S

S

S

S

A
X

S

X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
SIMD floating-point, #XF

S

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

34

S
S
S
S
S
S

S
S
S
S
S
S

X
X
X
X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

ADDSUBPS, VADDSUBPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

AESDEC
VAESDEC

AES
Decryption Round

Performs a single round of AES decryption. Transforms a state value specified by the first source
operand using a round key value specified by the second source operand, and writes the result to the
destination.
See Appendix A on page 973 for more information about the operation of the AES instructions.
Decryption consists of 1, …, Nr – 1 iterations of sequences of operations called rounds, terminated by
a unique final round, Nr. The AESDEC and VAESDEC instructions perform all the rounds except the
last; the AESDECLAST and VAESDECLAST instructions perform the final round.
The 128-bit state and round key vectors are interpreted as 16-byte column-major entries in a 4-by-4
matrix of bytes.The transformed state is written to the destination in column-major order. For both
instructions, the destination register is the same as the first source register.
There are legacy and extended forms of the instruction:
AESDEC
The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VAESDEC
The extended form of the instruction has a 128-bit encoding only.
The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

Feature Flag

AESDEC

AES

CPUID Fn0000_0001_ECX[AES] (bit 25)

VAESDEC

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
AESDEC xmm1, xmm2/mem128

Opcode

Description

66 0F 38 DE /r Performs one decryption round on a state value
in xmm1 using the key value in xmm2 or
mem128. Writes results to xmm1.

Mnemonic

Encoding
VEX RXB.map_select W.vvvv.L.pp Opcode

VAESDEC xmm1, xmm2, xmm3/mem128

C4

RXB.00010

X.src.0.01

DE /r

Related Instructions
(V)AESENC, (V)AESENCLAST, (V)AESIMC, (V)AESKEYGENASSIST

Instruction Reference

AESDEC, VAESDEC

35

AMD64 Technology

26568—Rev. 3.22—May 2018

rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

36

X
A
S
S

X
A
S
S

X

S
S
S
S
S

S
S
S
S
S

S

S

S

S

A
X

S
S
A
A
A
X
X
X
X
S
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Memory operand not 16-byte aligned and MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

AESDEC, VAESDEC

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

AESDECLAST
VAESDECLAST

AES
Last Decryption Round

Performs the final round of AES decryption. Completes transformation of a state value specified by
the first source operand using a round key value specified by the second source operand, and writes
the result to the destination.
See Appendix A on page 973 for more information about the operation of the AES instructions.
Decryption consists of 1, …, Nr – 1 iterations of sequences of operations called rounds, terminated by
a unique final round, Nr.The AESDEC and VAESDEC instructions perform all the rounds before the
final round; the AESDECLAST and VAESDECLAST instructions perform the final round.
The 128-bit state and round key vectors are interpreted as 16-byte column-major entries in a 4-by-4
matrix of bytes.The transformed state is written to the destination in column-major order. For both
instructions, the destination register is the same as the first source register.
There are legacy and extended forms of the instruction:
AESDECLAST

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VAESDECLAST

The extended form of the instruction has a 128-bit encoding only.
The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

Feature Flag

AESDECLAST

AES

CPUID Fn0000_0001_ECX[AES] (bit 25)

VAESDECLAST

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

AESDECLAST xmm1, xmm2/mem128

66 0F 38 DF/r

Description
Performs the last decryption round on a state
value in xmm1 using the key value in xmm2 or
mem128. Writes results to xmm1.

Mnemonic

Encoding
VEX RXB.map_select W.vvvv.L.pp Opcode

VAESDECLAST xmm1, xmm2, xmm3/mem128

C4

RXB.00010

X.src.0.01

DF /r

Related Instructions
(V)AESENC, (V)AESENCLAST, (V)AESIMC, (V)AESKEYGENASSIST

Instruction Reference

AESDECLAST, VAESDECLAST

37

AMD64 Technology

26568—Rev. 3.22—May 2018

rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

38

X
A
S
S

X
A
S
S

X

S
S
S
S
S

S
S
S
S
S

S

S

S

S

A
X

S
S
A
A
A
X
X
X
X
S
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Memory operand not 16-byte aligned and MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

AESDECLAST, VAESDECLAST

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

AESENC
VAESENC

AES
Encryption Round

Performs a single round of AES encryption. Transforms a state value specified by the first source
operand using a round key value specified by the second source operand, and writes the result to the
destination.
See Appendix A on page 973 for more information about the operation of the AES instructions.
Encryption consists of 1, …, Nr – 1 iterations of sequences of operations called rounds, terminated by
a unique final round, Nr. The AESENC and VAESENC instructions perform all the rounds before the
final round; the AESENCLAST and VAESENCLAST instructions perform the final round.
The 128-bit state and round key vectors are interpreted as 16-byte column-major entries in a 4-by-4
matrix of bytes.The transformed state is written to the destination in column-major order. For both
instructions, the destination register is the same as the first source register
There are legacy and extended forms of the instruction:
AESENC

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VAESENC

The extended form of the instruction has a 128-bit encoding only.
The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

Feature Flag

AESENC

AES

CPUID Fn0000_0001_ECX[AES] (bit 25)

VAESENC

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
AESENC xmm1, xmm2/mem128

Opcode

Description

66 0F 38 DC /r Performs one encryption round on a state value
in xmm1 using the key value in xmm2 or
mem128. Writes results to xmm1.

Mnemonic

Encoding
VEX RXB.map_select W.vvvv.L.pp Opcode

VAESENC xmm1, xmm2, xmm3/mem128

C4

RXB.00010

X.src.0.01

DC /r

Related Instructions
(V)AESDEC, (V)AESDECLAST, (V)AESIMC, (V)AESKEYGENASSIST

Instruction Reference

AESENC, VAESENC

39

AMD64 Technology

26568—Rev. 3.22—May 2018

rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

40

X
A
S
S

X
A
S
S

X

S
S
S
S
S

S
S
S
S
S

S

S

S

S

A
X

S
S
A
A
A
X
X
X
X
S
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Memory operand not 16-byte aligned and MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

AESENC, VAESENC

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

AESENCLAST
VAESENCLAST

AES
Last Encryption Round

Performs the final round of AES encryption. Completes transformation of a state value specified by
the first source operand using a round key value specified by the second source operand, and writes
the result to the destination.
See Appendix A on page 973 for more information about the operation of the AES instructions.
Encryption consists of 1, …, Nr – 1 iterations of sequences of operations called rounds, terminated by
a unique final round, Nr. The AESENC and VAESENC instructions perform all the rounds before the
final round; the AESENCLAST and VAESENCLAST instructions perform the final round.
The 128-bit state and round key vectors are interpreted as 16-byte column-major entries in a 4-by-4
matrix of bytes.The transformed state is written to the destination in column-major order. For both
instructions, the destination register is the same as the first source register.
There are legacy and extended forms of the instruction:
AESENCLAST

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VAESENCLAST

The extended form of the instruction has a 128-bit encoding only.
The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

Feature Flag

AESENCLAST

AES

CPUID Fn0000_0001_ECX[AES] (bit 25)

VAESENCLAST

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

AESENCLAST xmm1, xmm2/mem128

Description

66 0F 38 DD /r Performs the last encryption round on a
state value in xmm1 using the key value in xmm2
or mem128. Writes results to xmm1.

Mnemonic

Encoding
VEX RXB.map_select W.vvvv.L.pp Opcode

VAESENCLAST xmm1, xmm2, xmm3/mem128

C4

RXB.00010

X.src.0.01

DD /r

Related Instructions
(V)AESDEC, (V)AESDECLAST, (V)AESIMC, (V)AESKEYGENASSIST

Instruction Reference

AESENCLAST, VAESENCLAST

41

AMD64 Technology

26568—Rev. 3.22—May 2018

rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

42

X
A
S
S

X
A
S
S

X

S
S
S
S
S

S
S
S
S
S

S

S

S

S

A
X

S
S
A
A
A
X
X
X
X
S
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Memory operand not 16-byte aligned and MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

AESENCLAST, VAESENCLAST

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

AESIMC
VAESIMC

AES
InvMixColumn Transformation

Applies the AES InvMixColumns( ) transformation to expanded round keys in preparation for decryption. Transforms an expanded key specified by the second source operand and writes the result to a
destination register.
See Appendix A on page 973 for more information about the operation of the AES instructions.
The 128-bit round key vector is interpreted as 16-byte column-major entries in a 4-by-4 matrix of
bytes.The transformed result is written to the destination in column-major order.
AESIMC and VAESIMC are not used to transform the first and last round key in a decryption
sequence.
There are legacy and extended forms of the instruction:
AESIMC

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VAESIMC

The extended form of the instruction has a 128-bit encoding only.
The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

Feature Flag

AESIMC

AES

CPUID Fn0000_0001_ECX[AES] (bit 25)

VAESIMC

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
AESIMC xmm1, xmm2/mem128

Opcode

Description

66 0F 38 DB /r Performs AES InvMixColumn transformation on
a round key in the xmm2 or mem128 and stores
the result in xmm1.

Mnemonic

Encoding
VEX RXB.map_select

VAESIMC xmm1, xmm2/mem128

C4

RXB.00010

W.vvvv.L.pp

Opcode

X.src.0.01

DB /r

Related Instructions
(V)AESDEC, (V)AESDECLAST, (V)AESENC, (V)AESENCLAST, (V)AESKEYGENASSIST
rFLAGS Affected
None
Instruction Reference

AESIMC, VAESIMC

43

AMD64 Technology

26568—Rev. 3.22—May 2018

MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

44

X
A
S
S

X
A
S
S

X

S
S
S
S
S

S
S
S
S
S

S

S

S

S

A
X

S
S
A
A
A
X
X
X
X
S
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Memory operand not 16-byte aligned and MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

AESIMC, VAESIMC

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

AESKEYGENASSIST
VAESKEYGENASSIST

AES
Assist Round Key Generation

Expands a round key for encryption. Transforms a 128-bit round key operand using an 8-bit round
constant and writes the result to a destination register.
See Appendix A on page 973 for more information about the operation of the AES instructions.
The round key is provided by the second source operand and the round constant is specified by an
immediate operand. The 128-bit round key vector is interpreted as 16-byte column-major entries in a
4-by-4 matrix of bytes. The transformed result is written to the destination in column-major order.
There are legacy and extended forms of the instruction:
AESKEYGENASSIST

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VAESKEYGENASSIST

The extended form of the instruction has a 128-bit encoding only.
The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

Feature Flag

AESKEYGENASSIST

AES

CPUID Fn0000_0001_ECX[AES] (bit 25)

VAESKEYGENASSIST

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

AESKEYGENASSIST xmm1, xmm2/mem128, imm8

Description

66 0F 3A DF /r ib Expands a round key in xmm2 or
mem128 using an immediate
round constant. Writes the result
to xmm1.

Mnemonic

Encoding
VEX RXB.map_select W.vvvv.L.pp Opcode

AESKEYGENASSIST xmm1, xmm2 /mem128, imm8

C4

RXB.00011

X.src.0.01

DF /r ib

Related Instructions
(V)AESDEC, (V)AESDECLAST, (V)AESENC, (V)AESENCLAST,(V)AESIMC
rFLAGS Affected
None

Instruction Reference

AESKEYGENASSIST, VAESKEYGENASSIST

45

AMD64 Technology

26568—Rev. 3.22—May 2018

MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

46

X
A
S
S

X
A
S
S

X

S
S
S
S
S

S
S
S
S
S

S

S

S

S

A
X

S
S
A
A
A
X
X
X
X
S
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Memory operand not 16-byte aligned and MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

AESKEYGENASSIST, VAESKEYGENASSIST

Instruction Reference

26568—Rev. 3.22—May 2018

ANDNPD
VANDNPD

AMD64 Technology

AND NOT
Packed Double-Precision Floating-Point

Performs a bitwise AND of two packed double-precision floating-point values in the second source
operand with the ones’-complement of the two corresponding packed double-precision floating-point
values in the first source operand and writes the result into the destination.
There are legacy and extended forms of the instruction:
ANDNPD

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VANDNPD

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register and the second source operand is either a YMM register
or a 256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

ANDNPD

SSE2

VANDNPD

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
ANDNPD xmm1, xmm2/mem128

Opcode

Description

66 0F 55 /r

Performs bitwise AND of two packed double-precision
floating-point values in xmm2 or mem128 with the ones’complement of two packed double-precision floatingpoint values in xmm1. Writes the result to xmm1.

Mnemonic

Encoding
VEX RXB.map_select

W.vvvv.L.pp

Opcode

VANDNPD xmm1, xmm2, xmm3/mem128

C4

RXB.00001

X.src.0.01

55 /r

VANDNPD ymm1, ymm2, ymm3/mem256

C4

RXB.00001

X.src.1.01

55 /r

Related Instructions
(V)ANDNPS, (V)ANDPD, (V)ANDPS, (V)ORPD, (V)ORPS, (V)XORPD, (V)XORPS

Instruction Reference

ANDNPD, VANDNPD

47

AMD64 Technology

26568—Rev. 3.22—May 2018

rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

48

X
A
S
S

X
A
S
S

X

S
S
S
S
S

S
S
S
S
S

S

S

S

S

A
X

S
S
A
A
A
X
X
X
X
S
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Memory operand not 16-byte aligned and MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

ANDNPD, VANDNPD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

ANDNPS
VANDNPS

AND NOT
Packed Single-Precision Floating-Point

Performs a bitwise AND of four packed single-precision floating-point values in the second source
operand with the ones’-complement of the four corresponding packed single-precision floating-point
values in the first source operand, and writes the result in the destination.
There are legacy and extended forms of the instruction:
ANDNPS

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VANDNPS

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register and the second source operand is either a YMM register
or a 256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

ANDNPS

SSE1

CPUID Fn0000_0001_EDX[SSE] (bit 25)

VANDNPS

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

ANDNPS xmm1, xmm2/mem128

0F 55 /r

Description
Performs bitwise AND of four packed single-precision
floating-point values in xmm2 or mem128 with the ones’complement of four packed single-precision floating-point
values in xmm1. Writes the result to xmm1.

Mnemonic

Encoding
VEX RXB.map_select

W.vvvv.L.pp

Opcode

VANDNPS xmm1, xmm2, xmm3/mem128

C4

RXB.00001

X.src.0.00

55 /r

VANDNPS ymm1, ymm2, ymm3/mem256

C4

RXB.00001

X.src.1.00

55 /r

Related Instructions
(V)ANDNPD, (V)ANDPD, (V)ANDPS, (V)ORPD, (V)ORPS, (V)XORPD, (V)XORPS

Instruction Reference

ANDNPS, VANDNPS

49

AMD64 Technology

26568—Rev. 3.22—May 2018

rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

50

X
A
S
S

X
A
S
S

X

S
S
S
S
S

S
S
S
S
S

S

S

S

S

A
X

S
S
A
A
A
X
X
X
X
S
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Memory operand not 16-byte aligned and MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

ANDNPS, VANDNPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

ANDPD
VANDPD

AND
Packed Double-Precision Floating-Point

Performs bitwise AND of two packed double-precision floating-point values in the first source operand with the corresponding two packed double-precision floating-point values in the second source
operand and writes the results into the corresponding elements of the destination.
There are legacy and extended forms of the instruction:
ANDPD

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VANDPD

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register and the second source operand is either a YMM register
or a 256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

ANDPD

SSE2

VANDPD

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
ANDPD xmm1, xmm2/mem128

Opcode

Description

66 0F 54 /r

Performs bitwise AND of two packed double-precision
floating-point values in xmm1 with corresponding values in
xmm2 or mem128. Writes the result to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VANDPD xmm1, xmm2, xmm3/mem128

C4

RXB.00001

X.src.0.01

54 /r

VANDPD ymm1, ymm2, ymm3/mem256

C4

RXB.00001

X.src.1.01

54 /r

Related Instructions
(V)ANDNPD, (V)ANDNPS, (V)ANDPS, (V)ORPD, (V)ORPS, (V)XORPD, (V)XORPS

Instruction Reference

ANDPD, VANDPD

51

AMD64 Technology

26568—Rev. 3.22—May 2018

rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

52

X
A
S
S

X
A
S
S

X

S
S
S
S
S

S
S
S
S
S

S

S

S

S

A
X

S
S
A
A
A
X
X
X
X
S
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Memory operand not 16-byte aligned and MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

ANDPD, VANDPD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

ANDPS
VANDPS

AND
Packed Single-Precision Floating-Point

Performs bitwise AND of the four packed single-precision floating-point values in the first source
operand with the corresponding four packed single-precision floating-point values in the second
source operand, and writes the result into the corresponding elements of the destination.
There are legacy and extended forms of the instruction:
ANDPS

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VANDPS

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register and the second source operand is either a YMM register
or a 256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

ANDPS

SSE1

CPUID Fn0000_0001_EDX[SSE] (bit 25)

VANDPS

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

Description

ANDPS xmm1, xmm2/mem128

0F 54 /r

Performs bitwise AND of four packed single-precision floatingpoint values in xmm1 with corresponding values in xmm2 or
mem128. Writes the result to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VANDPS xmm1, xmm2, xmm3/mem128

C4

RXB.00001

X.src.0.00

54 /r

VANDPS ymm1, ymm2, ymm3/mem256

C4

RXB.00001

X.src.1.00

54 /r

Related Instructions
(V)ANDNPD, (V)ANDNPS, (V)ANDPD, (V)ORPD, (V)ORPS, (V)XORPD, (V)XORPS

Instruction Reference

ANDPS, VANDPS

53

AMD64 Technology

26568—Rev. 3.22—May 2018

rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

54

X
A
S
S

X
A
S
S

X

S
S
S
S
S

S
S
S
S
S

S

S

S

S

A
X

S
S
A
A
A
X
X
X
X
S
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Memory operand not 16-byte aligned and MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

ANDPS, VANDPS

Instruction Reference

26568—Rev. 3.22—May 2018

BLENDPD
VBLENDPD

AMD64 Technology

Blend
Packed Double-Precision Floating-Point

Copies packed double-precision floating-point values from either of two sources to a destination, as
specified by an 8-bit mask operand.
Each mask bit specifies a 64-bit element in a source location and a corresponding 64-bit element in
the destination register. When a mask bit = 0, the specified element of the first source is copied to the
corresponding position in the destination register. When a mask bit = 1, the specified element of the
second source is copied to the corresponding position in the destination register.
There are legacy and extended forms of the instruction:
BLENDPD

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected. Only mask bits [1:0] are used.
VBLENDPD

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Only mask bits [1:0] are used.
YMM Encoding

The first source operand is a YMM register and the second source operand is either a YMM register
or a 256-bit memory location. The destination is a third YMM register. Only mask bits [3:0] are used.
Instruction Support
Form

Subset

BLENDPD

SSE4.1

VBLENDPD

AVX

Feature Flag
CPUID Fn0000_0001_ECX[SSE41] (bit 19)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

BLENDPD xmm1, xmm2/mem128, imm8

Description

66 0F 3A 0D /r ib

Mnemonic

Copies values from xmm1 or
xmm2/mem128 to xmm1, as
specified by imm8.
Encoding

VEX RXB.map_select

W.vvvv.L.pp

Opcode

VBLENDPD xmm1, xmm2, xmm3/mem128, imm8

C4

RXB.00011

X.src.0.01

0D /r ib

VBLENDPD ymm1, ymm2, ymm3/mem256, imm8

C4

RXB.00011

X.src.1.01

0D /r ib

Instruction Reference

BLENDPD, VBLENDPD

55

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)BLENDPS, (B)BLENDVPD, (V)BLENDVPS
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

56

X
A
S
S

X
A
S
S

X

S
S
S
S
S

S
S
S
S
S

S

S

S

S

A
X

S
S
A
A
A
X
X
X
X
S
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Memory operand not 16-byte aligned and MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

BLENDPD, VBLENDPD

Instruction Reference

26568—Rev. 3.22—May 2018

BLENDPS
VBLENDPS

AMD64 Technology

Blend
Packed Single-Precision Floating-Point

Copies packed single-precision floating-point values from either of two sources to a destination, as
specified by an 8-bit mask operand.
Each mask bit specifies a 32-bit element in a source location and a corresponding 32-bit element in
the destination register. When a mask bit = 0, the specified element of the first source is copied to the
corresponding position in the destination register. When a mask bit = 1, the specified element of the
second source is copied to the corresponding position in the destination register.
There are legacy and extended forms of the instruction:
BLENDPS

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected. Only mask bits [3:0] are used.
VBLENDPS

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.Only mask bits [3:0] are used.
YMM Encoding

The first operand is a YMM register and the second operand is either a YMM register or a 256-bit
memory location. The destination is a third YMM register. All 8 bits of the mask are used.
Instruction Support
Form

Subset

BLENDPS

SSE4.1

VBLENDPS

AVX

Feature Flag
CPUID Fn0000_0001_ECX[SSE41] (bit 19)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

BLENDPS xmm1, xmm2/mem128, imm8

Description

66 0F 3A 0C /r ib

Mnemonic

Copies values from xmm1 or
xmm2/mem128 to xmm1, as
specified by imm8.
Encoding

VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VBLENDPS xmm1, xmm2, xmm3/mem128, imm8

C4

RXB.00011

X.src.0.01

0C /r ib

VBLENDPS ymm1, ymm2, ymm3/mem256, imm8

C4

RXB.00011

X.src.1.01

0C /r ib

Instruction Reference

BLENDPS, VBLENDPS

57

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)BLENDPD, (V)BLENDVPD, (V)BLENDVPS
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

58

X
A
S
S

X
A
S
S

X

S
S
S
S
S

S
S
S
S
S

S

S

S

S

A
X

S
S
A
A
A
X
X
X
X
S
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Memory operand not 16-byte aligned and MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

BLENDPS, VBLENDPS

Instruction Reference

26568—Rev. 3.22—May 2018

BLENDVPD
VBLENDVPD

AMD64 Technology

Variable Blend
Packed Double-Precision Floating-Point

Copies packed double-precision floating-point values from either of two sources to a destination, as
specified by a mask operand.
Each mask bit specifies a 64-bit element of a source location and a corresponding 64-bit element of
the destination. The position of a mask bit corresponds to the position of the most significant bit of a
copied value. When a mask bit = 0, the specified element of the first source is copied to the corresponding position in the destination. When a mask bit = 1, the specified element of the second source
is copied to the corresponding position in the destination.
There are legacy and extended forms of the instruction:
BLENDVPD

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected. The mask is defined by bits 127
and 63 of the implicit register XMM0.
VBLENDVPD

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. The mask is defined by bits 127 and 63 of a fourth
XMM register.
YMM Encoding

The first operand is a YMM register and the second operand is either a YMM register or a 256-bit
memory location. The destination is a third YMM register. The mask is defined by bits 255, 191, 127,
and 63 of a fourth YMM register.
Instruction Support
Form

Subset

BLENDVPD

SSE4.1

VBLENDVPD

AVX

Feature Flag
CPUID Fn0000_0001_ECX[SSE41] (bit 19)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

Instruction Reference

BLENDVPD, VBLENDVPD

59

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Encoding
Mnemonic

Opcode

BLENDVPD xmm1, xmm2/mem128

Description

66 0F 38 15 /r Copies values from xmm1 or xmm2/mem128 to
xmm1, as specified by the MSB of corresponding
elements of xmm0.

Mnemonic

Encoding
VEX RXB.map_select

W.vvvv.L.pp

Opcode

VBLENDVPD xmm1, xmm2, xmm3/mem128, xmm4

C4

RXB.00011

X.src.0.01

4B /r

VBLENDVPD ymm1, ymm2, ymm3/mem256, ymm4

C4

RXB.00011

X.src.1.01

4B /r

Related Instructions
(V)BLENDPD, (V)BLENDPS, (V)BLENDVPS
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

S

S

A
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

60

X
S
S
A
A
A
A
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

BLENDVPD, VBLENDVPD

Instruction Reference

26568—Rev. 3.22—May 2018

BLENDVPS
VBLENDVPS

AMD64 Technology

Variable Blend
Packed Single-Precision Floating-Point

Copies packed single-precision floating-point values from either of two sources to a destination, as
specified by a mask operand.
Each mask bit specifies a 32-bit element of a source location and a corresponding 32-bit element of
the destination register. The position of a mask bits corresponds to the position of the most significant
bit of a copied value. When a mask bit = 0, the specified element of the first source is copied to the
corresponding position in the destination. When a mask bit = 1, the specified element of the second
source is copied to the corresponding position in the destination.
There are legacy and extended forms of the instruction:
BLENDVPS

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected. The mask is defined by bits 127,
95, 63, and 31 of the implicit register XMM0.
VBLENDVPS

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. The mask is defined by bits 127, 95, 63, and 31 of
a fourth XMM register.
YMM Encoding

The first operand is a YMM register and the second operand is either a YMM register or a 256-bit
memory location. The destination is a third YMM register. The mask is defined by bits 255, 223, 191,
159, 127, 95, 63, and 31 of a fourth YMM register.
Instruction Support
Form

Subset

BLENDVPS

SSE4.1

VBLENDVPS

AVX

Feature Flag
CPUID Fn0000_0001_ECX[SSE41] (bit 19)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

Instruction Reference

BLENDVPS, VBLENDVPS

61

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Encoding
Mnemonic
BLENDVPS xmm1, xmm2/mem128

Opcode

Description

66 0F 38 14 /r

Copies packed single-precision
floating-point values from xmm1 or
xmm2/mem128 to xmm1, as
specified by bits in xmm0.

Mnemonic

Encoding
VEX RXB.map_select W.vvvv.L.pp

Opcode

VBLENDVPS xmm1, xmm2, xmm3/mem128, xmm4

C4

RXB.00011

X.src.0.01

4A /r

VBLENDVPS ymm1, ymm2, ymm3/mem256, ymm4

C4

RXB.00011

X.src.1.01

4A /r

Related Instructions
(V)BLENDPD, (V)BLENDPS, (V)BLENDVPD
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

S

S

A
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

62

X
S
S
A
A
A
A
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

BLENDVPS, VBLENDVPS

Instruction Reference

26568—Rev. 3.22—May 2018

CMPPD
VCMPPD

AMD64 Technology

Compare
Packed Double-Precision Floating-Point

Compares each of the two packed double-precision floating-point values of the first source operand to
the corresponding values of the second source operand and writes the result of each comparison to the
corresponding 64-bit element of the destination. When a comparison is TRUE, all 64 bits of the destination element are set; when a comparison is FALSE, all 64 bits of the destination element are
cleared. The type of comparison is specified by an immediate byte operand.
Signed comparisons return TRUE only when both operands are valid numbers and the numbers have
the relation specified by the type of comparison operation. Ordered comparison returns TRUE when
both operands are valid numbers, or FALSE when either operand is a NaN. Unordered comparison
returns TRUE only when one or both operands are NaN and FALSE otherwise.
QNaN operands generate an Invalid Operation Exception (IE) only if the comparison type isn't Equal,
Unequal, Ordered, or Unordered. SNaN operands always generate an IE.
There are legacy and extended forms of the instruction:
CMPPD

The first source operand is an XMM register and the second source operand is either an XMM register or a128-bit memory location.The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected. Comparison type is specified by
bits [2:0] of an immediate byte operand.
VCMPPD

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Comparison type is specified by bits [4:0] of an
immediate byte operand.
YMM Encoding

The first source operand is a YMM register and the second source operand is either a YMM register
or a 256-bit memory location. The destination operand is a YMM register. Comparison type is specified by bits [4:0] of an immediate byte operand.
Immediate Operand Encoding
CMPPD uses bits [2:0] of the 8-bit immediate operand and VCMPPD uses bits [4:0] of the 8-bit
immediate operand. Although VCMPPD supports 20h encoding values, the comparison types echo
those of CMPPD on 4-bit boundaries. The following table shows the immediate operand value for
CMPPD and each of the VCMPPD echoes.
Some comparison operations that are not directly supported by immediate-byte encodings can be
implemented by swapping the contents of the source and destination operands and executing the
appropriate comparison of the swapped values. These additional comparison operations are shown
with the directly supported comparison operations.

Instruction Reference

CMPPD, VCMPPD

63

AMD64 Technology

26568—Rev. 3.22—May 2018

Immediate Operand
Value

Compare Operation

Result If NaN Operand

QNaN Operand Causes
Invalid Operation
Exception

00h, 08h, 10h, 18h

Equal

FALSE

No

01h, 09h, 11h, 19h

Less than

FALSE

Yes

Greater than
(swapped operands)

FALSE

Yes

Less than or equal

FALSE

Yes

Greater than or equal
(swapped operands)

FALSE

Yes

03h, 0Bh, 13h, 1Bh

Unordered

TRUE

No

04h, 0Ch, 14h, 1Ch

Not equal

TRUE

No

05h, 0Dh, 15h, 1Dh

Not less than

TRUE

Yes

Not greater than
(swapped operands)

TRUE

Yes

Not less than or equal

TRUE

Yes

Not greater than or equal
(swapped operands)

TRUE

Yes

Ordered

FALSE

No

02h, 0Ah, 12h, 1Ah

06h, 0Eh, 16h, 1Eh

07h, 0Fh, 17h, 1Fh

The following alias mnemonics for (V)CMPPD with appropriate value of imm8 are supported.
Mnemonic

Implied Value of imm8

(V)CMPEQPD

00h, 08h, 10h, 18h

(V)CMPLTPD

01h, 09h, 11h, 19h

(V)CMPLEPD

02h, 0Ah, 12h, 1Ah

(V)CMPUNORDPD

03h, 0Bh, 13h, 1Bh

(V)CMPNEQPD

04h, 0Ch, 14h, 1Ch

(V)CMPNLTPD

05h, 0Dh, 15h, 1Dh

(V)CMPNLEPD

06h, 0Eh, 16h, 1Eh

(V)CMPORDPD

07h, 0Fh, 17h, 1Fh

Instruction Support
Form

Subset

CMPPD

SSE2

VCMPPD

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

64

CMPPD, VCMPPD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic

Opcode

CMPPD xmm1, xmm2/mem128, imm8

Description

66 0F C2 /r ib

Compares two pairs of values in xmm1 to
corresponding values in xmm2 or mem128.
Comparison type is determined by imm8.
Writes comparison results to xmm1.

Mnemonic

Encoding
VEX RXB.map_select

W.vvvv.L.pp

Opcode

VCMPPD xmm1, xmm2, xmm3/mem128, imm8

C4

RXB.00001

X.src.0.01

C2 /r ib

VCMPPD ymm1, ymm2, ymm3/mem256, imm8

C4

RXB.00001

X.src.1.01

C2 /r ib

Related Instructions
(V)CMPPS, (V)CMPSD, (V)CMPSS, (V)COMISD, (V)COMISS, (V)UCOMISD, (V)UCOMISS
rFLAGS Affected
None
MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM
11

OM
10

ZM
9

DM
8

IM
7

DAZ
6

PE
5

UE
4

OE
3

ZE
2

DE

IE

M

M

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

Instruction Reference

CMPPD, VCMPPD

65

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S

X
X
X
S
X

S

S

S

S

A
X

S

X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
SIMD floating-point, #XF

S

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

66

S
S
S

S
S
S

X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.

CMPPD, VCMPPD

Instruction Reference

26568—Rev. 3.22—May 2018

CMPPS
VCMPPS

AMD64 Technology

Compare
Packed Single-Precision Floating-Point

Compares each of the four packed single-precision floating-point values of the first source operand to
the corresponding values of the second source operand and writes the result of each comparison to the
corresponding 32-bit element of the destination. When a comparison is TRUE, all 32 bits of the destination element are set; when a comparison is FALSE, all 32 bits of the destination element are
cleared. The type of comparison is specified by an immediate byte operand.
Signed comparisons return TRUE only when both operands are valid numbers and the numbers have
the relation specified by the type of comparison operation. Ordered comparison returns TRUE when
both operands are valid numbers, or FALSE when either operand is a NaN. Unordered comparison
returns TRUE only when one or both operands are NaN and FALSE otherwise.
QNaN operands generate an Invalid Operation Exception (IE) only if the comparison type isn't Equal,
Unequal, Ordered, or Unordered. SNaN operands always generate an IE.
There are legacy and extended forms of the instruction:
CMPPS

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected. Comparison type is specified by
bits [2:0] of an immediate byte operand.
VCMPPS

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Comparison type is specified by bits [4:0] of an
immediate byte operand.
YMM Encoding

The first source operand is a YMM register and the second source operand is either a YMM register
or a 256-bit memory location. The destination operand is a YMM register. Comparison type is specified by bits [4:0] of an immediate byte operand.
Immediate Operand Encoding
CMPPS uses bits [2:0] of the 8-bit immediate operand and VCMPPS uses bits [4:0] of the 8-bit
immediate operand. Although VCMPPS supports 20h encoding values, the comparison types echo
those of CMPPS on 4-bit boundaries. The following table shows the immediate operand value for
CMPPS and each of the VCMPPDS echoes.
Some comparison operations that are not directly supported by immediate-byte encodings can be
implemented by swapping the contents of the source and destination operands and executing the
appropriate comparison of the swapped values. These additional comparison operations are shown in
with the directly supported comparison operations.

Instruction Reference

CMPPS, VCMPPS

67

AMD64 Technology

26568—Rev. 3.22—May 2018

Immediate Operand
Value

Compare Operation

Result If NaN Operand

QNaN Operand Causes
Invalid Operation
Exception

00h, 08h, 10h, 18h

Equal

FALSE

No

01h, 09h, 11h, 19h

Less than

FALSE

Yes

Greater than
(swapped operands)

FALSE

Yes

Less than or equal

FALSE

Yes

Greater than or equal
(swapped operands)

FALSE

Yes

03h, 0Bh, 13h, 1Bh

Unordered

TRUE

No

04h, 0Ch, 14h, 1Ch

Not equal

TRUE

No

05h, 0Dh, 15h, 1Dh

Not less than

TRUE

Yes

Not greater than
(swapped operands)

TRUE

Yes

Not less than or equal

TRUE

Yes

Not greater than or equal
(swapped operands)

TRUE

Yes

Ordered

FALSE

No

02h, 0Ah, 12h, 1Ah

06h, 0Eh, 16h, 1Eh

07h, 0Fh, 17h, 1Fh

The following alias mnemonics for (V)CMPPS with appropriate value of imm8 are supported.
Mnemonic

Implied Value of imm8

(V)CMPEQPS

00h, 08h, 10h, 18h

(V)CMPLTPS

01h, 09h, 11h, 19h

(V)CMPLEPS

02h, 0Ah, 12h, 1Ah

(V)CMPUNORDPS

03h, 0Bh, 13h, 1Bh

(V)CMPNEQPS

04h, 0Ch, 14h, 1Ch

(V)CMPNLTPS

05h, 0Dh, 15h, 1Dh

(V)CMPNLEPS

06h, 0Eh, 16h, 1Eh

(V)CMPORDPS

07h, 0Fh, 17h, 1Fh

Instruction Support
Form

Subset

Feature Flag

CMPPS

SSE1

CPUID Fn0000_0001_EDX[SSE] (bit 25)

VCMPPS

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

68

CMPPS, VCMPPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic

Opcode

CMPPS xmm1, xmm2/mem128, imm8

0F C2 /r ib

Description
Compares four pairs of values in xmm1 to
corresponding values in xmm2 or mem128.
Comparison type is determined by imm8.
Writes comparison results to xmm1.

Mnemonic

Encoding

VCMPPS xmm1, xmm2, xmm3/mem128, imm8

VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.00001

X.src.0.00

C2 /r ib

Related Instructions
(V)CMPPD, (V)CMPSD, (V)CMPSS, (V)COMISD, (V)COMISS, (V)UCOMISD, (V)UCOMISS
rFLAGS Affected
None
MXCSR Flags Affected
MM

FZ

17

15

Note:

RC
14

13

PM

UM

OM

ZM

DM

IM

DAZ

PE

UE

OE

ZE

12

11

10

9

8

7

6

5

4

3

2

DE

IE

M

M

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

Instruction Reference

CMPPS, VCMPPS

69

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S

X
X
X
S
X

S

S

S

S

A
X

S

X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
SIMD floating-point, #XF

S

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

70

S
S
S

S
S
S

X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.

CMPPS, VCMPPS

Instruction Reference

26568—Rev. 3.22—May 2018

CMPSD
VCMPSD

AMD64 Technology

Compare
Scalar Double-Precision Floating-Point

Compares a double-precision floating-point value in the low-order 64 bits of the first source operand
with a double-precision floating-point value in the low-order 64 bits of the second source operand and
writes the result to the low-order 64 bits of the destination. When a comparison is TRUE, all 64 bits
of the destination element are set; when a comparison is FALSE, all 64 bits of the destination element
are cleared. Comparison type is specified by an immediate byte operand.
Signed comparisons return TRUE only when both operands are valid numbers and the numbers have
the relation specified by the type of comparison operation. Ordered comparison returns TRUE when
both operands are valid numbers, or FALSE when either operand is a NaN. Unordered comparison
returns TRUE only when one or both operands are NaN and FALSE otherwise.
QNaN operands generate an Invalid Operation Exception (IE) only when the comparison type is not
Equal, Unequal, Ordered, or Unordered. SNaN operands always generate an IE.
There are legacy and extended forms of the instruction:
CMPSD

The first source operand is an XMM register. The second source operand is either an XMM register or
a 64-bit memory location. The first source register is also the destination. Bits [127:64] of the destination are not affected. Bits [255:128] of the YMM register that corresponds to the destination are not
affected. Comparison type is specified by bits [2:0] of an immediate byte operand.
This CMPSD instruction must not be confused with the same-mnemonic CMPSD (compare strings
by doubleword) instruction in the general-purpose instruction set. Assemblers can distinguish the
instructions by the number and type of operands.
VCMPSD

The extended form of the instruction has a 128-bit encoding only.
The first source operand is an XMM register. The second source operand is either an XMM register or
a 64-bit memory location. The destination is a third XMM register. Bits [127:64] of the destination
are copied from bits [127:64] of the first source. Bits [255:128] of the YMM register that corresponds
to the destination are cleared. Comparison type is specified by bits [4:0] of an immediate byte operand.
Immediate Operand Encoding
CMPSD uses bits [2:0] of the 8-bit immediate operand and VCMPSD uses bits [4:0] of the 8-bit
immediate operand. Although VCMPSD supports 20h encoding values, the comparison types echo
those of CMPSD on 4-bit boundaries. The following table shows the immediate operand value for
CMPSD and each of the VCMPSD echoes.
Some comparison operations that are not directly supported by immediate-byte encodings can be
implemented by swapping the contents of the source and destination operands and executing the
appropriate comparison of the swapped values. These additional comparison operations are shown
with the directly supported comparison operations. When operands are swapped, the first source
XMM register is overwritten by the result.

Instruction Reference

CMPSD, VCMPSD

71

AMD64 Technology

26568—Rev. 3.22—May 2018

Immediate Operand
Value

Compare Operation

Result If NaN Operand

QNaN Operand Causes
Invalid Operation
Exception

00h, 08h, 10h, 18h

Equal

FALSE

No

01h, 09h, 11h, 19h

Less than

FALSE

Yes

Greater than
(swapped operands)

FALSE

Yes

Less than or equal

FALSE

Yes

Greater than or equal
(swapped operands)

FALSE

Yes

03h, 0Bh, 13h, 1Bh

Unordered

TRUE

No

04h, 0Ch, 14h, 1Ch

Not equal

TRUE

No

05h, 0Dh, 15h, 1Dh

Not less than

TRUE

Yes

Not greater than
(swapped operands)

TRUE

Yes

Not less than or equal

TRUE

Yes

Not greater than or equal
(swapped operands)

TRUE

Yes

Ordered

FALSE

No

02h, 0Ah, 12h, 1Ah

06h, 0Eh, 16h, 1Eh

07h, 0Fh, 17h, 1Fh

The following alias mnemonics for (V)CMPSD with appropriate value of imm8 are supported.
Mnemonic

Implied Value of imm8

(V)CMPEQSD

00h, 08h, 10h, 18h

(V)CMPLTSD

01h, 09h, 11h, 19h

(V)CMPLESD

02h, 0Ah, 12h, 1Ah

(V)CMPUNORDSD

03h, 0Bh, 13h, 1Bh

(V)CMPNEQSD

04h, 0Ch, 14h, 1Ch

(V)CMPNLTSD

05h, 0Dh, 15h, 1Dh

(V)CMPNLESD

06h, 0Eh, 16h, 1Eh

(V)CMPORDSD

07h, 0Fh, 17h, 1Fh

Instruction Support
Form

Subset

CMPSD

SSE2

VCMPSD

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

72

CMPSD, VCMPSD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic

Opcode

CMPSD xmm1, xmm2/mem64, imm8

Description

F2 0F C2 /r ib

Compares double-precision floating-point
values in the low-order 64 bits of xmm1 with
corresponding values in xmm2 or mem64.
Comparison type is determined by imm8.
Writes comparison results to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.00001

X.src.X.11

C2 /r ib

VCMPSD xmm1, xmm2, xmm3/mem64, imm8

Related Instructions
(V)CMPPD, (V)CMPPS, (V)CMPSS, (V)COMISD, (V)COMISS, (V)UCOMISD, (V)UCOMISS
rFLAGS Affected
None
MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM
11

OM
10

ZM
9

DM
8

IM
7

DAZ
6

PE
5

UE
4

OE
3

ZE
2

DE

IE

M

M

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

Instruction Reference

CMPSD, VCMPSD

73

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S

S
S
S
S
S

X
X
X
X
X
X

S

X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

S

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

74

S
S
S

S
S
S

X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.

CMPSD, VCMPSD

Instruction Reference

26568—Rev. 3.22—May 2018

CMPSS
VCMPSS

AMD64 Technology

Compare
Scalar Single-Precision Floating-Point

Compares a single-precision floating-point value in the low-order 32 bits of the first source operand
with a single-precision floating-point value in the low-order 32 bits of the second source operand and
writes the result to the low-order 32 bits of the destination. When a comparison is TRUE, all 32 bits
of the destination element are set; when a comparison is FALSE, all 32 bits of the destination element
are cleared. Comparison type is specified by an immediate byte operand.
Signed comparisons return TRUE only when both operands are valid numbers and the numbers have
the relation specified by the type of comparison operation. Ordered comparison returns TRUE when
both operands are valid numbers, or FALSE when either operand is a NaN. Unordered comparison
returns TRUE only when one or both operands are NaN and FALSE otherwise.
QNaN operands generate an Invalid Operation Exception (IE) only if the comparison type isn't Equal,
Unequal, Ordered, or Unordered. SNaN operands always generate an IE.
There are legacy and extended forms of the instruction:
CMPSS

The first source operand is an XMM register. The second source operand is either an XMM register or
a 32-bit memory location. The first source register is also the destination. Bits [127:32] of the destination are not affected. Bits [255:128] of the YMM register that corresponds to the destination are not
affected. Comparison type is specified by bits [2:0] of an immediate byte operand.
VCMPSS

The extended form of the instruction has a 128-bit encoding only.
The first source operand is an XMM register. The second source operand is either an XMM register or
a 32-bit memory location. The destination is a third XMM register. Bits [127:32] of the destination
are copied from bits [127L32] of the first source. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Comparison type is specified by bits [4:0] of an immediate byte
operand.
Immediate Operand Encoding
CMPSS uses bits [2:0] of the 8-bit immediate operand and VCMPSS uses bits [4:0] of the 8-bit
immediate operand. Although VCMPSS supports 20h encoding values, the comparison types echo
those of CMPSS on 4-bit boundaries. The following table shows the immediate operand value for
CMPSS and each of the VCMPSS echoes.
Some comparison operations that are not directly supported by immediate-byte encodings can be
implemented by swapping the contents of the source and destination operands and executing the
appropriate comparison of the swapped values. These additional comparison operations are shown
below with the directly supported comparison operations. When operands are swapped, the first
source XMM register is overwritten by the result.

Instruction Reference

CMPSS, VCMPSS

75

AMD64 Technology

26568—Rev. 3.22—May 2018

Immediate Operand
Value

Compare Operation

Result If NaN Operand

QNaN Operand Causes
Invalid Operation
Exception

00h, 08h, 10h, 18h

Equal

FALSE

No

01h, 09h, 11h, 19h

Less than

FALSE

Yes

Greater than
(swapped operands)

FALSE

Yes

Less than or equal

FALSE

Yes

Greater than or equal
(swapped operands)

FALSE

Yes

03h, 0Bh, 13h, 1Bh

Unordered

TRUE

No

04h, 0Ch, 14h, 1Ch

Not equal

TRUE

No

05h, 0Dh, 15h, 1Dh

Not less than

TRUE

Yes

Not greater than
(swapped operands)

TRUE

Yes

Not less than or equal

TRUE

Yes

Not greater than or equal
(swapped operands)

TRUE

Yes

Ordered

FALSE

No

02h, 0Ah, 12h, 1Ah

06h, 0Eh, 16h, 1Eh

07h, 0Fh, 17h, 1Fh

The following alias mnemonics for (V)CMPSS with appropriate value of imm8 are supported.
Mnemonic

Implied Value of imm8

(V)CMPEQSS

00h, 08h, 10h, 18h

(V)CMPLTSS

01h, 09h, 11h, 19h

(V)CMPLESS

02h, 0Ah, 12h, 1Ah

(V)CMPUNORDSS

03h, 0Bh, 13h, 1Bh

(V)CMPNEQSS

04h, 0Ch, 14h, 1Ch

(V)CMPNLTSS

05h, 0Dh, 15h, 1Dh

(V)CMPNLESS

06h, 0Eh, 16h, 1Eh

(V)CMPORDSS

07h, 0Fh, 17h, 1Fh

Instruction Support
Form

Subset

CMPSS

SSE1

CPUID Fn0000_0001_EDX[SSE] (bit 25)

Feature Flag

VCMPSS

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

76

CMPSS, VCMPSS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic

Opcode

CMPSS xmm1, xmm2/mem32, imm8

Description

F3 0F C2 /r ib

Compares single-precision floating-point
values in the low-order 32 bits of xmm1 with
corresponding values in xmm2 or mem32.
Comparison type is determined by imm8.
Writes comparison results to xmm1.

Mnemonic

Encoding
VEX RXB.map_select

VCMPSS xmm1, xmm2, xmm3/mem32, imm8

C4

RXB.00001

W.vvvv.L.pp

Opcode

X.src.X.10

C2 /r ib

Related Instructions
(V)CMPPD, (V)CMPPS, (V)CMPSD, (V)COMISD, (V)COMISS, (V)UCOMISD, (V)UCOMISS
rFLAGS Affected
None
MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM
11

OM
10

ZM
9

DM
8

IM
7

DAZ
6

PE
5

UE
4

OE
3

ZE
2

DE

IE

M

M

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

Instruction Reference

CMPSS, VCMPSS

77

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S

S
S
S
S
S

X
X
X
X
X
X

S

X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

S

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

78

S
S
S

S
S
S

X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.

CMPSS, VCMPSS

Instruction Reference

26568—Rev. 3.22—May 2018

COMISD
VCOMISD

AMD64 Technology

Compare Ordered
Scalar Double-Precision Floating-Point

Compares a double-precision floating-point value in the low-order 64 bits of the first operand with a
double-precision floating-point value in the low-order 64 bits of the second operand and sets
rFLAGS.ZF, PF, and CF to show the result of the comparison:
Comparison

ZF

PF

CF

NaN input

1

1

1

operand 1 > operand 2

0

0

0

operand 1 < operand 2

0

0

1

operand 1 == operand 2

1

0

0

The result is unordered if one or both of the operand values is a NaN. The rFLAGS.OF, AF, and SF
bits are cleared. If an #XF SIMD floating-point exception occurs the rFLAGS bits are not updated.
There are legacy and extended forms of the instruction:
COMISD

The first source operand is an XMM register and the second source operand is an XMM register or a
64-bit memory location.
VCOMISD

The extended form of the instruction has a 128-bit encoding only.
The first source operand is an XMM register and the second source operand is either an XMM register or a 64-bit memory location.
Instruction Support
Form

Subset

COMISD

SSE2

VCOMISD

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
COMISD xmm1, xmm2/mem64

Opcode

Description

66 0F 2F /r

Compares double-precision floating-point values in xmm1
with corresponding values in xmm2 or mem64 and sets
rFLAGS.

Mnemonic
VCOMISD xmm1, xmm2 /mem64

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.00001

X.src.X.01

2F /r

Related Instructions
(V)CMPPD, (V)CMPPS, (V)CMPSD, (V)CMPSS, (V)COMISS, (V)UCOMISD, (V)UCOMISS

Instruction Reference

COMISD, VCOMISD

79

AMD64 Technology

26568—Rev. 3.22—May 2018

rFLAGS Affected
ID

VIP

VIF

AC

VM

RF

NT

IOPL

OF

DF

IF

TF

SF

ZF

AF

PF

CF

0

M

0

M

M

7

6

4

2

0

DE

IE

M

M

1

0

0
21
Note:

20

19

18

17

16

14

13 12

11

10

9

8

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.
Bits 31:22, 15, 5, 3, and 1 are reserved. For #XF, rFLAGS bits are not updated.

MXCSR Flags Affected
MM
17
Note:

80

FZ
15

RC
14

PM
13

12

UM
11

OM
10

ZM
9

DM
8

IM

DAZ

7

6

PE
5

UE
4

OE
3

ZE
2

M indicates a flag that may be modified (set or cleared). Unaffected flags are blank.

COMISD, VCOMISD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
A
X

S

S

X

S
S
S

S
S
S
S
S

X
X
X
X
X
X

S

S

X

S
S
S

S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.

COMISD, VCOMISD

81

AMD64 Technology

COMISS
VCOMISS

26568—Rev. 3.22—May 2018

Compare
Ordered Scalar Single-Precision Floating-Point

Compares a double-precision floating-point value in the low-order 32 bits of the first operand with a
single-precision floating-point value in the low-order 32 bits of the second operand and sets
rFLAGS.ZF, PF, and CF to show the result of the comparison:
Comparison

ZF

PF

CF

NaN input

1

1

1

operand 1 > operand 2

0

0

0

operand 1 < operand 2

0

0

1

operand 1 == operand 2

1

0

0

The result is unordered if one or both of the operand values is a NaN. The rFLAGS.OF, AF, and SF
bits are cleared. If an #XF SIMD floating-point exception occurs the rFLAGS bits are not updated.
There are legacy and extended forms of the instruction:
COMISS

The first source operand is an XMM register and the second source operand is an XMM register or a
32-bit memory location.
VCOMISS

The extended form of the instruction has a 128-bit encoding only.
The first source operand is an XMM register and the second source operand is either an XMM register or a 32-bit memory location.
Instruction Support
Form

Subset

COMISS

SSE1

CPUID Fn0000_0001_EDX[SSE] (bit 25)

Feature Flag

VCOMISS

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

COMISS xmm1, xmm2/mem32

0F 2F /r

Description
Compares single-precision floating-point values in xmm1
with corresponding values in xmm2 or mem32 and sets
rFLAGS.

Mnemonic
VCOMISS xmm1, xmm2 /mem32

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.00001

X.src.X.00

2F /r

Related Instructions
(V)CMPPD, (V)CMPPS, (V)CMPSD, (V)CMPSS, (V)COMISD, (V)UCOMISD, (V)UCOMISS

82

COMISS, VCOMISS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

rFLAGS Affected
ID

VIP

VIF

AC

VM

RF

NT

IOPL

OF

DF

IF

TF

SF

ZF

AF

PF

CF

0

M

0

M

M

7

6

4

2

0

DE

IE

M

M

1

0

0
21
Note:

20

19

18

17

16

14

13 12

11

10

9

8

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.
Bits 31:22, 15, 5, 3, and 1 are reserved. For #XF, rFLAGS bits are not updated.

MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM

OM

11

10

ZM
9

DM
8

IM

DAZ

7

6

PE
5

UE
4

OE
3

ZE
2

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
A
X

S

S

X

S
S
S

S
S
S
S
S

X
X
X
X
X
X

S

X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

S

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

S
S
S

S
S
S

X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.

COMISS, VCOMISS

83

AMD64 Technology

26568—Rev. 3.22—May 2018

CVTDQ2PD
VCVTDQ2PD

Convert Packed Doubleword Integers
to Packed Double-Precision Floating-Point

Converts packed 32-bit signed integer values to packed double-precision floating-point values and
writes the converted values to the destination.
There are legacy and extended forms of the instruction:
CVTDQ2PD

Converts two packed 32-bit signed integer values in the low-order 64 bits of an XMM register or in a
64-bit memory location to two packed double-precision floating-point values and writes the converted values to an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not affected.
VCVTDQ2PD

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Converts two packed 32-bit signed integer values in the low-order 64 bits of an XMM register or in a
64-bit memory location to two packed double-precision floating-point values and writes the converted values to an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

Converts four packed 32-bit signed integer values in the low-order 128 bits of a YMM register or a
256-bit memory location to four packed double-precision floating-point values and writes the converted values to a YMM register.
Instruction Support
Form

Subset

CVTDQ2PD

SSE2

VCVTDQ2PD

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
CVTDQ2PD xmm1, xmm2/mem64

Opcode

Description

F3 0F E6 /r

Converts packed doubleword signed integers in xmm2
or mem64 to double-precision floating-point values in
xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VCVTDQ2PD xmm1, xmm2/mem64

C4

RXB.00001

X.1111.0.10

E6 /r

VCVTDQ2PD ymm1, ymm2/mem256

C4

RXB.00001

X.1111.1.10

E6 /r

84

CVTDQ2PD, VCVTDQ2PD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Related Instructions
(V)CVTPD2DQ, (V)CVTPI2PD, (V)CVTSD2SI, (V)CVTSI2SD, (V)CVTTPD2DQ,
(V)CVTTSD2SI
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

S
S

X
S
S
A
A
A
A
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference with alignment checking enabled.

CVTDQ2PD, VCVTDQ2PD

85

AMD64 Technology

26568—Rev. 3.22—May 2018

CVTDQ2PS
VCVTDQ2PS

Convert Packed Doubleword Integers
to Packed Single-Precision Floating-Point

Converts packed 32-bit signed integer values to packed single-precision floating-point values and
writes the converted values to the destination. When the result is an inexact value, it is rounded as
specified by MXCSR.RC.
There are legacy and extended forms of the instruction:
CVTDQ2PS

Converts four packed 32-bit signed integer values in an XMM register or a 128-bit memory location
to four packed single-precision floating-point values and writes the converted values to an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not affected.
VCVTDQ2PS

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Converts four packed 32-bit signed integer values in an XMM register or a 128-bit memory location
to four packed single-precision floating-point values and writes the converted values to an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

Converts eight packed 32-bit signed integer values in a YMM register or a 256-bit memory location
to eight packed single-precision floating-point values and writes the converted values to a YMM register.
Instruction Support
Form

Subset

CVTDQ2PS

SSE2

VCVTDQ2PS

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

Description

CVTDQ2PS xmm1, xmm2/mem128

0F 5B /r

Converts packed doubleword integer values in xmm2 or
mem128 to packed single-precision floating-point
values in xmm2.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VCVTDQ2PS xmm1, xmm2/mem128

C4

RXB.00001

X.1111.0.00

5B /r

VCVTDQ2PS ymm1, ymm2/mem256

C4

RXB.00001

X.1111.1.00

5B /r

Related Instructions
(V)CVTPS2DQ, (V)CVTSI2SS, (V)CVTSS2SI, (V)CVTTPS2DQ, (V)CVTTSS2SI

86

CVTDQ2PS, VCVTDQ2PS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

rFLAGS Affected
None
MXCSR Flags Affected
MM

FZ

RC

PM

UM

OM

ZM

DM

IM

DAZ

PE

UE

OE

ZE

DE

IE

4

3

2

1

0

M
17
Note:

15

14

13

12

11

10

9

8

7

6

5

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S

X
X
X
S
X

S

S

S

S

A
X

S

X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
SIMD floating-point, #XF

S

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Precision, PE
S
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

S

X

A result could not be represented exactly in the destination format.

CVTDQ2PS, VCVTDQ2PS

87

AMD64 Technology

CVTPD2DQ
VCVTPD2DQ

26568—Rev. 3.22—May 2018

Convert Packed Double-Precision Floating-Point
to Packed Doubleword Integer

Converts packed double-precision floating-point values to packed signed doubleword integers and
writes the converted values to the destination.
When the result is an inexact value, it is rounded as specified by MXCSR.RC. When the floatingpoint value is a NaN, infinity, or the result of the conversion is larger than the maximum signed doubleword (–231 to +231 – 1), the instruction returns the 32-bit indefinite integer value (8000_0000h)
when the invalid-operation exception (IE) is masked.
There are legacy and extended forms of the instruction:
CVTPD2DQ

Converts two packed double-precision floating-point values in an XMM register or a 128-bit memory
location to two packed signed doubleword integers and writes the converted values to the two loworder doublewords of the destination XMM register. Bits [127:64] of the destination are cleared. Bits
[255:128] of the YMM register that corresponds to the destination are not affected.
VCVTPD2DQ

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Converts two packed double-precision floating-point values in an XMM register or a 128-bit memory
location to two signed doubleword values and writes the converted values to the lower two doubleword elements of the destination XMM register. Bits [127:64] of the destination are cleared. Bits
[255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

Converts four packed double-precision floating-point values in a YMM register or a 256-bit memory
location to four signed doubleword values and writes the converted values to an XMM register. Bits
[255:128] of the YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

CVTPD2DQ

SSE2

VCVTPD2DQ

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
CVTPD2DQ xmm1, xmm2/mem128

Opcode
F2 0F E6 /r

Description
Converts two packed double-precision floating-point
values in xmm2 or mem128 to packed doubleword
integers in xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VCVTPD2DQ xmm1, xmm2/mem128

C4

RXB.00001

X.1111.0.11

E6 /r

VCVTPD2DQ xmm1, ymm2/mem256

C4

RXB.00001

X.1111.1.11

E6 /r

88

CVTPD2DQ, VCVTPD2DQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Related Instructions
(V)CVTDQ2PD, (V)CVTPI2PD, (V)CVTSD2SI, (V)CVTSI2SD, (V)CVTTPD2DQ,
(V)CVTTSD2SI
rFLAGS Affected
None
MXCSR Flags Affected
MM

FZ

RC

PM

UM

OM

ZM

DM

IM

DAZ

PE

UE

OE

ZE

DE

M
17
Note:

15

14

13

12

11

10

9

8

7

6

5

IE
M

4

3

2

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S

X
X
X
S
X

S

S

S

S

A
X

S

X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
SIMD floating-point, #XF

S

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

S
S
S

S
S
S

X
X
X

A source operand was an SNaN value.
Undefined operation.
A result could not be represented exactly in the destination format.

CVTPD2DQ, VCVTPD2DQ

89

AMD64 Technology

CVTPD2PS
VCVTPD2PS

26568—Rev. 3.22—May 2018

Convert Packed Double-Precision Floating-Point
to Packed Single-Precision Floating-Point

Converts packed double-precision floating-point values to packed single-precision floating-point values and writes the converted values to the low-order doubleword elements of the destination. When
the result is an inexact value, it is rounded as specified by MXCSR.RC.
There are legacy and extended forms of the instruction:
CVTPD2PS

Converts two packed double-precision floating-point values in an XMM register or a 128-bit memory
location to two packed single-precision floating-point values and writes the converted values to an
XMM register. Bits [127:64] of the destination are cleared. Bits [255:128] of the YMM register that
corresponds to the destination are not affected.
VCVTPD2PS

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Converts two packed double-precision floating-point values in an XMM register or a 128-bit memory
location to two packed single-precision floating-point values and writes the converted values to an
XMM register. Bits [127:64] of the destination are cleared. Bits [255:128] of the YMM register that
corresponds to the destination are cleared.
YMM Encoding

Converts four packed double-precision floating-point values in a YMM register or a 256-bit memory
location to four packed single-precision floating-point values and writes the converted values to a
YMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

CVTPD2PS

SSE2

VCVTPD2PS

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
CVTPD2PS xmm1, xmm2/mem128

Opcode
66 0F 5A /r

Description
Converts packed double-precision floating-point
values in xmm2 or mem128 to packed singleprecision floating-point values in xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VCVTPD2PS xmm1, xmm2/mem128

C4

RXB.00001

X.1111.0.01

5A /r

VCVTPD2PS xmm1, ymm2/mem256

C4

RXB.00001

X.1111.1.01

5A /r

90

CVTPD2PS, VCVTPD2PS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Related Instructions
(V)CVTPS2PD, (V)CVTSD2SS, (V)CVTSS2SD
rFLAGS Affected
None
MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM

OM

11

10

ZM
9

DM
8

IM
7

DAZ
6

PE

UE

OE

M

M

M

5

4

3

ZE
2

DE

IE

M

M

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S

X
X
X
S
X

S

S

S

S

A
X

S

S

X

S
S
S
S
S
S

S
S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

X
X
X
X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

CVTPD2PS, VCVTPD2PS

91

AMD64 Technology

CVTPS2DQ
VCVTPS2DQ

26568—Rev. 3.22—May 2018

Convert Packed Single-Precision Floating-Point
to Packed Doubleword Integers

Converts packed single-precision floating-point values to packed signed doubleword integer values
and writes the converted values to the destination.
When the result is an inexact value, it is rounded as specified by MXCSR.RC. When the floatingpoint value is a NaN, infinity, or the result of the conversion is larger than the maximum signed doubleword (–231 to +231 – 1), the instruction returns the 32-bit indefinite integer value (8000_0000h)
when the invalid-operation exception (IE) is masked.
There are legacy and extended forms of the instruction:
CVTPS2DQ

Converts four packed single-precision floating-point values in an XMM register or a 128-bit memory
location to four packed signed doubleword integer values and writes the converted values to an XMM
register. Bits [255:128] of the YMM register that corresponds to the destination are not affected.
VCVTPS2DQ

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Converts four packed single-precision floating-point values in an XMM register or a 128-bit memory
location to four packed signed doubleword integer values and writes the converted values to an XMM
register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

Converts eight packed single-precision floating-point values in a YMM register or a 256-bit memory
location to eight packed signed doubleword integer values and writes the converted values to a YMM
register.
Instruction Support
Form

Subset

CVTPS2DQ

SSE2

VCVTPS2DQ

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
CVTPS2DQ xmm1, xmm2/mem128

Opcode
66 0F 5B /r

Description
Converts four packed single-precision floating-point
values in xmm2 or mem128 to four packed
doubleword integers in xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VCVTPS2DQ xmm1, xmm2/mem128

C4

RXB.00001

X.1111.0.01

5B /r

VCVTPS2DQ ymm1, ymm2/mem256

C4

RXB.00001

X.1111.1.01

5B /r

92

CVTPS2DQ, VCVTPS2DQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Related Instructions
(V)CVTDQ2PS, (V)CVTSI2SS, (V)CVTSS2SI, (V)CVTTPS2DQ, (V)CVTTSS2SI
rFLAGS Affected
None
MXCSR Flags Affected
MM

FZ

RC

PM

UM

OM

ZM

DM

IM

DAZ

PE

UE

OE

ZE

DE

M
17
Note:

15

14

13

12

11

10

9

8

7

6

5

IE
M

4

3

2

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S

X
X
X
S
X

S

S

S

S

A
X

S

S

X

S
S
S

S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

X
X
X

A source operand was an SNaN value.
Undefined operation.
A result could not be represented exactly in the destination format.

CVTPS2DQ, VCVTPS2DQ

93

AMD64 Technology

CVTPS2PD
VCVTPS2PD

26568—Rev. 3.22—May 2018

Convert Packed Single-Precision Floating-Point
to Packed Double-Precision Floating-Point

Converts packed single-precision floating-point values to packed double-precision floating-point values and writes the converted values to the destination.
There are legacy and extended forms of the instruction:
CVTPS2PD

Converts two packed single-precision floating-point values in the two low order doubleword elements of an XMM register or a 64-bit memory location to two double-precision floating-point values
and writes the converted values to an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not affected.
VCVTPS2PD

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Converts two packed single-precision floating-point values in the two low order doubleword elements of an XMM register or a 64-bit memory location to two double-precision floating-point values
and writes the converted values to an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

Converts four packed single-precision floating-point values in a YMM register or a 128-bit memory
location to four double-precision floating-point values and writes the converted values to a YMM
register.
Instruction Support
Form

Subset

CVTPS2PD

SSE2

VCVTPS2PD

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

Description

CVTPS2PD xmm1, xmm2/mem64

0F 5A /r

Converts packed single-precision floating-point values
in xmm2 or mem64 to packed double-precision floatingpoint values in xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VCVTPS2PD xmm1, xmm2/mem64

C4

RXB.00001

X.1111.0.00

5A /r

VCVTPS2PD ymm1, ymm2/mem128

C4

RXB.00001

X.1111.1.00

5A /r

Related Instructions
(V)CVTPD2PS, (V)CVTSD2SS, (V)CVTSS2SD

94

CVTPS2PD, VCVTPS2PD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

rFLAGS Affected
None
MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM

OM

11

10

ZM
9

DM
8

IM
7

DAZ
6

PE
5

UE
4

OE
3

ZE
2

DE

IE

M

M

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
A
X

S

S

X

S
S
S

S
S
S
S
S

X
X
X
X
X
X

S

X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

S

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

S
S
S

S
S
S

X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.

CVTPS2PD, VCVTPS2PD

95

AMD64 Technology

CVTSD2SI
VCVTSD2SI

26568—Rev. 3.22—May 2018

Convert Scalar Double-Precision Floating-Point
to Signed Doubleword or Quadword Integer

Converts a scalar double-precision floating-point value to a 32-bit or 64-bit signed integer value and
writes the converted value to a general-purpose register.
When the result is an inexact value, it is rounded as specified by MXCSR.RC. When the floatingpoint value is a NaN, infinity, or the result of the conversion is larger than the maximum signed doubleword (–231 to +231 – 1) or quadword value (–263 to +263 – 1), the instruction returns the indefinite
integer value (8000_0000h for 32-bit integers, 8000_0000_0000_0000h for 64-bit integers) when the
invalid-operation exception (IE) is masked.
There are legacy and extended forms of the instruction:
CVTSD2SI

The legacy form has two encodings:
• When REX.W = 0, converts a scalar double-precision floating-point value in the low-order 64 bits
of an XMM register or a 64-bit memory location to a 32-bit signed integer and writes the converted
value to a 32-bit general purpose register.
• When REX.W = 1, converts a scalar double-precision floating-point value in the low-order 64 bits
of an XMM register or a 64-bit memory location to a 64-bit sign-extended integer and writes the
converted value to a 64-bit general purpose register.
VCVTSD2SI

The extended form of the instruction has two 128-bit encodings:
• When VEX.W = 0, converts a scalar double-precision floating-point value in the low-order 64 bits
of an XMM register or a 64-bit memory location to a 32-bit signed integer and writes the converted
value to a 32-bit general purpose register.
• When VEX.W = 1, converts a scalar double-precision floating-point value in the low-order 64 bits
of an XMM register or a 64-bit memory location to a 64-bit sign-extended integer and writes the
converted value to a 64-bit general purpose register.
Instruction Support
Form

Subset

CVTSD2SI

SSE2

VCVTSD2SI

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

96

CVTSD2SI, VCVTSD2SI

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic

Opcode

Description

CVTSD2SI reg32, xmm1/mem64 F2 (W0) 0F 2D /r Converts a packed double-precision floating-point value
in xmm1 or mem64 to a doubleword integer in reg32.
CVTSD2SI reg64, xmm1/mem64 F2 (W1) 0F 2D /r Converts a packed double-precision floating-point value
in xmm1 or mem64 to a quadword integer in reg64.
Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VCVTSD2SI reg32, xmm2/mem64

C4

RXB.00001

0.1111.X.11

2D /r

VCVTSD2SI reg64, xmm2/mem64

C4

RXB.00001

1.1111.X.11

2D /r

Related Instructions
(V)CVTDQ2PD, (V)CVTPD2DQ, (V)CVTPI2PD, (V)CVTSI2SD, (V)CVTTPD2DQ,
(V)CVTTSD2SI
rFLAGS Affected
None
MXCSR Flags Affected
MM

FZ

17

15

RC

PM

UM

OM

ZM

DM

IM

DAZ

12

11

10

9

8

7

6

PE

UE

OE

ZE

DE

4

3

2

1

M
Note:

14

13

5

IE
M
0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

Instruction Reference

CVTSD2SI, VCVTSD2SI

97

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
A
X

S

S

X

S
S
S

S
S
S
S
S

X
X
X
X
X
X

S

S

X

S
S
S

S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

98

X
X
X

A source operand was an SNaN value.
Undefined operation.
A result could not be represented exactly in the destination format.

CVTSD2SI, VCVTSD2SI

Instruction Reference

26568—Rev. 3.22—May 2018

CVTSD2SS
VCVTSD2SS

AMD64 Technology

Convert Scalar Double-Precision Floating-Point
to Scalar Single-Precision Floating-Point

Converts a scalar double-precision floating-point value to a scalar single-precision floating-point
value and writes the converted value to the low-order 32 bits of the destination. When the result is an
inexact value, it is rounded as specified by MXCSR.RC.
There are legacy and extended forms of the instruction:
CVTSD2SS

Converts a scalar double-precision floating-point value in the low-order 64 bits of the second source
XMM register or a 64-bit memory location to a scalar single-precision floating-point value and writes
the converted value to the low-order 32 bits of a destination XMM register. Bits [127:32] of the destination are not affected. Bits [255:128] of the YMM register that corresponds to the destination are not
affected.
VCVTSD2SS

The extended form of the instruction has a 128-bit encoding only.
Converts a scalar double-precision floating-point value in the low-order 64 bits of a source XMM
register or a 64-bit memory location to a scalar single-precision floating-point value and writes the
converted value to the low-order 32 bits of the destination XMM register. Bits [127:32] of the destination are copied from the first source XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

CVTSD2SS

SSE2

VCVTSD2SS

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
CVTSD2SS xmm1, xmm2/mem64

Opcode

Description

F2 0F 5A /r

Converts a scalar double-precision floating-point
value in xmm2 or mem64 to a scalar single-precision
floating-point value in xmm1.

Mnemonic

Encoding
VEX RXB.map_select W.vvvv.L.pp

VCVTSD2SS xmm1, xmm2, xmm3/mem64

C4

RXB.00001

X.src.X.11

Opcode
5A /r

Related Instructions
(V)CVTPD2PS, (V)CVTPS2PD, (V)CVTSS2SD
rFLAGS Affected
None

Instruction Reference

CVTSD2SS, VCVTSD2SS

99

AMD64 Technology

26568—Rev. 3.22—May 2018

MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM

OM

11

10

ZM
9

DM
8

IM
7

DAZ
6

PE

UE

OE

M

M

M

5

4

3

ZE
2

DE

IE

M

M

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S

S
S
S
S
S

X
X
X
X
X
X

S

X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

S

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

100

S
S
S
S
S
S

S
S
S
S
S
S

X
X
X
X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

CVTSD2SS, VCVTSD2SS

Instruction Reference

26568—Rev. 3.22—May 2018

CVTSI2SD
VCVTSI2SD

AMD64 Technology

Convert Signed Doubleword or Quadword Integer
to Scalar Double-Precision Floating-Point

Converts a signed integer value to a double-precision floating-point value and writes the converted
value to a destination register. When the result of the conversion is an inexact value, the value is
rounded as specified by MXCSR.RC.
There are legacy and extended forms of the instruction:
CVTSI2SD

The legacy form as two encodings:
• When REX.W = 0, converts a signed doubleword integer value from a 32-bit source generalpurpose register or a 32-bit memory location to a double-precision floating-point value and writes
the converted value to the low-order 64 bits of an XMM register. Bits [127:64] of the destination
XMM register and bits [255:128] of the corresponding YMM register are not affected.
•

When REX.W = 1, converts a a signed quadword integer value from a 64-bit source generalpurpose register or a 64-bit memory location to a 64-bit double-precision floating-point value and
writes the converted value to the low-order 64 bits of an XMM register. Bits [127:64] of the
destination XMM register and bits [255:128] of the corresponding YMM register are not affected.

VCVTSI2SD

The extended form of the instruction has two 128-bit encodings:
• When VEX.W = 0, converts a signed doubleword integer value from a 32-bit source generalpurpose register or a 32-bit memory location to a double-precision floating-point value and writes
the converted value to the low-order 64 bits of the destination XMM register. Bits [127:64] of the
first source XMM register are copied to the destination XMM register. Bits [255:128] of the YMM
register that corresponds to the destination are cleared.
• When VEX.W = 1, converts a signed quadword integer value from a 64-bit source general-purpose
register or a 64-bit memory location to a double-precision floating-point value and writes the
converted value to the low-order 64 bits of the destination XMM register. Bits [127:64] of the first
source XMM register are copied to the destination XMM register. Bits [255:128] of the YMM
register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

CVTSI2SD

SSE2

VCVTSI2SD

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

Instruction Reference

CVTSI2SD, VCVTSI2SD

101

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Encoding
Mnemonic

Opcode

Description

CVTSI2SD xmm1, reg32/mem32 F2 (W0) 0F 2A /r Converts a doubleword integer in reg32 or mem32 to a
double-precision floating-point value in xmm1.
CVTSI2SD xmm1, reg64/mem64 F2 (W1) 0F 2A /r Converts a quadword integer in reg64 or mem64 to a
double-precision floating-point value in xmm1.
Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VCVTSI2SD xmm1, xmm2, reg32/mem32

C4

RXB.00001

0.src.X.11

2A /r

VCVTSI2SD xmm1, xmm2, reg64/mem64

C4

RXB.00001

1.src.X.11

2A /r

Related Instructions
(V)CVTDQ2PD, (V)CVTPD2DQ, (V)CVTPI2PD, (V)CVTSD2SI, (V)CVTTPD2DQ,
(V)CVTTSD2SI
rFLAGS Affected
None
MXCSR Flags Affected
MM

FZ

RC

PM

UM

OM

ZM

DM

IM

DAZ

PE

UE

OE

ZE

DE

IE

4

3

2

1

0

M
17
Note:

102

15

14

13

12

11

10

9

8

7

6

5

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

CVTSI2SD, VCVTSI2SD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S

S
S
S
S
S

X
X
X
X
X
X

S

X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

S

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Precision, PE
S
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

S

X

A result could not be represented exactly in the destination format.

CVTSI2SD, VCVTSI2SD

103

AMD64 Technology

CVTSI2SS
VCVTSI2SS

26568—Rev. 3.22—May 2018

Convert Signed Doubleword or Quadword Integer
to Scalar Single-Precision Floating-Point

Converts a signed integer value to a single-precision floating-point value and writes the converted
value to an XMM register. When the result of the conversion is an inexact value, the value is rounded
as specified by MXCSR.RC.
There are legacy and extended forms of the instruction:
CVTSI2SS

The legacy form has two encodings:
• When REX.W = 0, converts a signed doubleword integer value from a 32-bit source generalpurpose register or a 32-bit memory location to a single-precision floating-point value and writes
the converted value to the low-order 32 bits of an XMM register. Bits [127:32] of the destination
XMM register and bits [255:128] of the corresponding YMM register are not affected.
•

When REX.W = 1, converts a a signed quadword integer value from a 64-bit source generalpurpose register or a 64-bit memory location to a single-precision floating-point value and writes
the converted value to the low-order 32 bits of an XMM register. Bits [127:32] of the destination
XMM register and bits [255:128] of the corresponding YMM register are not affected.

VCVTSI2SS

The extended form of the instruction has two 128-bit encodings:
• When VEX.W = 0, converts a signed doubleword integer value from a 32-bit source generalpurpose register or a 32-bit memory location to a single-precision floating-point value and writes
the converted value to the low-order 32 bits of the destination XMM register. Bits [127:32] of the
first source XMM register are copied to the destination XMM register. Bits [255:128] of the YMM
register that corresponds to the destination are cleared.
• When VEX.W = 1, converts a signed quadword integer value from a 64-bit source general-purpose
register or a 64-bit memory location to a single-precision floating-point value and writes the
converted value to the low-order 32 bits of the destination XMM register. Bits [127:32] of the first
source XMM register are copied to the destination XMM register. Bits [255:128] of the YMM
register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

Feature Flag

CVTSI2SS

SSE1

CPUID Fn0000_0001_EDX[SSE] (bit 25)

VCVTSI2SS

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

104

CVTSI2SS, VCVTSI2SS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic

Opcode

Description

CVTSI2SS xmm1, reg32/mem32

F3 (W0) 0F 2A /r Converts a doubleword integer in reg32 or mem32 to a
single-precision floating-point value in xmm1.

CVTSI2SS xmm1, reg64/mem64

F3 (W1) 0F 2A /r Converts a quadword integer in reg64 or mem64 to a
single-precision floating-point value in xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VCVTSI2SS xmm1, xmm2, reg32/mem32

C4

RXB.00001

0.src.X.10

2A /r

VCVTSI2SS xmm1, xmm2, reg64/mem64

C4

RXB.00001

1.src.X.10

2A /r

Related Instructions
(V)CVTDQ2PS, (V)CVTPS2DQ, (V)CVTSS2SI, (V)CVTTPS2DQ, (V)CVTTSS2SI
rFLAGS Affected
None
MXCSR Flags Affected
MM

FZ

RC

PM

UM

OM

ZM

DM

IM

DAZ

PE

UE

OE

ZE

DE

IE

4

3

2

1

0

M
17
Note:

15

14

13

12

11

10

9

8

7

6

5

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

Instruction Reference

CVTSI2SS, VCVTSI2SS

105

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S

S
S
S
S
S

X
X
X
X
X
X

S

X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

S

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Precision, PE
S
X — AVX and SSE exception
A — AVX exception
S — SSE exception

106

S

X

A result could not be represented exactly in the destination format.

CVTSI2SS, VCVTSI2SS

Instruction Reference

26568—Rev. 3.22—May 2018

CVTSS2SD
VCVTSS2SD

AMD64 Technology

Convert Scalar Single-Precision Floating-Point
to Scalar Double-Precision Floating-Point

Converts a scalar single-precision floating-point value to a scalar double-precision floating-point
value and writes the converted value to the low-order 64 bits of the destination.
There are legacy and extended forms of the instruction:
CVTSS2SD

Converts a scalar single-precision floating-point value in the low-order 32 bits of a source XMM register or a 32-bit memory location to a scalar double-precision floating-point value and writes the converted value to the low-order 64 bits of a destination XMM register. Bits [127:64] of the destination
and bits [255:128] of the corresponding YMM register are not affected.
VCVTSS2SD

The extended form of the instruction has a 128-bit encoding only.
Converts a scalar single-precision floating-point value in the low-order 32 bits of the second source
XMM register or 32-bit memory location to a scalar double-precision floating-point value and writes
the converted value to the low-order 64 bits of the destination XMM register. Bits [127:64] of the destination are copied from the first source XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

CVTSS2SD

SSE2

VCVTSS2SD

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
CVTSS2SD xmm1, xmm2/mem32

Opcode

Description

F3 0F 5A /r

Converts a scalar single-precision floating-point value
in xmm2 or mem32 to a scalar double-precision
floating-point value in xmm1.

Mnemonic

Encoding

VCVTSS2SD xmm1, xmm2, xmm3/mem64

VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.00001

X.src.X.10

5A /r

Related Instructions
(V)CVTPD2PS, (V)CVTPS2PD, (V)CVTSD2SS

Instruction Reference

CVTSS2SD, VCVTSS2SD

107

AMD64 Technology

26568—Rev. 3.22—May 2018

MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM

OM

11

10

ZM
9

DM
8

IM
7

DAZ
6

PE
5

UE
4

OE
3

ZE
2

DE

IE

M

M

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S

S
S
S
S
S

X
X
X
X
X
X

S

S

X

S
S
S

S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

108

X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.

CVTSS2SD, VCVTSS2SD

Instruction Reference

26568—Rev. 3.22—May 2018

CVTSS2SI
VCVTSS2SI

AMD64 Technology

Convert Scalar Single-Precision Floating-Point
to Signed Doubleword or Quadword Integer

Converts a single-precision floating-point value to a signed integer value and writes the converted
value to a general-purpose register.
When the result of the conversion is an inexact value, the value is rounded as specified by
MXCSR.RC. When the floating-point value is a NaN, infinity, or the result of the conversion is larger
than the maximum signed doubleword (–231 to +231 – 1) or quadword value (–263 to +263 – 1), the
indefinite integer value (8000_0000h for 32-bit integers, 8000_0000_0000_0000h for 64-bit integers)
is returned when the invalid-operation exception (IE) is masked.
There are legacy and extended forms of the instruction:
CVTSS2SI

The legacy form has two encodings:
• When REX.W = 0, converts a single-precision floating-point value in the low-order 32 bits of an
XMM register or a 32-bit memory location to a 32-bit signed integer value and writes the
converted value to a 32-bit general-purpose register.
• When REX.W = 1, converts a single-precision floating-point value in the low-order 32 bits of an
XMM register or a 32-bit memory location to a 64-bit signed integer value and writes the
converted value to a 64-bit general-purpose register.
VCVTSS2SI
The extended form of the instruction has two 128-bit encodings:
• When VEX.W = 0, converts a single-precision floating-point value in the low-order 32 bits of an
XMM register or a 32-bit memory location to a 32-bit signed integer value and writes the
converted value to a 32-bit general-purpose register.
• When VEX.W = 1, converts a single-precision floating-point value in the low-order 32 bits of an
XMM register or a 32-bit memory location to a 64-bit signed integer value and writes the
converted value to a 64-bit general-purpose register.
Instruction Support
Form

Subset

Feature Flag

CVTSS2SI

SSE1

CPUID Fn0000_0001_EDX[SSE] (bit 25)

VCVTSS2SI

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

Instruction Reference

CVTSS2SI, VCVTSS2SI

109

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Encoding
Mnemonic

Opcode

Description

CVTSS2SI reg32, xmm1/mem32

F3 (W0) 0F 2D /r Converts a single-precision floating-point value in
xmm1 or mem32 to a 32-bit integer value in reg32

CVTSS2SI reg64, xmm1//mem64

F3 (W1) 0F 2D /r Converts a single-precision floating-point value in
xmm1 or mem64 to a 64-bit integer value in reg64

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VCVTSS2SI reg32, xmm1/mem32

C4

RXB.00001

0.1111.X.10

2D /r

VCVTSS2SI reg64, xmm1/mem64

C4

RXB.00001

1.1111.X.10

2D /r

Related Instructions
(V)CVTDQ2PS, (V)CVTPS2DQ, (V)CVTSI2SS, (V)CVTTPS2DQ, (V)CVTTSS2SI
MXCSR Flags Affected
MM

FZ

RC

PM

UM

OM

ZM

DM

IM

DAZ

PE

UE

OE

ZE

DE

M
17
Note:

110

15

14

13

12

11

10

9

8

7

6

5

IE
M

4

3

2

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

CVTSS2SI, VCVTSS2SI

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
A
X

S

S

X

S
S
S

S
S
S
S
S

X
X
X
X
X
X

S

S

X

S
S
S

S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

X
X
X

A source operand was an SNaN value.
Undefined operation.
A result could not be represented exactly in the destination format.

CVTSS2SI, VCVTSS2SI

111

AMD64 Technology

26568—Rev. 3.22—May 2018

CVTTPD2DQ Convert Packed Double-Precision Floating-Point
VCVTTPD2DQ
to Packed Doubleword Integer, Truncated
Converts packed double-precision floating-point values to packed signed doubleword integer values
and writes the converted values to the destination.
When the result is an inexact value, it is truncated (rounded toward zero). When the floating-point
value is a NaN, infinity, or the result of the conversion is larger than the maximum signed doubleword
(–231 to +231 – 1), the instruction returns the 32-bit indefinite integer value (8000_0000h) when the
invalid-operation exception (IE) is masked.
There are legacy and extended forms of the instruction:
CVTTPD2DQ

Converts two packed double-precision floating-point values in an XMM register or a 128-bit memory
location to two packed signed doubleword integers and writes the converted values to the two loworder doublewords of the destination XMM register. Bits [127:64] of the destination are cleared. Bits
[255:128] of the YMM register that corresponds to the destination are not affected.
VCVTTPD2DQ

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Converts two packed double-precision floating-point values in an XMM register or a 128-bit memory
location to two signed doubleword values and writes the converted values to the lower two doubleword elements of the destination XMM register. Bits [255:128] of the YMM register that corresponds
to the destination are cleared.
YMM Encoding

Converts four packed double-precision floating-point values in a YMM register or a 256-bit memory
location to four signed doubleword integer values and writes the converted values to an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

CVTTPD2DQ

SSE2

VCVTTPD2DQ

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

112

CVTTPD2DQ, VCVTTPD2DQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic

Opcode

CVTTPD2DQ xmm1, xmm2/mem128

Description

66 0F E6 /r

Converts two packed double-precision floating-point
values in xmm2 or mem128 to packed doubleword
integers in xmm1. Truncates inexact result.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VCVTTPD2DQ xmm1, xmm2/mem128

C4

RXB.00001

X.1111.0.01

E6 /r

VCVTTPD2DQ xmm1, ymm2/mem256

C4

RXB.00001

X.1111.1.01

E6 /r

Related Instructions
(V)CVTDQ2PD, (V)CVTPD2DQ, (V)CVTPI2PD, (V)CVTSD2SI, (V)CVTSI2SD, (V)CVTTSD2SI
MXCSR Flags Affected
MM

FZ

RC

PM

UM

OM

ZM

DM

IM

DAZ

PE

UE

OE

ZE

DE

M
17
Note:

15

14

13

12

11

10

9

8

7

6

5

IE
M

4

3

2

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

Instruction Reference

CVTTPD2DQ, VCVTTPD2DQ

113

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S

X
X
X
S
X

S

S

S

S

A
X

S

X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
SIMD floating-point, #XF

S

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

114

S
S
S

S
S
S

X
X
X

A source operand was an SNaN value.
Undefined operation.
A result could not be represented exactly in the destination format.

CVTTPD2DQ, VCVTTPD2DQ

Instruction Reference

26568—Rev. 3.22—May 2018

CVTTPS2DQ
VCVTTPS2DQ

AMD64 Technology

Convert Packed Single-Precision Floating-Point
to Packed Doubleword Integers, Truncated

Converts packed single-precision floating-point values to packed signed doubleword integer values
and writes the converted values to the destination.
When the result of the conversion is an inexact value, the value is truncated (rounded toward zero).
When the floating-point value is a NaN, infinity, or the result of the conversion is larger than the maximum signed doubleword (–231 to +231 – 1), the instruction returns the 32-bit indefinite integer value
(8000_0000h) when the invalid-operation exception (IE) is masked.
There are legacy and extended forms of the instruction:
CVTTPS2DQ

Converts four packed single-precision floating-point values in an XMM register or a 128-bit memory
location to four packed signed doubleword integer values and writes the converted values to an XMM
register. The high-order 128-bits of the corresponding YMM register are not affected.
VCVTTPS2DQ

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Converts four packed single-precision floating-point values in an XMM register or a 128-bit memory
location to four packed signed doubleword integer values and writes the converted values to an XMM
register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

Converts eight packed single-precision floating-point values in a YMM register or a 256-bit memory
location to eight packed signed doubleword integer values and writes the converted values to a YMM
register.
Instruction Support
Form

Subset

CVTTPS2DQ

SSE2

VCVTTPS2DQ

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

CVTTPS2DQ xmm1, xmm2/mem128

F3 0F 5B /r

Description
Converts four packed single-precision floating-point
values in xmm2 or mem128 to four packed
doubleword integers in xmm1. Truncates inexact
result.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VCVTTPS2DQ xmm1, xmm2/mem128

C4

RXB.00001

X.1111.0.10

5B /r

VCVTTPS2DQ ymm1, ymm2/mem256

C4

RXB.00001

X.1111.1.10

5B /r

Instruction Reference

CVTTPS2DQ, VCVTTPS2DQ

115

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)CVTDQ2PS, (V)CVTPS2DQ, (V)CVTSI2SS, (V)CVTSS2SI, (V)CVTTSS2SI
MXCSR Flags Affected
MM

FZ

RC

PM

UM

OM

ZM

DM

IM

DAZ

PE

UE

OE

ZE

DE

M
17
Note:

15

14

13

12

11

10

9

8

7

6

5

IE
M

4

3

2

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S

X
X
X
S
X

S

S

S

S

A
X

S

X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
SIMD floating-point, #XF

S

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

116

S
S
S

S
S
S

X
X
X

A source operand was an SNaN value.
Undefined operation.
A result could not be represented exactly in the destination format.

CVTTPS2DQ, VCVTTPS2DQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

CVTTSD2SI
Convert Scalar Double-Precision Floating-Point
VCVTTSD2SI to Signed Double- or Quadword Integer, Truncated
Converts a scalar double-precision floating-point value to a signed integer value and writes the converted value to a general-purpose register.
When the result of the conversion is an inexact value, the value is truncated (rounded toward zero).
When the floating-point value is a NaN, infinity, or the result of the conversion is larger than the maximum signed doubleword (–231 to +231 – 1) or quadword value (–263 to +263 – 1), the instruction
returns the indefinite integer value (8000_0000h for 32-bit integers, 8000_0000_0000_0000h for 64bit integers) when the invalid-operation exception (IE) is masked.
There are legacy and extended forms of the instruction:
CVTTSD2SI

The legacy form of the instruction has two encodings:
• When REX.W = 0, converts a scalar double-precision floating-point value in the low-order 64 bits
of an XMM register or a 64-bit memory location to a 32-bit signed integer and writes the converted
value to a 32-bit general purpose register.
• When REX.W = 1, converts a scalar double-precision floating-point value in the low-order 64 bits
of an XMM register or a 64-bit memory location to a 64-bit sign-extended integer and writes the
converted value to a 64-bit general purpose register.
VCVTTSD2SI

The extended form of the instruction has two 128-bit encodings.
• When VEX.W = 0, converts a scalar double-precision floating-point value in the low-order 64 bits
of an XMM register or a 64-bit memory location to a 32-bit signed integer and writes the converted
value to a 32-bit general purpose register.
• When VEX.W = 1, converts a scalar double-precision floating-point value in the low-order 64 bits
of an XMM register or a 64-bit memory location to a 64-bit sign-extended integer and writes the
converted value to a 64-bit general purpose register.
Instruction Support
Form

Subset

CVTTSD2SI

SSE2

VCVTTSD2SI

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

Instruction Reference

CVTTSD2SI, VCVTTSD2SI

117

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Encoding
Mnemonic
CVTTSD2SI reg32, xmm1/mem64
CVTTSD2SI reg64, xmm1/mem64

Opcode
Description
F2 (W0) 0F 2C /r Converts a packed double-precision floating-point
value in xmm1 or mem64 to a doubleword integer in
reg32. Truncates inexact result.
F2 (W1) 0F 2C /r Converts a packed double-precision floating-point
value in xmm1 or mem64 to a quadword integer in
reg64.Truncates inexact result.

Mnemonic

Encoding
VEX RXB.map_select W.vvvv.L.pp
C4
RXB.00001
0.1111.X.11
C4
RXB.00001
1.1111.X.11

VCVTTSD2SI reg32, xmm2/mem64
VCVTTSD2SI reg64, xmm2/mem64

Opcode
2C /r
2C /r

Related Instructions
(V)CVTDQ2PD, (V)CVTPD2DQ, (V)CVTPI2PD, (V)CVTSD2SI, (V)CVTSI2SD,
(V)CVTTPD2DQ
MXCSR Flags Affected
MM

FZ

17

15

RC

PM

UM

OM

ZM

DM

IM

DAZ

12

11

10

9

8

7

6

PE

UE

OE

ZE

DE

4

3

2

1

M
Note:

118

14

13

5

IE
M
0

A flag that may be set or cleared is M (modified). Unaffected flags are blank.

CVTTSD2SI, VCVTTSD2SI

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
A
X

S

S

X

S
S
S

S
S
S
S
S

X
X
X
X
X
X

S

S

X

S
S
S

S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

X
X
X

A source operand was an SNaN value.
Undefined operation.
A result could not be represented exactly in the destination format.

CVTTSD2SI, VCVTTSD2SI

119

AMD64 Technology

26568—Rev. 3.22—May 2018

CVTTSS2SI
Convert Scalar Single-Precision Floating-Point
VCVTTSS2SI to Signed Double or Quadword Integer, Truncated
Converts a single-precision floating-point value to a signed integer value and writes the converted
value to a general-purpose register.
When the result of the conversion is an inexact value, the value is truncated (rounded toward zero).
When the floating-point value is a NaN, infinity, or the result of the conversion is larger than the maximum signed doubleword (–231 to +231 – 1) or quadword value (–263 to +263 – 1), the indefinite integer value (8000_0000h for 32-bit integers, 8000_0000_0000_0000h for 64-bit integers) is returned
when the invalid-operation exception (IE) is masked.
There are legacy and extended forms of the instruction:
CVTTSS2SI

The legacy form of the instruction has two encodings:
• When REX.W = 0, converts a single-precision floating-point value in the low-order 32 bits of an
XMM register or a 32-bit memory location to a 32-bit signed integer value and writes the
converted value to a 32-bit general-purpose register. Bits [255:128] of the YMM register that
corresponds to the source are not affected.
• When REX.W = 1, converts a single-precision floating-point value in the low-order 32 bits of an
XMM register or a 32-bit memory location to a 64-bit signed integer value and writes the
converted value to a 64-bit general-purpose register. Bits [255:128] of the YMM register that
corresponds to the source are not affected.
VCVTTSS2SI

The extended form of the instruction has two 128-bit encodings:
• When VEX.W = 0, converts a single-precision floating-point value in the low-order 32 bits of an
XMM register or a 32-bit memory location to a 32-bit signed integer value and writes the
converted value to a 32-bit general-purpose register. Bits [255:128] of the YMM register that
corresponds to the source are cleared.
• When VEX.W = 1, converts a single-precision floating-point value in the low-order 32 bits of an
XMM register or a 32-bit memory location to a 64-bit signed integer value and writes the
converted value to a 64-bit general-purpose register. Bits [255:128] of the YMM register that
corresponds to the source are cleared.
Instruction Support
Form

Subset

Feature Flag

CVTTSS2SI

SSE1

CPUID Fn0000_0001_EDX[SSE] (bit 25)

VCVTTSS2SI

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

120

CVTTSS2SI, VCVTTSS2SI

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic

Opcode

Description

CVTTSS2SI reg32, xmm1/mem32

F3 (W0) 0F 2C /r Converts a single-precision floating-point value in
xmm1 or mem32 to a 32-bit integer value in reg32.
Truncates inexact result.

CVTTSS2SI reg64, xmm1/mem64

F3 (W1) 0F 2C /r Converts a single-precision floating-point value in
xmm1 or mem64 to a 64-bit integer value in reg64.
Truncates inexact result.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VCVTTSS2SI reg32, xmm1/mem32

C4

RXB.00001

0.1111.X.10

2C /r

VCVTTSS2SI reg64, xmm1/mem64

C4

RXB.00001

1.1111.X.10

2C /r

Related Instructions
(V)CVTDQ2PS, (V)CVTPS2DQ, (V)CVTSI2SS, (V)CVTSS2SI, (V)CVTTPS2DQ
MXCSR Flags Affected
MM

FZ

RC

PM

UM

OM

ZM

DM

IM

DAZ

PE

UE

OE

ZE

DE

M
17
Note:

15

14

13

12

11

10

9

8

7

6

5

IE
M

4

3

2

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

Instruction Reference

CVTTSS2SI, VCVTTSS2SI

121

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
A
X

S

S

X

S
S
S

S
S
S
S
S

X
X
X
X
X
X

S

S

X

S
S
S

S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

122

X
X
X

A source operand was an SNaN value.
Undefined operation.
A result could not be represented exactly in the destination format.

CVTTSS2SI, VCVTTSS2SI

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

DIVPD
VDIVPD

Divide
Packed Double-Precision Floating-Point

Divides each of the packed double-precision floating-point values of the first source operand by the
corresponding packed double-precision floating-point values of the second source operand and writes
the quotients to the destination.
There are legacy and extended forms of the instruction:
DIVPD

Divides two packed double-precision floating-point values in the first source XMM register by the
corresponding packed double-precision floating-point values in either a second source XMM register
or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VDIVPD

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Divides two packed double-precision floating-point values in the first source XMM register by the
corresponding packed double-precision floating-point values in either a second source XMM register
or a 128-bit memory location and writes the two results a destination XMM register. Bits [255:128] of
the YMM register that corresponds to the destination are cleared.
YMM Encoding

Divides four packed double-precision floating-point values in the first source YMM register by the
corresponding packed double-precision floating-point values in either a second source YMM register
or a 256-bit memory location and writes the two results a destination YMM register.
Instruction Support
Form

Subset

DIVPD

SSE2

VDIVPD

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
DIVPD xmm1, xmm2/mem128

Opcode
66 0F 5E /r

Description
Divides packed double-precision floating-point values in
xmm1 by the packed double-precision floating-point
values in xmm2 or mem128. Writes quotients to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VDIVPD xmm1, xmm2, xmm3/mem128

C4

RXB.00001

X.src.0.01

5E /r

VDIVPD ymm1, ymm2, ymm3/mem256

C4

RXB.00001

X.src.1.01

5E /r

Instruction Reference

DIVPD, VDIVPD

123

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)DIVPS, (V)DIVSD, (V)DIVSS
MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM

OM

11

10

ZM
9

DM
8

IM
7

DAZ
6

PE

UE

OE

ZE

DE

IE

M

M

M

M

M

M

5

4

3

2

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S

X
X
X
S
X

S

S

S

S

A
X

S

S

X

S
S
S
S
S
S
S

S
S
S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Division by zero, ZE
Overflow, OE
Underflow, UE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

124

X
X
X
X
X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Division of finite dividend by zero-value divisor.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

DIVPD, VDIVPD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

DIVPS
VDIVPS

Divide
Packed Single-Precision Floating-Point

Divides each of the packed single-precision floating-point values of the first source operand by the
corresponding packed single-precision floating-point values of the second source operand and writes
the quotients to the destination.
There are legacy and extended forms of the instruction:
DIVPS

Divides four packed single-precision floating-point values in the first source XMM register by the
corresponding packed single-precision floating-point values in either a second source XMM register
or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VDIVPS

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Divides four packed single-precision floating-point values in the first source XMM register by the
corresponding packed single-precision floating-point values in either a second source XMM register
or a 128-bit memory location and writes two results to a third destination XMM register. Bits
[255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

Divides eight packed single-precision floating-point values in the first source YMM register by the
corresponding packed single-precision floating-point values in either a second source YMM register
or a 256-bit memory location and writes the two results a destination YMM register.
Instruction Support
Form

Subset

Feature Flag

DIVPS

SSE1

CPUID Fn0000_0001_EDX[SSE] (bit 25)

VDIVPS

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

DIVPS xmm1, xmm2/mem128

0F 5E /r

Description
Divides packed single-precision floating-point values in
xmm1 by the corresponding values in xmm2 or mem128.
Writes quotients to xmm1

Mnemonic

Encoding
VEX RXB.map_select

W.vvvv.L.pp

Opcode

VDIVPS xmm1, xmm2, xmm3/mem128

C4

RXB.00001

X.src.0.00

5E /r

VDIVPS ymm1, ymm2, ymm3/mem256

C4

RXB.00001

X.src.1.00

5E /r

Instruction Reference

DIVPS, VDIVPS

125

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)DIVPD, (V)DIVSD, (V)DIVSS
MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM

OM

11

10

ZM
9

DM
8

IM
7

DAZ
6

PE

UE

OE

ZE

DE

IE

M

M

M

M

M

M

5

4

3

2

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S

X
X
X
S
X

S

S

S

S

A
X

S

S

X

S
S
S
S
S
S
S

S
S
S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Division by zero, ZE
Overflow, OE
Underflow, UE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

126

X
X
X
X
X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Division of finite dividend by zero-value divisor.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

DIVPS, VDIVPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

DIVSD
VDIVSD

Divide
Scalar Double-Precision Floating-Point

Divides the double-precision floating-point value in the low-order quadword of the first source operand by the double-precision floating-point value in the low-order quadword of the second source
operand and writes the quotient to the low-order quadword of the destination.
There are legacy and extended forms of the instruction:
DIVSD
The first source operand is an XMM register and the second source operand is either an XMM register or a 64-bit memory location. The first source register is also the destination register. Bits [127:64]
of the destination are not affected. Bits [255:128] of the YMM register that corresponds to the destination are not affected.
VDIVSD
The extended form of the instruction has a 128-bit encoding only.
The first source operand is an XMM register and the second source operand is either an XMM register or a 64-bit memory location. Bits [127:64] of the first source operand are copied to bits [127:64] of
the destination. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

DIVSD

SSE2

VDIVSD

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
DIVSD xmm1, xmm2/mem64

Opcode

Description

F2 0F 5E /r

Divides the double-precision floating-point value in the loworder 64 bits of xmm1by the corresponding value in xmm2
or mem64. Writes quotient to xmm1.

Mnemonic
VDIVSD xmm1, xmm2, xmm3/mem64

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.00001

X.src.X.11

5E /r

Related Instructions
(V)DIVPD, (V)DIVPS, (V)DIVSS

Instruction Reference

DIVSD, VDIVSD

127

AMD64 Technology

26568—Rev. 3.22—May 2018

MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM

OM

11

10

ZM
9

DM
8

IM
7

DAZ
6

PE

UE

OE

ZE

DE

IE

M

M

M

M

M

M

5

4

3

2

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S

S
S
S
S
S

X
X
X
X
X
X

S

S

X

S
S
S
S
S
S
S

S
S
S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Division by zero, ZE
Overflow, OE
Underflow, UE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

128

X
X
X
X
X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Division of finite dividend by zero-value divisor.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

DIVSD, VDIVSD

Instruction Reference

26568—Rev. 3.22—May 2018

DIVSS
VDIVSS

AMD64 Technology

Divide Scalar Single-Precision Floating-Point

Divides the single-precision floating-point value in the low-order doubleword of the first source operand by the single-precision floating-point value in the low-order doubleword of the second source
operand and writes the quotient to the low-order doubleword of the destination.
There are legacy and extended forms of the instruction:
DIVSS
The first source operand is an XMM register and the second source operand is either an XMM register or a 32-bit memory location. The first source register is also the destination register. Bits [127:32]
of the destination are not affected. Bits [255:128] of the YMM register that corresponds to the destination are not affected.
VDIVSS
The extended form of the instruction has a 128-bit encoding only.
The first source operand is an XMM register and the second source operand is either an XMM register or a 64-bit memory location. The destination is a third XMM register. Bits [127:32] of the first
source operand are copied to bits [127:32] of the destination. Bits [255:128] of the YMM register that
corresponds to the destination are cleared.
Instruction Support
Form

Subset

Feature Flag

DIVSS

SSE1

CPUID Fn0000_0001_EDX[SSE] (bit 25)

VDIVSS

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
DIVSS xmm1, xmm2/mem32

Opcode
F3 0F 5E /r

Description
Divides a single-precision floating-point value in the loworder doubleword of xmm1 by a corresponding value in
xmm2 or mem32. Writes the quotient to xmm1.

Mnemonic
VDIVSS xmm1, xmm2, xmm3/mem32

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.00001

X.src.X.10

5E /r

Related Instructions
(V)DIVPD, (V)DIVPS, (V)DIVSD

Instruction Reference

DIVSS, VDIVSS

129

AMD64 Technology

26568—Rev. 3.22—May 2018

MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM

OM

11

10

ZM
9

DM
8

IM
7

DAZ
6

PE

UE

OE

ZE

DE

IE

M

M

M

M

M

M

5

4

3

2

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S

S
S
S
S
S

X
X
X
X
X
X

S

S

X

S
S
S
S
S
S
S

S
S
S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Division by zero, ZE
Overflow, OE
Underflow, UE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

130

X
X
X
X
X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Division of finite dividend by zero-value divisor.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

DIVSS, VDIVSS

Instruction Reference

26568—Rev. 3.22—May 2018

DPPD
VDPPD

AMD64 Technology

Dot Product
Packed Double-Precision Floating-Point

Computes the dot-product of the input operands. An immediate operand specifies both the input values and the destination locations to which the products are written.
Selectively multiplies packed double-precision values in a source operand by the corresponding values in a second source operand, writes the results to a temporary location, adds the results, writes the
sum to a second temporary location and selectively writes the sum to a destination.
Mask bits [5:4] of an 8-bit immediate operand perform multiplicative selection. Bit 5 selects bits
[127:64] of the source operands; bit 4 selects bits [63:0] of the source operands. When a mask bit = 1,
the corresponding packed double-precision floating point values are multiplied and the product is
written to the corresponding position of a 128-bit temporary location. When a mask bit = 0, the corresponding position of the temporary location is cleared.
After the two 64-bit values in the first temporary location are added and written to the 64-bit second
temporary location, mask bits [1:0] of the same 8-bit immediate operand perform write selection. Bit
1 selects bits [127:64] of the destination; bit 0 selects bits [63:0] of the destination. When a mask bit =
1, the 64-bit value of the second temporary location is written to the corresponding position of the
destination. When a mask bit = 0, the corresponding position of the destination is cleared.
When the operation produces a NaN, its value is determined as follows.
Source Operands (in either order)

Note:

NaN Result1

QNaN

Any non-NaN floating-point value
(or single-operand instruction)

Value of QNaN

SNaN

Any non-NaN floating-point value
(or single-operand instruction)

Value of SNaN,
converted to a QNaN2

QNaN

QNaN

First operand

QNaN

SNaN

First operand
(converted to QNaN if SNaN

SNaN

SNaN

First operand
converted to a QNaN2

1. A NaN result produced when the floating-point invalid-operation exception is masked.
2. The conversion is done by changing the most-significant fraction bit to 1.

For each addition occurring in either the second or third step, for the purpose of NaN propagation, the
addend of lower bit index is considered to be the first of the two operands. For example, when both
multiplications produce NaNs, the one that corresponds to bits [64:0] is written to all indicated fields
of the destination, regardless of how those NaNs were generated from the sources. When the highorder multiplication produces NaNs and the low-order multiplication produces infinities of opposite
signs, the real indefinite QNaN (produced as the sum of the infinities) is written to the destination.
NaNs in source operands or in computational results result in at least one NaN in the destination. For
the 256-bit version, NaNs are propagated within the two independent dot product operations only to
their respective 128-bit results.

Instruction Reference

DPPD, VDPPD

131

AMD64 Technology

26568—Rev. 3.22—May 2018

There are legacy and extended forms of the instruction:
DPPD

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VDPPD

The extended form of the instruction has a single 128-bit encoding.
The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

DPPD

SSE4.1

VDPPD

AVX

Feature Flag
CPUID Fn0000_0001_ECX[SSE41] (bit 19)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
DPPD xmm1, xmm2/mem128, imm8

Opcode

Description

66 0F 3A 41 /r ib

Selectively multiplies packed double-precision
floating-point values in xmm2 or mem128 by
corresponding values in xmm1, adds interim
products, selectively writes results to xmm1.

Mnemonic

Encoding

VDPPD xmm1, xmm2, xmm3/mem128, imm8

VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.00011

X.src.0.01

41 /r ib

Related Instructions
(V)DPPS
MXCSR Flags Affected
MM
17
Note:

132

FZ
15

RC
14

PM
13

12

UM
11

OM
10

ZM
9

DM
8

IM
7

DAZ
6

PE

UE

OE

M

M

M

5

4

3

ZE
2

DE

IE

M

M

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.
Exceptions are determined separately for each add-multiply operation.
Unmasked exceptions do not affect the destination

DPPD, VDPPD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S

X
X
X
S
X

S

S

S

S

A
X

S

S

X

S
S
S
S
S
S

S
S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

X
X
X
X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

DPPD, VDPPD

133

AMD64 Technology

DPPS
VDPPS

26568—Rev. 3.22—May 2018

Dot Product
Packed Single-Precision Floating-Point

Computes the dot-product of the input operands. An immediate operand specifies both the input values and the destination locations to which the products are written.
Selectively multiplies packed single-precision values in a source operand by corresponding values in
a second source operand, writes results to a temporary location, adds pairs of results, writes the sums
to additional temporary locations, and selectively writes a cumulative sum to a destination.
Mask bits [7:4] of an 8-bit immediate operand perform multiplicative selection. Each bit selects a 32bit segment of the source operands; bit 7 selects bits [127:96], bit 6 selects bits [95:64], bit 5 selects
bits [63:32], and bit 4 selects bits [31:0]. When a mask bit = 1, the corresponding packed single-precision floating point values are multiplied and the product is written to the corresponding position of a
128-bit temporary location. When a mask bit = 0, the corresponding position of the temporary location is cleared.
After multiplication, three pairs of 32-bit values are added and written to temporary locations.
Bits [63:32] and [31:0] of temporary location 1 are added and written to 32-bit temporary location 2;
bits [127:96] and [95:64] of temporary location 1 are added and written to 32-bit temporary location
3; then the contents of temporary locations 2 and 3 are added and written to 32-bit temporary location
4.
After addition, mask bits [3:0] of the same 8-bit immediate operand perform write selection. Each bit
selects a 32-bit segment of the source operands; bit 3 selects bits [127:96], bit 2 selects bits [95:64],
bit 1 selects bits [63:32], and bit 0 selects bits [31:0] of the destination. When a mask bit = 1, the 64bit value of the fourth temporary location is written to the corresponding position of the destination.
When a mask bit = 0, the corresponding position of the destination is cleared.
For the 256-bit extended encoding, this process is performed on the upper and lower 128 bits of the
affected YMM registers.
When the operation produces a NaN, its value is determined as follows.
Source Operands (in either order)

Note:

NaN Result1

QNaN

Any non-NaN floating-point value
(or single-operand instruction)

Value of QNaN

SNaN

Any non-NaN floating-point value
(or single-operand instruction)

Value of SNaN,
converted to a QNaN2

QNaN

QNaN

First operand

QNaN

SNaN

First operand
(converted to QNaN if SNaN

SNaN

SNaN

First operand
converted to a QNaN2

1. A NaN result produced when the floating-point invalid-operation exception is masked.
2. The conversion is done by changing the most-significant fraction bit to 1.

For each addition occurring in either the second or third step, for the purpose of NaN propagation, the
addend of lower bit index is considered to be the first of the two operands. For example, when all four
multiplications produce NaNs, the one that corresponds to bits [31:0] is written to all indicated fields

134

DPPS, VDPPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

of the destination, regardless of how those NaNs were generated from the sources. When the two
highest-order multiplication produce NaNs and the two lowest-low-order multiplications produce
infinities of opposite signs, the real indefinite QNaN (produced as the sum of the infinities) is written
to the destination.
NaNs in source operands or in computational results result in at least one NaN in the destination. For
the 256-bit version, NaNs are propagated within the two independent dot product operations only to
their respective 128-bit results.
There are legacy and extended forms of the instruction:
DPPS

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VDPPS

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register and the second source operand is either a YMM register
or a 256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

DPPS

SSE4.1

VDPPS

AVX

Feature Flag
CPUID Fn0000_0001_ECX[SSE41] (bit 19)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
DPPS xmm1, xmm2/mem128, imm8

Opcode

Description

66 0F 3A 40 /r ib

Selectively multiplies packed single-precision
floating-point values in xmm2 or mem128 by
corresponding values in xmm1, adds interim
products, selectively writes results to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VDPPS xmm1, xmm2, xmm3/mem128, imm8

C4

RXB.00011

X.src.0.01

40 /r ib

VDPPS ymm1, ymm2, ymm3/mem256, imm8

C4

RXB.00011

X.src.1.01

40 /r ib

Related Instructions
(V)DPPD

Instruction Reference

DPPS, VDPPS

135

AMD64 Technology

26568—Rev. 3.22—May 2018

MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM

OM

11

10

ZM
9

DM
8

IM
7

DAZ
6

PE

UE

OE

M

M

M

5

4

3

ZE
2

DE

IE

M

M

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.
Exceptions are determined separately for each add-multiply operation.
Unmasked exceptions do not affect the destination

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S

X
X
X
S
X

S

S

S

S

A
X

S

S

X

S
S
S
S
S
S

S
S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

136

X
X
X
X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

DPPS, VDPPS

Instruction Reference

26568—Rev. 3.22—May 2018

EXTRACTPS
VEXTRACTPS

AMD64 Technology

Extract
Packed Single-Precision Floating-Point

Copies one of four packed single-precision floating-point values from a source XMM register to a
general purpose register or a 32-bit memory location.
Bits [1:0] of an immediate byte operand specify the location of the 32-bit value that is copied. 00b
corresponds to the low word of the source register and 11b corresponds to the high word of the source
register. Bits [7:2] of the immediate operand are ignored.
There are legacy and extended forms of the instruction:
EXTRACTPS

The source operand is an XMM register. The destination can be a general purpose register or a 32-bit
memory location. A 32-bit single-precision value extracted to a general purpose register is zeroextended to 64-bits.
VEXTRACTPS

The extended form of the instruction has a single 128-bit encoding.
The source operand is an XMM register. The destination can be a general purpose register or a 32-bit
memory location.
Instruction Support
Form

Subset

EXTRACTPS

SSE4.1

VEXTRACTPS

AVX

Feature Flag
CPUID Fn0000_0001_ECX[SSE41] (bit 19)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

EXTRACTPS reg32/mem32, xmm1
imm8

66 0F 3A 17 /r ib

Description
Extract the single-precision floating-point
element of xmm1 specified by imm8 to
reg32/mem32.

Mnemonic

Encoding

VEXTRACTPS reg32/mem32, xmm1, imm8

VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.00011

X.1111.0.01

17 /r ib

Related Instructions
(V)INSERTPS

Instruction Reference

EXTRACTPS, VEXTRACTPS

137

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S
S

S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — AVX and SSE exception
A — AVX exception
S — SSE exception

138

S
S

X
S
S
A
A
A
A
A
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Write to a read-only data segment.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

EXTRACTPS, VEXTRACTPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

EXTRQ

Extract Field From Register

Extracts specified bits from the lower 64 bits of the first operand (the destination XMM register). The
extracted bits are saved in the least-significant bit positions of the lower quadword of the destination;
the remaining bits in the lower quadword of the destination register are cleared to 0. The upper quadword of the destination register is undefined.
The portion of the source data being extracted is defined by the bit index and the field length. The bit
index defines the least-significant bit of the source operand being extracted. Bits [bit index + length
field – 1]:[bit index] are extracted. If the sum of the bit index + length field is greater than 64, the
results are undefined.
For example, if the bit index is 32 (20h) and the field length is 16 (10h), then the result in the destination register will be source [47:32] in bits 15:0, with zeros in bits 63:16.
A value of zero in the field length is defined as a length of 64. If the length field is 0 and the
bit index is 0, bits 63:0 of the source are extracted. For any other value of the bit index, the results are
undefined.
The bit index and field length can be specified as immediate values (second and first immediate operands, respectively, in the case of the three argument version of the instruction), or they can both be
specified by fields in an XMM source operand. In the latter case, bits [5:0] of the XMM register specify the number of bits to extract (the field length) and bits [13:8] of the XMM register specify the
index of the first bit in the field to extract. The bit index and field length are each six bits in length;
other bits of the field are ignored.
The diagram below illustrates the operation of this instruction.
XMM1
127

second imm8

64 63

7 5

0

0

first imm8
7 5

0

shift right
mask to field length

XMM1
127

XMM2

64 63

0

127

13 8

5 0

shift right
mask to field length

Instruction Reference

EXTRQ

139

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Support
Form

Subset

EXTRQ

SSE4A

Feature Flag
CPUID Fn8000_0001_ECX[SSE4A] (bit 6)

Software must check the CPUID bit once per program or library initialization before using the
instruction, or inconsistent behavior may result. For more on using the CPUID instruction to obtain
processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

EXTRQ xmm1, imm8, imm8

EXTRQ xmm1, xmm2

Description

66 0F 78 /0 ib ib

Extract field from xmm1, with the least significant bit
of the extracted data starting at the bit index
specified by [5:0] of the second immediate byte, with
the length specified by [5:0] of the first immediate
byte.

66 0F 79 /r

Extract field from xmm1, with the least significant bit
of the extracted data starting at the bit index
specified by xmm2[13:8], with the length specified
by xmm2[5:0].

Related Instructions
INSERTQ, PINSRW, PEXTRW
rFLAGS Affected
None
Exceptions
Exception

Invalid opcode, #UD

Device not available,
#NM

140

Real

Virtual
8086 Protected

Cause of Exception

X

X

X

SSE4A instructions are not supported, as indicated by
CPUID Fn8000_0001_ECX[SSE4A] = 0.

X

X

X

The emulate bit (EM) of CR0 was set to 1.

X

X

X

The operating-system FXSAVE/FXRSTOR support bit
(OSFXSR) of CR4 is cleared to 0.

X

X

X

The task-switch bit (TS) of CR0 was set to 1.

EXTRQ

Instruction Reference

26568—Rev. 3.22—May 2018

HADDPD
VHADDPD

AMD64 Technology

Horizontal Add
Packed Double-Precision Floating-Point

Adds adjacent pairs of double-precision floating-point values in two source operands and writes the
sums to a destination.
There are legacy and extended forms of the instruction:
HADDPD

Adds the packed double-precision values in bits [127:64] and bits [63:0] of the first source XMM register and writes the sum to bits [63:0] of the destination; adds the corresponding doublewords of the
second source XMM register or a 128-bit memory location and writes the sum to bits [127:64] of the
destination. The first source register is also the destination. Bits [255:128] of the YMM register that
corresponds to the destination are not affected.
VHADDPD

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Adds the packed double-precision values in bits [127:64] and bits [63:0] of the first source XMM register and writes the sum to bits [63:0] of the destination XMM register; adds the corresponding doublewords of the second source XMM register or a 128-bit memory location and writes the sum to bits
[127:64] of the destination. Bits [255:128] of the YMM register that corresponds to the destination
are cleared.
YMM Encoding

Adds the packed double-precision values in bits [127:64] and bits [63:0] of the of the first source
YMM register and writes the sum to bits [63:0] of the destination YMM register; adds the corresponding doublewords of the second source YMM register or a 256-bit memory location and writes
the sum to bits [127:64] of the destination. Performs the same process for the upper 128 bits of the
sources and destination.
Instruction Support
Form

Subset

Feature Flag

HADDPD

SSE3

CPUID Fn0000_0001_ECX[SSE3] (bit 0)

VHADDPD

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
HADDPD xmm1, xmm2/mem128

Opcode

Description

66 0F 7C /r

Adds adjacent pairs of double-precision values in xmm1
and xmm2 or mem128. Writes the sums to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VHADDPD xmm1, xmm2, xmm3/mem128

C4

RXB.00001

X.src.0.01

7C /r

VHADDPD ymm1, ymm2, ymm3/mem256

C4

RXB.00001

X.src.1.01

7C /r

Instruction Reference

HADDPD, VHADDPD

141

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)HADDPS, (V)HSUBPD, (V)HSUBPS
MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM

OM

11

10

ZM
9

DM
8

IM

DAZ

7

6

PE

UE

OE

M

M

M

5

4

3

ZE
2

DE

IE

M

M

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S

X
X
X
S
X

S

S

S

S

A
X

S

S

X

S
S
S
S
S
S

S
S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

142

X
X
X
X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

HADDPD, VHADDPD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

HADDPS
VHADDPS

Horizontal Add
Packed Single-Precision

Adds adjacent pairs of single-precision floating-point values in two source operands and writes the
sums to a destination.
There are legacy and extended forms of the instruction:
HADDPS

Adds the packed single-precision values in bits [63:32] and bits [31:0] of the first source XMM register and writes the sum to bits [31:0] of the destination; adds the packed single-precision values in bits
[127:96] and bits [95:64] of the first source register and writes the sum to bits [63:32] of the destination. Adds the corresponding values in the second source XMM register or a 128-bit memory location
and writes the sum to bits [95:64] and [127:96] of the destination. The first source register is also the
destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected.
VHADDPS

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Adds the packed single-precision values in bits [63:32] and bits [31:0] of the first source XMM register and writes the sum to bits [31:0] of the destination XMM register; adds the packed single-precision values in bits [127:96] and bits [95:64] of the first source register and writes the sum to bits
[63:32] of the destination. Adds the corresponding values in the second source XMM register or a
128-bit memory location and writes the sum to bits [95:64] and [127:96] of the destination. Bits
[255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

Adds the packed single-precision values in bits [63:32] and bits [31:0] of the first source YMM register and writes the sum to bits [31:0] of the destination YMM register; adds the packed single-precision values in bits [127:96] and bits [95:64] of the first source register and writes the sum to bits
[63:32] of the destination. Adds the corresponding values in the second source YMM register or a
256-bit memory location and writes the sums to bits [95:64] and [127:96] of the destination. Performs
the same process for the upper 128 bits of the sources and destination.
Instruction Support
Form

Subset

Feature Flag

HADDPS

SSE3

CPUID Fn0000_0001_ECX[SSE3] (bit 0)

VHADDPS

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

Instruction Reference

HADDPS, VHADDPS

143

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Encoding
Mnemonic
HADDPS xmm1, xmm2/mem128

Opcode
F2 0F 7C /r

Mnemonic
VHADDPS xmm1, xmm2, xmm3/mem128
VHADDPS ymm1, ymm2, ymm3/mem256

Description
Adds adjacent pairs of single-precision values in xmm1
and xmm2 or mem128. Writes the sums to xmm1.
Encoding
VEX RXB.map_select W.vvvv.L.pp Opcode
X.src.0.11
7C /r
C4
RXB.00001
X.src.1.11
7C /r
C4
RXB.00001

Related Instructions
(V)HADDPD, (V)HSUBPD, (V)HSUBPS
MXCSR Flags Affected
MM

FZ

17

15

Note:

144

RC
14

13

PM

UM

OM

ZM

DM

IM

DAZ

12

11

10

9

8

7

6

PE

UE

OE

M

M

M

5

4

3

ZE
2

DE

IE

M

M

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

HADDPS, VHADDPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S

X
X
X
S
X

S

S

S

S

A
X

S

X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
SIMD floating-point, #XF

S

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

S
S
S
S
S
S

S
S
S
S
S
S

X
X
X
X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

HADDPS, VHADDPS

145

AMD64 Technology

HSUBPD
VHSUBPD

26568—Rev. 3.22—May 2018

Horizontal Subtract
Packed Double-Precision

Subtracts adjacent pairs of double-precision floating-point values in two source operands and writes
the sums to a destination.
There are legacy and extended forms of the instruction:
HSUBPD

The first source register is also the destination.
Subtracts the packed double-precision value in bits [127:64] from the value in bits [63:0] of the first
source XMM register and writes the difference to bits [63:0] of the destination; subtracts the corresponding values of the second source XMM register or a 128-bit memory location and writes the difference to bits [127:64] of the destination. Bits [255:128] of the YMM register that corresponds to the
destination are not affected.
VHSUBPD

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Subtracts the packed double-precision values in bits [127:64] from the value in bits [63:0] of the first
source XMM register and writes the difference to bits [63:0] of the destination XMM register; subtracts the corresponding values of the second source XMM register or a 128-bit memory location and
writes the difference to bits [127:64] of the destination. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

Subtracts the packed double-precision values in bits [127:64] from the value in bits [63:0] of the of
the first source YMM register and writes the difference to bits [63:0] of the destination YMM register; subtracts the corresponding values of the second source YMM register or a 256-bit memory location and writes the difference to bits [127:64] of the destination. Performs the same process for the
upper 128 bits of the sources and destination.
Instruction Support
Form

Subset

Feature Flag

HSUBPD

SSE3

CPUID Fn0000_0001_ECX[SSE3] (bit 0)

VHSUBPD

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

146

HSUBPD, VHSUBPD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic
HSUBPD xmm1, xmm2/mem128

Opcode

Description

66 0F 7D /r

Subtracts adjacent pairs of double-precision floatingpoint values in xmm1 and xmm2 or mem128. Writes the
differences to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VHSUBPD xmm1, xmm2, xmm3/mem128

C4

RXB.00001

X.src.0.01

7D /r

VHSUBPD ymm1, ymm2, ymm3/mem256

C4

RXB.00001

X.src.1.01

7D /r

Related Instructions
(V)HSUBPS, (V)HADDPD, (V)HADDPS
MXCSR Flags Affected
MM

FZ

17

15

Note:

RC
14

13

PM

UM

OM

ZM

DM

IM

DAZ

12

11

10

9

8

7

6

PE

UE

OE

M

M

M

5

4

3

ZE
2

DE

IE

M

M

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

Instruction Reference

HSUBPD, VHSUBPD

147

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S

X
X
X
S
X

S

S

S

S

A
X

S

X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
SIMD floating-point, #XF

S

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

148

S
S
S
S
S
S

S
S
S
S
S
S

X
X
X
X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

HSUBPD, VHSUBPD

Instruction Reference

26568—Rev. 3.22—May 2018

HSUBPS
VHSUBPS

AMD64 Technology

Horizontal Subtract Packed Single

Subtracts adjacent pairs of single-precision floating-point values in two source operands and writes
the differences to a destination.
There are legacy and extended forms of the instruction:
HSUBPS

Subtracts the packed single-precision values in bits [63:32] from the values in bits [31:0] of the first
source XMM register and writes the difference to bits [31:0] of the destination; subtracts the packed
single-precision values in bits [127:96] from the value in bits [95:64] of the first source register and
writes the difference to bits [63:32] of the destination. Subtracts the corresponding values of the second source XMM register or a 128-bit memory location and writes the differences to bits [95:64] and
[127:96] of the destination. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VHSUBPS

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Subtracts the packed single-precision values in bits [63:32] from the value in bits [31:0] of the first
source XMM register and writes the difference to bits [31:0] of the destination XMM register; subtracts the packed single-precision values in bits [127:96] from the value bits [95:64] of the first source
register and writes the sum to bits [63:32] of the destination. Subtracts the corresponding values of the
second source XMM register or a 128-bit memory location and writes the differences to bits [95:64]
and [127:96] of the destination. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

Subtracts the packed single-precision values in bits [63:32] from the value in bits [31:0] of the first
source YMM register and writes the difference to bits [31:0] of the destination YMM register; subtracts the packed single-precision values in bits [127:96] from the value in bits [95:64] of the first
source register and writes the difference to bits [63:32] of the destination. Subtracts the corresponding
values of the second source YMM register or a 256-bit memory location and writes the differences to
bits [95:64] and [127:96] of the destination. Performs the same process for the upper 128 bits of the
sources and destination.
Instruction Support
Form

Subset

Feature Flag

HSUBPS

SSE3

CPUID Fn0000_0001_ECX[SSE3] (bit 0)

VHSUBPS

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

Instruction Reference

HSUBPS, VHSUBPS

149

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Encoding
Mnemonic
HSUBPS xmm1, xmm2/mem128

Opcode
F2 0F 7D /r

Mnemonic
VHSUBPS xmm1, xmm2, xmm3/mem128
VHSUBPS ymm1, ymm2, ymm3/mem256

Description
Subtracts adjacent pairs of values in xmm1 and xmm2
or mem128. Writes differences to xmm1.
Encoding
VEX RXB.map_select W.vvvv.L.pp Opcode
C4
RXB.00001
X.src.0.11
7D /r
C4
RXB.00001
X.src.1.11
7D /r

Related Instructions
(V)HSUBPD, (V)HADDPD, (V)HADDPS
MXCSR Flags Affected
MM
17
Note:

150

FZ
15

RC
14

PM
13

12

UM
11

OM
10

ZM
9

DM
8

IM

DAZ

7

6

PE

UE

OE

M

M

M

5

4

3

ZE
2

DE

IE

M

M

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

HSUBPS, VHSUBPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S

X
X
X
S
X

S

S

S

S

A
X

S

X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
SIMD floating-point, #XF

S

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

S
S
S
S
S
S

S
S
S
S
S
S

X
X
X
X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

HSUBPS, VHSUBPS

151

AMD64 Technology

26568—Rev. 3.22—May 2018

INSERTPS
VINSERTPS

Insert
Packed Single-Precision Floating-Point

Copies a selected single-precision floating-point value from a source operand to a selected location in
a destination register and optionally clears selected elements of the destination. The legacy and
extended forms of the instruction treat the remaining elements of the destination in different ways.
Selections are specified by three fields of an immediate 8-bit operand:
7

6

COUNT_S

5

4

COUNT_D

3

2

1

0

ZMASK

COUNT_S — The binary value of the field specifies a 32-bit element of a source register, counting
upward from the low-order doubleword. COUNT_S is used only for register source; when the source
is a memory operand, COUNT_S = 0.
COUNT_D — The binary value of the field specifies a 32-bit destination element, counting upward
from the low-order doubleword.
ZMASK — Set a bit to clear a 32-bit element of the destination.
There are legacy and extended forms of the instruction:
INSERTPS

The source operand is either an XMM register or a 32-bit memory location. The destination is an
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not
affected.
When the source operand is a register, the instruction copies the 32-bit element of the source specified
by Count_S to the location in the destination specified by Count_D, and clears destination elements
as specified by ZMask. Elements of the destination that are not cleared are not affected.
When the source operand is a memory location, the instruction copies a 32-bit value from memory, to
the location in the destination specified by Count_D, and clears destination elements as specified by
ZMask. Elements of the destination that are not cleared are not affected.
VINSERTPS

The extended form of the instruction has a 128-bit encoding only.
The first source operand is an XMM register and the second source operand is either an XMM register or a 32-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
When the second source operand is a register, the instruction copies the 32-bit element of the source
specified by Count_S to the location in the destination specified by Count_D. The other elements of
the destination are either copied from the first source operand or cleared as specified by ZMask.
When the second source operand is a memory location, the instruction copies a 32-bit value from the
source to the location in the destination specified by Count_D. The other elements of the destination
are either copied from the first source operand or cleared as specified by ZMask.
Instruction Support
Form

Subset

INSERTPS

SSE4.1

VINSERTPS

AVX

152

Feature Flag
CPUID Fn0000_0001_ECX[SSE41] (bit 19)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

INSERTPS, VINSERTPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
INSERTPS xmm1, xmm2/mem32, imm8

Opcode

Description

66 0F 3A 21 /r ib

Insert a selected single-precision floatingpoint value from xmm2 or from mem32 at a
selected location in xmm1 and clear
selected elements of xmm1. Selections
specified by imm8.

Mnemonic

Encoding

VINSERTPS xmm1, xmm2, xmm3/mem128, imm8

VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.00011

X.src.0.01

21 /r ib

Related Instructions
(V)EXTRACTPS
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

S
S

X
S
S
A
A
A
A
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

INSERTPS, VINSERTPS

153

AMD64 Technology

26568—Rev. 3.22—May 2018

INSERTQ

Insert Field

Inserts bits from the lower 64 bits of the source operand into the lower 64 bits of the destination operand. No other bits in the lower 64 bits of the destination are modified. The upper 64 bits of the destination are undefined.
The least-significant l bits of the source operand are inserted into the destination, with the least-significant bit of the source operand inserted at bit position n, where l and n are defined as the field length
and bit index, respectively.
Bits (field length – 1):0 of the source operand are inserted into bits (bit index + field length – 1):(bit
index) of the destination. If the sum of the bit index + length field is greater than 64, the results are
undefined.
For example, if the bit index is 32 (20h) and the field length is 16 (10h), then the result in the destination register will be source operand[15:0] in bits 47:32. Bits 63:48 and bits 31:0 are not modified.
A value of zero in the field length is defined as a length of 64. If the length field is 0 and the bit index
is 0, bits 63:0 of the source operand are inserted. For any other value of the bit index, the results are
undefined.
The bits to insert are located in the XMM2 source operand. The bit index and field length can be specified as immediate values or can be specified in the XMM source operand. In the immediate form, the
bit index and the field length are specified by the fourth (second immediate byte) and third operands
(first immediate byte), respectively. In the register form, the bit index and field length are specified in
bits [77:72] and bits [69:64] of the source XMM register, respectively. The bit index and field length
are each six bits in length; other bits in the field are ignored.
The diagram below illustrates the operation of this instruction.
second
first
imm8
imm8
0
0 7 5 0 7 5

XMM2
XMM1
127

64 63

127

64 63

select number of bits to insert

0

select bit position for insert

XMM1
127

64 63

XMM2
0

127

77
72

69
64 63

0

select number of bits to insert
select bit position for insert

154

INSERTQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Support
Form

Subset

INSERTQ

SSE4A

Feature Flag
CPUID Fn8000_0001_ECX[SSE4A] (bit 6)

Software must check the CPUID bit once per program or library initialization before using the
instruction, or inconsistent behavior may result. For more on using the CPUID instruction to obtain
processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

Description

INSERTQ xmm1, xmm2, imm8,
imm8

F2 0F 78 /r ib ib

Insert field starting at bit 0 of xmm2 with the length
specified by [5:0] of the first immediate byte. This
field is inserted into xmm1 starting at the bit position
specified by [5:0] of the second immediate byte.

INSERTQ xmm1, xmm2

F2 0F 79 /r

Insert field starting at bit 0 of xmm2 with the length
specified by xmm2[69:64]. This field is inserted into
xmm1 starting at the bit position specified by
xmm2[77:72].

Related Instructions
EXTRQ, PINSRW, PEXTRW
rFLAGS Affected
None
Exceptions
Exception

Invalid opcode, #UD

Device not available,
#NM

Instruction Reference

Real

Virtual
8086 Protected

Cause of Exception

X

X

X

SSE4A instructions are not supported, as indicated by
CPUID Fn8000_0001_ECX[SSE4A] = 0.

X

X

X

The emulate bit (EM) of CR0 was set to 1.

X

X

X

The operating-system FXSAVE/FXRSTOR support bit
(OSFXSR) of CR4 is cleared to 0.

X

X

X

The task-switch bit (TS) of CR0 was set to 1.

INSERTQ

155

AMD64 Technology

26568—Rev. 3.22—May 2018

LDDQU
VLDDQU

Load
Unaligned Double Quadword

Loads unaligned double quadwords from a memory location to a destination register.
Like the (V)MOVUPD instructions, (V)LDDQU loads a 128-bit or 256-bit operand from an
unaligned memory location. However, to improve performance when the memory operand is actually
misaligned, (V)LDDQU may read an aligned 16 or 32 bytes to get the first part of the operand, and an
aligned 16 or 32 bytes to get the second part of the operand. This behavior is implementation-specific,
and (V)LDDQU may only read the exact 16 or 32 bytes needed for the memory operand. If the memory operand is in a memory range where reading extra bytes can cause performance or functional
issues, use (V)MOVUPD instead of (V)LDDQU.
Memory operands that are not aligned on 16-byte or 32-byte boundaries do not cause general-protection exceptions.
There are legacy and extended forms of the instruction:
LDDQU

The source operand is an unaligned 128-bit memory location. The destination operand is an XMM
register. Bits [255:128] of the YMM register that corresponds to the destination register are not
affected.
VLDDQU

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

The source operand is an unaligned 128-bit memory location. The destination operand is an XMM
register. Bits [255:128] of the YMM register that corresponds to the destination register are cleared.
YMM Encoding

The source operand is an unaligned 256-bit memory location. The destination operand is a YMM register.
Instruction Support
Form

Subset

Feature Flag

LDDQU

SSE3

CPUID Fn0000_0001_ECX[SSE3] (bit 0)

VLDDQU

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
LDDQU xmm1, mem128

Opcode

Description

F2 0F F0 /r

Loads a 128-bit value from an unaligned mem128 to
xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VLDDQU xmm1, mem128

C4

RXB.00001

X.1111.0.11

F0 /r

VLDDQU ymm1, mem256

C4

RXB.00001

X.1111.1.11

F0 /r

156

LDDQU, VLDDQU

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Related Instructions
(V)MOVDQU
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S
S

S
S
S
S
S

Alignment check, #AC
S
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Instruction Reference

X
S
S
A
A
A
A
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Write to a read-only data segment.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

LDDQU, VLDDQU

157

AMD64 Technology

26568—Rev. 3.22—May 2018

LDMXCSR
VLDMXCSR

Load
MXCSR Control/Status Register

Loads the MXCSR register with a 32-bit value from memory.
For both legacy LDMXCSR and extended VLDMXCSR forms of the instruction, the source operand
is a 32-bit memory location and the destination operand is the MXCSR.
If an MXCSR load clears a SIMD floating-point exception mask bit and sets the corresponding
exception flag bit, a SIMD floating-point exception is not generated immediately. An exception is
generated only when the next instruction that operates on an XMM or YMM register operand and
causes that particular SIMD floating-point exception to be reported executes.
A general protection exception occurs if the instruction attempts to load non-zero values into reserved
MXCSR bits. Software can use MXCSR_MASK to determine which bits are reserved. For details,
see “128-Bit, 64-Bit, and x87 Programming” in Volume 2.
The MXCSR register is described in “Registers” in Volume 1.
Instruction Support
Form

Subset

Feature Flag

LDMXCSR

SSE1

CPUID Fn0000_0001_EDX[SSE] (bit 25)

VLDMXCSR

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

LDMXCSR mem32

0F AE /2

Description
Loads MXCSR register with 32-bit value from memory.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.00001

X.1111.0.00

AE /2

VLDMXCSR mem32

Related Instructions
(V)STMXCSR
MXCSR Flags Affected
MM

FZ

M

M

M

17

15

14

Note:

158

RC
M
13

PM

UM

OM

ZM

DM

IM

DAZ

PE

UE

OE

ZE

DE

IE

M

M

M

M

M

M

M

M

M

M

M

M

M

12

11

10

9

8

7

6

5

4

3

2

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

LDMXCSR, VLDMXCSR

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

X
A

X
A

S
S

S
S

X
S
S
S
S
S

X
S
S
S
S
S
S

X
A
S
S
A
A
A
A
X
X
X
X
S
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
CR0.EM = 1.
CR4.OSFXSR = 0.
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Attempt to load non-zero values into reserved MXCSR bits
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

LDMXCSR, VLDMXCSR

159

AMD64 Technology

26568—Rev. 3.22—May 2018

MASKMOVDQU
VMASKMOVDQU

Masked Move
Double Quadword Unaligned

Moves bytes from the first source operand to a memory location specified by the DS:rDI register.
Bytes are selected by mask bits in the second source operand. The memory location may be
unaligned.
The mask consists of the most significant bit of each byte of the second source register.
When a mask bit = 1, the corresponding byte of the first source register is written to the destination;
when a mask bit = 0, the corresponding byte is not written.
Exception and trap behavior for elements not selected for storage to memory is implementation
dependent. For instance, a given implementation may signal a data breakpoint or a page fault for
bytes that are zero-masked and not actually written.
The instruction implicitly uses weakly-ordered, write-combining buffering for the data, as described
in “Buffering and Combining Memory Writes” in Volume 2. For data that is shared by multiple processors, this instruction should be used together with a fence instruction in order to ensure data coherency (see “Cache and TLB Management” in Volume 2).
There are legacy and extended forms of the instruction:
MASKMOVDQU

The first source operand is an XMM register and the second source operand is an XMM register. The
destination is a 128-bit memory location.
VMASKMOVDQU

The extended form of the instruction has a 128-bit encoding only.
The first source operand is an XMM register and the second source operand is an XMM register. The
destination is a 128-bit memory location.
Instruction Support
Form

Subset

MASKMOVDQU

SSE2

VMASKMOVDQU

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
MASKMOVDQU xmm1, xmm2

Opcode
66 0F F7 /r

Description
Move bytes selected by a mask value in xmm2 from
xmm1 to the memory location specified by DS:rDI.

Mnemonic

Encoding

VMASKMOVDQU xmm1, xmm2

VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.00001

X.1111.0.01

F7 /r

Related Instructions
(V)MASKMOVPD, (V)MASKMOVPS

160

MASKMOVDQU, VMASKMOVDQU

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

S

S

A
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

X
S
S
A
A
A
A
A
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

MASKMOVDQU, VMASKMOVDQU

161

AMD64 Technology

MAXPD
VMAXPD

26568—Rev. 3.22—May 2018

Maximum
Packed Double-Precision Floating-Point

Compares each packed double-precision floating-point value of the first source operand to the corresponding value of the second source operand and writes the numerically greater value into the corresponding location of the destination.
If both source operands are equal to zero, the value of the second source operand is returned. If either
operand is a NaN (SNaN or QNaN), and invalid-operation exceptions are masked, the second source
operand is written to the destination.
There are legacy and extended forms of the instruction:
MAXPD

Compares two pairs of packed double-precision floating-point values.
The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VMAXPD

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Compares two pairs of packed double-precision floating-point values.
The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM
register that corresponds to the destination are cleared.
YMM Encoding

Compares four pairs of packed double-precision floating-point values.
The first source operand is a YMM register and the second source operand is either a YMM register
or a 256-bit memory location. The destination is a YMM register.
Instruction Support
Form

Subset

MAXPD

SSE2

VMAXPD

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

162

MAXPD, VMAXPD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic
MAXPD xmm1, xmm2/mem128

Opcode

Description

66 0F 5F /r

Compares two pairs of packed double-precision values in
xmm1 and xmm2 or mem128 and writes the greater value
to the corresponding position in xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VMAXPD xmm1, xmm2, xmm3/mem128

C4

RXB.00001

X.src.0.01

5F /r

VMAXPD ymm1, ymm2, ymm3/mem256

C4

RXB.00001

X.src.1.01

5F /r

Related Instructions
(V)MAXPS, (V)MAXSD, (V)MAXSS, (V)MINPD, (V)MINPS, (V)MINSD, (V)MINSS
MXCSR Flags Affected
MM

FZ

17

15

Note:

RC
14

13

PM

UM

OM

ZM

DM

IM

DAZ

PE

UE

OE

ZE

12

11

10

9

8

7

6

5

4

3

2

DE

IE

M

M

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

Instruction Reference

MAXPD, VMAXPD

163

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S

X
X
X
S
X

S

S

S

S

A
X

S

X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
SIMD floating-point, #XF

S

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

164

S
S
S

S
S
S

X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.

MAXPD, VMAXPD

Instruction Reference

26568—Rev. 3.22—May 2018

MAXPS
VMAXPS

AMD64 Technology

Maximum
Packed Single-Precision Floating-Point

Compares each packed single-precision floating-point value of the first source operand to the corresponding value of the second source operand and writes the numerically greater value into the corresponding location of the destination.
If both source operands are equal to zero, the value of the second source operand is returned. If either
operand is a NaN (SNaN or QNaN), and invalid-operation exceptions are masked, the second source
operand is written to the destination.
There are legacy and extended forms of the instruction:
MAXPS

Compares four pairs of packed single-precision floating-point values.
The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VMAXPS

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Compares four pairs of packed single-precision floating-point values.
The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM
register that corresponds to the destination are cleared.
YMM Encoding

Compares eight pairs of packed single-precision floating-point values.
The first source operand is a YMM register and the second source operand is either a YMM register
or a 256-bit memory location. The destination is a YMM register.
Instruction Support
Form

Subset

Feature Flag

MAXPS

SSE1

CPUID Fn0000_0001_EDX[SSE] (bit 25)

VMAXPS

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

Instruction Reference

MAXPS, VMAXPS

165

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Encoding
Mnemonic

Opcode

Description

MAXPS xmm1, xmm2/mem128

0F 5F /r

Compares four pairs of packed single-precision values in
xmm1 and xmm2 or mem128 and writes the greater
values to the corresponding positions in xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VMAXPS xmm1, xmm2, xmm3/mem128

C4

RXB.00001

X.src.0.00

5F /r

VMAXPS ymm1, ymm2, ymm3/mem256

C4

RXB.00001

X.src.1.00

5F /r

Related Instructions
(V)MAXPD, (V)MAXSD, (V)MAXSS, (V)MINPD, (V)MINPS, (V)MINSD, (V)MINSS
MXCSR Flags Affected
MM

FZ

17

15

Note:

166

RC
14

13

PM

UM

OM

ZM

DM

IM

DAZ

PE

UE

OE

ZE

12

11

10

9

8

7

6

5

4

3

2

DE

IE

M

M

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

MAXPS, VMAXPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S

X
X
X
S
X

S

S

S

S

A
X

S

X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
SIMD floating-point, #XF

S

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

S
S
S

S
S
S

X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.

MAXPS, VMAXPS

167

AMD64 Technology

26568—Rev. 3.22—May 2018

MAXSD
VMAXSD

Maximum
Scalar Double-Precision Floating-Point

Compares the scalar double-precision floating-point value in the low-order 64 bits of the first source
operand to a corresponding value in the second source operand and writes the numerically greater
value into the low-order 64 bits of the destination.
If both source operands are equal to zero, the value of the second source operand is returned. If either
operand is a NaN (SNaN or QNaN), and invalid-operation exceptions are masked, the second source
operand is written to the destination.
There are legacy and extended forms of the instruction:
MAXSD

The first source operand is an XMM register. The second source operand is either an XMM register or
a 64-bit memory location. The first source register is also the destination. When the second source is
a 64-bit memory location, the upper 64 bits of the first source register are copied to the destination.
Bits [127:64] of the destination are not affected. Bits [255:128] of the YMM register that corresponds
to the destination are not affected.
VMAXSD

The extended form of the instruction has a 128-bit encoding only.
The first source operand is an XMM register and the second source operand is either an XMM register or a 64-bit memory location. The destination is an XMM register. When the second source is a 64bit memory location, the upper 64 bits of the first source register are copied to the destination. Bits
[127:64] of the destination are copied from bits [127:64] of the first source. Bits [255:128] of the
YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

MAXSD

SSE2

VMAXSD

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
MAXSD xmm1, xmm2/mem64

Opcode

Description

F2 0F 5F /r

Compares a pair of scalar double-precision values in the
low-order 64 bits of xmm1 and xmm2 or mem64 and
writes the greater value to the low-order 64 bits of xmm1.

Mnemonic
VMAXSD xmm1, xmm2, xmm3/mem64

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.00001

X.src.X.11

5F /r

Related Instructions
(V)MAXPD, (V)MAXPS, (V)MAXSS, (V)MINPD, (V)MINPS, (V)MINSD, (V)MINSS

168

MAXSD, VMAXSD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM

OM

11

10

ZM
9

DM
8

IM
7

DAZ
6

PE
5

UE
4

OE
3

ZE
2

DE

IE

M

M

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S

S
S
S
S
S

X
X
X
X
X
X

S

S

X

S
S
S

S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.

MAXSD, VMAXSD

169

AMD64 Technology

26568—Rev. 3.22—May 2018

MAXSS
VMAXSS

Maximum
Scalar Single-Precision Floating-Point

Compares the scalar single-precision floating-point value in the low-order 32 bits of the first source
operand to a corresponding value in the second source operand and writes the numerically greater
value into the low-order 32 bits of the destination.
If both source operands are equal to zero, the value of the second source operand is returned. If either
operand is a NaN (SNaN or QNaN), and invalid-operation exceptions are masked, the second source
operand is written to the destination.
There are legacy and extended forms of the instruction:
MAXSS

The first source operand is an XMM register. The second source operand is either an XMM register or
a 32-bit memory location. The first source register is also the destination. Bits [127:32] of the destination are not affected. Bits [255:128] of the YMM register that corresponds to the destination are not
affected.
VMAXSS

The extended form of the instruction has a 128-bit encoding only.
The first source operand is an XMM register and the second source operand is either an XMM register or a 32-bit memory location. The destination is an XMM register. Bits [127:32] of the destination
are copied from the first source operand. Bits [255:128] of the YMM register that corresponds to the
destination are cleared.
Instruction Support
Form

Subset

Feature Flag

MAXSS

SSE1

CPUID Fn0000_0001_EDX[SSE] (bit 25)

VMAXSS

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
MAXSS xmm1, xmm2/mem32

Opcode

Description

F3 0F 5F /r

Compares a pair of scalar single-precision values in the
low-order 32 bits of xmm1 and xmm2 or mem32 and
writes the greater value to the low-order 32 bits of xmm1.

Mnemonic
VMAXSS xmm1, xmm2, xmm3/mem32

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.00001

X.src.X.10

5F /r

Related Instructions
(V)MAXPD, (V)MAXPS, (V)MAXSD, (V)MINPD, (V)MINPS, (V)MINSD, (V)MINSS

170

MAXSS, VMAXSS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM

OM

11

10

ZM
9

DM
8

IM
7

DAZ
6

PE
5

UE
4

OE
3

ZE
2

DE

IE

M

M

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S

S
S
S
S
S

X
X
X
X
X
X

S

S

X

S
S
S

S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.

MAXSS, VMAXSS

171

AMD64 Technology

MINPD
VMINPD

26568—Rev. 3.22—May 2018

Minimum
Packed Double-Precision Floating-Point

Compares each packed double-precision floating-point value of the first source operand to the corresponding value of the second source operand and writes the numerically lesser value into the corresponding location of the destination.
If both source operands are equal to zero, the value of the second source operand is returned. If either
operand is a NaN (SNaN or QNaN), and invalid-operation exceptions are masked, the second source
operand is written to the destination.
There are legacy and extended forms of the instruction:
MINPD

Compares two pairs of packed double-precision floating-point values.
The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VMINPD

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Compares two pairs of packed double-precision floating-point values.
The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM
register that corresponds to the destination are cleared.
YMM Encoding

Compares four pairs of packed double-precision floating-point values.
The first source operand is a YMM register and the second source operand is either a YMM register
or a 256-bit memory location. The destination is a YMM register.
Instruction Support
Form

Subset

MINPD

SSE2

VMINPD

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

172

MINPD, VMINPD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic
MINPD xmm1, xmm2/mem128

Opcode

Description

66 0F 5D /r

Compares two pairs of packed double-precision values in
xmm1 and xmm2 or mem128 and writes the lesser value
to the corresponding position in xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VMINPD xmm1, xmm2, xmm3/mem128

C4

RXB.00001

X.src.0.01

5D /r

VMINPD ymm1, ymm2, ymm3/mem256

C4

RXB.00001

X.src.1.01

5D /r

Related Instructions
(V)MAXPD, (V)MAXPS, (V)MAXSD, (V)MAXSS, (V)MINPS, (V)MINSD, (V)MINSS
MXCSR Flags Affected
MM

FZ

17

15

Note:

RC
14

13

PM

UM

OM

ZM

DM

IM

DAZ

PE

UE

OE

ZE

12

11

10

9

8

7

6

5

4

3

2

DE

IE

M

M

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

Instruction Reference

MINPD, VMINPD

173

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S

X
X
X
S
X

S

S

S

S

A
X

S

X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
SIMD floating-point, #XF

S

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

174

S
S
S

S
S
S

X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.

MINPD, VMINPD

Instruction Reference

26568—Rev. 3.22—May 2018

MINPS
VMINPS

AMD64 Technology

Minimum
Packed Single-Precision Floating-Point

Compares each packed single-precision floating-point value of the first source operand to the corresponding value of the second source operand and writes the numerically lesser value into the corresponding location of the destination.
If both source operands are equal to zero, the value of the second source operand is returned. If either
operand is a NaN (SNaN or QNaN), and invalid-operation exceptions are masked, the second source
operand is written to the destination.
There are legacy and extended forms of the instruction:
MINPS

Compares four pairs of packed single-precision floating-point values.
The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VMINPS

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Compares four pairs of packed single-precision floating-point values.
The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM
register that corresponds to the destination are cleared.
YMM Encoding

Compares eight pairs of packed single-precision floating-point values.
The first source operand is a YMM register and the second source operand is either a YMM register
or a 256-bit memory location. The destination is a YMM register.
Instruction Support
Form

Subset

Feature Flag

MINPS

SSE1

CPUID Fn0000_0001_EDX[SSE] (bit 25)

VMINPS

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

Instruction Reference

MINPS, VMINPS

175

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Encoding
Mnemonic

Opcode

Description

MINPS xmm1, xmm2/mem128

0F 5D /r

Compares four pairs of packed single-precision values in
xmm1 and xmm2 or mem128 and writes the lesser values
to the corresponding positions in xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VMINPS xmm1, xmm2, xmm3/mem128

C4

RXB.00001

X.src.0.00

5D /r

VMINPS ymm1, ymm2, ymm3/mem256

C4

RXB.00001

X.src.1.00

5D /r

Related Instructions
(V)MAXPD, (V)MAXPS, (V)MAXSD, (V)MAXSS, (V)MINPD, (V)MINSD, (V)MINSS
MXCSR Flags Affected
MM

FZ

17

15

Note:

176

RC
14

13

PM

UM

OM

ZM

DM

IM

DAZ

PE

UE

OE

ZE

12

11

10

9

8

7

6

5

4

3

2

DE

IE

M

M

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

MINPS, VMINPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S

X
X
X
S
X

S

S

S

S

A
X

S

X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
SIMD floating-point, #XF

S

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

S
S
S

S
S
S

X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.

MINPS, VMINPS

177

AMD64 Technology

26568—Rev. 3.22—May 2018

MINSD
VMINSD

Minimum
Scalar Double-Precision Floating-Point

Compares the scalar double-precision floating-point value in the low-order 64 bits of the first source
operand to a corresponding value in the second source operand and writes the numerically lesser
value into the low-order 64 bits of the destination.
If both source operands are equal to zero, the value of the second source operand is returned. If either
operand is a NaN (SNaN or QNaN), and invalid-operation exceptions are masked, the second source
operand is written to the destination.
There are legacy and extended forms of the instruction:
MINSD

The first source operand is an XMM register. The second source operand is either an XMM register or
a 64-bit memory location. The first source register is also the destination. Bits [127:64] of the destination are not affected. Bits [255:128] of the YMM register that corresponds to the destination are not
affected.
VMINSD

The extended form of the instruction has a 128-bit encoding only.
The first source operand is an XMM register and the second source operand is either an XMM register or a 64-bit memory location. The destination is an XMM register. Bits [127:64] of the destination
are copied from the first source operand. Bits [255:128] of the YMM register that corresponds to the
destination are cleared.
Instruction Support
Form

Subset

MINSD

SSE2

VMINSD

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
MINSD xmm1, xmm2/mem64

Opcode
F2 0F 5D /r

Description
Compares a pair of scalar double-precision values in the
low-order 64 bits of xmm1 and xmm2 or mem64 and
writes the lesser value to the low-order 64 bits of xmm1.

Mnemonic
VMINSD xmm1, xmm2, xmm3/mem64

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.00001

X.src.X.11

5D /r

Related Instructions
(V)MAXPD, (V)MAXPS, (V)MAXSD, (V)MAXSS, (V)MINPD, (V)MINPS, (V)MINSS

178

MINSD, VMINSD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM

OM

11

10

ZM
9

DM
8

IM
7

DAZ
6

PE
5

UE
4

OE
3

ZE
2

DE

IE

M

M

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S

S
S
S
S
S

X
X
X
X
X
X

S

S

X

S
S
S

S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.

MINSD, VMINSD

179

AMD64 Technology

26568—Rev. 3.22—May 2018

MINSS
VMINSS

Minimum
Scalar Single-Precision Floating-Point

Compares the scalar single-precision floating-point value in the low-order 32 bits of the first source
operand to a corresponding value in the second source operand and writes the numerically lesser
value into the low-order 32 bits of the destination.
If both source operands are equal to zero, the value of the second source operand is returned. If either
operand is a NaN (SNaN or QNaN), and invalid-operation exceptions are masked, the second source
operand is written to the destination.
There are legacy and extended forms of the instruction:
MINSS

The first source operand is an XMM register. The second source operand is either an XMM register or
a 32-bit memory location. The first source register is also the destination. Bits [127:32] of the destination are not affected. Bits [255:128] of the YMM register that corresponds to the destination are not
affected.
VMINSS

The extended form of the instruction has a 128-bit encoding only.
The first source operand is an XMM register and the second source operand is either an XMM register or a 32-bit memory location. The destination is an XMM register. Bits [127:32] of the destination
are copied from the first source operand. Bits [255:128] of the YMM register that corresponds to the
destination are cleared.
Instruction Support
Form

Subset

Feature Flag

MINSS

SSE1

CPUID Fn0000_0001_EDX[SSE] (bit 25)

VMINSS

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
MINSS xmm1, xmm2/mem32

Opcode
F3 0F 5D /r

Description
Compares a pair of scalar single-precision values in the
low-order 32 bits of xmm1 and xmm2 or mem32 and
writes the lesser value to the low-order 32 bits of xmm1.

Mnemonic
VMINSS xmm1, xmm2, xmm3/mem32

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.00001

X.src.X.10

5D /r

Related Instructions
(V)MAXPD, (V)MAXPS, (V)MAXSD, (V)MAXSS, (V)MINPD, (V)MINPS, (V)MINSD

180

MINSS, VMINSS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM

OM

11

10

ZM
9

DM
8

IM
7

DAZ
6

PE
5

UE
4

OE
3

ZE
2

DE

IE

M

M

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S

S
S
S
S
S

X
X
X
X
X
X

S

S

X

S
S
S

S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.

MINSS, VMINSS

181

AMD64 Technology

MOVAPD
VMOVAPD

26568—Rev. 3.22—May 2018

Move Aligned
Packed Double-Precision Floating-Point

Moves packed double-precision floating-point values. Values can be moved from a register or memory location to a register; or from a register to a register or memory location.
A memory operand that is not aligned causes a general-protection exception.
There are legacy and extended forms of the instruction:
MOVAPD

Moves two double-precision floating-point values. There are encodings for each type of move.
• The source operand is either an XMM register or a 128-bit memory location. The destination
operand is an XMM register.
• The source operand is an XMM register. The destination operand is either an XMM register or a
128-bit memory location.
Bits [255:128] of the YMM register that corresponds to the destination are not affected.
VMOVAPD

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Moves two double-precision floating-point values. There are encodings for each type of move:
• The source operand is either an XMM register or a 128-bit memory location. The destination
operand is an XMM register.
• The source operand is an XMM register. The destination operand is either an XMM register or a
128-bit memory location.
Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

Moves four double-precision floating-point values. There are encodings for each type of move:
• The source operand is either a YMM register or a 256-bit memory location. The destination
operand is a YMM register.
• The source operand is a YMM register. The destination operand is either a YMM register or a
256-bit memory location.
Instruction Support
Form

Subset

MOVAPD

SSE2

VMOVAPD

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

182

MOVAPD, VMOVAPD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic

Opcode

Description

MOVAPD xmm1, xmm2/mem128

66 0F 28 /r

Moves two packed double-precision floating-point
values from xmm2 or mem128 to xmm1.

MOVAPD xmm1/mem128, xmm2

66 0F 29 /r

Moves two packed double-precision floating-point
values from xmm1 or mem128 to xmm2.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VMOVAPD xmm1, xmm2/mem128

C4

RXB.00001

X.1111.0.01

28 /r

VMOVAPD xmm1/mem128, xmm2

C4

RXB.00001

X.1111.0.01

29 /r

VMOVAPD ymm1, ymm2/mem256

C4

RXB.00001

X.1111.1.01

28 /r

VMOVAPD ymm1/mem256, ymm2

C4

RXB.00001

X.1111.1.01

29 /r

Related Instructions
(V)MOVHPD, (V)MOVLPD, (V)MOVMSKPD, (V)MOVSD, (V)MOVUPD
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S
S
S

S
S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS

General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
S
X
A

Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

S

X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Memory operand not aligned on a 16-byte boundary.
Write to a read-only data segment.
VEX256: Memory operand not 32-byte aligned.
VEX128: Memory operand not 16-byte aligned.
Null data segment used to reference memory.
Instruction execution caused a page fault.

MOVAPD, VMOVAPD

183

AMD64 Technology

MOVAPS
VMOVAPS

26568—Rev. 3.22—May 2018

Move Aligned
Packed Single-Precision Floating-Point

Moves packed single-precision floating-point values. Values can be moved from a register or memory
location to a register; or from a register to a register or memory location.
A memory operand that is not aligned causes a general-protection exception.
There are legacy and extended forms of the instruction:
MOVAPS

Moves four single-precision floating-point values.
There are encodings for each type of move.
• The source operand is either an XMM register or a 128-bit memory location. The destination
operand is an XMM register.
• The source operand is an XMM register. The destination operand is either an XMM register or a
128-bit memory location.
Bits [255:128] of the YMM register that corresponds to the destination are not affected.
VMOVAPS

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Moves four single-precision floating-point values. There are encodings for each type of move.
• The source operand is either an XMM register or a 128-bit memory location. The destination
operand is an XMM register.
• The source operand is an XMM register. The destination operand is either an XMM register or a
128-bit memory location.
Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

Moves eight single-precision floating-point values. There are encodings for each type of move.
• The source operand is either a YMM register or a 256-bit memory location. The destination
operand is a YMM register.
• The source operand is a YMM register. The destination operand is either a YMM register or a
256-bit memory location.
Instruction Support
Form

Subset

Feature Flag

MOVAPS

SSE1

CPUID Fn0000_0001_EDX[SSE] (bit 25)

VMOVAPS

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

184

MOVAPS, VMOVAPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic

Opcode

MOVAPS xmm1, xmm2/mem128

0F 28 /r

Moves four packed single-precision floating-point
values from xmm2 or mem128 to xmm1.

Description

MOVAPS xmm1/mem128, xmm2

0F 29 /r

Moves four packed single-precision floating-point
values from xmm1 or mem128 to xmm2.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VMOVAPS xmm1, xmm2/mem128

C4

RXB.00001

X.1111.0.00

28 /r

VMOVAPS xmm1/mem128, xmm2

C4

RXB.00001

X.1111.0.00

29 /r

VMOVAPS ymm1, ymm2/mem256

C4

RXB.00001

X.1111.1.00

28 /r

VMOVAPS ymm1/mem256, ymm2

C4

RXB.00001

X.1111.1.00

29 /r

Related Instructions
(V)MOVHLPS, (V)MOVHPS, (V)MOVLHPS, (V)MOVLPS, (V)MOVMSKPS, (V)MOVSS,
(V)MOVUPS
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S
S
S

S
S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS

General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
S
X
A

Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

S

X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Memory operand not aligned on a 16-byte boundary.
Write to a read-only data segment.
VEX256: Memory operand not 32-byte aligned.
VEX128: Memory operand not 16-byte aligned.
Null data segment used to reference memory.
Instruction execution caused a page fault.

MOVAPS, VMOVAPS

185

AMD64 Technology

26568—Rev. 3.22—May 2018

MOVD
VMOVD

Move
Doubleword or Quadword

Moves 32-bit and 64-bit values. A value can be moved from a general-purpose register or memory
location to the corresponding low-order bits of an XMM register, with zero-extension to 128 bits; or
from the low-order bits of an XMM register to a general-purpose register or memory location.
The quadword form of this instruction is distinct from the differently-encoded (V)MOVQ instruction.
There are legacy and extended forms of the instruction:
MOVD

There are two encodings for 32-bit moves, characterized by REX.W = 0.
• The source operand is either a 32-bit general-purpose register or a 32-bit memory location. The
destination is an XMM register. The 32-bit value is zero-extended to 128 bits.
• The source operand is an XMM register. The destination is either a 32-bit general-purpose register
or a 32-bit memory location.
There are two encodings for 64-bit moves, characterized by REX.W = 1.
• The source operand is either a 64-bit general-purpose register or a 64-bit memory location. The
destination is an XMM register. The 64-bit value is zero-extended to 128 bits.
• The source operand is an XMM register. The destination is either a 64-bit general-purpose register
or a 64-bit memory location.
Bits [255:128] of the YMM register that corresponds to the destination are not affected.
VMOVD

The extended form of the instruction has four 128-bit encodings:
There are two encodings for 32-bit moves, characterized by VEX.W = 0.
• The source operand is either a 32-bit general-purpose register or a 32-bit memory location. The
destination is an XMM register. The 32-bit value is zero-extended to 128 bits.
• The source operand is an XMM register. The destination is either a 32-bit general-purpose register
or a 32-bit memory location.
There are two encodings for 64-bit moves, characterized by VEX.W = 1.
• The source operand is either a 64-bit general-purpose register or a 64-bit memory location. The
destination is an XMM register. The 64-bit value is zero-extended to 128 bits.
• The source operand is an XMM register. The destination is either a 64-bit general-purpose register
or a 64-bit memory location.
Bits [255:128] of the YMM register that corresponds to the destination are cleared.
Instruction Support

186

Form

Subset

MOVD

SSE2

VMOVD

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

MOVD, VMOVD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

Description

MOVD xmm, reg32/mem32

66 (W0) 0F 6E /r

Move a 32-bit value from reg32/mem32 to xmm.

MOVD xmm, reg64/mem64

66 (W1) 0F 6E /r

Move a 64-bit value from reg64/mem64 to xmm.

MOVD reg32/mem32, xmm

66 (W0) 0F 7E /r

Move a 32-bit value from xmm to reg32/mem32

MOVD reg64/mem64, xmm

66 (W1) 0F 7E /r

Move a 64-bit value from xmm to reg64/mem64.

Mnemonic

Encoding
VEX RXB.map_select

W.vvvv.L.pp

Opcode

VMOVD1 xmm, reg32/mem32

C4

RXB.00001

0.1111.0.01

6E /r

VMOVQ xmm, reg64/mem64

C4

RXB.00001

1.1111.0.01

6E /r

VMOVD1 reg32/mem32, xmm

C4

RXB.00001

0.1111.0.01

7E /r

VMOVQ reg64/mem64, xmm

C4

RXB.00001

1.1111.0.01

7E /r

Note: 1. Also known as MOVQ in some developer tools.

Related Instructions
(V)MOVDQA, (V)MOVDQU, (V)MOVQ
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S
S

S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

S
S

X
S
S
A
A
A
A
A
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Write to a read-only data segment.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

MOVD, VMOVD

187

AMD64 Technology

26568—Rev. 3.22—May 2018

MOVDDUP
VMOVDDUP

Move and Duplicate
Double-Precision Floating-Point

Moves and duplicates double-precision floating-point values.
There are legacy and extended forms of the instruction:
MOVDDUP

Moves and duplicates one quadword value.
The source operand is either the low 64 bits of an XMM register or the address of the least-significant
byte of 64 bits of data in memory. The destination is an XMM register. Bits [255:128] of the YMM
register that corresponds to the destination are not affected.
VMOVDDUP

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Moves and duplicates one quadword value.
The source operand is either the low 64 bits of an XMM register or the address of the least-significant
byte of 64 bits of data in memory. The destination is an XMM register. Bits [255:128] of the YMM
register that corresponds to the destination are cleared.
YMM Encoding

Moves and duplicates two even-indexed quadword values.
The source operand is either a YMM register or the address of the least-significant byte of 256 bits of
data in memory. The destination is a YMM register.Bits [63:0] of the source are written to bits
[127:64] and [63:0] of the destination; bits [191:128] of the source are written to bits [255:192] and
[191:128] of the destination.
Instruction Support
Form

Subset

Feature Flag

MOVDDUP

SSE3

CPUID Fn0000_0001_ECX[SSE3] (bit 0)

VMOVDDUP

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
MOVDDUP xmm1, xmm2/mem64

Opcode
F2 0F 12 /r

Description
Moves two copies of the low 64 bits of xmm2 or
mem64 to xmm1.

Mnemonic

Encoding
VEX RXB.map_select

W.vvvv.L.pp

Opcode

MOVDDUP xmm1, xmm2/mem64

C4

RXB.00001

X.1111.0.11

12 /r

MOVDDUP ymm1, ymm2/mem256

C4

RXB.00001

X.1111.1.11

12 /r

188

MOVDDUP, VMOVDDUP

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Related Instructions
(V)MOVSHDUP, (V)MOVSLDUP
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

S
S

X
S
S
A
A
A
A
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference with alignment checking enabled.

MOVDDUP, VMOVDDUP

189

AMD64 Technology

26568—Rev. 3.22—May 2018

MOVDQA
VMOVDQA

Move Aligned
Double Quadword

Moves aligned packed integer values. Values can be moved from a register or a memory location to a
register, or from a register to a register or a memory location.
A memory operand that is not aligned causes a general-protection exception.
There are legacy and extended forms of the instruction:
MOVDQA

Moves two aligned quadwords (128-bit move). There are two encodings.
• The source operand is an XMM register. The destination is either an XMM register or a 128-bit
memory location.
• The source operand is either an XMM register or a 128-bit memory location. The destination is an
XMM register.
Bits [255:128] of the YMM register that corresponds to the destination are not affected.
VMOVDQA

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Moves two aligned quadwords (128-bit move). There are two encodings.
• The source operand is an XMM register. The destination is either an XMM register or a 128-bit
memory location.
• The source operand is either an XMM register or a 128-bit memory location. The destination is an
XMM register.
Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

Moves four aligned quadwords (256-bit move). There are two encodings.
• The source operand is a YMM register. The destination is either a YMM register or a 256-bit
memory location.
• The source operand is either a YMM register or a 256-bit memory location. The destination is a
YMM register.
Instruction Support
Form

Subset

MOVDQA

SSE2

VMOVDQA

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

190

MOVDQA, VMOVDQA

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic

Opcode

Description

MOVDQA xmm1, xmm2/mem128

66 0F 6F /r

Moves aligned packed integer values from xmm2
ormem128 to xmm1.

MOVDQA xmm1/mem128, xmm2

66 0F 7F /r

Moves aligned packed integer values from xmm1 or
mem128 to xmm2.

Mnemonic

Encoding
W.vvvv.L.pp

Opcode

VMOVDQA xmm1, xmm2/mem128

VEX RXB.map_select
C4

RXB.00001

X.1111.0.01

6F /r

VMOVDQA xmm1/mem128, xmm2

C4

RXB.00001

X.1111.0.01

6F /r

VMOVDQA ymm1, xmm2/mem256

C4

RXB.00001

X.1111.1.01

7F /r

VMOVDQA ymm1/mem256, ymm2

C4

RXB.00001

X.1111.1.01

7F /r

Related Instructions
(V)MOVD, (V)MOVDQU, (V)MOVQ
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S
S
S

S
S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS

General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
S
X
A

Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

S

X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Memory operand not aligned on a 16-byte boundary.
Write to a read-only data segment.
VEX256: Memory operand not 32-byte aligned.
VEX128: Memory operand not 16-byte aligned.
Null data segment used to reference memory.
Instruction execution caused a page fault.

MOVDQA, VMOVDQA

191

AMD64 Technology

MOVDQU
VMOVDQU

26568—Rev. 3.22—May 2018

Move
Unaligned Double Quadword

Moves unaligned packed integer values. Values can be moved from a register or a memory location to
a register, or from a register to a register or a memory location.
There are legacy and extended forms of the instruction:
MOVDQU

Moves two unaligned quadwords (128-bit move). There are two encodings.
• The source operand is an XMM register. The destination is either an XMM register or a 128-bit
memory location.
• The source operand is either an XMM register or a 128-bit memory location. The destination is an
XMM register.
Bits [255:128] of the YMM register that corresponds to the destination are not affected.
VMOVDQU

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Moves two unaligned quadwords (128-bit move). There are two encodings:
• The source operand is an XMM register. The destination is either an XMM register or a 128-bit
memory location.
• The source operand is either an XMM register or a 128-bit memory location. The destination is an
XMM register.
Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

Moves four unaligned quadwords (256-bit move). There are two encodings:
• The source operand is a YMM register. The destination is either a YMM register or a 256-bit
memory location.
• The source operand is either a YMM register or a 256-bit memory location. The destination is a
YMM register.
Instruction Support
Form

Subset

MOVDQU

SSE2

VMOVDQU

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

192

MOVDQU, VMOVDQU

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic

Opcode

Description

MOVDQU xmm1, xmm2/mem128

F3 0F 6F /r

Moves unaligned packed integer values from xmm2 or
mem128 to xmm1.

MOVDQU xmm1/mem128, xmm2

F3 0F 7F /r

Moves unaligned packed integer values from xmm1 or
mem128 to xmm2.

Mnemonic

Encoding
W.vvvv.L.pp

Opcode

VMOVDQU xmm1, xmm2/mem128

VEX RXB.map_select
C4

RXB.00001

X.1111.0.10

6F /r

VMOVDQU xmm1/mem128, xmm2

C4

RXB.00001

X.1111.0.10

6F /r

VMOVDQU ymm1, xmm2/mem256

C4

RXB.00001

X.1111.1.10

7F /r

VMOVDQU ymm1/mem256, ymm2

C4

RXB.00001

X.1111.1.10

7F /r

Related Instructions
(V)MOVD, (V)MOVDQA, (V)MOVQ
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S
S

S
S
S
S
S

Alignment check, #AC
S
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Instruction Reference

X
S
S
A
A
A
A
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Write to a read-only data segment.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

MOVDQU, VMOVDQU

193

AMD64 Technology

26568—Rev. 3.22—May 2018

MOVHLPS
VMOVHLPS

Move High to Low
Packed Single-Precision Floating-Point

Moves two packed single-precision floating-point values from the high quadword of an XMM register to the low quadword of an XMM register.
There are legacy and extended forms of the instruction:
MOVHLPS

The source operand is bits [127:64] of an XMM register. The destination is bits [63:0] of an XMM
register. Bits [127:64] of the destination are not affected. Bits [255:128] of the YMM register that corresponds to the destination are not affected.
VMOVHLPS

The extended form of the instruction has a 128-bit encoding only.
The source operands are bits [127:64] of two XMM registers. The destination is a third XMM register. Bits [127:64] of the first source are moved to bits [127:64] of the destination; bits [127:64] of the
second source are moved to bits [63:0] of the destination. Bits [255:128] of the YMM register that
corresponds to the destination are cleared.
Instruction Support
Form

Subset

Feature Flag

MOVHLPS

SSE1

CPUID Fn0000_0001_EDX[SSE] (bit 25)

VMOVHLPS

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

MOVHLPS xmm1, xmm2

0F 12 /r

Description
Moves two packed single-precision floating-point
values from xmm2[127:64] to xmm1[63:0].

Mnemonic

Encoding
VEX RXB.map_select

VMOVHLPS xmm1, xmm2, xmm3

C4

RXB.00001

W.vvvv.L.pp

Opcode

X.src.0.00

12 /r

Related Instructions
(V)MOVAPS, (V)MOVHPS, (V)MOVLHPS, (V)MOVLPS, (V)MOVMSKPS, (V)MOVSS,
(V)MOVUPS

194

MOVHLPS, VMOVHLPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

X
Device not available, #NM
S
X — AVX and SSE exception
A — AVX exception
S — SSE exception

X
S

Invalid opcode, #UD

Instruction Reference

X
S
S
A
A
A
A
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.

MOVHLPS, VMOVHLPS

195

AMD64 Technology

MOVHPD
VMOVHPD

26568—Rev. 3.22—May 2018

Move High
Packed Double-Precision Floating-Point

Moves a packed double-precision floating-point value. Values can be moved from a 64-bit memory
location to the high-order quadword of an XMM register, or from the high-order quadword of an
XMM register to a 64-bit memory location.
There are legacy and extended forms of the instruction:
MOVHPD

There are two encodings.
• The source operand is a 64-bit memory location. The destination is bits [127:64] of an XMM
register.
• The source operand is bits [127:64] of an XMM register. The destination is a 64-bit memory
location.
Bits [255:128] of the YMM register that corresponds to the destination are not affected.
VMOVHPD

The extended form of the instruction has two 128-bit encodings:
• There are two source operands. The first source is an XMM register. The second source is a 64-bit
memory location. The destination is an XMM register. Bits [63:0] of the source register are written
to bits [63:0] of the destination; bits [63:0] of the source memory location are written to bits
[127:64] of the destination.
• The source operand is bits [127:64] of an XMM register. The destination is a 64-bit memory
location.
Bits [255:128] of the YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

MOVHPD

SSE2

VMOVHPD

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

196

MOVHPD, VMOVHPD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic

Opcode

Description

MOVHPD xmm1, mem64

66 0F 16 /r

Moves a packed double-precision floating-point value from
mem64 to xmm1[127:64].

MOVHPD mem64, xmm1

66 0F 17 /r

Moves a packed double-precision floating-point value from
xmm1[127:64] to mem64.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VMOVHPD xmm1, xmm2, mem64

C4

RXB.00001

X.src.0.01

16 /r

VMOVHPD mem64, xmm1

C4

RXB.00001

X.1111.0.01

17 /r

Related Instructions
(V)MOVAPD, (V)MOVLPD, (V)MOVMSKPD, (V)MOVSD, (V)MOVUPD
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S
S

S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

S
S

X
S
S
A
A
A
A
A
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b (for memory destination encoding only).
VEX.L = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Write to a read-only data segment.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

MOVHPD, VMOVHPD

197

AMD64 Technology

MOVHPS
VMOVHPS

26568—Rev. 3.22—May 2018

Move High
Packed Single-Precision Floating-Point

Moves two packed single-precision floating-point value. Values can be moved from a 64-bit memory
location to the high-order quadword of an XMM register, or from the high-order quadword of an
XMM register to a 64-bit memory location.
There are legacy and extended forms of the instruction:
MOVHPS

There are two encodings.
• The source operand is a 64-bit memory location. The destination is bits [127:64] of an XMM
register.
• The source operand is bits [127:64] of an XMM register. The destination is a 64-bit memory
location.
Bits [255:128] of the YMM register that corresponds to the destination are not affected.
VMOVHPS

The extended form of the instruction has two 128-bit encodings:
• There are two source operands. The first source is an XMM register. The second source is a 64-bit
memory location. The destination is an XMM register. Bits [63:0] of the source register are written
to bits [63:0] of the destination; bits [63:0] of the source memory location are written to bits
[127:64] of the destination.
• The source operand is bits [127:64] of an XMM register. The destination is a 64-bit memory
location.
Bits [255:128] of the YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

Feature Flag

MOVHPS

SSE1

CPUID Fn0000_0001_EDX[SSE] (bit 25)

VMOVHPS

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

198

MOVHPS, VMOVHPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic

Opcode

MOVHPS xmm1, mem64

0F 16 /r

Moves two packed double-precision floating-point value from
mem64 to xmm1[127:64].

Description

MOVHPS mem64, xmm1

0F 17 /r

Moves two packed double-precision floating-point value from
xmm1[127:64] to mem64.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VMOVHPS xmm1, xmm2, mem64

C4

RXB.00001

X.src.0.00

16 /r

VMOVHPS mem64, xmm1

C4

RXB.00001

X.1111.0.00

17 /r

Related Instructions
(V)MOVAPS, (V)MOVHLPS, (V)MOVLHPS, (V)MOVLPS, (V)MOVMSKPS, (V)MOVSS,
(V)MOVUPS
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S
S

S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

S
S

X
S
S
A
A
A
A
A
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b (for memory destination encoding only).
VEX.L = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Write to a read-only data segment.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

MOVHPS, VMOVHPS

199

AMD64 Technology

26568—Rev. 3.22—May 2018

MOVLHPS
VMOVLHPS

Move Low to High
Packed Single-Precision Floating-Point

Moves two packed single-precision floating-point values from the low quadword of an XMM register
to the high quadword of a second XMM register.
There are legacy and extended forms of the instruction:
MOVLHPS

The source operand is bits [63:0] of an XMM register. The destination is bits [127:64] of an XMM
register. Bits [255:128] of the YMM register that corresponds to the destination are not affected.
VMOVLHPS

The extended form of the instruction has a 128-bit encoding only.
The source operands are bits [63:0] of two XMM registers. The destination is a third XMM register.
Bits [63:0] of the first source are moved to bits [63:0] of the destination; bits [63:0] of the second
source are moved to bits [127:64] of the destination. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

Feature Flag

MOVLHPS

SSE1

CPUID Fn0000_0001_EDX[SSE] (bit 25)

VMOVLHPS

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

MOVLHPS xmm1, xmm2

0F 16 /r

Description
Moves two packed single-precision floating-point
values from xmm2[63:0] to xmm1[127:64].

Mnemonic

Encoding
VEX RXB.map_select

VMOVLHPS xmm1, xmm2, xmm3

C4

RXB.00001

W.vvvv.L.pp

Opcode

X.src.0.00

16 /r

Related Instructions
(V)MOVAPS, (V)MOVHLPS, (V)MOVHPS, (V)MOVLPS, (V)MOVMSKPS, (V)MOVSS,
(V)MOVUPS

200

MOVLHPS, VMOVLHPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

X
Device not available, #NM
S
X — AVX and SSE exception
A — AVX exception
S — SSE exception

X
S

Invalid opcode, #UD

Instruction Reference

X
S
S
A
A
A
A
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.

MOVLHPS, VMOVLHPS

201

AMD64 Technology

26568—Rev. 3.22—May 2018

MOVLPD
VMOVLPD

Move Low
Packed Double-Precision Floating-Point

Moves a packed double-precision floating-point value. Values can be moved from a 64-bit memory
location to the low-order quadword of an XMM register, or from the low-order quadword of an XMM
register to a 64-bit memory location.
There are legacy and extended forms of the instruction:
MOVLPD

There are two encodings.
• The source operand is a 64-bit memory location. The destination is bits [63:0] of an XMM register.
Bits [255:128] of the YMM register that corresponds to the destination are not affected.
• The source operand is bits [63:0] of an XMM register. The destination is a 64-bit memory location.
VMOVLPD

The extended form of the instruction has two 128-bit encodings.
• There are two source operands. The first source is an XMM register. The second source is a 64-bit
memory location. The destination is an XMM register. Bits [127:64] of the source register are
written to bits [127:64] of the destination; bits [63:0] of the source memory location are written to
bits [63:0] of the destination. Bits [255:128] of the YMM register that corresponds to the
destination are cleared.
• The source operand is bits [63:0] of an XMM register. The destination is a 64-bit memory location.
Instruction Support
Form

Subset

MOVLPD

SSE2

VMOVLPD

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

Description

MOVLPD xmm1, mem64

66 0F 12 /r

Moves a packed double-precision floating-point value from
mem64 to xmm1[63:0].

MOVLPD mem64, xmm1

66 0F 13 /r

Moves a packed double-precision floating-point value from
xmm1[63:0] to mem64.

Mnemonic

Encoding
VEX

RXB.map_select

VMOVLPD xmm1, xmm2, mem64

C4

VMOVLPD mem64, xmm1

C4

202

W.vvvv.L.pp

Opcode

RXB.00001

X.src.0.01

12 /r

RXB.00001

X.1111.0.01

13 /r

MOVLPD, VMOVLPD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Related Instructions
(V)MOVAPD, (V)MOVHPD, (V)MOVMSKPD, (V)MOVSD, (V)MOVUPD
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S
S

S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

S
S

X
S
S
A
A
A
A
A
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b (for memory destination encoding only).
VEX.L = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Write to a read-only data segment.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

MOVLPD, VMOVLPD

203

AMD64 Technology

26568—Rev. 3.22—May 2018

MOVLPS
VMOVLPS

Move Low Packed Single-Precision
Floating-Point

Moves two packed single-precision floating-point values. Values can be moved from a 64-bit memory
location to the low-order quadword of an XMM register, or from the low-order quadword of an XMM
register to a 64-bit memory location.
There are legacy and extended forms of the instruction:
MOVLPS

There are two encodings.
• The source operand is a 64-bit memory location. The destination is bits [63:0] of an XMM register.
Bits [255:128] of the YMM register that corresponds to the destination are not affected.
• The source operand is bits [63:0] of an XMM register. The destination is a 64-bit memory location.
VMOVLPS

The extended form of the instruction has two 128-bit encodings.
• There are two source operands. The first source is an XMM register. The second source is a 64-bit
memory location. The destination is an XMM register. Bits [127:64] of the source register are
written to bits [127:64] of the destination; bits [63:0] of the source memory location are written to
bits [63:0] of the destination. Bits [255:128] of the YMM register that corresponds to the
destination are cleared.
• The source operand is bits [63:0] of an XMM register. The destination is a 64-bit memory location.
Instruction Support
Form

Subset

Feature Flag

MOVLPS

SSE1

CPUID Fn0000_0001_EDX[SSE] (bit 25)

VMOVLPS

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

Description

MOVLPS xmm1, mem64

0F 12 /r

Moves two packed single-precision floating-point value from
mem64 to xmm1[63:0].

MOVLPS mem64, xmm1

0F 13 /r

Moves two packed single-precision floating-point value from
xmm1[63:0] to mem64.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VMOVLPS xmm1, xmm2, mem64

C4

RXB.00001

X.src.0.00

12 /r

VMOVLPS mem64, xmm1

C4

RXB.00001

X.1111.0.00

13 /r

204

MOVLPS, VMOVLPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Related Instructions
(V)MOVAPS, (V)MOVHLPS, (V)MOVHPS, (V)MOVLHPS, (V)MOVMSKPS, (V)MOVSS,
(V)MOVUPS
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S
S

S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

S
S

X
S
S
A
A
A
A
A
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b (for memory destination encoding only).
VEX.L = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Write to a read-only data segment.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

MOVLPS, VMOVLPS

205

AMD64 Technology

MOVMSKPD
VMOVMSKPD

26568—Rev. 3.22—May 2018

Extract Sign Mask
Packed Double-Precision Floating-Point

Extracts the sign bits of packed double-precision floating-point values from an XMM register, zeroextends the value, and writes it to the low-order bits of a general-purpose register.
There are legacy and extended forms of the instruction:
MOVMSKPD

Extracts two mask bits.
The source operand is an XMM register. The destination can be either a 64-bit or a 32-bit general purpose register. Writes the extracted bits to positions [1:0] of the destination and clears the remaining
bits. Bits [255:128] of the YMM register that corresponds to the source are not affected.
MOVMSKPD

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Extracts two mask bits.
The source operand is an XMM register. The destination can be either a 64-bit or a 32-bit general purpose register. Writes the extracted bits to positions [1:0] of the destination and clears the remaining
bits. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

Extracts four mask bits.
The source operand is a YMM register. The destination can be either a 64-bit or a 32-bit general purpose register. Writes the extracted bits to positions [3:0] of the destination and clears the remaining
bits.
Instruction Support
Form

Subset

MOVMSKPD

SSE2

VMOVMSKPD

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
MOVMSKPD reg, xmm

Opcode
66 0F 50 /r

Description
Move zero-extended sign bits of packed double-precision
values from xmm to a general-purpose register.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VMOVMSKPD reg, xmm

C4

RXB.00001

X.1111.0.01

50 /r

VMOVMSKPD reg, ymm

C4

RXB.00001

X.1111.1.01

50 /r

206

MOVMSKPD, VMOVMSKPD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Related Instructions
(V)MOVMSKPS, (V)PMOVMSKB
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

X
Device not available, #NM
S
X — AVX and SSE exception
A — AVX exception
S — SSE exception

X
S

Invalid opcode, #UD

Instruction Reference

X
S
S
A
A
A
A
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.

MOVMSKPD, VMOVMSKPD

207

AMD64 Technology

26568—Rev. 3.22—May 2018

MOVMSKPS
VMOVMSKPS

Extract Sign Mask
Packed Single-Precision Floating-Point

Extracts the sign bits of packed single-precision floating-point values from an XMM register, zeroextends the value, and writes it to the low-order bits of a general-purpose register.
There are legacy and extended forms of the instruction:
MOVMSKPS

Extracts four mask bits.
The source operand is an XMM register. The destination can be either a 64-bit or a 32-bit general purpose register. Writes the extracted bits to positions [3:0] of the destination and clears the remaining
bits.
MOVMSKPS

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Extracts four mask bits.
The source operand is an XMM register. The destination can be either a 64-bit or a 32-bit general purpose register. Writes the extracted bits to positions [3:0] of the destination and clears the remaining
bits.
YMM Encoding

Extracts eight mask bits.
The source operand is a YMM register. The destination can be either a 64-bit or a 32-bit general purpose register. Writes the extracted bits to positions [7:0] of the destination and clears the remaining
bits.
Instruction Support
Form

Subset

Feature Flag

MOVMSKPS

SSE1

CPUID Fn0000_0001_EDX[SSE] (bit 25)

VMOVMSKPS

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

MOVMSKPS reg, xmm

0F 50 /r

Description
Move zero-extended sign bits of packed single-precision
values from xmm to a general-purpose register.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VMOVMSKPS reg, xmm

C4

RXB.00001

X.1111.0.00

50 /r

VMOVMSKPS reg, ymm

C4

RXB.00001

X.1111.1.00

50 /r

208

MOVMSKPS, VMOVMSKPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Related Instructions
(V)MOVMSKPD, (V)PMOVMSKB
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

X
Device not available, #NM
S
X — AVX and SSE exception
A — AVX exception
S — SSE exception

X
S

Invalid opcode, #UD

Instruction Reference

X
S
S
A
A
A
A
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.

MOVMSKPS, VMOVMSKPS

209

AMD64 Technology

26568—Rev. 3.22—May 2018

MOVNTDQ
VMOVNTDQ

Move Non-Temporal
Double Quadword

Moves double quadword values from a register to a memory location.
Indicates to the processor that the data is non-temporal, and is unlikely to be used again soon. The
processor treats the store as a write-combining (WC) memory write, which minimizes cache pollution. The method of minimization depends on the hardware implementation of the instruction. For
further information, see “Memory Optimization” in Volume 1.
The instruction is weakly-ordered with respect to other instructions that operate on memory. Software
should use an SFENCE or MFENCE instruction to force strong memory ordering of MOVNTDQ
with respect to other stores.
An attempted store to a non-aligned memory location results in a #GP exception.
There are legacy and extended forms of the instruction:
MOVNTDQ

Moves one 128-bit value.
The source operand is an XMM register. The destination is a 128-bit memory location.
VMOVNTDQ

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Moves one 128-bit value.
The source operand is an XMM register. The destination is a 128-bit memory location.
YMM Encoding

Moves two 128-bit values.
The source operand is a YMM register. The destination is a 256-bit memory location.
Instruction Support
Form

Subset

MOVNTDQ

SSE2

VMOVNTDQ

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
MOVNTDQ mem128, xmm

Opcode
66 0F E7 /r

Description
Moves a 128-bit value from xmm to mem128, minimizing
cache pollution.

Mnemonic

Encoding
VEX RXB.map_select

W.vvvv.L.pp

Opcode

VMOVNTDQ mem128, xmm

C4

RXB.00001

X.1111.0.01

E7 /r

VMOVNTDQ mem256, ymm

C4

RXB.00001

X.1111.1.01

E7 /r

210

MOVNTDQ, VMOVNTDQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Related Instructions
(V)MOVNTDQA, (V)MOVNTPD, (V)MOVNTPS
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S
S
S

S
S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS

General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
S
X
A

Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

S

X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Memory operand not aligned on a 16-byte boundary.
Write to a read-only data segment.
VEX256: Memory operand not 32-byte aligned.
VEX128: Memory operand not 16-byte aligned.
Null data segment used to reference memory.
Instruction execution caused a page fault.

MOVNTDQ, VMOVNTDQ

211

AMD64 Technology

26568—Rev. 3.22—May 2018

MOVNTDQA
VMOVNTDQA

Move Non-Temporal
Double Quadword Aligned

Loads an XMM/YMM register from a naturally-aligned 128-bit or 256-bit memory location.
Indicates to the processor that the data is non-temporal, and is unlikely to be used again soon. The
processor treats the load as a write-combining (WC) memory read, which minimizes cache pollution.
The method of minimization depends on the hardware implementation of the instruction. For further
information, see “Memory Optimization” in Volume 1.
The instruction is weakly-ordered with respect to other instructions that operate on memory. Software
should use an MFENCE instruction to force strong memory ordering of MOVNTDQA with respect
to other reads.
An attempted load from a non-aligned memory location results in a #GP exception.
There are legacy and extended forms of the instruction:
MOVNTDQA

Loads a 128-bit value into the specified XMM register from a 16-byte aligned memory location.
VMOVNTDQA

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Loads a 128-bit value into the specified XMM register from a 16-byte aligned memory location.
YMM Encoding

Loads a 256-bit value into the specified YMM register from a 32-byte aligned memory location.
Instruction Support
Form
MOVNTDQA

Subset

Feature Flag

SSE4.1 CPUID Fn0000_0001_ECX[SSE41] (bit 19)

VMOVNTDQA 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VMOVNTDQA 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
MOVNTDQA xmm, mem128

Opcode

Description

66 0F 38 2A /r Loads xmm from an aligned memory location, minimizing
cache pollution.
Encoding

Mnemonic

VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VMOVNTDQA xmm, mem128

C4

RXB.02

X.1111.0.01

2A /r

VMOVNTDQA ymm, mem256

C4

RXB.02

X.1111.1.01

2A /r

Related Instructions
(V)MOVNTDQ, (V)MOVNTPD, (V)MOVNTPS
212

MOVNTDQA, VMOVNTDQA

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S
S
S

S
S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS

X
S
S
A
A
A
A
A
X
X
X
X
S
X

General protection, #GP
A

Page fault, #PF
S
X — AVX, AVX2, and SSE exception
A — AVX, AVX2 exception
S — SSE exception

Instruction Reference

X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Memory operand not aligned on a 16-byte boundary.
Write to a read-only data segment.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Null data segment used to reference memory.
Instruction execution caused a page fault.

MOVNTDQA, VMOVNTDQA

213

AMD64 Technology

MOVNTPD
VMOVNTPD

26568—Rev. 3.22—May 2018

Move Non-Temporal
Packed Double-Precision Floating-Point

Moves packed double-precision floating-point values from a register to a memory location.
Indicates to the processor that the data is non-temporal, and is unlikely to be used again soon. The
processor treats the store as a write-combining (WC) memory write, which minimizes cache pollution. The method of minimization depends on the hardware implementation of the instruction. For
further information, see “Memory Optimization” in Volume 1.
The instruction is weakly-ordered with respect to other instructions that operate on memory. Software
should use an SFENCE or MFENCE instruction to force strong memory ordering of MOVNTDQ
with respect to other stores.
An attempted store to a non-aligned memory location results in a #GP exception.
There are legacy and extended forms of the instruction:
MOVNTPD

Moves two values.
The source operand is an XMM register. The destination is a 128-bit memory location.
MOVNTPD

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Moves two values.
The source operand is an XMM register. The destination is a 128-bit memory location.
YMM Encoding

Moves four values.
The source operand is a YMM register. The destination is a 256-bit memory location.
Instruction Support
Form

Subset

MOVNTPD

SSE2

VMOVNTPD

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
MOVNTPD mem128, xmm

Opcode
66 0F 2B /r

Description
Moves two packed double-precision floating-point values
from xmm to mem128, minimizing cache pollution.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VMOVNTPD mem128, xmm

C4

RXB.00001

X.1111.0.01

2B /r

VMOVNTPD mem256, ymm

C4

RXB.00001

X.1111.1.01

2B /r

214

MOVNTPD, VMOVNTPD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Related Instructions
MOVNTDQ, MOVNTI, MOVNTPS, MOVNTQ
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S
S
S

S
S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS

General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
S
X
A

Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

S

X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Memory operand not aligned on a 16-byte boundary.
Write to a read-only data segment.
VEX256: Memory operand not 32-byte aligned.
VEX128: Memory operand not 16-byte aligned.
Null data segment used to reference memory.
Instruction execution caused a page fault.

MOVNTPD, VMOVNTPD

215

AMD64 Technology

MOVNTPS
VMOVNTPS

26568—Rev. 3.22—May 2018

Move Non-Temporal
Packed Single-Precision Floating-Point

Moves packed single-precision floating-point values from a register to a memory location.
Indicates to the processor that the data is non-temporal, and is unlikely to be used again soon. The
processor treats the store as a write-combining (WC) memory write, which minimizes cache pollution. The method of minimization depends on the hardware implementation of the instruction. For
further information, see “Memory Optimization” in Volume 1.
The instruction is weakly-ordered with respect to other instructions that operate on memory. Software
should use an SFENCE or MFENCE instruction to force strong memory ordering of MOVNTDQ
with respect to other stores.
An attempted store to a non-aligned memory location results in a #GP exception.
There are legacy and extended forms of the instruction:
MOVNTPS

Moves four values.
The source operand is an XMM register. The destination is a 128-bit memory location.
MOVNTPS

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Moves four values.
The source operand is an XMM register. The destination is a 128-bit memory location.
YMM Encoding

Moves eight values.
The source operand is a YMM register. The destination is a 256-bit memory location.
Instruction Support
Form

Subset

Feature Flag

MOVNTPS

SSE1

CPUID Fn0000_0001_EDX[SSE] (bit 25)

VMOVNTPS

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

MOVNTPS mem128, xmm

0F 2B /r

Description
Moves four packed double-precision floating-point values
from xmm to mem128, minimizing cache pollution.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VMOVNTPS mem128, xmm

C4

RXB.00001

X.1111.0.00

2B /r

VMOVNTPS mem256, ymm

C4

RXB.00001

X.1111.1.00

2B /r

216

MOVNTPS, VMOVNTPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Related Instructions
(V)MOVNTDQ, (V)MOVNTDQA, (V)MOVNTPD, (V)MOVNTQ
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S
S
S

S
S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS

General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
S
X
A

Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

S

X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Memory operand not aligned on a 16-byte boundary.
Write to a read-only data segment.
VEX256: Memory operand not 32-byte aligned.
VEX128: Memory operand not 16-byte aligned.
Null data segment used to reference memory.
Instruction execution caused a page fault.

MOVNTPS, VMOVNTPS

217

AMD64 Technology

26568—Rev. 3.22—May 2018

MOVNTSD

Move Non-Temporal Scalar
Double-Precision Floating-Point

Stores one double-precision floating-point value from an XMM register to a 64-bit memory location.
This instruction indicates to the processor that the data is non-temporal, and is unlikely to be used
again soon. The processor treats the store as a write-combining memory write, which minimizes cache
pollution.
The diagram below illustrates the operation of this instruction:
mem64
XMM register
63

0

127

64 63

0

copy

Instruction Support
Form

Subset

MOVNTSD

SSE4A

Feature Flag
CPUID Fn8000_0001_ECX[SSE4A] (bit 6)

Software must check the CPUID bit once per program or library initialization before using the
instruction, or inconsistent behavior may result. For more on using the CPUID instruction to obtain
processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
MOVNTSD

Opcode
mem64, xmm

F2 0F 2B /r

Description
Stores one double-precision floating-point XMM
register value into a 64 bit memory location. Treat as
a non-temporal store.

Related Instructions
MOVNTDQ, MOVNTI, MOVNTPD, MOVNTPS, MOVNTQ, MOVNTSS
rFLAGS Affected
None

218

MOVNTSD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Real

Virtual
8086 Protected

Cause of Exception

X

X

X

The SSE4A instructions are not supported, as
indicated by CPUID Fn8000_0001_ECX[SSE4A] = 0.

X

X

X

The emulate bit (CR0.EM) was set to 1.

X

X

X

The operating-system FXSAVE/FXRSTOR support bit
(CR4.OSFXSR) was cleared to 0.

Device not available,
#NM

X

X

X

The task-switch bit (CR0.TS) was set to 1.

Stack, #SS

X

X

X

A memory address exceeded the stack segment limit
or was non-canonical.

X

X

X

A memory address exceeded a data segment limit or
was non-canonical.

X

A null data segment was used to reference memory.

X

The destination operand was in a non-writable
segment.

Invalid opcode, #UD

General protection,
#GP

Page fault, #PF

X

X

A page fault resulted from executing the instruction.

Alignment check, #AC

X

X

An unaligned memory reference was performed while
alignment checking was enabled.

Instruction Reference

MOVNTSD

219

AMD64 Technology

26568—Rev. 3.22—May 2018

MOVNTSS

Move Non-Temporal Scalar
Single-Precision Floating-Point

Stores one single-precision floating-point value from an XMM register to a 32-bit memory location.
This instruction indicates to the processor that the data is non-temporal, and is unlikely to be used
again soon. The processor treats the store as a write-combining memory write, which minimizes cache
pollution.
The diagram below illustrates the operation of this instruction:
mem32
XMM register
31

0

127

31

0

copy

Instruction Support
Form

Subset

MOVNTSS

SSE4A

Feature Flag
CPUID Fn8000_0001_ECX[SSE4A] (bit 6)

Software must check the CPUID bit once per program or library initialization before using the
instruction, or inconsistent behavior may result. For more on using the CPUID instruction to obtain
processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
MOVNTSS

Opcode
mem32, xmm

F3 0F 2B /r

Description
Stores one single-precision floating-point XMM
register value into a 32-bit memory location. Treat as
a non-temporal store.

Related Instructions
MOVNTDQ, MOVNTI, MOVNTOPD, MOVNTPS, MOVNTQ, MOVNTSD
rFLAGS Affected
None

220

MOVNTSS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Real

Virtual
8086 Protected

Cause of Exception

X

X

X

The SSE4A instructions are not supported, as
indicated by CPUID Fn8000_0001_ECX[SSE4A] = 0.

X

X

X

The emulate bit (CR0.EM) was set to 1.

X

X

X

The operating-system FXSAVE/FXRSTOR support bit
(CR4.OSFXSR) was cleared to 0.

Device not available,
#NM

X

X

X

The task-switch bit (CR0.TS) was set to 1.

Stack, #SS

X

X

X

A memory address exceeded the stack segment limit
or was non-canonical.

X

X

X

A memory address exceeded a data segment limit or
was non-canonical.

X

A null data segment was used to reference memory.

X

The destination operand was in a non-writable
segment.

Invalid opcode, #UD

General protection,
#GP

Page fault, #PF

X

X

A page fault resulted from executing the instruction.

Alignment check, #AC

X

X

An unaligned memory reference was performed while
alignment checking was enabled.

Instruction Reference

MOVNTSS

221

AMD64 Technology

26568—Rev. 3.22—May 2018

MOVQ
VMOVQ

Move
Quadword

Moves 64-bit values. The source is either the low-order quadword of an XMM register or a 64-bit
memory location. The destination is either the low-order quadword of an XMM register or a 64-bit
memory location. When the destination is a register, the 64-bit value is zero-extended to 128 bits.
There are legacy and extended forms of the instruction:
MOVQ

There are two encodings:
• The source operand is either an XMM register or a 64-bit memory location. The destination is an
XMM register. The 64-bit value is zero-extended to 128 bits.
• The source operand is an XMM register. The destination is either an XMM register or a 64-bit
memory location. When the destination is a register, the 64-bit value is zero-extended to 128 bits.
Bits [255:128] of the YMM register that corresponds to the destination are not affected.
VMOVQ

The extended form of the instruction has three 128-bit encodings:
• The source operand is an XMM register. The destination is an XMM register. The 64-bit value is
zero-extended to 128 bits.
• The source operand is a 64-bit memory location. The destination is an XMM register. The 64-bit
value is zero-extended to 128 bits.
• The source operand is an XMM register. The destination is either an XMM register or a 64-bit
memory location. When the destination is a register, the 64-bit value is zero-extended to 128 bits.
Bits [255:128] of the YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

MOVQ

SSE2

VMOVQ

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

222

MOVQ, VMOVQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Opcode

Description

MOVQ xmm1, xmm2/mem64

Mnemonic

F3 0F 7E /r

Move a zero-extended 64-bit value from xmm2 or mem64
to xmm1.

MOVQ xmm1/mem64, xmm2

66 0F D6 /r

Move a 64-bit value from xmm2 to xmm1 or mem64.
Zero-extends for register destination.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VMOVQ xmm1, xmm2

C4

RXB.00001

X.1111.0.10

7E /r

VMOVQ xmm1, mem64

C4

RXB.00001

X.1111.0.10

7E /r

VMOVQ xmm1/mem64, xmm2

C4

RXB.00001

X.1111.0.01

D6 /r

Related Instructions
(V)MOVD, (V)MOVDQA, (V)MOVDQU
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S
S

S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

S
S

X
S
S
A
A
A
A
A
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Write to a read-only data segment.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

MOVQ, VMOVQ

223

AMD64 Technology

MOVSD
VMOVSD

26568—Rev. 3.22—May 2018

Move
Scalar Double-Precision Floating-Point

Moves scalar double-precision floating point values. The source is either a low-order quadword of an
XMM register or a 64-bit memory location. The destination is either a low-order quadword of an
XMM register or a 64-bit memory location.
There are legacy and extended forms of the instruction:
MOVSD

There are two encodings.
• The source operand is either an XMM register or a 64-bit memory location. The destination is an
XMM register. If the source operand is a register, bits [127:64] of the destination are not affected.
If the source operand is a 64-bit memory location, the upper 64 bits of the destination are cleared.
• The source operand is an XMM register. The destination is either an XMM register or a 64-bit
memory location. When the destination is a register, bits [127:64] of the destination are not
affected.
Bits [255:128] of the YMM register that corresponds to the destination are not affected.
VMOVSD

The extended form of the instruction has four 128-bit encodings. Two of the encodings are functionally equivalent.
• The source operand is a 64-bit memory location. The destination is an XMM register. The 64-bit
value is zero-extended to 128 bits.
• The source operand is an XMM register. The destination is a 64-bit memory location.
• Two functionally-equivalent encodings:
There are two source XMM registers. The destination is an XMM register. Bits [127:64] of the first
source register are copied to bits [127:64] of the destination; the 64-bit value in bits [63:0] of the
second source register is written to bits [63:0] of the destination.
Bits [255:128] of the YMM register that corresponds to the destination are cleared.
This instruction must not be confused with the MOVSD (move string doubleword) instruction of the
general-purpose instruction set. Assemblers can distinguish the instructions by the number and type
of operands.
Instruction Support
Form

Subset

MOVSD

SSE2

VMOVSD

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

224

MOVSD, VMOVSD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic

Opcode

Description

MOVSD xmm1, xmm2/mem64

F2 0F 10 /r

Moves a 64-bit value from xmm2 or mem64 to xmm1. Zero
extends to 128 bits when source operand is memory.

MOVSD xmm1/mem64, xmm2

F2 0F 11 /r

Moves a 64-bit value from xmm2 to xmm1 or mem64.
Encoding 1

Mnemonic

VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VMOVSD xmm1, mem64

C4

RXB.00001

X.1111.X.11

10 /r

VMOVSD mem64, xmm1

C4

RXB.00001

X.1111.X.11

11 /r

VMOVSD xmm1, xmm2, xmm3 2

C4

RXB.00001

X.src.X.11

10 /r

VMOVSD xmm1, xmm2, xmm3 2

C4

RXB.00001

X.src.X.11

11 /r

Note 1: The addressing mode differentiates between the two operand form (where one operand is a memory location) and
the three operand form (where all operands are held in registers).
Note 2: These two encodings are functionally equivalent.

Related Instructions
(V)MOVAPD, (V)MOVHPD, (V)MOVLPD, (V)MOVMSKPD, (V)MOVUPD
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S
S

S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

S
S

X
S
S
A
A
A
A
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b (for memory destination enoding only).
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Write to a read-only data segment.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

MOVSD, VMOVSD

225

AMD64 Technology

26568—Rev. 3.22—May 2018

MOVSHDUP
VMOVSHDUP

Move High and Duplicate
Single-Precision

Moves and duplicates odd-indexed single-precision floating-point values.
There are legacy and extended forms of the instruction:
MOVSHDUP

Moves and duplicates two odd-indexed single-precision floating-point values.
The source operand is an XMM register or a 128-bit memory location. The destination is an XMM
register. Bits [127:96] of the source are duplicated and written to bits [127:96] and [95:64] of the destination. Bits [63:32] of the source are duplicated and written to bits [63:32] and [31:0] of the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected.
VMOVSHDUP

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Moves and duplicates two odd-indexed single-precision floating-point values.
The source operand is an XMM register or a 128-bit memory location. The destination is an XMM
register. Bits [127:96] of the source are duplicated and written to bits [127:96] and [95:64] of the destination. Bits [63:32] of the source are duplicated and written to bits [63:32] and [31:0] of the destination. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

Moves and duplicates four odd-indexed single-precision floating-point values.
The source operand is a YMM register or a 256-bit memory location. The destination is a YMM register. Bits [255:224] of the source are duplicated and written to bits [255:224] and [223:192] of the
destination. Bits [191:160] of the source are duplicated and written to bits [191:160] and [159:128] of
the destination. Bits [127:96] of the source are duplicated and written to bits [127:96] and [95:64] of
the destination. Bits [63:32] of the source are duplicated and written to bits [63:32] and [31:0] of the
destination.
Instruction Support
Form

Subset

Feature Flag

MOVSHDUP

SSE3

CPUID Fn0000_0001_ECX[SSE3] (bit 0)

VMOVSHDUP

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

226

MOVSHDUP, VMOVSHDUP

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic
MOVSHDUP xmm1, xmm2/mem128

Opcode

Description

F3 0F 16 /r

Moves and duplicates two odd-indexed singleprecision floating-point values in xmm2 or mem128.
Writes to xmm1.

Mnemonic

Encoding
VEX RXB.map_select

W.vvvv.L.pp

Opcode

VMOVSHDUP xmm1, xmm2/mem128

C4

RXB.00001

X.1111.0.10

16 /r

VMOVSHDUP ymm1, ymm2/mem256

C4

RXB.00001

X.1111.1.10

16 /r

Related Instructions
(V)MOVDDUP, (V)MOVSLDUP
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

S

S

A
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

X
S
S
A
A
A
A
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

MOVSHDUP, VMOVSHDUP

227

AMD64 Technology

26568—Rev. 3.22—May 2018

MOVSLDUP
VMOVSLDUP

Move Low and Duplicate
Single-Precision

Moves and duplicates even-indexed single-precision floating-point values.
There are legacy and extended forms of the instruction:
MOVSLDUP

Moves and duplicates two even-indexed single-precision floating-point values.
The source operand is an XMM register or a 128-bit memory location. The destination is an XMM
register. Bits [95:64] of the source are duplicated and written to bits [127:96] and [95:64] of the destination. Bits [31:0] of the source are duplicated and written to bits [63:32] and [31:0] of the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected.
VMOVSLDUP

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Moves and duplicates two even-indexed single-precision floating-point values.
The source operand is an XMM register or a 128-bit memory location. The destination is an XMM
register. Bits [95:64] of the source are duplicated and written to bits [127:96] and [95:64] of the destination. Bits [31:0] of the source are duplicated and written to bits [63:32] and [31:0] of the destination. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

Moves and duplicates four even-indexed single-precision floating-point values.
The source operand is a YMM register or a 256-bit memory location. The destination is a YMM register. Bits [223:192] of the source are duplicated and written to bits [255:224] and [223:192] of the
destination. Bits [159:128] of the source are duplicated and written to bits [191:160] and [159:128] of
the destination. Bits [95:64] of the source are duplicated and written to bits [127:96] and [95:64] of
the destination. Bits [31:0] of the source are duplicated and written to bits [63:32] and [31:0] of the
destination.
Instruction Support
Form

Subset

Feature Flag

MOVSLDUP

SSE3

CPUID Fn0000_0001_ECX[SSE3] (bit 0)

VMOVSLDUP

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

228

MOVSLDUP, VMOVSLDUP

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic
MOVSLDUP xmm1, xmm2/mem128

Opcode

Description

F3 0F 12 /r

Moves and duplicates two even-indexed singleprecision floating-point values in xmm2 or mem128.
Writes to xmm1.

Mnemonic

Encoding
VEX RXB.map_select

W.vvvv.L.pp

Opcode

VMOVSLDUP xmm1, xmm2/mem128

C4

RXB.00001

X.1111.0.10

12 /r

VMOVSLDUP ymm1, ymm2/mem256

C4

RXB.00001

X.1111.1.10

12 /r

Related Instructions
(V)MOVDDUP, (V)MOVSHDUP
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

S

S

A
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

X
S
S
A
A
A
A
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

MOVSLDUP, VMOVSLDUP

229

AMD64 Technology

MOVSS
VMOVSS

26568—Rev. 3.22—May 2018

Move
Scalar Single-Precision Floating-Point

Moves scalar single-precision floating point values. The source is either a low-order doubleword of
an XMM register or a 32-bit memory location. The destination is either a low-order doubleword of an
XMM register or a 32-bit memory location.
There are legacy and extended forms of the instruction:
MOVSS

There are three encodings.
• The source operand is an XMM register. The destination is an XMM register. Bits [127:32] of the
destination are not affected.
• The source operand is a 32-bit memory location. The destination is an XMM register. The 32-bit
value is zero-extended to 128 bits.
• The source operand is an XMM register. The destination is either an XMM register or a 32-bit
memory location. When the destination is a register, bits [127:32] of the destination are not
affected.
Bits [255:128] of the YMM register that corresponds to the source are not affected.
VMOVSS

The extended form of the instruction has four 128-bit encodings. Two of the encodings are functionally equivalent.
• The source operand is a 32-bit memory location. The destination is an XMM register. The 32-bit
value is zero-extended to 128 bits.
• The source operand is an XMM register. The destination is a 32-bit memory location.
• Two functionally-equivalent encodings:
There are two source XMM registers. The destination is an XMM register. Bits [127:64] of the first
source register are copied to bits [127:64] of the destination; the 32-bit value in bits [31:0] of the
second source register is written to bits [31:0] of the destination.
Bits [255:128] of the YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

Feature Flag

MOVSS

SSE1

CPUID Fn0000_0001_EDX[SSE] (bit 25)

VMOVSS

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

230

MOVSS, VMOVSS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic

Opcode

Description

MOVSS xmm1, xmm2

F3 0F 10 /r

Moves a 32-bit value from xmm2 to xmm1.

MOVSS xmm1, mem32

F3 0F 10 /r

Moves a zero-extended 32-bit value from mem32 to xmm1.

MOVSS xmm2/mem32, xmm1

F3 0F 11 /r

Moves a 32-bit value from xmm1 to xmm2 or mem32.

Mnemonic

Encoding1
VEX RXB.map_select

VMOVSS xmm1, mem32
VMOVSS mem32, xmm1

W.vvvv.L.pp

Opcode
10 /r

C4

RXB.00001

X.1111.X.10

C4

RXB.00001

X.1111.X.10

11 /r

2

C4

RXB.00001

X.src.X.10

10 /r

VMOVSS xmm1, xmm2, xmm3 2

C4

RXB.00001

X.src.X.10

11 /r

VMOVSS xmm1, xmm2, xmm3

Note 1: The addressing mode differentiates between the two operand form (where one operand is a memory location) and
the three operand form (where all operands are held in registers).
Note 2: These two encodings are functionally equivalent.

Related Instructions
(V)MOVAPS, (V)MOVHLPS, (V)MOVHPS, (V)MOVLHPS, (V)MOVLPS, (V)MOVMSKPS,
(V)MOVUPS
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S
S

S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

S
S

X
S
S
A
A
A
A
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b (for memory destination enoding only).
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Write to a read-only data segment.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

MOVSS, VMOVSS

231

AMD64 Technology

MOVUPD
VMOVUPD

26568—Rev. 3.22—May 2018

Move Unaligned
Packed Double-Precision Floating-Point

Moves packed double-precision floating-point values. Values can be moved from a register or memory location to a register; or from a register to a register or memory location.
A memory operand that is not aligned does not cause a general-protection exception.
There are legacy and extended forms of the instruction:
MOVUPD

Moves two double-precision floating-point values. There are encodings for each type of move.
• The source operand is either an XMM register or a 128-bit memory location. The destination
operand is an XMM register.
• The source operand is an XMM register. The destination operand is either an XMM register or a
128-bit memory location.
Bits [255:128] of the YMM register that corresponds to the destination are not affected.
VMOVUPD

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Moves two double-precision floating-point values. There are encodings for each type of move.
• The source operand is either an XMM register or a 128-bit memory location. The destination
operand is an XMM register.
• The source operand is an XMM register. The destination operand is either an XMM register or a
128-bit memory location.
Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

Moves four double-precision floating-point values. There are encodings for each type of move.
• The source operand is either a YMM register or a 256-bit memory location. The destination
operand is a YMM register.
• The source operand is a YMM register. The destination operand is either a YMM register or a
256-bit memory location.
Instruction Support
Form

Subset

MOVUPD

SSE2

VMOVUPD

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

232

MOVUPD, VMOVUPD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic

Opcode

Description

MOVUPD xmm1, xmm2/mem128

66 0F 10 /r

Moves two packed double-precision floating-point
values from xmm2 or mem128 to xmm1.

MOVUPD xmm1/mem128, xmm2

66 0F 11 /r

Moves two packed double-precision floating-point
values from xmm1 or mem128 to xmm2.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VMOVUPD xmm1, xmm2/mem128

C4

RXB.00001

X.1111.0.01

10 /r

VMOVUPD xmm1/mem128, xmm2

C4

RXB.00001

X.1111.0.01

11 /r

VMOVUPD ymm1, ymm2/mem256

C4

RXB.00001

X.1111.1.01

10 /r

VMOVUPD ymm1/mem256, ymm2

C4

RXB.00001

X.1111.1.01

11 /r

Related Instructions
(V)MOVAPD, (V)MOVHPD, (V)MOVLPD, (V)MOVMSKPD, (V)MOVSD
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S
S

S
S
S
S
S

Alignment check, #AC
S
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Instruction Reference

X
S
S
A
A
A
A
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Write to a read-only data segment.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

MOVUPD, VMOVUPD

233

AMD64 Technology

MOVUPS
VMOVUPS

26568—Rev. 3.22—May 2018

Move Unaligned
Packed Single-Precision Floating-Point

Moves packed single-precision floating-point values. Values can be moved from a register or memory
location to a register; or from a register to a register or memory location.
A memory operand that is not aligned does not cause a general-protection exception.
There are legacy and extended forms of the instruction:
MOVUPS

Moves four single-precision floating-point values. There are encodings for each type of move.
• The source operand is either an XMM register or a 128-bit memory location. The destination
operand is an XMM register.
• The source operand is an XMM register. The destination operand is either an XMM register or a
128-bit memory location.
Bits [255:128] of the YMM register that corresponds to the destination are not affected.
VMOVUPS

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Moves four single-precision floating-point values. There are encodings for each type of move.
• The source operand is either an XMM register or a 128-bit memory location. The destination
operand is an XMM register.
• The source operand is an XMM register. The destination operand is either an XMM register or a
128-bit memory location.
Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

Moves eight single-precision floating-point values. There are encodings for each type of move.
• The source operand is either a YMM register or a 256-bit memory location. The destination
operand is a YMM register.
• The source operand is a YMM register. The destination operand is either a YMM register or a
256-bit memory location.
Instruction Support
Form

Subset

Feature Flag

MOVUPS

SSE1

CPUID Fn0000_0001_EDX[SSE] (bit 25)

VMOVUPS

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

234

MOVUPS, VMOVUPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic

Opcode

MOVUPS xmm1, xmm2/mem128

0F 10 /r

Moves four packed single-precision floating-point
values from xmm2 or unaligned mem128 to xmm1.

Description

MOVUPS xmm1/mem128, xmm2

0F 11 /r

Moves four packed single-precision floating-point
values from xmm1 or unaligned mem128 to xmm2.

Mnemonic

Encoding
W.vvvv.L.pp

Opcode

VMOVUPS xmm1, xmm2/mem128

VEX RXB.map_select
C4

RXB.00001

X.1111.0.00

10 /r

VMOVUPS xmm1/mem128, xmm2

C4

RXB.00001

X.1111.0.00

11 /r

VMOVUPS ymm1, ymm2/mem256

C4

RXB.00001

X.1111.1.00

10 /r

VMOVUPS ymm1/mem256, ymm2

C4

RXB.00001

X.1111.1.00

11 /r

Related Instructions
(V)MOVAPS, (V)MOVHLPS, (V)MOVHPS, (V)MOVLHPS, (V)MOVLPS, (V)MOVMSKPS,
(V)MOVSS
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S
S

S
S
S
S
S

Alignment check, #AC
S
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Instruction Reference

X
S
S
A
A
A
A
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Write to a read-only data segment.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

MOVUPS, VMOVUPS

235

AMD64 Technology

MPSADBW
VMPSADBW

26568—Rev. 3.22—May 2018

Multiple Sum of Absolute Differences

Calculates 8 or 16 sums of absolute differences of sequentially selected groups of four contiguous
unsigned byte integers in the first source operand and a selected group of four contiguous unsigned
byte integers in a second source operand and writes the eight or sixteen 16-bit unsigned integer sums
to sequential words of the destination register. The 256-bit form of the instruction additionally performs a similar but independent calculation using the upper 128 bits of the source operands.
Figure 2-2 on page 238 provides a graphical representation of the operation of the instruction. The
following description accompanies it.
The computation uses as inputs 11 bytes from the first source operand and 4 bytes in the second
source operand. Bit fields in the imm8 operand specify the index of the right-most byte of each group.
Bits [1:0] of the immediate operand determine the index of the right-most byte of four contiguous
bytes within the second source operand used in the operation that produces the result (or, in the case
of the 256-bit form of the instruction, the lower 128 bits of the result). Bit 2 of the immediate operand
determines the right-most index of the 11contiguous bytes in the first source operand used in the same
calculation. In the 128-bit form of the instruction, bits [7:3] of the immediate operand are ignored.
Bits [4:3] of the immediate operand determine the index of the right-most byte of four contiguous
bytes within the second source operand used in the operation that produces the upper 128 bits of the
result in the 256-bit form of the instruction. Bit 5 of the immediate operand determines the right-most
index of the 11 contiguous bytes within in the upper half of the first 256-bit source operand used in
the same calculation. In the 256-bit form of the instruction, bits [7:6] of the immediate operand are
ignored.
Each word of the destination register receives the result of a separate computation of the sum of absolute differences function applied to a specific pair of four-element vectors derived from the source
operands. The sum of absolute differences function SumAbsDiff (A, B) takes as input two 4-element
unsigned 8-bit integer vectors and produces a single unsigned 16-bit integer result. The function is
defined as:
SumAbsDiff(A, B) = | A[0]-B[0] | + | A[1]-B[1] | + | A[2]-B[2] | + | A[3]-B[3] |

The sum of absolute differences function produces a quantitative measure of the difference between
two 4-element vectors. Each of the calculations that generates a result uses this metric to assess the
difference between the selected 4-byte vector from operand 2 (B in the above equation) with each of
eight overlapping 4-byte vectors (A in the equation) selected sequentially from the first source operand.
The right-most word (Word 0) of the destination receives the result of the comparison of the rightmost 4 bytes of the selected group of 11 from operand 1 (src1[ i1+3 : i1], as shown in the figure) to
the selected 4 bytes from operand 2 (src2[j1+3:j1], in the figure). Word 1 of the destination receives
the result of the comparison of the four bytes starting at an offset of 1 from the right-most byte of the
group of 11 (src1[ i1+4 : i1+1] in the figure) to the 4 bytes from operand 2. Word 2 of the destination
receives the result of the comparison of the four bytes starting at an offset of 2 from the right-most
byte of the group of 11 (src1[ i1+5 : i1+2], in the figure) to the selected 4 bytes from operand 2. This
continues in like manner until the left-most four bytes of the 11 are compared to the 4 bytes from
operand 2 with the result being written to Word 7. This completes the generation of the lower 128 bits
of the result.

236

MPSADBW, VMPSADBW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

The generation of the upper 128 bits of the result for the 256-bit form of the instruction is performed
in like manner using separately selected groups of bytes from the upper half of the 256-bit operands,
as described above.
The following is a more formal description of the operation of the (V)MPSADBW instruction:
For both the 128-bit and 256-bit form of the instruction, the following set of operations is performed:
src1 and src2 are byte vectors that overlay the first and second source operand respectively.
dest is a word vector that overlays the destination register.
tmp1[ ] is an array of 4-element vectors derived from the first source operand.
tmp2 and tmp3 are 4-element vectors derived from the second source operand.
i1 = imm8[2] * 4
j1= imm8[1:0] * 4
tmp1[0] = {src1[i1+3], src1[i1+2], src1[i1+1], src1[i1]}
tmp1[1] = {src1[i1+4], src1[i1+3], src1[i1+2], src1[i1+1]}
tmp1[2] = {src1[i1+5], src1[i1+4], src1[i1+3], src1[i1+2]}
tmp1[3] = {src1[i1+6], src1[i1+5], src1[i1+4], src1[i1+3]}
tmp1[4] = {src1[i1+7], src1[i1+6], src1[i1+5], src1[i1+4]}
tmp1[5] = {src1[i1+8], src1[i1+7], src1[i1+6], src1[i1+5]}
tmp1[6] = {src1[i1+9], src1[i1+8], src1[i1+7], src1[i1+6]}
tmp1[7] = {src1[i1+10], src1[i1+9], src1[i1+8], src1[i1+7]}
tmp2 = {src2[j1+3], src2[j1+2], src2[j1+1], src2[j1]}
dest[0] = SumAbsDiff(tmp1[0], tmp2)
dest[1] = SumAbsDiff(tmp1[1], tmp2)
dest[2] = SumAbsDiff(tmp1[2], tmp2)
dest[3] = SumAbsDiff(tmp1[3], tmp2)
dest[4] = SumAbsDiff(tmp1[4], tmp2)
dest[5] = SumAbsDiff(tmp1[5], tmp2)
dest[6] = SumAbsDiff(tmp1[6], tmp2)
dest[7] = SumAbsDiff(tmp1[7], tmp2)

Additionally, for the 256-bit form of the instruction, the following set of operations is performed:
i2 = imm8[5] * 4 + 16
j2= imm8[4:3] * 4 +16
tmp1[8] = {src1[i2+3], src1[i2+2], src1[i2+1], src1[i2]}
tmp1[9] = {src1[i2+4], src1[i2+3], src1[i2+2], src1[i2+1]}
tmp1[10] = {src1[i2+5], src1[i2+4], src1[i2+3], src1[i2+2]}
tmp1[11] = {src1[i2+6], src1[i2+5], src1[i2+4], src1[i2+3]}
tmp1[12] = {src1[i2+7], src1[i2+6], src1[i2+5], src1[i2+4]}
tmp1[13] = {src1[i2+8], src1[i2+7], src1[i2+6], src1[i2+5]}
tmp1[14] = {src1[i2+9], src1[i2+8], src1[i2+7], src1[i2+6]}
tmp1[15] = {src1[i2+10], src1[i2+9], src1[i2+8], src1[i2+7]}
tmp3 = {src2[j2+3], src2[j2+2], src2[j2+1], src2[j2]}
dest[8] = SumAbsDiff(tmp1[8], tmp3)
dest[9] = SumAbsDiff(tmp1[9], tmp3)
dest[10] = SumAbsDiff(tmp1[10], tmp3)
dest[11] = SumAbsDiff(tmp1[11], tmp3)

Instruction Reference

MPSADBW, VMPSADBW

237

AMD64 Technology

26568—Rev. 3.22—May 2018

dest[12] = SumAbsDiff(tmp1[12], tmp3)
dest[13] = SumAbsDiff(tmp1[13], tmp3)
dest[14] = SumAbsDiff(tmp1[14], tmp3)
dest[15] = SumAbsDiff(tmp1[15], tmp3)
src1[i1+10:i1+7]

src1[i1+9:i1+6]

src1[i1+8:i1+5]

src1[i1+7:i1+4]

src1[i1+6:i1+3]

src1[i1+5:i1+2]

src1[i1+4:i1+1]

src1[i1+3:i1]

bytes

bytes

bytes

bytes

bytes

bytes

bytes

bytes

src1[ j1+3:j1]

tmp1[7]

tmp1[6]

tmp1[5]

tmp1[4]

tmp1[3]

tmp1[2]

tmp1[1]

tmp1[0]
bytes

Σ |Δ|

Σ |Δ|

Σ |Δ|

Σ |Δ|

Σ |Δ|

Σ |Δ|

Σ |Δ|

Σ |Δ|

word 7

word 6

word 5

word 4

word 3

word 2

word 1

word 0

tmp2

Destination XMM Register (lower half of YMM Register)
src1[i2+10:i2+7]

src1[i2+9:i2+6]

src1[i2+8:i2+5]

src1[i2+7:i2+4]

src1[i2+6:i2+3]

src1[i2+5:i2+2]

src1[i2+4:i2+1]

src1[i2+3:i2]

bytes

bytes

bytes

bytes

bytes

bytes

bytes

bytes

src1[ j2+3:j2]

tmp1[15]

tmp1[14]

tmp1[13]

tmp1[12]

tmp1[11]

tmp1[10]

tmp1[9]

tmp1[8]
bytes

Σ |Δ|

Σ |Δ|

Σ |Δ|

Σ |Δ|

Σ |Δ|

Σ |Δ|

Σ |Δ|

Σ |Δ|

word 15

word 14

word 13

word 12

word 11

word 10

word 9

word 8

tmp3

Destination YMM Register (upper half)
Notes:
• i1 is a byte offset into source operand 1 (i1 = imm8[2] * 4).
• j1 is a byte offset into source operand 2 (j1 = imm8[1:0] * 4)
• i2 is a second byte offset into source operand 1 (i2 = imm8[5] * 4 + 16)
• j2 is a second byte offset into source operand 2 (j2 = imm8[4:3] * 4 + 16)
• Σ |Δ| represents the sum of absolute differences function which operates on two
4-element unsigned packed byte values and produces an unsigned 16-bit integer.
MPSADBW_instruct2.eps

Figure 2-2.

(V)MPSADBW Instruction

There are legacy and extended forms of the instruction:
MPSADBW

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.

238

MPSADBW, VMPSADBW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

VMPSADBW

The extended form of the instruction has 128-bit and 256-bit encodings:
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register. Bits [127:0] of the destination
receive the results of the first 8 sums of absolute differences calculation using the selected bytes of the
lower halves of the two source operands. Bits [255:128] of the destination receive the results of the
second 8 sums of absolute differences calculation using selected bytes of the upper halves of the two
source operands.
Instruction Support
Form

Subset

MPSADBW

SSE4.1

Feature Flag
CPUID Fn0000_0001_ECX[SSE41] (bit 19)

VMPSADBW 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VMPSADBW 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
MPSADBW xmm1, xmm2/mem128, imm8

Opcode

Description

66 0F 3A 42 /r ib

Sums absolute difference of groups of
four 8-bit integer in xmm1 and xmm2
or mem128. Writes results to xmm1.

Mnemonic

Encoding
VEX RXB.map_select

W.vvvv.L.pp

Opcode

VMPSADBW xmm1, xmm2, xmm3/mem128, imm8

C4

RXB.03

X.src1.0.01

42 /r ib

VMPSADBW ymm1, ymm2, ymm3/mem256, imm8

C4

RXB.03

X.src1.1.01

42 /r ib

Related Instructions
(V)PSADBW, (V)PABSB, (V)PABSD, (V)PABSW

Instruction Reference

MPSADBW, VMPSADBW

239

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

240

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

MPSADBW, VMPSADBW

Instruction Reference

26568—Rev. 3.22—May 2018

MULPD
VMULPD

AMD64 Technology

Multiply
Packed Double-Precision Floating-Point

Multiplies each packed double-precision floating-point value of the first source operand by the corresponding packed double-precision floating-point value of the second source operand and writes the
product of each multiplication into the corresponding quadword of the destination.
There are legacy and extended forms of the instruction:
MULPD

Multiplies two double-precision floating-point values in the first source XMM register by the corresponding double precision floating-point values in either a second XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that
corresponds to the destination are not affected.
VMULPD

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Multiplies two double-precision floating-point values in the first source XMM register by the corresponding double-precision floating-point values in either a second source XMM register or a 128-bit
memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that
corresponds to the destination are cleared.
YMM Encoding

Multiplies four double-precision floating-point values in the first source YMM register by the corresponding double precision floating-point values in either a second source YMM register or a 256-bit
memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

MULPD

SSE2

VMULPD

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
MULPD xmm1, xmm2/mem128

Opcode
66 0F 59 /r

Description
Multiplies two packed double-precision floatingpoint values in xmm1 by corresponding values in
xmm2 or mem128. Writes results to xmm1.

Mnemonic

Encoding
VEX RXB.map_select

W.vvvv.L.pp

Opcode

VMULPD xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src.0.01

59 /r

VMULPD ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src.1.01

59 /r

Instruction Reference

MULPD, VMULPD

241

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)MULPS, (V)MULSD, (V)MULSS
MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM

OM

11

10

ZM
9

DM
8

IM
7

DAZ
6

PE

UE

OE

M

M

M

5

4

3

ZE
2

DE

IE

M

M

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S

X
X
X
S
X

S

S

S

S

A
X

S

S

X

S
S
S

S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

242

X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.

MULPD, VMULPD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

MULPS
VMULPS

Multiply
Packed Single-Precision Floating-Point

Multiplies each packed single-precision floating-point value of the first source operand by the corresponding packed single-precision floating-point value of the second source operand and writes the
product of each multiplication into the corresponding elements of the destination.
There are legacy and extended forms of the instruction:
MULPS

Multiplies four single-precision floating-point values in the first source XMM register by the corresponding single-precision floating-point values of either a second source XMM register or a 128-bit
memory location. The first source register is also the destination. Bits [255:128] of the YMM register
that corresponds to the destination are not affected.
VMULPS

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Multiplies four single-precision floating-point values in the first source XMM register by the corresponding single-precision floating-point values of either a second source XMM register or a 128-bit
memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that
corresponds to the destination are cleared.
YMM Encoding

Multiplies eight single-precision floating-point values in the first source YMM register by the corresponding single-precision floating-point values of either a second source YMM register or a 256-bit
memory location. Writes the results to a third YMM register.
Instruction Support
Form

Subset

Feature Flag

MULPS

SSE1

CPUID Fn0000_0001_EDX[SSE] (bit 25)

VMULPS

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

MULPS xmm1, xmm2/mem128

0F 59 /r

Description
Multiplies four packed single-precision floating-point values
in xmm1 by corresponding values in xmm2 or mem128.
Writes the products to xmm1.

Mnemonic

Encoding
VEX RXB.map_select W.vvvv.L.pp Opcode

VMULPS xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.00

59 /r

VMULPS ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.00

59 /r

Instruction Reference

MULPS, VMULPS

243

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)MULPD, (V)MULSD, (V)MULSS
MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM

OM

11

10

ZM
9

DM
8

IM
7

DAZ
6

PE

UE

OE

M

M

M

5

4

3

ZE
2

DE

IE

M

M

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S

X
X
X
S
X

S

S

S

S

A
X

S

S

X

S
S
S

S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

244

X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.

MULPS, VMULPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

MULSD
VMULSD

Multiply
Scalar Double-Precision Floating-Point

Multiplies the double-precision floating-point value in the low-order quadword of the first source
operand by the double-precision floating-point value in the low-order quadword of the second source
operand and writes the product into the low-order quadword of the destination.
There are legacy and extended forms of the instruction:
MULSD

The first source operand is an XMM register and the second source operand is either an XMM register or a 64-bit memory location. The first source register is also the destination register. Bits [127:64]
of the destination and bits [255:128] of the corresponding YMM register are not affected.
VMULSD

The extended form of the instruction has a 128-bit encoding only.
The first source operand is an XMM register and the second source operand is either an XMM register or a 64-bit memory location. The destination is a third XMM register. Bits [127:64] of the first
source operand are copied to bits [127:64] of the destination. Bits [255:128] of the YMM register that
corresponds to the destination are cleared.
Instruction Support
Form

Subset

MULSD

SSE2

VMULSD

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
MULSD xmm1, xmm2/mem64

Opcode

Description

F2 0F 59 /r

Multiplies low-order double-precision floating-point values
in xmm1 by corresponding values in xmm2 or mem64.
Writes the products to xmm1.

Mnemonic
VMULSD xmm1, xmm2, xmm3/mem64

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.01

X.src1.X.11

59 /r

Related Instructions
(V)MULPD, (V)MULPS, (V)MULSS

Instruction Reference

MULSD, VMULSD

245

AMD64 Technology

26568—Rev. 3.22—May 2018

MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM

OM

11

10

ZM
9

DM
8

IM
7

DAZ
6

PE

UE

OE

M

M

M

5

4

3

ZE
2

DE

IE

M

M

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S

S
S
S
S
S

X
X
X
X
X
X

S

S

X

S
S
S

S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

246

X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.

MULSD, VMULSD

Instruction Reference

26568—Rev. 3.22—May 2018

MULSS
VMULSS

AMD64 Technology

Multiply Scalar Single-Precision Floating-Point

Multiplies the single-precision floating-point value in the low-order doubleword of the first source
operand by the single-precision floating-point value in the low-order doubleword of the second
source operand and writes the product into the low-order doubleword of the destination.
There are legacy and extended forms of the instruction:
MULSS

The first source operand is an XMM register and the second source operand is either an XMM register or a 32-bit memory location. The first source register is also the destination. Bits [127:32] of the
destination register and bits [255:128] of the corresponding YMM register are not affected.
VMULSS

The extended form of the instruction has a 128-bit encoding only.
The first source operand is an XMM register and the second source operand is either an XMM register or a 32-bit memory location. The destination is a third XMM register. Bits [127:32] of the first
source register are copied to bits [127:32] of the of the destination. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

Feature Flag

MULSS

SSE1

CPUID Fn0000_0001_EDX[SSE] (bit 25)

VMULSS

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
MULSS xmm1, xmm2/mem32

Opcode

Description

F3 0F 59 /r

Multiplies a single-precision floating-point value in the loworder doubleword of xmm1 by a corresponding value in
xmm2 or mem32. Writes the product to xmm1.

Mnemonic
VMULSS xmm1, xmm2, xmm3/mem32

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.01

X.src1.X.10

59 /r

Related Instructions
(V)MULPD, (V)MULPS, (V)MULSD

Instruction Reference

MULSS, VMULSS

247

AMD64 Technology

26568—Rev. 3.22—May 2018

MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM

OM

11

10

ZM
9

DM
8

IM
7

DAZ
6

PE

UE

OE

M

M

M

5

4

3

ZE
2

DE

IE

M

M

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S

S
S
S
S
S

X
X
X
X
X
X

S

S

X

S
S
S

S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

248

X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.

MULSS, VMULSS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

ORPD
VORPD

OR
Packed Double-Precision Floating-Point

Performs bitwise OR of two packed double-precision floating-point values in the first source operand
with the corresponding two packed double-precision floating-point values in the second source operand and writes the results into the corresponding elements of the destination.
There are legacy and extended forms of the instruction:
ORPD

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VORPD

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register and the second source operand is either a YMM register
or a 256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

ORPD

SSE2

VORPD

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
ORPD xmm1, xmm2/mem128

Opcode

Description

66 0F 56 /r

Performs bitwise OR of two packed double-precision
floating-point values in xmm1 with corresponding values in
xmm2 or mem128. Writes the result to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VORPD xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

56 /r

VORPD ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

56 /r

Related Instructions
(V)ANDNPS, (V)ANDPD, (V)ANDPS, (V)ORPS, (V)XORPD, (V)XORPS

Instruction Reference

ORPD, VORPD

249

AMD64 Technology

26568—Rev. 3.22—May 2018

MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

250

X
A
S
S

X
A
S
S

X

S
S
S
S
S

S
S
S
S
S

S

S

S

S

A
X

S
S
A
A
A
X
X
X
X
S
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Memory operand not 16-byte aligned and MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

ORPD, VORPD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

ORPS
VORPS

OR
Packed Single-Precision Floating-Point

Performs bitwise OR of the four packed single-precision floating-point values in the first source operand with the corresponding four packed single-precision floating-point values in the second source
operand, and writes the result into the corresponding elements of the destination.
There are legacy and extended forms of the instruction:
ORPS

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VORPS

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register and the second source operand is either a YMM register
or a 256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

ORPS

SSE1

CPUID Fn0000_0001_EDX[SSE] (bit 25)

VORPS

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

Description

ORPS xmm1, xmm2/mem128

0F 56 /r

Performs bitwise OR of four packed double-precision floatingpoint values in xmm1 with corresponding values in xmm2 or
mem128. Writes the result to xmm1.

Mnemonic

Encoding
VEX RXB.map_select W.vvvv.L.pp

Opcode

VORPS xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.00

56 /r

VORPS ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.00

56 /r

Related Instructions
(V)ANDNPD, (V)ANDNPS, (V)ANDPD, (V)ANDPS, (V)ORPD, (V)XORPD, (V)XORPS

Instruction Reference

ORPS, VORPS

251

AMD64 Technology

26568—Rev. 3.22—May 2018

MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

252

X
A
S
S

X
A
S
S

X

S
S
S
S
S

S
S
S
S
S

S

S

S

S

A
X

S
S
A
A
A
X
X
X
X
S
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Memory operand not 16-byte aligned and MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

ORPS, VORPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PABSB
VPABSB

Packed Absolute Value
Signed Byte

Computes the absolute value of 16 or 32 packed 8-bit signed integers in the source operand. Each
byte of the destination receives an unsigned 8-bit integer that is the absolute value of the signed 8-bit
integer in the corresponding byte of the source operand.
There are legacy and extended forms of the instruction:
PABSB

The source operand is an XMM register or a 128-bit memory location. The destination is an XMM
register. Bits [255:128] of the YMM register that corresponds to the destination are not affected.
VPABSB

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The source operand is an XMM register or a 128-bit memory location. The destination is an XMM
register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The source operand is a YMM register or a 256-bit memory location. The destination is a YMM register. All 32 bytes of the destination are written.
Instruction Support
Form

Subset

Feature Flag

PABSB

SSSE3

VPABSB 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPABSB 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_ECX[SSSE3] (bit 9)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PABSB xmm1, xmm2/mem128

Opcode

Description

0F 38 1C /r Computes the absolute value of each packed 8-bit signed
integer value in xmm2/mem128 and writes the 8-bit unsigned
results to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPABSB xmm1, xmm2/mem128

C4

RXB.02

X.1111.0.01

1C /r

VPABSB ymm1, ymm2/mem256

C4

RXB.02

X.1111.1.01

1C /r

Related Instructions
(V)PABSW, (V)PABSD

Instruction Reference

PABSB, VPABSB

253

AMD64 Technology

26568—Rev. 3.22—May 2018

MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

254

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PABSB, VPABSB

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PABSD
VPABSD

Packed Absolute Value
Signed Doubleword

Computes the absolute value of four or eight packed 32-bit signed integers in the source operand.
Each doubleword of the destination receives an unsigned 32-bit integer that is the absolute value of
the signed 32-bit integer in the corresponding doubleword of the source operand.
There are legacy and extended forms of the instruction:
PABSD

The source operand is an XMM register or a 128-bit memory location. The destination is an XMM
register. Bits [255:128] of the YMM register that corresponds to the destination are not affected.
VPABSD

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The source operand is an XMM register or a 128-bit memory location. The destination is an XMM
register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The source operand is a YMM register or a 256-bit memory location. The destination is a YMM register. All four doublewords of the destination are written.
Instruction Support
Form

Subset

Feature Flag

PABSD

SSSE3

VPABSD 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPABSD 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_ECX[SSSE3] (bit 9)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PABSD xmm1, xmm2/mem128

Opcode

Description

0F 38 1E /r Computes the absolute value of each packed 32-bit signed
integer value in xmm2/mem128 and writes the 32-bit
unsigned results to xmm1

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPABSD xmm1, xmm2/mem128

C4

RXB.02

X.1111.0.01

1E /r

VPABSD ymm1, ymm2/mem256

C4

RXB.02

X.1111.1.01

1E /r

Related Instructions
(V)PABSB, (V)PABSW

Instruction Reference

PABSD, VPABSD

255

AMD64 Technology

26568—Rev. 3.22—May 2018

MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

256

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PABSD, VPABSD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PABSW
VPABSW

Packed Absolute Value
Signed Word

Computes the absolute value of eight or sixteen packed 16-bit signed integers in the source operand.
Each word of the destination receives an unsigned 16-bit integer that is the absolute value of the
signed 16-bit integer in the corresponding word of the source operand.
There are legacy and extended forms of the instruction:
PABSW

The source operand is an XMM register or a 128-bit memory location. The destination is an XMM
register. Bits [255:128] of the YMM register that corresponds to the destination are not affected.
VPABSW

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The source operand is an XMM register or a 128-bit memory location. The destination is an XMM
register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The source operand is a YMM register or a 256-bit memory location. The destination is a YMM register. All 16 words of the destination are written.
Instruction Support
Form

Subset

Feature Flag

PABSW

SSSE3

VPABSW 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPABSW 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_ECX[SSSE3] (bit 9)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PABSW xmm1, xmm2/mem128

Opcode

Description

0F 38 1D /r Computes the absolute value of each packed 16-bit signed
integer value in xmm2/mem128 and writes the 16-bit
unsigned results to xmm1

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPABSW xmm1, xmm2/mem128

C4

RXB.02

X.1111.0.01

1D /r

VPABSW ymm1, ymm2/mem256

C4

RXB.02

X.1111.1.01

1D /r

Related Instructions
(V)PABSB, (V)PABSD

Instruction Reference

PABSW, VPABSW

257

AMD64 Technology

26568—Rev. 3.22—May 2018

MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

258

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PABSW, VPABSW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PACKSSDW
VPACKSSDW

Pack with Signed Saturation
Doubleword to Word

Converts four or eight 32-bit signed integers from the first source operand and the second source
operand into 16-bit signed integers and packs the results into the destination.
Positive source value greater than 7FFFh are saturated to 7FFFh; negative source values less than
8000h are saturated to 8000h.
Converted values from the first source operand are packed into the low-order words of the destination; converted values from the second source operand are packed into the high-order words of the
destination.
There are legacy and extended forms of the instruction:
PACKSSDW

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPACKSSDW

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PACKSSDW

SSE2

VPACKSSDW 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPACKSSDW 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PACKSSDW xmm1, xmm2/mem128

Opcode
66 0F 6B /r

Description
Converts 32-bit signed integers in xmm1 and xmm2
or mem128 into 16-bit signed integers with
saturation. Writes packed results to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPACKSSDW xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

6B /r

VPACKSSDW ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

6B /r

Instruction Reference

PACKSSDW, VPACKSSDW

259

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PACKSSWB, (V)PACKUSDW, (V)PACKUSWB
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

260

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PACKSSDW, VPACKSSDW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PACKSSWB
VPACKSSWB

Pack with Signed Saturation
Word to Byte

Converts eight or sixteen 16-bit signed integers from the first source operand and the second source
operand into sixteen or thirty two 8-bit signed integers and packs the results into the destination.
Positive source values greater than 7Fh are saturated to 7Fh; negative source values less than 80h are
saturated to 80h.
Converted values from the first source operand are packed into the low-order bytes of the destination;
converted values from the second source operand are packed into the high-order bytes of the destination.
There are legacy and extended forms of the instruction:
PACKSSWB

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPACKSSWB

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PACKSSWB

SSE2

VPACKSSWB 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPACKSSWB 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PACKSSWB xmm1, xmm2/mem128

Opcode

Description

66 0F 63 /r

Converts 16-bit signed integers in xmm1 and xmm2
or mem128 into 8-bit signed integers with saturation.
Writes packed results to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPACKSSWB xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

63 /r

VPACKSSWB ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

63 /r

Instruction Reference

PACKSSWB, VPACKSSWB

261

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PACKSSDW, (V)PACKUSDW, (V)PACKUSWB
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

262

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PACKSSWB, VPACKSSWB

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PACKUSDW
VPACKUSDW

Pack with Unsigned Saturation
Doubleword to Word

Converts four or eight 32-bit signed integers from the first source operand and the second source
operand into eight or sixteen 16-bit unsigned integers and packs the results into the destination.
Source values greater than FFFFh are saturated to FFFFh; source values less than 0000h are saturated
to 0000h.
Packs converted values from the first source operand into the low-order words of the destination;
packs converted values from the second source operand into the high-order words of the destination.
There are legacy and extended forms of the instruction:
PACKUSDW

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPACKUSDW

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PACKUSDW

SSE4.1

VPACKUSDW 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPACKUSDW 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_ECX[SSE41] (bit 19)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PACKUSDW xmm1, xmm2/mem128

Opcode

Description

66 0F 38 2B /r Converts 32-bit signed integers in xmm1 and xmm2
or mem128 into 16-bit unsigned integers with
saturation. Writes packed results to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPACKUSDW xmm1, xmm2, xmm3/mem128

C4

RXB.02

X.src1.0.01

2B /r

VPACKUSDW ymm1, ymm2, ymm3/mem256

C4

RXB.02

X.src1.0.01

2B /r

Instruction Reference

PACKUSDW, VPACKUSDW

263

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PACKSSDW, (V)PACKSSWB, (V)PACKUSWB
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

264

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PACKUSDW, VPACKUSDW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PACKUSWB
VPACKUSWB

Pack with Unsigned Saturation
Word to Byte

Converts eight or sixteen 16-bit signed integers from the first source operand and the second source
operand into sixteen or thirty two 8-bit unsigned integers and packs the results into the destination.
When a source value is greater than 7Fh it is saturated to FFh; when source value is less than 00h, it is
saturated to 00h.
Packs converted values from the first source operand into the low-order bytes of the destination;
packs converted values from the second source operand into the high-order bytes of the destination.
There are legacy and extended forms of the instruction:
PACKUSWB

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPACKUSWB

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PACKUSWB

SSE2

VPACKUSWB 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPACKUSWB 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PACKUSWB xmm1, xmm2/mem128

Opcode

Description

66 0F 67 /r

Converts 16-bit signed integers in xmm1 and xmm2
or mem128 into 8-bit signed integers with saturation.
Writes packed results to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPACKUSWB xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

67 /r

VPACKUSWB ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

67 /r

Instruction Reference

PACKUSWB, VPACKUSWB

265

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PACKSSDW, (V)PACKSSWB, (V)PACKUSDW
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

266

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PACKUSWB, VPACKUSWB

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PADDB
VPADDB

Packed Add
Bytes

Adds 16 or 32 packed 8-bit integer values in the first source operand to corresponding values in the
second source operand and writes the integer sums to the corresponding bytes of the destination.
This instruction operates on both signed and unsigned integers. When a result overflows, the carry is
ignored (neither the overflow nor carry bit in rFLAGS is set), and only the low-order 8 bits of each
result are written to the destination.
There are legacy and extended forms of the instruction:
PADDB

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPADDB

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PADDB

SSE2

VPADDB 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPADDB 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PADDB xmm1, xmm2/mem128

Opcode

Description

66 0F FC /r

Adds packed byte integer values in xmm1 and xmm2 or
mem128 Writes the sums to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPADDB xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

FC /r

VPADDB ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

FC /r

Instruction Reference

PADDB, VPADDB

267

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PADDD, (V)PADDQ, (V)PADDSB, (V)PADDSW, (V)PADDUSB, (V)PADDUSW, (V)PADDW
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

268

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PADDB, VPADDB

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PADDD
VPADDD

Packed Add
Doublewords

Adds 4 or 8 packed 32-bit integer value in the first source operand to corresponding values in the second source operand and writes integer sums to the corresponding doublewords of the destination.
This instruction operates on both signed and unsigned integers. When a result overflows, the carry is
ignored (neither the overflow nor carry bit in rFLAGS is set), and only the low-order 32 bits of each
result are written to the destination.
There are legacy and extended forms of the instruction:
PADDD

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPADDD

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PADDD

SSE2

VPADDD 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPADDD 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PADDD xmm1, xmm2/mem128

Opcode
66 0F FE /r

Description
Adds packed doubleword integer values in xmm1 and
xmm2 or mem128 Writes the sums to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPADDD xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

FE /r

VPADDD ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

FE /r

Instruction Reference

PADDD, VPADDD

269

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PADDB, (V)PADDQ, (V)PADDSB, (V)PADDSW, (V)PADDUSB, (V)PADDUSW, (V)PADDW
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

270

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PADDD, VPADDD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PADDQ
VPADDQ

Packed Add
Quadwords

Adds 2 or 4 packed 64-bit integer values in the first source operand to corresponding values in the
second source operand and writes the integer sums to the corresponding quadwords of the destination.
This instruction operates on both signed and unsigned integers. When a result overflows, the carry is
ignored (neither the overflow nor carry bit in rFLAGS is set), and only the low-order 64 bits of each
result are written to the destination.
There are legacy and extended forms of the instruction:
PADDQ

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPADDQ

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PADDQ

SSE2

VPADDQ 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPADDQ 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PADDQ xmm1, xmm2/mem128

Opcode
66 0F D4 /r

Description
Adds packed quadword integer values in xmm1 and
xmm2 or mem128 Writes the sums to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPADDQ xmm1, xmm2, xmm3/mem128

C4

RXB.00001

X.src1.0.01

D4 /r

VPADDQ ymm1, ymm2, ymm3/mem256

C4

RXB.00001

X.src1.1.01

D4 /r

Instruction Reference

PADDQ, VPADDQ

271

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PADDB, (V)PADDD, (V)PADDSB, (V)PADDSW, (V)PADDUSB, (V)PADDUSW, (V)PADDW
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

272

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PADDQ, VPADDQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PADDSB
VPADDSB

Packed Add with Signed Saturation
Bytes

Adds 16 or 32 packed 8-bit signed integer values in the first source operand to the corresponding values in the second source operand and writes the signed integer sums to corresponding bytes of the
destination.
Positive sums greater than 7Fh are saturated to 7Fh; negative sums less than 80h are saturated to 80h.
There are legacy and extended forms of the instruction:
PADDSB

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPADDSB

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PADDSB

SSE2

VPADDSB 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPADDSB 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PADDSB xmm1, xmm2/mem128

Opcode
66 0F EC /r

Description
Adds packed signed 8-bit integer values in xmm1 and
xmm2 or mem128 with signed saturation. Writes the
sums to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPADDSB xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

EC /r

VPADDSB ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

EC /r

Instruction Reference

PADDSB, VPADDSB

273

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PADDB, (V)PADDD, (V)PADDQ, (V)PADDSW, (V)PADDUSB, (V)PADDUSW, (V)PADDW
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

274

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PADDSB, VPADDSB

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PADDSW
VPADDSW

Packed Add with Signed Saturation
Words

Adds 8 or 16 packed 16-bit signed integer value in the first source operand to the corresponding values in the second source operand and writes the signed integer sums to the corresponding words of
the destination.
Positive sums greater than 7FFFh are saturated to 7FFFh; negative sums less than 8000h are saturated
to 8000h.
There are legacy and extended forms of the instruction:
PADDSW
The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPADDSW
The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PADDSW

SSE2

VPADDSW 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

VPADDSW 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PADDSW xmm1, xmm2/mem128

Opcode

Description

66 0F ED /r

Adds packed signed 16-bit integer values in xmm1 and
xmm2 or mem128 with signed saturation. Writes the
sums to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPADDSW xmm1, xmm2, xmm3/mem128

C4

RXB.00001

X.src1.0.01

ED /r

VPADDSW ymm1, ymm2, ymm3/mem256

C4

RXB.00001

X.src1.1.01

ED /r

Instruction Reference

PADDSW, VPADDSW

275

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PADDB, (V)PADDD, (V)PADDQ, (V)PADDSB, (V)PADDUSB, (V)PADDUSW, (V)PADDW
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

276

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PADDSW, VPADDSW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PADDUSB
VPADDUSB

Packed Add with Unsigned Saturation
Bytes

Adds 16 or 32 packed 8-bit unsigned integer values in the first source operand to the corresponding
values in the second source operand and writes the unsigned integer sums to the corresponding bytes
of the destination.
Sums greater than FFh are saturated to FFh.
There are legacy and extended forms of the instruction:
PADDUSB
The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPADDUSB
The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PADDUSB

SSE2

VPADDUSB 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

VPADDUSB 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PADDUSB xmm1, xmm2/mem128

Opcode
66 0F DC /r

Description
Adds packed unsigned 8-bit integer values in xmm1
and xmm2 or mem128 with unsigned saturation.
Writes the sums to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPADDUSB xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

DC /r

VPADDUSB ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

DC /r

Instruction Reference

PADDUSB, VPADDUSB

277

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PADDB, (V)PADDD, (V)PADDQ, (V)PADDSB, (V)PADDSW, (V)PADDUSW, (V)PADDW
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

278

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PADDUSB, VPADDUSB

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PADDUSW
VPADDUSW

Packed Add with Unsigned Saturation
Words

Adds 8 or 16 packed 16-bit unsigned integer value in the first source operand to the corresponding
values in the second source operand and writes the unsigned integer sums to the corresponding words
of the destination.
Sums greater than FFFFh are saturated to FFFFh.
There are legacy and extended forms of the instruction:
PADDUSW

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPADDUSW

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PADDUSW

SSE2

VPADDUSW 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPADDUSW 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PADDUSW xmm1, xmm2/mem128

Opcode

Description

66 0F DD /r

Adds packed unsigned 16-bit integer values in xmm1
and xmm2 or mem128 with unsigned saturation.
Writes the sums to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPADDUSW xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

DD /r

VPADDUSW ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

DD /r

Instruction Reference

PADDUSW, VPADDUSW

279

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PADDB, (V)PADDD, (V)PADDQ, (V)PADDSB, (V)PADDSW, (V)PADDUSB, (V)PADDW
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

280

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PADDUSW, VPADDUSW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PADDW
VPADDW

Packed Add
Words

Adds or 16 packed 16-bit integer value in the first source operand to the corresponding values in the
second source operand and writes the integer sums to the corresponding word of the destination.
This instruction operates on both signed and unsigned integers. When a result overflows, the carry is
ignored (neither the overflow nor carry bit in rFLAGS is set), and only the low-order 16 bits of each
result are written to the destination.
There are legacy and extended forms of the instruction:
PADDW

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPADDW

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PADDW

SSE2

VPADDW 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPADDW 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PADDW xmm1, xmm2/mem128

Opcode
66 0F FD /r

Description
Adds packed 16-bit integer values in xmm1 and xmm2
or mem128. Writes the sums to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPADDW xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

FD /r

VPADDW ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

FD /r

Instruction Reference

PADDW, VPADDW

281

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PADDB, (V)PADDD, (V)PADDQ, (V)PADDSB, (V)PADDSW, (V)PADDUSB, (V)PADDUSW
RFlags Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

282

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PADDW, VPADDW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PALIGNR
VPALIGNR

Packed Align Right

Concatenates one or two pairs of 16-byte values from the first and second source operands and rightshifts the concatenated values the number of bytes specified by the unsigned immediate operand.
Writes the least-significant 16 bytes of the shifted result to the destination or writes the least-significant 16 bytes of the two shifted results to the upper and lower halves of the destination.
For the 128-bit form of the instruction, the first and second 128-bit source operands are concatenated
to form a temporary 256-bit value with the first source operand occupying the most-significant half of
the temporary value. After the right-shift operation, the lower 128 bits of the result are written to the
destination.
For the 256-bit form of the instruction, the lower 16 bytes of the first and second source operands are
concatenated to form a first temporary 256-bit value with the bytes from the first source operand
occupying the most-significant half of the temporary value. The upper 16 bytes of the first and second
source operands are concatenated to form a second temporary 256-bit value with the bytes from the
first source operand occupying the most-significant half of the second temporary value. Both temporary values are right-shifted the number of bytes specified by the immediate operand. After the rightshift operation, the lower 16 bytes of the first temporary value are written to the lower 128 bits of the
destination and the lower 16 bytes of the second temporary value are written to the upper 128 bits of
the destination.
The binary value of the immediate operand determines the byte shift value. On each shift the mostsignificant byte is set to zero. When the byte shift value is greater than 31, the destination is zeroed.
There are two forms of the instruction.
PALIGNR

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPALIGNR

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

PALIGNR

SSSE3

VPALIGNR 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPALIGNR 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

Instruction Reference

Feature Flag
CPUID Fn0000_0001_ECX[SSSE3] (bit 9)

PALIGNR, VPALIGNR

283

AMD64 Technology

26568—Rev. 3.22—May 2018

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

PALIGNR xmm1, xmm2/mem128, imm8

Description

66 0F 3A 0F /r ib Right-shifts xmm1:xmm2/mem128 imm8
bytes. Writes shifted result to xmm1.

Mnemonic

Encoding
VEX RXB.map_select

W.vvvv.L.pp

Opcode

VPALIGNR xmm1, xmm2, xmm3/mem128, imm8

C4

RXB.03

X.src1.0.01

0F /r ib

VPALIGNR ymm1, ymm2, ymm3/mem256, imm8

C4

RXB.03

X.src1.1.01

0F /r ib

Related Instructions
None
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

284

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PALIGNR, VPALIGNR

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PAND
VPAND

Packed AND

Performs a bitwise AND of the packed values in the first and second source operands and writes the
result to the destination.
There are legacy and extended forms of the instruction:
PAND

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPAND

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PAND

SSE2

VPAND 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPAND 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PAND xmm1, xmm2/mem128

Opcode

Description

66 0F DB /r

Performs bitwise AND of values in xmm1 and xmm2 or
mem128. Writes the result to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPAND xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

DB /r

VPAND ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

DB /r

Related Instructions
(V)PANDN, (V)POR, (V)PXOR

Instruction Reference

PAND, VPAND

285

AMD64 Technology

26568—Rev. 3.22—May 2018

rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

286

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PAND, VPAND

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PANDN
VPANDN

Packed AND NOT

Generates the ones’ complement of the value in the first source operand and performs a bitwise AND
of the complement and the value in the second source operand. Writes the result to the destination.
There are legacy and extended forms of the instruction:
PANDN
The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPANDN
The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PANDN

SSE2

VPANDN 128-bit

AVX

CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPANDN 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PANDN xmm1, xmm2/mem128

Opcode
66 0F DF /r

Description
Generates ones’ complement of xmm1, then performs
bitwise AND with value in xmm2 or mem128. Writes the
result to xmm1.

Mnemonic

Encoding
VEX RXB.map_select

W.vvvv.L.pp

Opcode

VPANDN xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src.0.01

DF /r

VPANDN ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src.1.01

DF /r

Related Instructions
(V)PAND, (V)POR, (V)PXOR

Instruction Reference

PANDN, VPANDN

287

AMD64 Technology

26568—Rev. 3.22—May 2018

rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

288

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PANDN, VPANDN

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PAVGB
VPAVGB

Packed Average
Unsigned Bytes

Computes the rounded averages of 16 or 32 packed unsigned 8-bit integer values in the first source
operand and the corresponding values of the second source operand. Writes each average to the corresponding byte of the destination.
An average is computed by adding pairs of 8-bit integer values in corresponding positions in the two
operands, adding 1 to a 9-bit temporary sum, and right-shifting the temporary sum by one bit position.
There are legacy and extended forms of the instruction:
PAVGB

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPAVGB

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PAVGB

SSE2

VPAVGB 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPAVGB 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PAVGB xmm1, xmm2/mem128

Opcode

Description

66 0F E0 /r

Averages pairs of packed 8-bit unsigned integer values
in xmm1 and xmm2 or mem128. Writes the averages to
xmm1.

Mnemonic

Encoding
VEX RXB.map_select W.vvvv.L.pp

Opcode

VPAVGB xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

E0 /r

VPAVGB ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

E0 /r

Instruction Reference

PAVGB, VPAVGB

289

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
PAVGW
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

290

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PAVGB, VPAVGB

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PAVGW
VPAVGW

Packed Average
Unsigned Words

Computes the rounded average of packed unsigned 16-bit integer values in the first source operand
and the corresponding values of the second source operand. Writes each average to the corresponding
word of the destination.
An average is computed by adding pairs of 16-bit integer values in corresponding positions in the two
operands, adding 1 to a 17-bit temporary sum, and right-shifting the temporary sum by one bit position.
There are legacy and extended forms of the instruction:
PAVGW

The first source operand is an XMM register and the second source operand is an XMM register or
128-bit memory location. The destination is the same XMM register as the first source operand; the
upper 128-bits of the corresponding YMM register are not affected.
VPAVGW

The extended form of the instruction has128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PAVGW

SSE2

VPAVGW 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPAVGW 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PAVGW xmm1, xmm2/mem128

Opcode

Description

66 0F E3 /r

Averages pairs of packed 16-bit unsigned integer values
in xmm1 and xmm2 or mem128. Writes the averages to
xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPAVGW xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

E3 /r

VPAVGW ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

E3 /r

Instruction Reference

PAVGW, VPAVGW

291

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PAVGB
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

292

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PAVGW, VPAVGW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PBLENDVB
VPBLENDVB

Variable Blend
Packed Bytes

Copies packed bytes from either of two sources to a destination, as specified by a mask operand.
The mask is defined by the most significant bit of each byte of the mask operand. The position of a
mask bit corresponds to the position of the most significant bit of a copied value.
• When a mask bit = 0, the specified element of the first source is copied to the corresponding
position in the destination.
• When a mask bit = 1, the specified element of the second source is copied to the corresponding
position in the destination.
There are legacy and extended forms of the instruction:
PBLENDVB

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected. The mask operand is the implicit
register XMM0.
VPBLENDVB

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. The mask operand is a fourth XMM register
selected by bits [7:4] of an immediate byte.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register. The mask operand is a fourth
YMM register selected by bits [7:4] of an immediate byte.
Instruction Support
Form

Subset

Feature Flag

PBLENDVB

SSE4.1

CPUID Fn0000_0001_ECX[SSE41] (bit 19)

VPBLENDVB 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPBLENDVB 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

Instruction Reference

PBLENDVB, VPBLENDVB

293

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Encoding
Mnemonic

Opcode

PBLENDVB xmm1, xmm2/mem128

Description

66 0F 38 10 /r Selects byte values from xmm1 or xmm2/mem128,
depending on the value of corresponding mask bits
in XMM0. Writes the selected values to xmm1.

Mnemonic

Encoding
VEX RXB.map_select

W.vvvv.L.pp

Opcode

VPBLENDVB xmm1, xmm2, xmm3/mem128, xmm4

C4

RXB.03

0.src1.0.01

4C /r is4

VPBLENDVB ymm1, ymm2, ymm3/mem256, ymm4

C4

RXB.03

0.src1.1.01

4C /r is4

Related Instructions
(V)BLENDVPD, (V)BLENDVPS
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

294

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PBLENDVB, VPBLENDVB

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PBLENDW
VPBLENDW

Blend
Packed Words

Copies packed words from either of two sources to a destination, as specified by an immediate 8-bit
mask operand. For the 256-bit form, the same 8-bit mask is applied twice; once to select words to be
written to the lower 128 bits of the destination and again to select words to be written to the upper 128
bits of the destination.
Each bit of the mask selects a word from one of the source operands based on the position of the word
within the operand. Bit 0 of the mask selects the least-significant word (word 0) to be copied, bit 1
selects the next-most significant word (word 1), and so forth. Bit 7 selects word 7 (the most-significant word for 128-bit operands).
For the 256-bit operands, the mask is reused to select words in the upper 128-bits of the source operands to be copied. Bit 0 of the mask selects word 8, bit 1 selects word 9, and so forth. Finally, bit 7 of
the mask selects the word from position 15.
• When a mask bit = 0, the specified element of the first source is copied to the corresponding
position in the destination.
• When a mask bit = 1, the specified element of the second source is copied to the corresponding
position in the destination.
There are legacy and extended forms of the instruction:
PBLENDW

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPBLENDW

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PBLENDW

SSE4.1

CPUID Fn0000_0001_ECX[SSE41] (bit 19)

VPBLENDW 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPBLENDW 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

Instruction Reference

PBLENDW, VPBLENDW

295

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Encoding
Mnemonic

Opcode

PBLENDW xmm1, xmm2/mem128, imm8

66 0F 3A 0E /r ib

Description
Selects word values from xmm1 or
xmm2/mem128, as specified by imm8.
Writes the selected values to xmm1.

Mnemonic

Encoding
VEX RXB.map_select

W.vvvv.L.pp

Opcode

VPBLENDW xmm1, xmm2, xmm3/mem128, imm8

C4

RXB.03

X.src1.0.01

0E /r /ib

VPBLENDW ymm1, ymm2, ymm3/mem256, imm8

C4

RXB.03

X.src1.1.01

0E /r /ib

Related Instructions
(V)BLENDPD
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

296

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PBLENDW, VPBLENDW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PCLMULQDQ
VPCLMULQDQ

Carry-less Multiply
Quadwords

Performs a carry-less multiplication of a selected quadword element of the first source operand by a
selected quadword element of the second source operand and writes the product to the destination.
Carry-less multiplication, also known as binary polynomial multiplication, is the mathematical operation of computing the product of two operands without generating or propagating carries. It is an
essential component of cryptographic processing, and typically requires a large number of cycles.
The instruction provides an efficient means of performing the operation and is particularly useful in
implementing the Galois counter mode used in the Advanced Encryption Standard (AES). See
Appendix A on page 973 for additional information.
Bits 4 and 0 of an 8-bit immediate byte operand specify which quadword of each source operand to
multiply, as follows.
Mnemonic

Imm[0]

Imm[4]

Quadword Operands Selected

(V)PCLMULLQLQDQ

0

0

SRC1[63:0], SRC2[63:0]

(V)PCLMULHQLQDQ

1

0

SRC1[127:64], SRC2[63:0]

(V)PCLMULLQHQDQ

0

1

SRC1[63:0], SRC2[127:64]

(V)PCLMULHQHQDQ

1

1

SRC1[127:64], SRC2[127:64]

Alias mnemonics are provided for the various immediate byte combinations.
There are legacy and extended forms of the instruction:
PCLMULQDQ
The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPCLMULQDQ
The extended form of the instruction has a 128-bit encoding only.
The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

PCLMULQDQ

PCLMULQDQ

CPUID Fn0000_0001_ECX[PCLMULQDQ] (bit 1)

Feature Flag

VPCLMULQDQ

AVX or
PCLMULQDQ

CPUID Fn0000_0001_ECX[PCLMULQDQ] (bit 1) or
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

Instruction Reference

PCLMULQDQ, VPCLMULQDQ

297

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Encoding
Mnemonic

Opcode

PCLMULQDQ xmm1, xmm2/mem128, imm8

Description

66 0F 3A 44 /r ib Performs carry-less multiplication of a
selected quadword element of xmm1 by a
selected quadword element of xmm2 or
mem128. Elements are selected by bits 4
and 0 of imm8. Writes the product to xmm1.

Mnemonic

Encoding
VEX RXB.map_select W.vvvv.L.pp

VPCLMULQDQ xmm1, xmm2, xmm3/mem128, imm8

C4

RXB.00011

X.src.0.01

Opcode
44 /r ib

Related Instructions
(V)PMULDQ, (V)PMULUDQ
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

298

X
A
S
S

X
A
S
S

X

S
S
S
S
S

S
S
S
S
S

S

S

S

S

A
X

S
S
A
A
A
X
X
X
X
S
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Memory operand not 16-byte aligned and MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

PCLMULQDQ, VPCLMULQDQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PCMPEQB
VPCMPEQB

Packed Compare Equal
Bytes

Compares packed byte values in the first source operand to corresponding values in the second source
operand and writes a comparison result to the corresponding byte of the destination.
When values are equal, the result is FFh; when values are not equal, the result is 00h.
There are legacy and extended forms of the instruction:
PCMPEQB

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPCMPEQB

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PCMPEQB

SSE2

VPCMPEQB 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPCMPEQB 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PCMPEQB xmm1, xmm2/mem128

Opcode
66 0F 74 /r

Description
Compares packed bytes in xmm1 to packed bytes in
xmm2 or mem128. Writes results to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPCMPEQB xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

74 /r

VPCMPEQB ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

74 /r

Related Instructions
(V)PCMPEQD, (V)PCMPEQW, (V)PCMPGTB, (V)PCMPGTD, (V)PCMPGTW

Instruction Reference

PCMPEQB, VPCMPEQB

299

AMD64 Technology

26568—Rev. 3.22—May 2018

rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

300

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PCMPEQB, VPCMPEQB

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PCMPEQD
VPCMPEQD

Packed Compare Equal
Doublewords

Compares packed doubleword values in the first source operand to corresponding values in the second source operand and writes a comparison result to the corresponding doubleword of the destination.
When values are equal, the result is FFFFFFFFh; when values are not equal, the result is 00000000h.
There are legacy and extended forms of the instruction:
PCMPEQD

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPCMPEQD

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PCMPEQD

SSE2

VPCMPEQD 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPCMPEQD 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PCMPEQD xmm1, xmm2/mem128

Opcode
66 0F 76 /r

Description
Compares packed doublewords in xmm1 to packed
doublewords in xmm2 or mem128. Writes results to
xmm1.

Mnemonic

Encoding
W.vvvv.L.pp

Opcode

VPCMPEQD xmm1, xmm2, xmm3/mem128

VEX RXB.map_select
C4

RXB.01

X.src1.0.01

76 /r

VPCMPEQD ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

76 /r

Related Instructions
(V)PCMPEQB, (V)PCMPEQW, (V)PCMPGTB, (V)PCMPGTD, (V)PCMPGTW
Instruction Reference

PCMPEQD, VPCMPEQD

301

AMD64 Technology

26568—Rev. 3.22—May 2018

rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

302

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PCMPEQD, VPCMPEQD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PCMPEQQ
VPCMPEQQ

Packed Compare Equal
Quadwords

Compares packed quadword values in the first source operand to corresponding values in the second
source operand and writes a comparison result to the corresponding quadword of the destination.
When values are equal, the result is FFFFFFFFFFFFFFFFh; when values are not equal, the result is
0000000000000000h.
There are legacy and extended forms of the instruction:
PCMPEQQ

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPCMPEQQ

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PCMPEQQ

SSE4.1

CPUID Fn0000_0001_ECX[SSE41] (bit 19)

VPCMPEQQ 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPCMPEQQ 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PCMPEQQ xmm1, xmm2/mem128

Opcode

Description

66 0F 38 29 /r Compares packed quadwords in xmm1 to packed
quadwords in xmm2 or mem128. Writes results to
xmm1.

Mnemonic

Encoding
VEX RXB.map_select

W.vvvv.L.pp

Opcode

VPCMPEQQ xmm1, xmm2, xmm3/mem128

C4

RXB.02

X.src1.0.01

29 /r

VPCMPEQQ ymm1, ymm2, ymm3/mem256

C4

RXB.02

X.src1.1.01

29 /r

Instruction Reference

PCMPEQQ, VPCMPEQQ

303

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PCMPEQB, (V)PCMPEQW, (V)PCMPGTB, (V)PCMPGTD, (V)PCMPGTW
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

304

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PCMPEQQ, VPCMPEQQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PCMPEQW
VPCMPEQW

Packed Compare Equal
Words

Compares packed word values in the first source operand to corresponding values in the second
source operand and writes a comparison result to the corresponding word of the destination.
When values are equal, the result is FFFFh; when values are not equal, the result is 0000h.
There are legacy and extended forms of the instruction:
PCMPEQW

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPCMPEQW

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PCMPEQW

SSE2

VPCMPEQW 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPCMPEQW 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PCMPEQW xmm1, xmm2/mem128

Opcode

Description

66 0F 75 /r

Compares packed words in xmm1 to packed words in
xmm2 or mem128. Writes results to xmm1.

Mnemonic

Encoding
VEX RXB.map_select

W.vvvv.L.pp

Opcode

VPCMPEQW xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

75 /r

VPCMPEQW ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

75 /r

Related Instructions
(V)PCMPEQB, (V)PCMPEQD, (V)PCMPGTB, (V)PCMPGTD, (V)PCMPGTW

Instruction Reference

PCMPEQW, VPCMPEQW

305

AMD64 Technology

26568—Rev. 3.22—May 2018

rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

306

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PCMPEQW, VPCMPEQW

Instruction Reference

26568—Rev. 3.22—May 2018

PCMPESTRI
VPCMPESTRI

AMD64 Technology

Packed Compare
Explicit Length Strings Return Index

Compares character string data in the first and second source operands. Comparison operations are
carried out as specified by values encoded in the immediate operand. Writes an index to the ECX register.
Source operands are formatted as a packed characters in one of two supported widths: 8 or 16 bits.
Characters may be treated as either signed or unsigned values. Each operand has associated with it a
separate integer value specifying the length of the string.
The absolute value of the data in the EAX/RAX register represents the length of the character string
in the first source operand; the absolute value of the data in the EDX/RDX register represents the
length of the character string in the second source operand.
If the absolute value of the data in either register is greater than the maximum string length that fits in
128 bits, the length is set to the maximum: 8, for 16-bit characters, or 16, for 8-bit characters.
The comparison operations between the two operand strings are summarized in an intermediate
result—a comparison summary bit vector that is post-processed to produce the final output. Data
fields within the immediate byte specify the source data format, comparison type, comparison summary bit vector post-processing, and output option selection.
The index of either the most significant or least significant set bit of the post-processed comparison
summary bit vector is returned in ECX. If no bits are set in the post-processed comparison summary
bit vector, ECX is set to 16 for source operand strings composed of 8-bit characters or 8 for 16-bit
character strings.
See Section 1.5, “String Compare Instructions” for information about source string data format, comparison operations, comparison summary bit vector generation, post-processing, and output selection
options.
The rFLAGS are set to indicate the following conditions:
Flag

Condition

CF

Cleared if the comparison summary bit vector is zero; otherwise set.

PF

cleared.

AF

cleared.

ZF

Set if the specified length of the second string is less than the maximum; otherwise
cleared.

SF

Set if the specified length of the first string is less than the maximum; otherwise
cleared.

OF

Equal to the value of the lsb of the post-processed comparison summary bit vector.

There are legacy and extended forms of the instruction:
PCMPESTRI

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. A result index is written to the ECX register.
VPCMPESTRI

The extended form of the instruction has a 128-bit encoding only.

Instruction Reference

PCMPESTRI, VPCMPESTRI

307

AMD64 Technology

26568—Rev. 3.22—May 2018

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. A result index is written to the ECX register.
Instruction Support
Form

Subset

PCMPESTRI

SSE4.2

VPCMPESTRI

AVX

Feature Flag
CPUID Fn0000_0001_ECX[SSE42] (bit 20)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

PCMPESTRI xmm1, xmm2/mem128, imm8

Description

66 0F 3A 61 /r ib Compares packed string data in xmm1 and
xmm2 or mem128. Writes a result index to
the ECX register.

Mnemonic

Encoding

VPCMPESTRI xmm1, xmm2/mem128, imm8

VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.00011

X.1111.0.01

61 /r ib

Related Instructions
(V)PCMPESTRM, (V)PCMPISTRI, (V)PCMPISTRM
rFLAGS Affected
ID

VIP

VIF

AC

VM

RF

NT

IOPL

OF

DF

IF

TF

M
21
Note:

20

19

18

17

16

14

13

12

11

10

9

8

SF

ZF

AF

PF

CF

M

M

0

0

M

7

6

4

2

0

Bits 31:22, 15, 5, 3, and 1 are reserved. A flag that is set or cleared is M (modified). Unaffected flags are blank.
Undefined flags are U.

MXCSR Flags Affected
None

308

PCMPESTRI, VPCMPESTRI

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

S

S

A
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

X
S
S
A
A
A
A
A
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

PCMPESTRI, VPCMPESTRI

309

AMD64 Technology

PCMPESTRM
VPCMPESTRM

26568—Rev. 3.22—May 2018

Packed Compare
Explicit Length Strings Return Mask

Compares character string data in the first and second source operands. Comparison operations are
carried out as specified by values encoded in the immediate operand. Writes a mask value to the
YMM0/XMM0 register.
Source operands are formatted as a packed characters in one of two supported widths: 8 or 16 bits.
Characters may be treated as either signed or unsigned values. Each operand has associated with it a
separate integer value specifying the length of the string.
The absolute value of the data in the EAX/RAX register represents the length of the character string
in the first source operand; the absolute value of the data in the EDX/RDX register represents the
length of the character string in the second source operand.
If the absolute value of the data in either register is greater than the maximum string length that fits in
128 bits, the length is set to the maximum: 8, for 16-bit characters, or 16, for 8-bit characters.
The comparison operations between the two operand strings are summarized in an intermediate
result—a comparison summary bit vector that is post-processed to produce the final output. Data
fields within the immediate byte specify the source data format, comparison type, comparison summary bit vector post-processing, and output option selection.
Depending on the output option selected, the post-processed comparison summary bit vector is either
zero-extended to 128 bits or expanded into a byte/word-mask and then written to XMM0.
See Section 1.5, “String Compare Instructions” for information about source string data format, comparison operations, comparison summary bit vector generation, post-processing, and output selection
options.
The rFLAGS are set to indicate the following conditions:
Flag

Condition

CF

Cleared if the comparison summary bit vector is zero; otherwise set.

PF

cleared.

AF

cleared.

ZF

Set if the specified length of the second string is less than the maximum; otherwise
cleared.

SF

Set if the specified length of the first string is less than the maximum; otherwise
cleared.

OF

Equal to the value of the lsb of the post-processed summary bit vector.

There are legacy and extended forms of the instruction:
PCMPESTRM

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The mask result is written to the XMM0 register.
VPCMPESTRM

The extended form of the instruction has a 128-bit encoding only.

310

PCMPESTRM, VPCMPESTRM

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The mask result is written to the XMM0 register. Bits [255:128] of the
YMM0 register are cleared.
Instruction Support
Form

Subset

PCMPESTRM

SSE4.2

VPCMPESTRM

AVX

Feature Flag
CPUID Fn0000_0001_ECX[SSE42] (bit 20)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

PCMPESTRMxmm1, xmm2/mem128, imm8

Description

66 0F 3A 60 /r ib Compares packed string data in xmm1 and
xmm2 or mem128. Writes a mask value to
the XMM0 register.

Mnemonic

Encoding
VEX RXB.map_select

VPCMPESTRM xmm1, xmm2/mem128, imm8

C4

RXB.00011

W.vvvv.L.pp

Opcode

X.1111.0.01

60 /r ib

Related Instructions
(V)PCMPESTRI, (V)PCMPISTRI, (V)PCMPISTRM
rFLAGS Affected
ID

VIP

VIF

AC

VM

RF

NT

IOPL

OF

DF

IF

TF

M
21
Note:

20

19

18

17

16

14

13

12

11

10

9

8

SF

ZF

AF

PF

CF

M

M

0

0

M

7

6

4

2

0

Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set or cleared to 0 is M (modified). Unaffected flags are blank.
Undefined flags are U.

MXCSR Flags Affected
None

Instruction Reference

PCMPESTRM, VPCMPESTRM

311

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

S

S

A
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

312

X
S
S
A
A
A
A
A
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

PCMPESTRM, VPCMPESTRM

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PCMPGTB
VPCMPGTB

Packed Compare Greater Than
Signed Bytes

Compares packed signed byte values in the first source operand to corresponding values in the second
source operand and writes a comparison result to the corresponding byte of the destination.
When a value in the first operand is greater than a value in the second source operand, the result is
FFh; when a value in the first operand is less than or equal to a value in the second operand, the result
is 00h.
There are legacy and extended forms of the instruction:
PCMPGTB

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPCMPGTB

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PCMPGTB

SSE2

VPCMPGTB 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPCMPGTB 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PCMPGTB xmm1, xmm2/mem128

Opcode
66 0F 64 /r

Description
Compares packed bytes in xmm1 to packed bytes in
xmm2 or mem128. Writes results to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPCMPGTB xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

64 /r

VPCMPGTB ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

64 /r

Instruction Reference

PCMPGTB, VPCMPGTB

313

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PCMPEQB, (V)PCMPEQD, (V)PCMPEQW, (V)PCMPGTD, (V)PCMPGTW
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

314

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PCMPGTB, VPCMPGTB

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PCMPGTD
VPCMPGTD

Packed Compare Greater Than
Signed Doublewords

Compares packed signed doubleword values in the first source operand to corresponding values in the
second source operand and writes a comparison result to the corresponding doubleword of the destination.
When a value in the first operand is greater than a value in the second operand, the result is
FFFFFFFFh; when a value in the first operand is less than or equal to a value in the second operand,
the result is 00000000h.
There are legacy and extended forms of the instruction:
PCMPGTD

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPCMPGTD

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PCMPGTD

SSE2

VPCMPGTD 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPCMPGTD 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PCMPGTD xmm1, xmm2/mem128

Opcode
66 0F 66 /r

Description
Compares packed bytes in xmm1 to packed bytes in
xmm2 or mem128. Writes results to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPCMPGTD xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

66 /r

VPCMPGTD ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

66 /r

Instruction Reference

PCMPGTD, VPCMPGTD

315

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PCMPEQB, (V)PCMPEQD, (V)PCMPEQW, (V)PCMPGTB, (V)PCMPGTW
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

316

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PCMPGTD, VPCMPGTD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PCMPGTQ
VPCMPGTQ

Packed Compare Greater Than
Signed Quadwords

Compares packed signed quadword values in the first source operand to corresponding values in the
second source operand and writes a comparison result to the corresponding quadword of the destination.
When a value in the first operand is greater than a value in the second operand, the result is
FFFFFFFFFFFFFFFFh; when a value in the first operand is less than or equal to a value in the second
operand, the result is 0000000000000000h.
There are legacy and extended forms of the instruction:
PCMPGTQ

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPCMPGTQ

The extended form of the instruction has 128-bit and 256-bit encodings:
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PCMPGTQ

SSE4.2

CPUID Fn0000_0001_ECX[SSE42] (bit 20)

VPCMPGTQ 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPCMPGTD 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PCMPGTQ xmm1, xmm2/mem128

Opcode

Description

66 0F 38 37 /r Compares packed bytes in xmm1 to packed bytes in
xmm2 or mem128. Writes results to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPCMPGTQ xmm1, xmm2, xmm3/mem128

C4

RXB.02

X.src1.0.01

37 /r

VPCMPGTQ ymm1, ymm2, ymm3/mem256

C4

RXB.02

X.src1.1.01

37 /r

Instruction Reference

PCMPGTQ, VPCMPGTQ

317

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PCMPEQB, (V)PCMPEQD, (V)PCMPEQW, (V)PCMPGTB, (V)PCMPGTW
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

318

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PCMPGTQ, VPCMPGTQ

Instruction Reference

26568—Rev. 3.22—May 2018

PCMPGTW
VPCMPGTW

AMD64 Technology

Packed Compare Greater Than Signed Words

Compares packed signed word values in the first source operand to corresponding values in the second source operand and writes a comparison result to the corresponding word of the destination.
When a value in the first operand is greater than a value in the second operand, the result is FFFFh;
when a value in the first operand is less than or equal to a value in the second operand, the result is
0000h.
There are legacy and extended forms of the instruction:
PCMPGTW

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPCMPGTW

The extended form of the instruction has 128-bit and 256-bit encodings:
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PCMPGTW

SSE2

VPCMPGTW 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPCMPGTW 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PCMPGTW xmm1, xmm2/mem128

Opcode
66 0F 65 /r

Description
Compares packed bytes in xmm1 to packed bytes in
xmm2 or mem128. Writes results to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPCMPGTW xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

65 /r

VPCMPGTW ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

65 /r

Instruction Reference

PCMPGTW, VPCMPGTW

319

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PCMPEQB, (V)PCMPEQD, (V)PCMPEQW, (V)PCMPGTB, (V)PCMPGTD
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

320

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PCMPGTW, VPCMPGTW

Instruction Reference

26568—Rev. 3.22—May 2018

PCMPISTRI
VPCMPISTRI

AMD64 Technology

Packed Compare
Implicit Length Strings Return Index

Compares character string data in the first and second source operands. Comparison operations are
carried out as specified by values encoded in the immediate operand. Writes an index to the ECX register.
Source operands are formatted as a packed characters in one of two supported widths: 8 or 16 bits.
Characters may be treated as either signed or unsigned values.
Source operand strings shorter than the maximum that can be packed into a 128-bit value are terminated by a null character (value of 0). The characters prior to the null character constitute the string. If
the first (lowest indexed) character is null, the string length is 0.
The comparison operations between the two operand strings are summarized in an intermediate
result—a comparison summary bit vector that is post-processed to produce the final output. Data
fields within the immediate byte specify the source data format, comparison type, comparison summary bit vector post-processing, and output option selection.
The index of either the most significant or least significant set bit of the post-processed comparison
summary bit vector is returned in ECX. If no bits are set in the post-processed comparison summary
bit vector, ECX is set to 16 for source operand strings composed of 8-bit characters or 8 for 16-bit
character strings.
See Section 1.5, “String Compare Instructions” for information about source string data format, comparison operations, comparison summary bit vector generation, post-processing, and output selection
options.
The rFLAGS are set to indicate the following conditions:
Flag

Condition

CF

Cleared if the comparison summary bit vector is zero; otherwise set.

PF

cleared.

AF

cleared.

ZF

Set if any byte (word) in the second operand is null; otherwise cleared.

SF

Set if any byte (word) in the first operand is null; otherwise cleared

OF

Equal to the value of the lsb of the post-processed summary bit vector.

There are legacy and extended forms of the instruction:
PCMPISTRI

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. A result index is written to the ECX register.
VPCMPISTRI

The extended form of the instruction has a 128-bit encoding only.
The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. A result index is written to the ECX register.

Instruction Reference

PCMPISTRI, VPCMPISTRI

321

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Support
Form

Subset

PCMPISTRI

SSE4.2

VPCMPISTRI

AVX

Feature Flag
CPUID Fn0000_0001_ECX[SSE42] (bit 20)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

PCMPISTRI xmm1, xmm2/mem128, imm8

Description

66 0F 3A 63 /r ib Compares packed string data in xmm1 and
xmm2 or mem128.

Mnemonic

Encoding
VEX RXB.map_select

VPCMPISTRI xmm1, xmm2/mem128, imm8

C4

RXB.03

W.vvvv.L.pp

Opcode

X.1111.0.01

63 /r ib

Related Instructions
(V)PCMPESTRI, (V)PCMPESTRM, (V)PCMPISTRM
rFLAGS Affected
ID

VIP

VIF

AC

VM

RF

NT

IOPL

OF

DF

IF

TF

M
21
Note:

20

19

18

17

16

14

13

12

11

10

9

8

SF

ZF

AF

PF

CF

M

M

0

0

M

7

6

4

2

0

Bits 31:22, 15, 5, 3, and 1 are reserved. A flag that is set or cleared is M (modified). Unaffected flags are blank.
Undefined flags are U.

MXCSR Flags Affected
None

322

PCMPISTRI, VPCMPISTRI

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

S

S

A
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

X
S
S
A
A
A
A
A
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

PCMPISTRI, VPCMPISTRI

323

AMD64 Technology

PCMPISTRM
VPCMPISTRM

26568—Rev. 3.22—May 2018

Packed Compare Implicit Length
Strings Return Mask

Compares character string data in the first and second source operands. Comparison operations are
carried out as specified by values encoded in the immediate operand. Writes a mask value to the
YMM0/XMM0 register
Source operands are formatted as a packed characters in one of two supported widths: 8 or 16 bits.
Characters may be treated as either signed or unsigned values.
Source operand strings shorter than the maximum that can be packed into a 128-bit value are terminated by a null character (value of 0). The characters prior to the null character constitute the string. If
the first (lowest indexed) character is null, the string length is 0.
The comparison operations between the two operand strings are summarized in an intermediate
result—a comparison summary bit vector that is post-processed to produce the final output. Data
fields within the immediate byte specify the source data format, comparison type, comparison summary bit vector post-processing, and output option selection.
Depending on the output option selected, the post-processed comparison summary bit vector is either
zero-extended to 128 bits or expanded into a byte/word-mask and then written to XMM0.
See Section 1.5, “String Compare Instructions” for information about source string data format, comparison operations, comparison summary bit vector generation, post-processing, and output selection
options.
The rFLAGS are set to indicate the following conditions:
Flag

Condition

CF

Cleared if the comparison summary bit vector is zero; otherwise set.

PF

cleared.

AF

cleared.

ZF

Set if any byte (word) in the second operand is null; otherwise cleared.

SF

Set if any byte (word) in the first operand is null; otherwise cleared.

OF

Equal to the value of the lsb of the post-processed summary bit vector.

There are legacy and extended forms of the instruction:
PCMPISTRM

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The mask result is written to the XMM0 register.
VPCMPISTRM

The extended form of the instruction has a 128-bit encoding only.
The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The mask result is written to the XMM0 register. Bits [255:128] of the
YMM0 register are cleared.

324

PCMPISTRM, VPCMPISTRM

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Support
Form

Subset

PCMPISTRM

SSE4.2

VPCMPISTRM

AVX

Feature Flag
CPUID Fn0000_0001_ECX[SSE42] (bit 20)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

PCMPISTRM xmm1, xmm2/mem128, imm8

Description

66 0F 3A 62 /r ib Compares packed string data in xmm1 and
xmm2 or mem128. Writes a result or mask
to the XMM0 register.

Mnemonic

Encoding

VPCMPISTRM xmm1, xmm2/mem128, imm8

VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.03

X.1111.0.01

62 /r ib

Related Instructions
(V)PCMPESTRI, (V)PCMPESTRM, (V)PCMPISTRI
rFLAGS Affected
ID

VIP

VIF

AC

VM

RF

NT

IOPL

OF

DF

IF

TF

M
21
Note:

20

19

18

17

16

14

13

12

11

10

9

8

SF

ZF

AF

PF

CF

M

M

0

0

M

7

6

4

2

0

Bits 31:22, 15, 5, 3, and 1 are reserved. A flag that is set or cleared is M (modified). Unaffected flags are blank.
Undefined flags are U.

MXCSR Flags Affected
None

Instruction Reference

PCMPISTRM, VPCMPISTRM

325

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

S

S

A
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

326

X
S
S
A
A
A
A
A
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

PCMPISTRM, VPCMPISTRM

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PEXTRB
VPEXTRB

Extract
Packed Byte

Extracts a byte from a source register and writes it to an 8-bit memory location or to the low-order
byte of a general-purpose register, with zero-extension to 32 or 64 bits. Bits [3:0] of an immediate
byte operand select the byte to be extracted:
Value of imm8 [3:0]

Source Bits Extracted

0000

[7:0]

0001

[15:8]

0010

[23:16]

0011

[31:24]

0100

[39:32]

0101

[47:40]

0110

[55:48]

0111

[63:56]

1000

[71:64]

1001

[79:72]

1010

[87:80]

1011

[95:88]

1100

[103:96]

1101

[111:104]

1110

[119:112]

1111

[127:120]

There are legacy and extended forms of the instruction:
PEXTRB

The source operand is an XMM register and the destination is either an 8-bit memory location or the
low-order byte of a general-purpose register. When the destination is a general-purpose register, the
extracted byte is zero-extended to 32 or 64 bits.
VPEXTRB

The extended form of the instruction has a 128-bit encoding only.
The source operand is an XMM register and the destination is either an 8-bit memory location or the
low-order byte of a general-purpose register. When the destination is a general-purpose register, the
extracted byte is zero-extended to 32 or 64 bits.
Instruction Support
Form

Subset

PEXTRB

SSE4.1

VPEXTRB

AVX

Instruction Reference

Feature Flag
CPUID Fn0000_0001_ECX[SSE41] (bit 19)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

PEXTRB, VPEXTRB

327

AMD64 Technology

26568—Rev. 3.22—May 2018

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PEXTRB reg/m8, xmm, imm8

Opcode

Description

66 0F 3A 14 /r ib

Extracts an 8-bit value specified by imm8 from xmm
and writes it to m8 or the low-order byte of a generalpurpose register, with zero-extension.

Mnemonic

Encoding

VPEXTRB reg/mem8, xmm, imm8

VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.03

X.1111.0.01

14 /r ib

Related Instructions
(V)PEXTRD, (V)PEXTRW, (V)PEXTRQ, (V)PINSRB, (V)PINSRD, (V)PINSRW, (V)PINSRQ
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S
S

S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — AVX and SSE exception
A — AVX exception
S — SSE exception

328

S
S

X
S
S
A
A
A
A
A
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Write to a read-only data segment.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

PEXTRB, VPEXTRB

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PEXTRD
VPEXTRD

Extract
Packed Doubleword

Extracts a doubleword from a source register and writes it to an 32-bit memory location or a 32-bit
general-purpose register. Bits [1:0] of an immediate byte operand select the doubleword to be
extracted:
Value of imm8 [1:0]

Source Bits Extracted

00

[31:0]

01

[63:32]

10

[95:64]

11

[127:96]

There are legacy and extended forms of the instruction:
PEXTRD
The encoding is the same as PEXTRQ, with REX.W = 0.
The source operand is an XMM register and the destination is either an 32-bit memory location or a
32-bit general-purpose register.
VPEXTRD
The extended form of the instruction has a 128-bit encoding only.
The encoding is the same as VPEXTRQ, with VEX.W = 0.
The source operand is an XMM register and the destination is either an 32-bit memory location or a
32-bit general-purpose register.
Instruction Support
Form

Subset

PEXTRD

SSE4.1

VPEXTRD

AVX

Feature Flag
CPUID Fn0000_0001_ECX[SSE41] (bit 19)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

Description

PEXTRD reg32/mem32, xmm, imm8 66 (W0) 0F 3A 16 /r ib Extracts a 32-bit value specified by imm8 from
xmm and writes it to mem32 or reg32.
Mnemonic
VPEXTRD reg32/mem32, xmm, imm8

Instruction Reference

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.03

0.1111.0.01

16 /r ib

PEXTRD, VPEXTRD

329

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PEXTRB, (V)PEXTRW, (V)PEXTRQ, (V)PINSRB, (V)PINSRD, (V)PINSRW, (V)PINSRQ
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S
S

S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — AVX and SSE exception
A — AVX exception
S — SSE exception

330

S
S

X
S
S
A
A
A
A
A
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Write to a read-only data segment.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

PEXTRD, VPEXTRD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PEXTRQ
VPEXTRQ

Extract
Packed Quadword

Extracts a quadword from a source register and writes it to an 64-bit memory location or to a 64-bit
general-purpose register. Bit [0] of an immediate byte operand selects the quadword to be extracted:
Value of imm8 [0]

Source Bits Extracted

0

[63:0]

1

[127:64]

There are legacy and extended forms of the instruction:
PEXTRQ

The encoding is the same as PEXTRD, with REX.W = 1.
The source operand is an XMM register and the destination is either an 64-bit memory location or a
64-bit general-purpose register.
VPEXTRQ

The extended form of the instruction has a 128-bit encoding only.
The encoding is the same as VPEXTRD, with VEX.W = 1.
The source operand is an XMM register and the destination is either an 64-bit memory location or a
64-bit general-purpose register.
Instruction Support
Form

Subset

PEXTRD

SSE4.1

VPEXTRD

AVX

Feature Flag
CPUID Fn0000_0001_ECX[SSE41] (bit 19)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

Description

PEXTRQ reg64/mem64, xmm, imm8 66 (W1) 0F 3A 16 /r ib Extracts a 64-bit value specified by imm8 from
xmm and writes it to mem64 or reg64.
Mnemonic
VPEXTRQ reg64/mem64, xmm, imm8

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.03

1.1111.0.01

16 /r ib

Related Instructions
(V)PEXTRB, (V)PEXTRD, (V)PEXTRW, (V)PINSRB, (V)PINSRD, (V)PINSRW, (V)PINSRQ
rFLAGS Affected
None

Instruction Reference

PEXTRQ, VPEXTRQ

331

AMD64 Technology

26568—Rev. 3.22—May 2018

MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S
S

S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — AVX and SSE exception
A — AVX exception
S — SSE exception

332

S
S

X
S
S
A
A
A
A
A
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Write to a read-only data segment.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

PEXTRQ, VPEXTRQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PEXTRW
VPEXTRW

Extract Packed Word

Extracts a word from a source register and writes it to a 16-bit memory location or to the low-order
word of a general-purpose register, with zero-extension to 32 or 64 bits. Bits [3:0] of an immediate
byte operand select the word to be extracted:
Value of imm8 [2:0]

Source Bits Extracted

000

[15:0]

001

[31:16]

010

[47:32

011

[63:48]

100

[79:64]

101

[95:80]

110

[111:96]

111

[127:112]

There are legacy and extended forms of the instruction:
PEXTRW

The legacy form of the instruction has SSE2 and SSE4.1 encodings.
The source operand is an XMM register and the destination is the low-order word of a general-purpose register. The extracted word is zero-extended to 32 or 64 bits.
The source operand is an XMM register and the destination is either an 16-bit memory location or the
low-order word of a general-purpose register. When the destination is a general-purpose register, the
extracted word is zero-extended to 32 or 64 bits.
VPEXTRW

The extended form of the instruction has two 128-bit encodings that correspond to the two legacy
encodings.
The source operand is an XMM register and the destination is the low-order word of a general-purpose register. The extracted word is zero-extended to 32 or 64 bits.
The source operand is an XMM register and the destination is either an 16-bit memory location or the
low-order word of a general-purpose register. When the destination is a general-purpose register, the
extracted word is zero-extended to 32 or 64 bits.
Instruction Support
Form

Subset

PEXTRW reg

SSE2

PEXTRW reg/mem16

SSE4.1

VPEXTRW

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[SSE41] (bit 19)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

Instruction Reference

PEXTRW, VPEXTRW

333

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Encoding
Mnemonic

Opcode

PEXTRW reg, xmm, imm8
PEXTRW reg/m16, xmm, imm8

Description

66 0F C5 /r ib

Extracts a 16-bit value specified by imm8 from xmm
and writes it to the low-order byte of a generalpurpose register, with zero-extension.

66 0F 3A 15 /r ib

Extracts a 16-bit value specified by imm8 from xmm
and writes it to m16 or the low-order byte of a
general-purpose register, with zero-extension.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPEXTRW reg, xmm, imm8

C4

RXB.01

X.1111.0.01

C5 /r ib

VPEXTRW reg/mem16, xmm, imm8

C4

RXB.03

X.1111.0.01

15 /r ib

Related Instructions
(V)PEXTRB, (V)PEXTRD, (V)PEXTRQ, (V)PINSRB, (V)PINSRD, (V)PINSRW, (V)PINSRQ
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S
S

S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — AVX and SSE exception
A — AVX exception
S — SSE exception

334

S
S

X
S
S
A
A
A
A
A
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Write to a read-only data segment.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

PEXTRW, VPEXTRW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PHADDD
VPHADDD

Packed Horizontal Add
Doubleword

Adds adjacent 32-bit signed integers in each of two source operands and packs the sums into the destination. If a sum overflows, the carry is ignored (neither the overflow nor carry bit in rFLAGS is set)
and only the low-order 32 bits of the sum are written in the destination.
Adds the 32-bit signed integer values in bits [63:32] and bits [31:0] of the first source operand and
packs the sum into bits [31:0] of the destination; adds the 32-bit signed integer values in bits [127:96]
and bits [95:64] of the first source operand and packs the sum into bits [63:32] of the destination.
Adds the corresponding values in the second source operand and packs the sums into bits [95:64] and
[127:96] of the destination.
Additionally, for the 256-bit form, adds the 32-bit signed integer values in bits [191:160] and bits
[159:128] of the first source operand and packs the sum into bits [159:128] of the destination; adds
the 32-bit signed integer values in bits [255:224] and bits [223:192] of the first source operand and
packs the sum into bits [191:160] of the destination. Adds the corresponding values in the second
source operand and packs the sums into bits [223:192] and [255:224] of the destination.
There are legacy and extended forms of the instruction:
PHADDD

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination register. Bits [255:128] of
the YMM register that corresponds to the destination not affected.
VPHADDD

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PHADDD

SSSE3

VPHADDD 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPHADDD 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_ECX[SSSE3] (bit 9)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

Instruction Reference

PHADDD, VPHADDD

335

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Encoding
Mnemonic

Opcode

PHADDD xmm1, xmm2/mem128

Description

66 0F 38 02 /r Adds adjacent pairs of signed integers in xmm1 and
xmm2 or mem128. Writes packed sums to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPHADDD xmm1, xmm2, xmm3/mem128

C4

RXB.02

X.src1.0.01

02 /r

VPHADDD ymm1, ymm2, ymm3/mem256

C4

RXB.02

X.src1.1.01

02 /r

Related Instructions
(V)PHADDW, (V)PHADDSW
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

336

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PHADDD, VPHADDD

Instruction Reference

26568—Rev. 3.22—May 2018

PHADDSW
VPHADDSW

AMD64 Technology

Packed Horizontal Add with Saturation
Word

Adds adjacent 16-bit signed integers in each of two source operands, with saturation, and packs the
16-bit signed sums into the destination.
Positive sums greater than 7FFFh are saturated to 7FFFh; negative sums less than 8000h are saturated
to 8000h.
For the 128-bit form of the instruction, the following operations are performed:
dest is the destination register – either an XMM register or the corresponding YMM register.
src1 is the first source operand. src2 is the second source operand.
Ssum() is a function that returns the saturated 16-bit signed sum of its arguments.
dest[15:0] = Ssum(src1[31:16], src1[15:0])
dest[31:16] = Ssum(src1[63:48], src1[47:32])
dest[47:32] = Ssum(src1[95:80], src1[79:64])
dest[63:48] = Ssum(src1[127:112], src1[111:96])
dest[79:64] = Ssum(src2[31:16], src2[15:0])
dest[95:80] = Ssum(src2[63:48], src2[47:32])
dest[111:96] = Ssum(src2[95:80], src2[79:64])
dest[127:112] = Ssum(src2[127:112], src2[111:96])

Additionally, for the 256-bit form of the instruction, the following operations are performed:
dest[143:128] = Ssum(src1[159:144], src1[143:128])
dest[159:144] = Ssum(src1[191:176], src1[175:160])
dest[175:160] = Ssum(src1[223:208], src1[207:192])
dest[191:176] = Ssum(src1[255:240], src1[239:224])
dest[207:192] = Ssum(src2[159:144], src2[143:128])
dest[223:208] = Ssum(src2[191:176], src2[175:160])
dest[239:224] = Ssum(src2[223:208], src2[207:192])
dest[255:240] = Ssum(src2[255:240], src2[239:224])

There are legacy and extended forms of the instruction:
PHADDSW

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPHADDSW

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.

Instruction Reference

PHADDSW, VPHADDSW

337

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Support
Form

Subset

Feature Flag

PHADDSW

SSSE3

VPHADDSW 128-bit

AVX

CPUID Fn0000_0001_ECX[SSSE3] (bit 9)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPHADDSW 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PHADDSW xmm1, xmm2/mem128

Opcode

Description

66 0F 38 03 /r Adds adjacent pairs of signed integers in xmm1 and
xmm2 or mem128, with saturation. Writes packed
sums to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPHADDSW xmm1, xmm2, xmm3/mem128

C4

RXB.02

X.src1.0.01

03 /r

VPHADDSW ymm1, ymm2, ymm3/mem256

C4

RXB.02

X.src1.1.01

03 /r

Related Instructions
(V)PHADDD, (V)PHADDW
rFLAGS Affected
None
MXCSR Flags Affected
None

338

PHADDSW, VPHADDSW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

Instruction Reference

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PHADDSW, VPHADDSW

339

AMD64 Technology

26568—Rev. 3.22—May 2018

PHADDW
VPHADDW

Packed Horizontal Add
Word

Adds adjacent 16-bit signed integers in each of two source operands and packs the 16-bit sums into
the destination. If a sum overflows, the carry is ignored (neither the overflow nor carry bit in rFLAGS
is set).
For the 128-bit form of the instruction, the following operations are performed:
dest is the destination register – either an XMM register or the corresponding YMM register.
src1 is the first source operand. src2 is the second source operand.
dest[15:0] = src1[31:16] + src1[15:0]
dest[31:16] = src1[63:48] + src1[47:32]
dest[47:32] = src1[95:80] + src1[79:64]
dest[63:48] = src1[127:112] + src1[111:96]
dest[79:64] = src2[31:16] + src2[15:0]
dest[95:80] = src2[63:48] + src2[47:32]
dest[111:96] = src2[95:80] + src2[79:64]
dest[127:112] = src2[127:112] + src2[111:96]

Additionally, for the 256-bit form of the instruction, the following operations are performed:
dest[143:128] = src1[159:144] + src1[143:128]
dest[159:144] = src1[191:176] + src1[175:160]
dest[175:160] = src1[223:208] + src1[207:192]
dest[191:176] = src1[255:240] + src1[239:224]
dest[207:192] = src2[159:144] + src2[143:128]
dest[223:208] = src2[191:176] + src2[175:160]
dest[239:224] = src2[223:208] + src2[207:192]
dest[255:240] = src2[255:240] + src2[239:224]

There are legacy and extended forms of the instruction:
PHADDW

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPHADDW

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.YMM Encoding
The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.

340

PHADDW, VPHADDW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Support
Form

Subset

Feature Flag

PHADDW

SSSE3

VPHADDW 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

CPUID Fn0000_0001_ECX[SSSE3] (bit 9)

VPHADDW 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
.

Mnemonic
PHADDW xmm1, xmm2/mem128

Opcode

Description

66 0F 38 01 /r Adds adjacent pairs of signed integers in xmm1 and
xmm2 or mem128. Writes packed sums to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPHADDW xmm1, xmm2, xmm3/mem128

C4

RXB.02

X.src1.0.01

01 /r

VPHADDW ymm1, ymm2, ymm3/mem256

C4

RXB.02

X.src1.1.01

01 /r

Related Instructions
(V)PHADDD, (V)PHADDSW
rFLAGS Affected
None
MXCSR Flags Affected
None

Instruction Reference

PHADDW, VPHADDW

341

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

342

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PHADDW, VPHADDW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PHMINPOSUW
VPHMINPOSUW

Horizontal Minimum and Position

Finds the minimum unsigned 16-bit value in the source operand and copies it to the low order word
element of the destination. Writes the source position index of the value to bits [18:16] of the destination and clears bits[127:19] of the destination.
There are legacy and extended forms of the instruction:
PHMINPOSUW

The source operand is an XMM register or 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not affected.
VPHMINPOSUW

The extended form of the instruction has a 128-bit encoding only.
The source operand is an XMM register or 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

PHMINPOSUW

SSE4.1

VPHMINPOSUW

AVX

Feature Flag
CPUID Fn0000_0001_ECX[SSE41] (bit 19)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

PHMINPOSUW xmm1, xmm2/mem128

Description

66 0F 38 41 /r Finds the minimum unsigned word element in
xmm2 or mem128, copies it to xmm1[15:0]; writes
its position index to xmm1[18:16], and clears
xmm1[127:19].

Mnemonic

Encoding
VEX RXB.map_select

VPHMINPOSUW xmm1, xmm2/mem128

C4

RXB.02

W.vvvv.L.pp

Opcode

X.1111.0.01

41 /r

Related Instructions
(V)PMINSB, (V)PMINSD, (V)PMINSW, (V)PMINUB, (V)PMINUD, (V)PMINUW
rFLAGS Affected
None
MXCSR Flags Affected
None

Instruction Reference

PHMINPOSUW, VPHMINPOSUW

343

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

S

S

A
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

344

X
S
S
A
A
A
A
A
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

PHMINPOSUW, VPHMINPOSUW

Instruction Reference

26568—Rev. 3.22—May 2018

PHSUBD
VPHSUBD

AMD64 Technology

Packed Horizontal Subtract
Doubleword

Subtracts adjacent 32-bit signed integers in each of two source operands and packs the differences
into the destination. The higher-order doubleword of each pair is subtracted from the lower-order
doubleword.
Subtracts the 32-bit signed integer value in bits [63:32] of the first source operand from the 32-bit
signed integer value in bits [31:0] of the first source operand and packs the difference into bits [31:0]
of the destination; subtracts the 32-bit signed integer value in bits [127:96] of the first source operand
from the 32-bit signed integer value in bits [95:64] of the first source operand and packs the difference into bits [63:32] of the destination. Performs the corresponding operations on pairs of 32-bit
signed integer values in the second source operand and packs the differences into bits [95:64] and
[127:96] of the destination.
Additionally, for the 256-bit form, subtracts the 32-bit signed integer value in bits [191:160] of the
first source operand from the 32-bit signed integer value in bits [159:128] of the first source operand
and packs the difference into bits [159:128] of the destination; subtracts the 32-bit signed integer
value in bits [255:224] of the first source operand from the 32-bit integer value in bits [223:192] of
the first source operand and packs the difference into bits [191:160] of the destination. Performs the
corresponding operations on pairs of 32-bit signed integer values in the second source operand and
packs the differences into bits [223:192] and [255:224] of the destination.
There are legacy and extended forms of the instruction:
PHSUBD

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPHSUBD

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PHSUBD

SSSE3

VPHSUBD 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPHSUBD 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_ECX[SSSE3] (bit 9)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

Instruction Reference

PHSUBD, VPHSUBD

345

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Encoding
Mnemonic

Opcode

PHSUBD xmm1, xmm2/mem128

Description

66 0F 38 06 /r Subtracts adjacent pairs of signed integers in xmm1 and
xmm2 or mem128. Writes packed differences to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPHSUBD xmm1, xmm2, xmm3/mem128

C4

RXB.02

X.src1.0.01

06 /r

VPHSUBD ymm1, ymm2, ymm3/mem256

C4

RXB.02

X.src1.1.01

06 /r

Related Instructions
(V)PHSUBW, (V)PHSUBSW
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

346

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PHSUBD, VPHSUBD

Instruction Reference

26568—Rev. 3.22—May 2018

PHSUBSW
VPHSUBSW

AMD64 Technology

Packed Horizontal Subtract with Saturation
Word

Subtracts adjacent 16-bit signed integers in each of two source operands, with saturation, and packs
the differences into the destination. The higher-order word of each pair is subtracted from the lowerorder word.
Positive differences greater than 7FFFh are saturated to 7FFFh; negative differences less than 8000h
are saturated to 8000h.
For the 128-bit form of the instruction, the following operations are performed:
dest is the destination register – either an XMM register or the corresponding YMM register.
src1 is the first source operand. src2 is the second source operand.
Sdiff(A,B) is a function that returns the saturated 16-bit signed difference A − B.
dest[15:0] = Sdiff(src1[15:0], src1[31:16])
dest[31:16] = Sdiff(src1[47:32], src1[63:48])
dest[47:32] = Sdiff(src1[79:64], src1[95:80])
dest[63:48] = Sdiff(src1[111:96], src1[127:112])
dest[79:64] = Sdiff(src2[15:0], src2[31:16])
dest[95:80] = Sdiff(src2[47:32], src2[63:48])
dest[111:96] = Sdiff(src2[79:64], src2[95:80])
dest[127:112] = Sdiff(src2[111:96], src2[127:112])

Additionally, for the 256-bit form of the instruction, the following operations are performed:
dest[143:128] = Sdiff(src1[143:128], src1[159:144])
dest[159:144] = Sdiff(src1[175:160], src1[191:176])
dest[175:160] = Sdiff(src1[207:192], src1[223:208])
dest[191:176] = Sdiff(src1[239:224], src1[255:240])
dest[207:192] = Sdiff(src2[143:128], src2[159:144])
dest[223:208] = Sdiff(src2[175:160], src2[191:176])
dest[239:224] = Sdiff(src2[207:192], src2[223:208])
dest[255:240] = Sdiff(src2[239:224], src2[255:240])

There are legacy and extended forms of the instruction:
PHSUBSW

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPHSUBSW

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.

Instruction Reference

PHSUBSW, VPHSUBSW

347

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Support
Form

Subset

Feature Flag

PHSUBSW

SSSE3

VPHSUBSW 128-bit

AVX

CPUID Fn0000_0001_ECX[SSSE3] (bit 9)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPHSUBSW 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PHSUBSW xmm1, xmm2/mem128

Opcode

Description

66 0F 38 07 /r Subtracts adjacent pairs of signed integers in xmm1
and xmm2 or mem128, with saturation. Writes packed
differences to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPHSUBSW xmm1, xmm2, xmm3/mem128

C4

RXB.02

X.src1.0.01

07 /r

VPHSUBSW ymm1, ymm2, ymm3/mem256

C4

RXB.02

X.src1.1.01

07 /r

Related Instructions
(V)PHSUBD, (V)PHSUBW
rFLAGS Affected
None
MXCSR Flags Affected
None

348

PHSUBSW, VPHSUBSW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

Instruction Reference

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PHSUBSW, VPHSUBSW

349

AMD64 Technology

26568—Rev. 3.22—May 2018

PHSUBW
VPHSUBW

Packed Horizontal Subtract
Word

Subtracts adjacent 16-bit signed integers in each of two source operands and packs the differences
into a destination. The higher-order word of each pair is subtracted from the lower-order word.
For the 128-bit form of the instruction, the following operations are performed:
dest is the destination register – either an XMM register or the corresponding YMM register.
src1 is the first source operand. src2 is the second source operand.
dest[15:0] = src1[15:0] − src1[31:16
dest[31:16] = src1[47:32] − src1[63:48]
dest[47:32] = src1[79:64] − src1[95:80]
dest[63:48] = src1[111:96] − src1[127:112]
dest[79:64] = src2[15:0] − src2[31:16]
dest[95:80] = src2[47:32] − src2[63:48]
dest[111:96] = src2[79:64] − src2[95:80]
dest[127:112] = src2[111:96] − src2[127:112]

Additionally, for the 256-bit form of the instruction, the following operations are performed:
dest[143:128] = src1[143:128] − src1[159:144]
dest[159:144] = src1[175:160] − src1[191:176]
dest[175:160] = src1[207:192] − src1[223:208]
dest[191:176] = src1[239:224] − src1[255:240]
dest[207:192] = src2[143:128] − src2[159:144]
dest[223:208] = src2[175:160] − src2[191:176]
dest[239:224] = src2[207:192] − src2[223:208]
dest[255:240] = src2[239:224] − src2[255:240]

There are legacy and extended forms of the instruction:
PHSUBW

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination register. Bits [255:128] of
the YMM register that corresponds to the destination are not affected.
VPHSUBW

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.

350

PHSUBW, VPHSUBW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Support
Form

Subset

Feature Flag

PHSUBW

SSSE3

VPHSUBW 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

CPUID Fn0000_0001_ECX[SSSE3] (bit 9)

VPHSUBW 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PHSUBW xmm1, xmm2/mem128

Opcode

Description

66 0F 38 05 /r Subtracts adjacent pairs of signed integers in xmm1
and xmm2 or mem128. Writes packed differences to
xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPHSUBW xmm1, xmm2, xmm3/mem128

C4

RXB.02

X.src1.0.01

05 /r

VPHSUBW ymm1, ymm2, ymm3/mem256

C4

RXB.02

X.src1.1.01

05 /r

Related Instructions
(V)PHSUBD, (V)PHSUBW
rFLAGS Affected
None
MXCSR Flags Affected
None

Instruction Reference

PHSUBW, VPHSUBW

351

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

352

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PHSUBW, VPHSUBW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PINSRB
VPINSRB

Packed Insert
Byte

Inserts a byte from an 8-bit memory location or the low-order byte of a 32-bit general-purpose register into a destination register. Bits [3:0] of an immediate byte operand select the location where the
byte is to be inserted:
Value of imm8 [3:0]

Insertion Location

0000

[7:0]

0001

[15:8]

0010

[23:16]

0011

[31:24]

0100

[39:32]

0101

[47:40]

0110

[55:48]

0111

[63:56]

1000

[71:64]

1001

[79:72]

1010

[87:80]

1011

[95:88]

1100

[103:96]

1101

[111:104]

1110

[119:112]

1111

[127:120]

There are legacy and extended forms of the instruction:
PINSRB

The source operand is either an 8-bit memory location or the low-order byte of a 32-bit general-purpose register and the destination an XMM register. The other bytes of the destination are not affected.
Bits [255:128] of the YMM register that corresponds to the destination are not affected.
VPINSRB

The extended form of the instruction has a 128-bit encoding only.
There are two source operands. The first source operand is either an 8-bit memory location or the
low-order byte of a 32-bit general-purpose register and the second source operand is an XMM register. The destination is a second XMM register. All the bytes of the second source other than the byte
that corresponds to the location of the inserted byte are copied to the destination. Bits [255:128] of the
YMM register that corresponds to destination are cleared.

Instruction Reference

PINSRB, VPINSRB

353

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Support
Form

Subset

PINSRB

SSE4.1

VPINSRB

AVX

Feature Flag
CPUID Fn0000_0001_ECX[SSE41] (bit 19)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PINSRB xmm, reg32/mem8, imm8

Opcode
66 0F 3A 20 /r ib

Description
Inserts an 8-bit value selected by imm8 from the
low-order byte of reg32 or from mem8 into xmm.

Mnemonic

Encoding
VEX RXB.map_select

VPINSRB xmm, reg/mem8, xmm, imm8

C4

RXB.03

W.vvvv.L.pp

Opcode

X.1111.0.01

20 /r ib

Related Instructions
(V)PEXTRB, (V)PEXTRD, (V)PEXTRQ, (V)PEXTRW, (V)PINSRD, (V)PINSRQ, (V)PINSRW
rFLAGS Affected
None
MXCSR Flags Affected
None

354

PINSRB, VPINSRB

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

S
S

X
S
S
A
A
A
A
A
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

PINSRB, VPINSRB

355

AMD64 Technology

26568—Rev. 3.22—May 2018

PINSRD
VPINSRD

Packed Insert
Doubleword

Inserts a doubleword from a 32-bit memory location or a 32-bit general-purpose register into a destination register. Bits [1:0] of an immediate byte operand select the location where the doubleword is to
be inserted:
Value of imm8 [1:0]

Insertion Location

00

[31:0]

01

[63:32]

10

[95:64]

11

[127:96]

There are legacy and extended forms of the instruction:
PINSRD

The encoding is the same as PINSRQ, with REX.W = 0.
The source operand is either a 32-bit memory location or a 32-bit general-purpose register and the
destination an XMM register. The other doublewords of the destination are not affected. Bits
[255:128] of the YMM register that corresponds to the destination are not affected.
VPINSRD

The extended form of the instruction has a 128-bit encoding only.
The encoding is the same as VPINSRQ, with VEX.W = 0.
There are two source operands. The first source operand is either a 32-bit memory location or a 32-bit
general-purpose register and the second source operand is an XMM register. The destination is a second XMM register. All the doublewords of the second source other than the doubleword that corresponds to the location of the inserted doubleword are copied to the destination. Bits [255:128] of the
YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

PINSRD

SSE4.1

VPINSRD

AVX

Feature Flag
CPUID Fn0000_0001_ECX[SSE41] (bit 19)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

356

PINSRD, VPINSRD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic

Opcode

PINSRD xmm, reg32/mem32, imm8

Description

66 (W0) 0F 3A 22 /r ib Inserts a 32-bit value selected by imm8 from
reg32 or mem32 into xmm.

Mnemonic

Encoding

VPINSRD xmm, reg32/mem32, xmm, imm8

VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.03

0.1111.0.01

22 /r ib

Related Instructions
(V)PEXTRB, (V)PEXTRD, (V)PEXTRQ, (V)PEXTRW, (V)PINSRB, (V)PINSRQ, (V)PINSRW
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

S
S

X
S
S
A
A
A
A
A
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

PINSRD, VPINSRD

357

AMD64 Technology

26568—Rev. 3.22—May 2018

PINSRQ
VPINSRQ

Packed Insert
Quadword

Inserts a quadword from a 64-bit memory location or a 64-bit general-purpose register into a destination register. Bit [0] of an immediate byte operand selects the location where the doubleword is to be
inserted:
Value of imm8 [0]

Insertion Location

0

[63:0]

1

[127:64]

There are legacy and extended forms of the instruction:
PINSRQ

The encoding is the same as PINSRD, with REX.W = 1.
The source operand is either a 64-bit memory location or a 64-bit general-purpose register and the
destination an XMM register. The other quadwords of the destination are not affected. Bits [255:128]
of the YMM register that corresponds to the destination are not affected.
VPINSRQ

The extended form of the instruction has a 128-bit encoding only.
The encoding is the same as VPINSRD, with VEX.W = 1.
There are two source operands. The first source operand is either a 64-bit memory location or a 64-bit
general-purpose register and the second source operand is an XMM register. The destination is a second XMM register. All the quadwords of the second source other than the quadword that corresponds
to the location of the inserted quadword are copied to the destination. Bits [255:128] of the YMM register that corresponds to the destination XMM registers are cleared.
Instruction Support
Form

Subset

PINSRQ

SSE4.1

VPINSRQ

AVX

Feature Flag
CPUID Fn0000_0001_ECX[SSE41] (bit 19)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PINSRQ xmm, reg64/mem64, imm8

Opcode

Description

66 (W1) 0F 3A 22 /r ib

Inserts a 64-bit value selected by imm8 from
reg64 or mem64 into xmm.

Mnemonic

Encoding
VEX RXB.map_select

VPINSRQ xmm, reg64/mem64, xmm, imm8

358

C4

PINSRQ, VPINSRQ

RXB.03

W.vvvv.L.pp

Opcode

1.1111.0.01

22 /r ib

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Related Instructions
(V)PEXTRB, (V)PEXTRD, (V)PEXTRQ, (V)PEXTRW, (V)PINSRB, (V)PINSRD, (V)PINSRW
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

S
S

X
S
S
A
A
A
A
A
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

PINSRQ, VPINSRQ

359

AMD64 Technology

26568—Rev. 3.22—May 2018

PINSRW
VPINSRW

Packed Insert Word

Inserts a word from a 16-bit memory location or the low-order word of a 32-bit general-purpose register into a destination register. Bits [2:0] of an immediate byte operand select the location where the
byte is to be inserted:
Value of imm8 [2:0]

Insertion Location

000

[15:0]

001

[31:16]

010

[47:32

011

[63:48]

100

[79:64]

101

[95:80]

110

[111:96]

111

[127:112]

There are legacy and extended forms of the instruction:
PINSRW

The source operand is either a 16-bit memory location or the low-order word of a 32-bit general-purpose register and the destination an XMM register. The other words of the destination are not
affected. Bits [255:128] of the YMM register that corresponds to the destination are not affected.
VPINSRW

The extended form of the instruction has a 128-bit encoding only.
There are two source operands. The first source operand is either a 16-bit memory location or the
low-order word of a 32-bit general-purpose register and the second source operand is an XMM register. The destination is an XMM register. All the words of the second source other than the word that
corresponds to the location of the inserted word are copied to the destination. Bits [255:128] of the
YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

Feature Flag

PINSRW

SSE1

CPUID Fn0000_0001_EDX[SSE] (bit 25)

VPINSRW

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

360

PINSRW, VPINSRW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic
PINSRW xmm, reg32/mem16, imm8

Opcode

Description

66 0F C4 /r ib

Inserts a 16-bit value selected by imm8 from the
low-order word of reg32 or from mem16 into xmm.

Mnemonic

Encoding

VPINSRW xmm, reg32/mem16, xmm, imm8

VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.01

X.1111.0.01

C4 /r ib

Related Instructions
(V)PEXTRB, (V)PEXTRD, (V)PEXTRQ, (V)PEXTRW, (V)PINSRB, (V)PINSRD, (V)PINSRQ
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

S
S

X
S
S
A
A
A
A
A
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

PINSRW, VPINSRW

361

AMD64 Technology

PMADDUBSW
VPMADDUBSW

26568—Rev. 3.22—May 2018

Packed Multiply and Add
Unsigned Byte to Signed Word

Multiplies and adds sets of two packed 8-bit unsigned values from the first source operand and two
packed 8-bit signed values from the second source operand, with signed saturation; writes eight 16-bit
sums to the destination.
For the 128-bit form of the instruction, the following operations are performed:
dest is the destination register – either an XMM register or the corresponding YMM register.
src1 is the first source operand. src2 is the second source operand.
Ssum() is a function that returns the saturated 16-bit signed sum of its arguments.
dest[15:0] = Ssum(src1[7:0] * src2[7:0], src1[15:8] * src2[15:8])
dest[31:16] = Ssum(src1[23:16] * src2[23:16], src1[31:24] * src2[31:24])
dest[47:32] = Ssum(src1[39:32] * src2[39:32], src1[47:40] * src2[47:40])
dest[63:48] = Ssum(src1[55:48] * src2[55:48], src1[63:56] * src2[63:56])
dest[79:64] = Ssum(src1[71:64] * src2[71:64], src1[79:72] * src2[79:72])
dest[95:80] = Ssum(src1[87:80] * src2[87:80], src1[95:88] * src2[95:88])
dest[111:96] = Ssum(src1[103:96] * src2[103:96]], src1[111:104] * src2[111:104])
dest[127:112] = Ssum(src1[119:112] * src2[119:112], src1[127:120] * src2[127:120])

Additionally, for the 256-bit form of the instruction, the following operations are performed:
dest[143:128] = Ssum(src1[135:128] * src2[135:128], src1[143:136] * src2[143:136])
dest[159:144] = Ssum(src1[151:144] * src2[151:144], src1[159:152] * src2[159:152])
dest[175:160] = Ssum(src1[167:160] * src2[167:160], src1[175:168] * src2[175:168])
dest[191:176] = Ssum(src1[183:176] * src2[183:176], src1[191:184] * src2[191:184])
dest[207:192] = Ssum(src1[199:192] * src2[199:192], src1[207:200] * src2[207:200])
dest[223:208] = Ssum(src1[215:208] * src2[215:208], src1[223:216] * src2[223:216])
dest[239:224] = Ssum(src1[231:224] * src2[231:224], src1[239:232] * src2[239:232])
dest[255:240] = Ssum(src1[247:240] * src2[247:240], src1[255:248] * src2[255:248])

There are legacy and extended forms of the instruction:
PMADDUBSW

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPMADDUBSW

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.

362

PMADDUBSW, VPMADDUBSW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Support
Form

Subset

Feature Flag

PMADDUBSW

SSSE3

VPMADDUBSW 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

CPUID Fn0000_0001_ECX[SSSE3] (bit 9)

VPMADDUBSW 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

Description

PMADDUBSW xmm1, xmm2/mem128 66 0F 38 04 /r Multiplies packed 8-bit unsigned values in xmm1 and
packed 8-bit signed values xmm2 / mem128, adds
the products, and writes saturated sums to xmm1.
Mnemonic

Encoding
VEX RXB.map_select

W.vvvv.L.pp

Opcode

VPMADDUBSW xmm1, xmm2, xmm3/mem128

C4

RXB.02

X.src1.0.01

04 /r

VPMADDUBSW ymm1, ymm2, ymm3/mem256

C4

RXB.02

X.src1.1.01

04 /r

Related Instructions
(V)PMADDWD
rFLAGS Affected
None
MXCSR Flags Affected
None

Instruction Reference

PMADDUBSW, VPMADDUBSW

363

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

364

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PMADDUBSW, VPMADDUBSW

Instruction Reference

26568—Rev. 3.22—May 2018

PMADDWD
VPMADDWD

AMD64 Technology

Packed Multiply and Add
Word to Doubleword

Multiplies and adds sets of four packed 16-bit signed values from two source registers; writes four
32-bit sums to the destination.
For the 128-bit form of the instruction, the following operations are performed:
dest is the destination register – either an XMM register or the corresponding YMM register.
src1 is the first source operand. src2 is the second source operand.
dest[31:0] = (src1[15:0] * src2[15:0]) + (src1[31:16] * src2[31:16])
dest[63:32] = (src1[47:32] * src2[47:32]) + (src1[63:48] * src2[63:48])
dest[95:64] = (src1[79:64] * src2[79:64]) + (src1[95:80] * src2[95:80])
dest[127:96] = (src1[111:96] * src2[111:96]) + (src1[127:112] * src2[127:112])

Additionally, for the 256-bit form of the instruction, the following operations are performed:
dest[159:128] = (src1[143:128] * src2[143:128]) + (src1[159:144] * src2[159:144])
dest[191:160] = (src1[175:160] * src2[175:160]) + (src1[191:176] * src2[191:176])
dest[223:192] = (src1[207:192] * src2[207:192]) + (src1[223:208] * src2[223:208])
dest[255:224] = (src1[239:224] * src2[239:224]) + (src1[255:240] * src2[255:240])

When all four of the signed 16-bit source operands in a set have the value 8000h, the 32-bit overflow
wraps around to 8000_0000h. There are no other overflow cases.
There are legacy and extended forms of the instruction:
PMADDWD

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPMADDWD

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PMADDWD

SSE2

VPMADDWD 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPMADDWD 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

Instruction Reference

PMADDWD, VPMADDWD

365

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Encoding
Mnemonic
PMADDWD xmm1, xmm2/mem128

Opcode

Description

66 0F F5 /r

Multiplies packed 16-bit signed values in xmm1 and
xmm2 or mem128, adds the products, and writes the
sums to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPMADDWD xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

F5 /r

VPMADDWD ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

F5 /r

Related Instructions
(V)PMADDUBSW, (V)PMULHUW, (V)PMULHW, (V)PMULLW, (V)PMULUDQ
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

366

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PMADDWD, VPMADDWD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PMAXSB
VPMAXSB

Packed Maximum
Signed Bytes

Compares each packed 8-bit signed integer value of the first source operand to the corresponding
value of the second source operand and writes the numerically greater value into the corresponding
byte of the destination.
The 128-bit form of the instruction compares 16 pairs of 8-bit signed integer values; the 256-bit form
compares 32 pairs.
There are legacy and extended forms of the instruction:
PMAXSB

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPMAXSB

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM
register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PMAXSB

SSE4.1

VPMAXSB 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPMAXSB 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_ECX[SSE41] (bit 19)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PMAXSB xmm1, xmm2/mem128

Opcode

Description

66 0F 38 3C /r

Compares 16 pairs of packed 8-bit values in xmm1 and
xmm2 or mem128 and writes the greater values to the
corresponding positions in xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPMAXSB xmm1, xmm2, xmm3/mem128

C4

RXB.02

X.src1.0.01

3C /r

VPMAXSB ymm1, ymm2, ymm3/mem256

C4

RXB.02

X.src1.1.01

3C /r

Instruction Reference

PMAXSB, VPMAXSB

367

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PMAXSD, (V)PMAXSW, (V)PMAXUB, (V)PMAXUD, (V)PMAXUW
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

368

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PMAXSB, VPMAXSB

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PMAXSD
VPMAXSD

Packed Maximum
Signed Doublewords

Compares each packed 32-bit signed integer value of the first source operand to the corresponding
value of the second source operand and writes the numerically greater value into the corresponding
doubleword of the destination.
The 128-bit form of the instruction compares four pairs of 32-bit signed integer values; the 256-bit
form compares eight.
There are legacy and extended forms of the instruction:
PMAXSD

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPMAXSD

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM
register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PMAXSD

SSE4.1

VPMAXSD 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPMAXSD 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_ECX[SSE41] (bit 19)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PMAXSD xmm1, xmm2/mem128

Opcode

Description

66 0F 38 3D /r

Compares four pairs of packed 32-bit values in xmm1
and xmm2 or mem128 and writes the greater values to
the corresponding positions in xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPMAXSD xmm1, xmm2, xmm3/mem128

C4

RXB.02

X.src1.0.01

3D /r

VPMAXSD ymm1, ymm2, ymm3/mem256

C4

RXB.02

X.src1.1.01

3D /r

Instruction Reference

PMAXSD, VPMAXSD

369

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PMAXSB, (V)PMAXSW, (V)PMAXUB, (V)PMAXUD, (V)PMAXUW
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

370

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PMAXSD, VPMAXSD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PMAXSW
VPMAXSW

Packed Maximum
Signed Words

Compares each packed 16-bit signed integer value of the first source operand to the corresponding
value of the second source operand and writes the numerically greater value into the corresponding
word of the destination.
The 128-bit form of the instruction compares eight pairs of 16-bit signed integer values; the 256-bit
form compares 16 pairs.
There are legacy and extended forms of the instruction:
PMAXSW

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPMAXSW

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM
register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PMAXSW

SSE2

VPMAXSW 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPMAXSW 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PMAXSW xmm1, xmm2/mem128

Opcode
66 0F EE /r

Description
Compares eight pairs of packed 16-bit values in xmm1
and xmm2 or mem128 and writes the greater values to
the corresponding positions in xmm1.

Mnemonic

Encoding
W.vvvv.L.pp

Opcode

VPMAXSW xmm1, xmm2, xmm3/mem128

VEX RXB.map_select
C4

RXB.01

X.src1.0.01

EE /r

VPMAXSW ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

EE /r

Instruction Reference

PMAXSW, VPMAXSW

371

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PMAXSB, (V)PMAXSD, (V)PMAXUB, (V)PMAXUD, (V)PMAXUW
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

372

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PMAXSW, VPMAXSW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PMAXUB
VPMAXUB

Packed Maximum
Unsigned Bytes

Compares each packed 8-bit unsigned integer value of the first source operand to the corresponding
value of the second source operand and writes the numerically greater value into the corresponding
byte of the destination.
The 128-bit form of the instruction compares 16 pairs of 8-bit unsigned integer values; the 256-bit
form compares 32 pairs.
There are legacy and extended forms of the instruction:
PMAXUB

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPMAXUB

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM
register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PMAXUB

SSE2

VPMAXUB 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPMAXUB 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PMAXUB xmm1, xmm2/mem128

Opcode

Description

66 0F DE /r Compares 16 pairs of packed unsigned 8-bit values in
xmm1 and xmm2 or mem128 and writes the greater
values to the corresponding positions in xmm1.

Mnemonic

Encoding
W.vvvv.L.pp

Opcode

VPMAXUB xmm1, xmm2, xmm3/mem128

VEX RXB.map_select
C4

RXB.01

X.src1.0.01

DE /r

VPMAXUB ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

DE /r

Instruction Reference

PMAXUB, VPMAXUB

373

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PMAXSB, (V)PMAXSD, (V)PMAXSW, (V)PMAXUD, (V)PMAXUW
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

None

374

PMAXUB, VPMAXUB

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PMAXUD
VPMAXUD

Packed Maximum
Unsigned Doublewords

Compares each packed 32-bit unsigned integer value of the first source operand to the corresponding
value of the second source operand and writes the numerically greater value into the corresponding
doubleword of the destination.
The 128-bit form of the instruction compares four pairs of 32-bit unsigned integer values; the 256-bit
form compares eight.
There are legacy and extended forms of the instruction:
PMAXUD

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPMAXUD

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM
register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PMAXUD

SSE4.1

VPMAXUD 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPMAXUD 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_ECX[SSE41] (bit 19)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PMAXUD xmm1, xmm2/mem128

Opcode

Description

66 0F 38 3F /r Compares four pairs of packed unsigned 32-bit values
in xmm1 and xmm2 or mem128 and writes the greater
values to the corresponding positions in xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPMAXUD xmm1, xmm2, xmm3/mem128

C4

RXB.02

X.src1.0.01

3F /r

VPMAXUD ymm1, ymm2, ymm3/mem256

C4

RXB.02

X.src1.1.01

3F /r

Instruction Reference

PMAXUD, VPMAXUD

375

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PMAXSB, (V)PMAXSD, (V)PMAXSW, (V)PMAXUB, (V)PMAXUW
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

376

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PMAXUD, VPMAXUD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PMAXUW
VPMAXUW

Packed Maximum
Unsigned Words

Compares each packed 16-bit unsigned integer value of the first source operand to the corresponding
value of the second source operand and writes the numerically greater value into the corresponding
word of the destination.
The 128-bit form of the instruction compares eight pairs of 16-bit unsigned integer values; the 256-bit
form compares 16 pairs.
There are legacy and extended forms of the instruction:
PMAXUW

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPMAXUW

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM
register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PMAXUW

SSE4.1

VPMAXUW 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPMAXUW 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_ECX[SSE41] (bit 19)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PMAXUW xmm1, xmm2/mem128

Opcode

Description

66 0F 38 3E /r Compares eight pairs of packed unsigned 16-bit values
in xmm1 and xmm2 or mem128 and writes the greater
values to the corresponding positions in xmm1.

Mnemonic

Encoding
W.vvvv.L.pp

Opcode

VPMAXUW xmm1, xmm2, xmm3/mem128

VEX RXB.map_select
C4

RXB.02

X.src1.0.01

3E /r

VPMAXUW ymm1, ymm2, ymm3/mem256

C4

RXB.02

X.src1.1.01

3E /r

Instruction Reference

PMAXUW, VPMAXUW

377

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PMAXSB, (V)PMAXSD, (V)PMAXSW, (V)PMAXUB, (V)PMAXUD
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

378

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PMAXUW, VPMAXUW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PMINSB
VPMINSB

Packed Minimum
Signed Bytes

Compares each packed 8-bit signed integer value of the first source operand to the corresponding
value of the second source operand and writes the numerically lesser value into the corresponding
byte of the destination.
The 128-bit form of the instruction compares 16 pairs of 8-bit signed integer values; the 256-bit form
compares 32 pairs.
There are legacy and extended forms of the instruction:
PMINSB

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPMINSB

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM
register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PMINSB

SSE4.1

VPMINSB 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPMINSB 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_ECX[SSE41] (bit 19)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PMINSB xmm1, xmm2/mem128

Opcode

Description

66 0F 38 38 /r Compares 16 pairs of packed 8-bit values in xmm1 and
xmm2 or mem128 and writes the lesser values to the
corresponding positions in xmm1

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPMINSB xmm1, xmm2, xmm3/mem128

C4

RXB.02

X.src1.0.01

38 /r

VPMINSB ymm1, ymm2, ymm3/mem256

C4

RXB.02

X.src1.1.01

38 /r

Instruction Reference

PMINSB, VPMINSB

379

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PMINSD, (V)PMINSW, (V)PMINUB, (V)PMINUD, (V)PMINUW
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

380

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PMINSB, VPMINSB

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PMINSD
VPMINSD

Packed Minimum
Signed Doublewords

Compares each packed 32-bit signed integer value of the first source operand to the corresponding
value of the second source operand and writes the numerically lesser value into the corresponding
doubleword of the destination.
The 128-bit form of the instruction compares four pairs of 32-bit signed integer values; the 256-bit
form compares eight.
There are legacy and extended forms of the instruction:
PMINSD

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPMINSD

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM
register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PMINSD

SSE4.1

VPMINSD 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPMINSD 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_ECX[SSE41] (bit 19)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PMINSD xmm1, xmm2/mem128

Opcode

Description

66 0F 38 39 /r Compares four pairs of packed 32-bit values in xmm1
and xmm2 or mem128 and writes the lesser values to
the corresponding positions in xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPMINSD xmm1, xmm2, xmm3/mem128

C4

RXB.02

X.src1.0.01

39 /r

VPMINSD ymm1, ymm2, ymm3/mem256

C4

RXB.02

X.src1.1.01

39 /r

Instruction Reference

PMINSD, VPMINSD

381

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PMINSB, (V)PMINSW, (V)PMINUB, (V)PMINUD, (V)PMINUW
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

382

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PMINSD, VPMINSD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PMINSW
VPMINSW

Packed Minimum Signed Words

Compares each packed 16-bit signed integer value of the first source operand to the corresponding
value of the second source operand and writes the numerically lesser value into the corresponding
word of the destination.
The 128-bit form of the instruction compares eight pairs of 16-bit signed integer values; the 256-bit
form compares 16 pairs.
There are legacy and extended forms of the instruction:
PMINSW

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPMINSW

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM
register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PMINSW

SSE2

VPMINSW 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPMINSW 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PMINSW xmm1, xmm2/mem128

Opcode

Description

66 0F EA /r Compares eight pairs of packed 16-bit values in xmm1
and xmm2 or mem128 and writes the lesser values to the
corresponding positions in xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPMINSW xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

EA /r

VPMINSW ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

EA /r

Instruction Reference

PMINSW, VPMINSW

383

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PMINSB, (V)PMINSD, (V)PMINUB, (V)PMINUD, (V)PMINUW
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

384

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PMINSW, VPMINSW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PMINUB
VPMINUB

Packed Minimum
Unsigned Bytes

Compares each packed 8-bit unsigned integer value of the first source operand to the corresponding
value of the second source operand and writes the numerically lesser value into the corresponding
byte of the destination.
The 128-bit form of the instruction compares 16 pairs of 8-bit unsigned integer values; the 256-bit
form compares 32 pairs.
There are legacy and extended forms of the instruction:
PMINUB

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPMINUB

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM
register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PMINUB

SSE2

VPMINUB 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPMINUB 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PMINUB xmm1, xmm2/mem128

Opcode
66 0F DA /r

Description
Compares 16 pairs of packed unsigned 8-bit values in
xmm1 and xmm2 or mem128 and writes the lesser
values to the corresponding positions in xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPMINUB xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

DA /r

VPMINUB ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

DA /r

Instruction Reference

PMINUB, VPMINUB

385

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PMINSB, (V)PMINSD, (V)PMINSW, (V)PMINUD, (V)PMINUW
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

386

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PMINUB, VPMINUB

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PMINUD
VPMINUD

Packed Minimum
Unsigned Doublewords

Compares each packed 32-bit unsigned integer value of the first source operand to the corresponding
value of the second source operand and writes the numerically lesser value into the corresponding
doubleword of the destination.
The 128-bit form of the instruction compares four pairs of 32-bit unsigned integer values; the 256-bit
form compares eight.
There are legacy and extended forms of the instruction:
PMINUD

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPMINUD

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM
register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PMINUD

SSE4.1

VPMINUD 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPMINUD 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_ECX[SSE41] (bit 19)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PMINUD xmm1, xmm2/mem128

Opcode

Description

66 0F 38 3B /r Compares four pairs of packed unsigned 32-bit values
in xmm1 and xmm2 or mem128 and writes the lesser
values to the corresponding positions in xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPMINUD xmm1, xmm2, xmm3/mem128

C4

RXB.02

X.src1.0.01

3B /r

VPMINUD ymm1, ymm2, ymm3/mem256

C4

RXB.02

X.src1.1.01

3B /r

Instruction Reference

PMINUD, VPMINUD

387

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PMINSB, (V)PMINSD, (V)PMINSW, (V)PMINUB, (V)PMINUW
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

388

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PMINUD, VPMINUD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PMINUW
VPMINUW

Packed Minimum Unsigned Words

Compares each packed 16-bit unsigned integer value of the first source operand to the corresponding
value of the second source operand and writes the numerically lesser value into the corresponding
word of the destination.
The 128-bit form of the instruction compares eight pairs of 16-bit unsigned integer values; the 256-bit
form compares 16 pairs.
There are legacy and extended forms of the instruction:
PMINUW

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPMINUW

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM
register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PMINUW

SSE4.1

VPMINUW 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPMINUW 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_ECX[SSE41] (bit 19)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PMINUW xmm1, xmm2/mem128

Opcode

Description

66 0F 38 3A /r Compares eight pairs of packed unsigned 16-bit values
in xmm1 and xmm2 or mem128 and writes the lesser
values to the corresponding positions in xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPMINUW xmm1, xmm2, xmm3/mem128

C4

RXB.02

X.src1.0.01

3A /r

VPMINUW ymm1, ymm2, ymm3/mem256

C4

RXB.02

X.src1.1.01

3A /r

Instruction Reference

PMINUW, VPMINUW

389

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PMINSB, (V)PMINSD, (V)PMINSW, (V)PMINUB, (V)PMINUD
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

390

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PMINUW, VPMINUW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PMOVMSKB
VPMOVMSKB

Packed Move Mask
Byte

Copies the value of the most-significant bit of each byte element of the source operand to create a 16
or 32 bit mask value, zero-extends the value, and writes it to the destination.
There are legacy and extended forms of the instruction:
PMOVMSKB

The source operand is an XMM register. The destination is a 32-bit general purpose register. The
mask is zero-extended to fill the destination register, the mask occupies bits [15:0].
VPMOVMSKB

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The source operand is an XMM register. The destination is a 64-bit general purpose register. The
mask is zero-extended to fill the destination register, the mask occupies bits [15:0].
YMM Encoding

The source operand is a YMM register. The destination is a 64-bit general purpose register. The mask
is zero-extended to fill the destination register, the mask occupies bits [31:0].
Instruction Support
Form

Subset

Feature Flag

PMOVMSKB

SSE2

VPMOVMSKB 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPMOVMSKB 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PMOVMSKB reg32, xmm1

Opcode
66 0F D7 /r

Description
Moves a zero-extended mask consisting of the mostsignificant bit of each byte in xmm1 to a 32-bit generalpurpose register.

Mnemonic

Encoding
W.vvvv.L.pp

Opcode

VMOVMSKB reg64, xmm1

VEX RXB.map_select
C4

RXB.01

X.1111.0.01

D7 /r

VMOVMSKB reg64, ymm1

C4

RXB.01

X.1111.1.01

D7 /r

Related Instructions
(V)MOVMSKPD, (V)MOVMSKPS

Instruction Reference

PMOVMSKB, VPMOVMSKB

391

AMD64 Technology

26568—Rev. 3.22—May 2018

rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

Invalid opcode, #UD

X
X
Device not available, #NM
S
S
X — SSE, AVX and AVX2 exception
A — AVX, AVX2exception
S — SSE exception

392

X
S
S
A
A
A
A
A
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv field ! = 1111b.
VEX.L field = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.

PMOVMSKB, VPMOVMSKB

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PMOVSXBD
VPMOVSXBD

Packed Move with Sign-Extension
Byte to Doubleword

Sign-extends four or eight packed 8-bit signed integers in the source operand to 32 bits and writes the
packed doubleword signed integers to the destination.
If the source operand is a register, the 8-bit signed integers are taken from the least-significant bytes
of the register.
There are legacy and extended forms of the instruction:
PMOVSXBD

The source operand is either an XMM register or a 32-bit memory location. The destination is an
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not
affected.
VPMOVSXBD

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The source operand is either an XMM register or a 32-bit memory location. The destination is an
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The source operand is either an XMM register or a 64-bit memory location. The destination is a
YMM register.
Instruction Support
Form

Subset

Feature Flag

PMOVSXBD

SSE4.1

CPUID Fn0000_0001_ECX[SSE41] (bit 19)

VPMOVSXBD 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPMOVSXBD 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PMOVSXBD xmm1, xmm2/mem32

Opcode
66 0F 38 21 /r

Description
Sign-extends four packed signed 8-bit
integers in the four low bytes of xmm2 or
mem32 and writes four packed signed
32-bit integers to xmm1.

Mnemonic

Encoding
W.vvvv.L.pp

Opcode

VPMOVSXBD xmm1, xmm2/mem32

VEX RXB.map_select
C4

RXB.02

X.1111.0.01

21 /r

VPMOVSXBD ymm1, xmm2/mem64

C4

RXB.02

X.1111.1.01

21 /r

Instruction Reference

PMOVSXBD, VPMOVSXBD

393

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PMOVSXBQ, (V)PMOVSXBW, (V)PMOVSXDQ, (V)PMOVSXWD, (V)PMOVSXW
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Page fault, #PF
S
Alignment check, #AC
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

394

X
S
S
A
A
A
A
A
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

PMOVSXBD, VPMOVSXBD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PMOVSXBQ
VPMOVSXBQ

Packed Move with Sign Extension
Byte to Quadword

Sign-extends two or four packed 8-bit signed integers in the source operand to 64 bits and writes the
packed quadword signed integers to the destination.
If the source operand is a register, the 8-bit signed integers are taken from the least-significant bytes
of the register.
There are legacy and extended forms of the instruction:
PMOVSXBQ

The source operand is either an XMM register or a 16-bit memory location. The destination is an
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not
affected.
VPMOVSXBQ

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The source operand is either an XMM register or a 16-bit memory location. The destination is an
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The source operand is either an XMM register or a 32-bit memory location. The destination is a
YMM register.
Instruction Support
Form

Subset

Feature Flag

PMOVSXBQ

SSE4.1

CPUID Fn0000_0001_ECX[SSE41] (bit 19)

VPMOVSXBQ 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPMOVSXBQ 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PMOVSXBQ xmm1, xmm2/mem16

Opcode

Description

66 0F 38 22 /r

Sign-extends two packed signed 8-bit
integers in the two low bytes of xmm2
or mem16 and writes two packed
signed 64-bit integers to xmm1.

Mnemonic

Encoding
W.vvvv.L.pp

Opcode

VPMOVSXBQ xmm1, xmm2/mem16

VEX RXB.map_select
C4

RXB.02

X.1111.0.01

22 /r

VPMOVSXBQ ymm1, xmm2/mem32

C4

RXB.02

X.1111.1.01

22 /r

Instruction Reference

PMOVSXBQ, VPMOVSXBQ

395

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PMOVSXBD, (V)PMOVSXBW, (V)PMOVSXDQ, (V)PMOVSXWD, (V)PMOVSXW
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Page fault, #PF
S
Alignment check, #AC
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

396

X
S
S
A
A
A
A
A
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

PMOVSXBQ, VPMOVSXBQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PMOVSXBW
VPMOVSXBW

Packed Move with Sign Extension
Byte to Word

Sign-extends eight or sixteen packed 8-bit signed integers in the source operand to 16 bits and writes
the packed word signed integers to the destination.
If the source operand is a register, the eight 8-bit signed integers are taken from the lower half of the
register.
There are legacy and extended forms of the instruction:
PMOVSXBW
The source operand is either an XMM register or a 64-bit memory location. The destination is an
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not
affected.
VPMOVSXBW
The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The source operand is either an XMM register or a 64-bit memory location. The destination is an
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The source operand is either an XMM register or a 128-bit memory location. The destination is a
YMM register.
Instruction Support
Form

Subset

Feature Flag
CPUID Fn0000_0001_ECX[SSE41] (bit 19)

PMOVSXBW

SSE4.1

VPMOVSXBW 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPMOVSXBW 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PMOVSXBW xmm1, xmm2/mem64

Opcode

Description

66 0F 38 20 /r

Sign-extends eight packed signed 8-bit
integers in the eight low bytes of xmm2 or
mem64 and writes eight packed signed
16-bit integers to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPMOVSXBW xmm1, xmm2/mem64

C4

RXB.02

X.1111.0.01

20 /r

VPMOVSXBW ymm1, xmm2/mem128

C4

RXB.02

X.1111.1.01

20 /r

Instruction Reference

PMOVSXBW, VPMOVSXBW

397

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PMOVSXBD, (V)PMOVSXBQ, (V)PMOVSXDQ, (V)PMOVSXWD, (V)PMOVSXW
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Page fault, #PF
S
Alignment check, #AC
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

398

X
S
S
A
A
A
A
A
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

PMOVSXBW, VPMOVSXBW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PMOVSXDQ
VPMOVSXDQ

Packed Move with Sign-Extension
Doubleword to Quadword

Sign-extends two or four packed 32-bit signed integers in the source operand to 64 bits and writes the
packed quadword signed integers to the destination.
If the source operand is a register, the two 32-bit signed integers are taken from the lower half of the
register.
There are legacy and extended forms of the instruction:
PMOVSXDQ

The source operand is either an XMM register or a 64-bit memory location. The destination is an
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not
affected.
VPMOVSXDQ

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The source operand is either an XMM register or a 64-bit memory location. The destination is an
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The source operand is either an XMM register or a 128-bit memory location. The destination is a
YMM register.
Instruction Support
Form

Subset

Feature Flag

PMOVSXDQ

SSE4.1

CPUID Fn0000_0001_ECX[SSE41] (bit 19)

VPMOVSXDQ 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPMOVSXDQ 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PMOVSXDQ xmm1, xmm2/mem64

Opcode

Description

66 0F 38 25 /r

Sign-extends two packed signed 32-bit
integers in the two low doublewords of
xmm2 or mem64 and writes two packed
signed 64-bit integers to xmm1.

Mnemonic

Encoding
W.vvvv.L.pp

Opcode

VPMOVSXDQ xmm1, xmm2/mem64

VEX RXB.map_select
C4

RXB.02

X.1111.0.01

25 /r

VPMOVSXDQ ymm1, xmm2/mem128

C4

RXB.02

X.1111.1.01

25 /r

Instruction Reference

PMOVSXDQ, VPMOVSXDQ

399

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PMOVSXBD, (V)PMOVSXBQ, (V)PMOVSXBW, (V)PMOVSXWD, (V)PMOVSXWQ
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Page fault, #PF
S
Alignment check, #AC
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

400

X
S
S
A
A
A
A
A
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

PMOVSXDQ, VPMOVSXDQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PMOVSXWD
VPMOVSXWD

Packed Move with Sign-Extension
Word to Doubleword

Sign-extends four or eight packed 16-bit signed integers in the source operand to 32 bits and writes
the packed doubleword signed integers to the destination.
If the source operand is a register, the four 16-bit signed integers are taken from the lower half of the
register.
There are legacy and extended forms of the instruction:
PMOVSXWD

The source operand is either an XMM register or a 64-bit memory location. The destination is an
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not
affected.
VPMOVSXWD

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The source operand is either an XMM register or a 64-bit memory location. The destination is an
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The source operand is either an XMM register or a 128-bit memory location. The destination is a
YMM register.
Instruction Support
Form

Subset

Feature Flag

PMOVSXWD

SSE4.1

CPUID Fn0000_0001_ECX[SSE41] (bit 19)

VPMOVSXWD 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPMOVSXWD 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PMOVSXWD xmm1, xmm2/mem64

Opcode

Description

66 0F 38 23 /r

Sign-extends four packed signed 16-bit
integers in the four low words of xmm2 or
mem64 and writes four packed signed 32-bit
integers to xmm1.

Mnemonic

Encoding
W.vvvv.L.pp

Opcode

VPMOVSXWD xmm1, xmm2/mem64

VEX RXB.map_select
C4

RXB.02

X.1111.0.01

23 /r

VPMOVSXWD ymm1, xmm2/mem128

C4

RXB.02

X.1111.1.01

23 /r

Instruction Reference

PMOVSXWD, VPMOVSXWD

401

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PMOVSXBD, (V)PMOVSXBQ, (V)PMOVSXBW, (V)PMOVSXDQ, (V)PMOVSXWQ
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Page fault, #PF
S
Alignment check, #AC
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

402

X
S
S
A
A
A
A
A
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

PMOVSXWD, VPMOVSXWD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PMOVSXWQ
VPMOVSXWQ

Packed Move with Sign-Extension
Word to Quadword

Sign-extends two or four packed 16-bit signed integers to 64 bits and writes the packed quadword
signed integers to the destination.
If the source operand is a register, the 16-bit signed integers are taken from least-significant words of
the register.
There are legacy and extended forms of the instruction:
PMOVSXWQ

The source operand is either an XMM register or a 32-bit memory location. The destination is an
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not
affected.
VPMOVSXWQ

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The source operand is either an XMM register or a 32-bit memory location. The destination is an
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The source operand is either an XMM register or a 64-bit memory location. The destination is a
YMM register.
Instruction Support
Form

Subset

Feature Flag

PMOVSXWQ

SSE4.1

CPUID Fn0000_0001_ECX[SSE41] (bit 19)

VPMOVSXWQ 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPMOVSXWQ 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PMOVSXWQ xmm1, xmm2/mem32

Opcode

Description

66 0F 38 24 /r

Sign-extends two packed signed 16-bit
integers in the two low words of xmm2 or
mem32 and writes two packed signed
64-bit integers to xmm1.

Mnemonic

Encoding
W.vvvv.L.pp

Opcode

VPMOVSXWQ xmm1, xmm2/mem32

VEX RXB.map_select
C4

RXB.02

X.1111.0.01

24 /r

VPMOVSXWQ ymm1, xmm2/mem64

C4

RXB.02

X.1111.1.01

24 /r

Instruction Reference

PMOVSXWQ, VPMOVSXWQ

403

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PMOVSXBD, (V)PMOVSXBQ, (V)PMOVSXBW, (V)PMOVSXDQ, (V)PMOVSXWD
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Page fault, #PF
S
Alignment check, #AC
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

404

X
S
S
A
A
A
A
A
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

PMOVSXWQ, VPMOVSXWQ

Instruction Reference

26568—Rev. 3.22—May 2018

PMOVZXBD
VPMOVZXBD

AMD64 Technology

Packed Move with Zero-Extension
Byte to Doubleword

Zero-extends four or eight packed 8-bit unsigned integers in the source operand to 32 bits and writes
the packed doubleword positive-signed integers to the destination.
If the source operand is a register, the 8-bit signed integers are taken from the least-significant bytes
of the register.
There are legacy and extended forms of the instruction:
PMOVZXBD

The source operand is either an XMM register or a 32-bit memory location. The destination is an
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not
affected.
VPMOVZXBD

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The source operand is either an XMM register or a 32-bit memory location. The destination is an
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The source operand is either an XMM register or a 64-bit memory location. The destination is a
YMM register.
Instruction Support
Form

Subset

Feature Flag

PMOVZXBD

SSE4.1

CPUID Fn0000_0001_ECX[SSE41] (bit 19)

VPMOVZXBD 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPMOVZXBD 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PMOVZXBD xmm1, xmm2/mem32

Opcode

Description

66 0F 38 31 /r

Zero-extends four packed unsigned 8-bit
integers in the four low bytes of xmm2 or
mem32 and writes four packed positivesigned 32-bit integers to xmm1.

Mnemonic

Encoding
W.vvvv.L.pp

Opcode

VPMOVZXBD xmm1, xmm2/mem32

VEX RXB.map_select
C4

RXB.02

X.1111.0.01

31 /r

VPMOVZXBD ymm1, xmm2/mem64

C4

RXB.02

X.1111.1.01

31 /r

Instruction Reference

PMOVZXBD, VPMOVZXBD

405

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PMOVZXBQ, (V)PMOVZXBW, (V)PMOVZXDQ, (V)PMOVZXWD, (V)PMOVZXW
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Page fault, #PF
S
Alignment check, #AC
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

406

X
S
S
A
A
A
A
A
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

PMOVZXBD, VPMOVZXBD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PMOVZXBQ
VPMOVZXBQ

Packed Move Byte to Quadword
with Zero-Extension

Zero-extends two or four packed 8-bit unsigned integers in the source operand to 64 bits and writes
the packed quadword positive-signed integers to the destination.
If the source operand is a register, the 8-bit signed integers are taken from the least-significant bytes
of the register.
There are legacy and extended forms of the instruction:
PMOVZXBQ

The source operand is either an XMM register or a 16-bit memory location. The destination is an
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not
affected.
VPMOVZXBQ

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The source operand is either an XMM register or a 16-bit memory location. The destination is an
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The source operand is either an XMM register or a 32-bit memory location. The destination is a
YMM register.
Instruction Support
Form

Subset

Feature Flag

PMOVZXBQ

SSE4.1

CPUID Fn0000_0001_ECX[SSE41] (bit 19)

VPMOVZXBQ 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPMOVZXBQ 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PMOVZXBQ xmm1, xmm2/mem16

Opcode

Description

66 0F 38 32 /r

Zero-extends two packed unsigned 8-bit
integers in the two low bytes of xmm2 or
mem16 and writes two packed positivesigned 64-bit integers to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPMOVZXBQ xmm1, xmm2/mem16

C4

RXB.02

X.1111.0.01

32 /r

VPMOVZXBQ ymm1, xmm2/mem32

C4

RXB.02

X.1111.1.01

32 /r

Instruction Reference

PMOVZXBQ, VPMOVZXBQ

407

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PMOVZXBD, (V)PMOVZXBW, (V)PMOVZXDQ, (V)PMOVZXWD, (V)PMOVZXW
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Page fault, #PF
S
Alignment check, #AC
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

408

X
S
S
A
A
A
A
A
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

PMOVZXBQ, VPMOVZXBQ

Instruction Reference

26568—Rev. 3.22—May 2018

PMOVZXBW
VPMOVZXBW

AMD64 Technology

Packed Move Byte to Word with Zero-Extension

Zero-extends eight or sixteen packed 8-bit unsigned integers in the source operand to 16 bits and
writes the packed word positive-signed integers to the destination.
If the source operand is a register, the eight 8-bit signed integers are taken from the lower half of the
register.
There are legacy and extended forms of the instruction:
PMOVZXBW

The source operand is either an XMM register or a 64-bit memory location. The destination is an
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not
affected.
VPMOVZXBW

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The source operand is either an XMM register or a 64-bit memory location. The destination is an
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The source operand is either an XMM register or a 128-bit memory location. The destination is a
YMM register.
Instruction Support
Form

Subset

Feature Flag

PMOVZXBW

SSE4.1

CPUID Fn0000_0001_ECX[SSE41] (bit 19)

VPMOVZXBW 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPMOVZXBW 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PMOVZXBW xmm1, xmm2/mem64

Opcode

Description

66 0F 38 30 /r

Zero-extends eight packed unsigned 8-bit
integers in the eight low bytes of xmm2 or
mem64 and writes eight packed positivesigned 16-bit integers to xmm1.

Mnemonic

Encoding
W.vvvv.L.pp

Opcode

VPMOVZXBW xmm1, xmm2/mem64

VEX RXB.map_select
C4

RXB.02

X.1111.0.01

30 /r

VPMOVZXBW ymm1, xmm2/mem128

C4

RXB.02

X.1111.1.01

30 /r

Instruction Reference

PMOVZXBW, VPMOVZXBW

409

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PMOVZXBD, (V)PMOVZXBQ, (V)PMOVZXDQ, (V)PMOVZXWD, (V)PMOVZXW
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Page fault, #PF
S
Alignment check, #AC
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

410

X
S
S
A
A
A
A
A
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

PMOVZXBW, VPMOVZXBW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PMOVZXDQ
VPMOVZXDQ

Packed Move with Zero-Extension
Doubleword to Quadword

Zero-extends two or four packed 32-bit unsigned integers in the source operand to 64 bits and writes
the packed quadword positive-signed integers to the destination.
If the source operand is a register, the two 32-bit signed integers are taken from the lower half of the
register.
There are legacy and extended forms of the instruction:
PMOVZXDQ

The source operand is either an XMM register or a 64-bit memory location. The destination is an
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not
affected.
VPMOVZXDQ

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The source operand is either an XMM register or a 64-bit memory location. The destination is an
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The source operand is either an XMM register or a 128-bit memory location. The destination is a
YMM register.
Instruction Support
Form

Subset

Feature Flag

PMOVZXDQ

SSE4.1

CPUID Fn0000_0001_ECX[SSE41] (bit 19)

VPMOVZXDQ 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPMOVZXDQ 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PMOVZXDQ xmm1, xmm2/mem64

Opcode

Description

66 0F 38 35 /r

Zero-extends two packed unsigned 32-bit
integers in the two low doublewords of xmm2
or mem64 and writes two packed positivesigned 64-bit integers to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPMOVZXDQ xmm1, xmm2/mem64

C4

RXB.02

X.1111.0.01

35 /r

VPMOVZXDQ ymm1, xmm2/mem128

C4

RXB.02

X.1111.1.01

35 /r

Instruction Reference

PMOVZXDQ, VPMOVZXDQ

411

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PMOVZXBD, (V)PMOVZXBQ, (V)PMOVZXBW, (V)PMOVZXWD, (V)PMOVZXWQ
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Page fault, #PF
S
Alignment check, #AC
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

412

X
S
S
A
A
A
A
A
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

PMOVZXDQ, VPMOVZXDQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PMOVZXWD
VPMOVZXWD

Packed Move Word to Doubleword
with Zero-Extension

Zero-extends four or eight packed 16-bit unsigned integers in the source operand to 32 bits and writes
the packed doubleword positive-signed integers to the destination.
If the source operand is a register, the four 16-bit signed integers are taken from the lower half of the
register.
There are legacy and extended forms of the instruction:
PMOVZXWD

The source operand is either an XMM register or a 64-bit memory location. The destination is an
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not
affected.
VPMOVZXWD

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The source operand is either an XMM register or a 64-bit memory location. The destination is an
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The source operand is either an XMM register or a 128-bit memory location. The destination is a
YMM register.
Instruction Support
Form

Subset

Feature Flag

PMOVZXWD

SSE4.1

CPUID Fn0000_0001_ECX[SSE41] (bit 19)

VPMOVZXWD 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPMOVZXWD 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

PMOVZXWD xmm1, xmm2/mem64

Description

66 0F 38 33 /r

Zero-extends four packed unsigned 16-bit
integers in the four low words of xmm2 or
mem64 and writes four packed positivesigned 32-bit integers to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPMOVZXWD xmm1, xmm2/mem64

C4

RXB.02

X.1111.0.01

33 /r

VPMOVZXWD ymm1, xmm2/mem128

C4

RXB.02

X.1111.1.01

33 /r

Instruction Reference

PMOVZXWD, VPMOVZXWD

413

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PMOVZXBD, (V)PMOVZXBQ, (V)PMOVZXBW, (V)PMOVZXDQ, (V)PMOVZXWQ
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Page fault, #PF
S
Alignment check, #AC
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

414

X
S
S
A
A
A
A
A
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

PMOVZXWD, VPMOVZXWD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PMOVZXWQ
VPMOVZXWQ

Packed Move with Zero-Extension
Word to Quadword

Zero-extends two or four packed 16-bit unsigned integers to 64 bits and writes the packed quadword
positive signed integers to the destination.
If the source operand is a register, the 16-bit signed integers are taken from least-significant words of
the register.
There are legacy and extended forms of the instruction:
PMOVZXWQ

The source operand is either an XMM register or a 32-bit memory location. The destination is an
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not
affected.
VPMOVZXWQ

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The source operand is either an XMM register or a 32-bit memory location. The destination is an
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The source operand is either an XMM register or a 64-bit memory location. The destination is a
YMM register.
Instruction Support
Form

Subset

Feature Flag

PMOVZXWQ

SSE4.1

CPUID Fn0000_0001_ECX[SSE41] (bit 19)

VPMOVZXWQ 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPMOVZXWQ 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PMOVZXWQ xmm1, xmm2/mem32

Opcode

Description

66 0F 38 34 /r

Zero-extends two packed unsigned 16-bit
integers in the two low words of xmm2 or
mem32 and writes two packed positivesigned 64-bit integers to xmm1.

Mnemonic

Encoding
W.vvvv.L.pp

Opcode

VPMOVZXWQ xmm1, xmm2/mem32

VEX RXB.map_select
C4

RXB.02

X.1111.0.01

34 /r

VPMOVZXWQ ymm1, xmm2/mem64

C4

RXB.02

X.1111.1.01

34 /r

Instruction Reference

PMOVZXWQ, VPMOVZXWQ

415

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PMOVZXBD, (V)PMOVZXBQ, (V)PMOVZXBW, (V)PMOVZXDQ, (V)PMOVZXWD
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Page fault, #PF
S
Alignment check, #AC
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

416

X
S
S
A
A
A
A
A
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

PMOVZXWQ, VPMOVZXWQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PMULDQ
VPMULDQ

Packed Multiply
Signed Doubleword to Quadword

Multiplies two or four pairs of 32-bit signed integers in the first and second source operands and
writes two or four packed quadword signed integer products to the destination.
For the 128-bit form of the instruction, the following operations are performed:
dest is the destination register – either an XMM register or the corresponding YMM register.
src1 is the first source operand. src2 is the second source operand.
dest[63:0] = (src1[31:0] * src2[31:0])
dest[127:64] = (src1[95:64] * src2[95:64])

Additionally, for the 256-bit form of the instruction, the following operations are performed:
dest[191:128] = (src1[159:128] * src2[159:128])
dest[255:192] = (src1[223:192] * src2[223:192])

There are legacy and extended forms of the instruction:
PMULDQ

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPMULDQ

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PMULDQ

SSE4.1

CPUID Fn0000_0001_ECX[SSE41] (bit 19)

VPMULDQ 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPMULDQ 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

Instruction Reference

PMULDQ, VPMULDQ

417

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Encoding
Mnemonic

Opcode

PMULDQ xmm1, xmm2/mem128

66 0F 38 28 /r

Description
Multiplies two packed 32-bit signed integers in
xmm1[31:0] and xmm1[95:64] by the
corresponding values in xmm2 or mem128.
Writes packed 64-bit signed integer products to
xmm1[63:0] and xmm1[127:64].

Mnemonic

Encoding
VEX RXB.map_select

W.vvvv.L.pp

Opcode

VPMULDQ xmm1, xmm2, xmm3/mem128

C4

RXB.02

X.src1.0.01

28 /r

VPMULDQ ymm1, ymm2, ymm3/mem256

C4

RXB.02

X.src1.1.01

28 /r

Related Instructions
(V)PMULLD, (V)PMULHW, (V)PMULHUW,(V)PMULUDQ, (V)PMULLW
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

418

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PMULDQ, VPMULDQ

Instruction Reference

26568—Rev. 3.22—May 2018

PMULHRSW
VPMULHRSW

AMD64 Technology

Packed Multiply High with Round and Scale
Words

Multiplies each packed 16-bit signed value in the first source operand by the corresponding value in
the second source operand, truncates the 32-bit product to the 18 most significant bits by right-shifting, then rounds the truncated value by adding 1 to its least-significant bit. Writes bits [16:1] of the
sum to the corresponding word of the destination.
There are legacy and extended forms of the instruction:
PMULHRSW

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPMULHRSW

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PMULHRSW

SSSE3

VPMULHRSW 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPMULHRSW 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_ECX[SSSE3] (bit 9)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PMULHRSW xmm1, xmm2/mem128

Opcode

Description

66 0F 38 0B /r Multiplies each packed 16-bit signed value in xmm1
by the corresponding value in xmm2 or mem128,
truncates product to 18 bits, rounds by adding 1.
Writes bits [16:1] of the sum to xmm1.

Mnemonic

Encoding
VEX RXB.map_select

W.vvvv.L.pp

Opcode

VPMULHRSW xmm1, xmm2, xmm3/mem128

C4

RXB.2

X.src1.0.01

0B /r

VPMULHRSW ymm1, ymm2, ymm3/mem256

C4

RXB.2

X.src1.1.01

0B /r

Instruction Reference

PMULHRSW, VPMULHRSW

419

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
None
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

420

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PMULHRSW, VPMULHRSW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PMULHUW
VPMULHUW

Packed Multiply High
Unsigned Word

Multiplies each packed 16-bit unsigned value in the first source operand by the corresponding value
in the second source operand; writes the high-order 16 bits of each 32-bit product to the corresponding word of the destination.
There are legacy and extended forms of the instruction:
PMULHUW

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPMULHUW

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PMULHUW

SSE2

VPMULHUW 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPMULHUW 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PMULHUW xmm1, xmm2/mem128

Opcode

Description

66 0F E4 /r

Multiplies packed 16-bit unsigned values in xmm1 by
the corresponding values in xmm2 or mem128. Writes
bits [31:16] of each product to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPMULHUW xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

E4 /r

VPMULHUW ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

E4 /r

Instruction Reference

PMULHUW, VPMULHUW

421

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PMULDQ, (V)PMULHW, (V)PMULLD, (V)PMULLW, (V)PMULUDQ
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

422

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PMULHUW, VPMULHUW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PMULHW
VPMULHW

Packed Multiply High
Signed Word

Multiplies each packed 16-bit signed value in the first source operand by the corresponding value in
the second source operand; writes the high-order 16 bits of each 32-bit product to the corresponding
word of the destination.
There are legacy and extended forms of the instruction:
PMULHW

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPMULHW

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PMULHW

SSE2

VPMULHW 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPMULHW 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PMULHW xmm1, xmm2/mem128

Opcode

Description

66 0F E5 /r

Multiplies packed 16-bit signed values in xmm1 by the
corresponding values in xmm2 or mem128. Writes bits
[31:16] of each product to xmm1.

Mnemonic

Encoding
W.vvvv.L.pp

Opcode

VPMULHW xmm1, xmm2, xmm3/mem128

VEX RXB.map_select
C4

RXB.01

X.src1.0.01

E5 /r

VPMULHW ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

E5 /r

Instruction Reference

PMULHW, VPMULHW

423

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PMULDQ, (V)PMULHUW, (V)PMULLD, (V)PMULLW, (V)PMULUDQ
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

424

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PMULHW, VPMULHW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PMULLD
VPMULLD

Packed Multiply and Store Low
Signed Doubleword

Multiplies four packed 32-bit signed integers in the first source operand by the corresponding values
in the second source operand and writes bits [31:0] of each 64-bit product to the corresponding 32-bit
element of the destination.
There are legacy and extended forms of the instruction:
PMULLD

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPMULLD

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PMULLD

SSE4.1

CPUID Fn0000_0001_ECX[SSE41] (bit 19)

VPMULLD 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPMULLD 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PMULLD xmm1, xmm2/mem128

Opcode

Description

66 0F 38 40 /r

Multiplies four packed 32-bit signed integers in
xmm1 by corresponding values in xmm2 or
m128. Writes bits [31:0] of each 64-bit product to
the corresponding 32-bit element of xmm1.

Mnemonic

Encoding
W.vvvv.L.pp

Opcode

VPMULLD xmm1, xmm2, xmm3/mem128

VEX RXB.map_select
C4

RXB.02

X.src1.0.01

40 /r

VPMULLD ymm1, ymm2, ymm3/mem256

C4

RXB.02

X.src1.1.01

40 /r

Instruction Reference

PMULLD, VPMULLD

425

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PMULDQ, (V)PMULHUW, (V)PMULHW, (V)PMULLW, (V)PMULUDQ
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

426

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PMULLD, VPMULLD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PMULLW
VPMULLW

Packed Multiply Low
Signed Word

Multiplies eight packed 16-bit signed integers in the first source operand by the corresponding values
in the second source operand and writes bits [15:0] of each 32-bit product to the corresponding 16-bit
element of the destination.
There are legacy and extended forms of the instruction:
PMULLW

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPMULLW

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PMULLW

SSE2

VPMULLW 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPMULLW 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PMULLW xmm1, xmm2/mem128

Opcode

Description

66 0F D5 /r

Multiplies eight packed 16-bit signed integers in
xmm1 by corresponding values in xmm2 or
m128. Writes bits [15:0] of each 32-bit product to
the corresponding 16-bit element of xmm1.

Mnemonic

Encoding
W.vvvv.L.pp

Opcode

VPMULLW xmm1, xmm2, xmm3/mem128

VEX RXB.map_select
C4

RXB.01

X.src1.0.01

D5 /r

VPMULLW ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

D5 /r

Instruction Reference

PMULLW, VPMULLW

427

AMD64 Technology

26568—Rev. 3.22—May 2018

Related Instructions
(V)PMULDQ, (V)PMULHUW, (V)PMULHW, (V)PMULLD, (V)PMULUDQ
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

428

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PMULLW, VPMULLW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PMULUDQ
VPMULUDQ

Packed Multiply
Unsigned Doubleword to Quadword

Multiplies two or four pairs of 32-bit unsigned integers in the first and second source operands and
writes two or four packed quadword unsigned integer products to the destination.
For the 128-bit form of the instruction, the following operations are performed:
dest is the destination register – either an XMM register or the corresponding YMM register.
src1 is the first source operand. src2 is the second source operand.
dest[63:0] = (src1[31:0] * src2[31:0])
dest[127:64] = (src1[95:64] * src2[95:64])

Additionally, for the 256-bit form of the instruction, the following operations are performed:
dest[191:128] = (src1[159:128] * src2[159:128])
dest[255:192] = (src1[223:192] * src2[223:192])

There are legacy and extended forms of the instruction:
PMULUDQ

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPMULUDQ

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PMULUDQ

SSE2

VPMULUDQ 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPMULUDQ 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

Instruction Reference

PMULUDQ, VPMULUDQ

429

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Encoding
Mnemonic
PMULUDQ xmm1, xmm2/mem128

Opcode

Description

66 0F F4 /r

Multiplies two packed 32-bit unsigned integers in
xmm1[31:0] and xmm1[95:64] by the
corresponding values in xmm2 or mem128.
Writes packed 64-bit unsigned integer products to
xmm1[63:0] and xmm1[127:64].

Mnemonic

Encoding
VEX RXB.map_select

W.vvvv.L.pp

Opcode

VPMULUDQ xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

F4 /r

VPMULUDQ ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

F4 /r

Related Instructions
(V)PMULDQ, (V)PMULHUW, (V)PMULHW, (V)PMULLD, (V)PMULLW, (V)PMULUDQ
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

430

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PMULUDQ, VPMULUDQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

POR
VPOR

Packed OR

Performs a bitwise OR of the first and second source operands and writes the result to the destination.
When one or both of a pair of corresponding bits in the first and second operands are set, the corresponding bit of the destination is set; when neither source bit is set, the destination bit is cleared.
There are legacy and extended forms of the instruction:
POR

The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The first source XMM register is also the destination. Bits
[255:128] of the YMM register that corresponds to the destination are not affected.
VPOR

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM
register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

POR

SSE2

VPOR 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

VPOR 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
POR xmm1, xmm2/mem128

Opcode
66 0F EB /r

Description
Performs bitwise OR of values in xmm1 and xmm2 or
mem128. Writes results to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPOR xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

EB /r

VPOR ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

EB /r

Related Instructions
(V)PAND, (V)PANDN, (V)PXOR

Instruction Reference

POR, VPOR

431

AMD64 Technology

26568—Rev. 3.22—May 2018

rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

432

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

POR, VPOR

Instruction Reference

26568—Rev. 3.22—May 2018

PSADBW
VPSADBW

AMD64 Technology

Packed Sum of Absolute Differences
Bytes to Words

Subtracts the 16 or 32 packed 8-bit unsigned integers in the second source operand from the corresponding values in the first source operand and computes the absolute value of the differences. Computes two or four unsigned 16-bit integer sums of groups of eight absolute differences and writes the
sums to specific words of the destination.
For the 128-bit form of the instruction:
• The unsigned 16-bit integer sum of absolute differences of the eight bytes [7:0] of the source
operands is written to bits [15:0] of the destination; bits [63:16] are cleared.
• The unsigned 16-bit integer sum of absolute differences of the eight bytes [15:8] of the source
operands is written to bits [79:64] of the destination; bits [127:80] are cleared.
Additionally, for the 256-bit form of the instruction:
• The unsigned 16-bit integer sum of absolute differences of the eight bytes [23:16] of the source
operands is written to bits [143:128] of the destination; bits [191:144] are cleared.
• The unsigned 16-bit integer sum of absolute differences of the eight bytes [24:31] of the source
operands is written to bits [207:192] of the destination; bits [255:208] are cleared.
There are legacy and extended forms of the instruction:
PSADBW

The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The first source XMM register is also the destination. Bits
[255:128] of the YMM register that corresponds to the destination are not affected.
VPSADBW

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM
register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PSADBW

SSE2

VPSADBW 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPSADBW 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

Instruction Reference

PSADBW, VPSADBW

433

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Encoding
Mnemonic
PSADBW xmm1, xmm2/mem128

Opcode

Description

66 0F F6 /r

Compute the sum of the absolute differences of two sets
of packed 8-bit unsigned integer values in xmm1 and
xmm2 or mem128. Writes 16-bit unsigned integer sums
to xmm1

Mnemonic

Encoding
VEX RXB.map_select

W.vvvv.L.pp

Opcode

VPSADBW xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

F6 /r

VPSADBW ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

F6 /r

Related Instructions
(V)MPSADBW
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

434

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PSADBW, VPSADBW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PSHUFB
VPSHUFB

Packed Shuffle
Byte

Copies bytes from the first source operand to the destination or clears bytes in the destination, as
specified by control bytes in the second source operand.
The control bytes occupy positions in the source operand that correspond to positions in the destination. Each control byte has the following fields.
7
FRZ

6

4
Reserved

Bits
[7]

3

0
SRC_Index

Description
Set the bit to clear the corresponding byte of the destination.
Clear the bit to copy the selected source byte to the corresponding byte of the destination.

[6:4]

Reserved

[3:0]

Binary value selects the source byte.

For the 256-bit form of the instruction, the SRC_Index fields in the upper 16 bytes of the second
source operand select bytes in the upper 16 bytes of the first source operand to be copied.
There are legacy and extended forms of the instruction:
PSHUFB

The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The first source XMM register is also the destination. Bits
[255:128] of the YMM register that corresponds to the destination are not affected.
VPSHUFB

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM
register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

PSHUFB

SSSE3

Feature Flag

VPSHUFB 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPSHUFB 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_ECX[SSSE3] (bit 9)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

Instruction Reference

PSHUFB, VPSHUFB

435

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Encoding
Mnemonic
PSHUFB xmm1, xmm2/mem128

Opcode

Description

66 0F 38 00 /r

Moves bytes in xmm1 as specified by control bytes in
xmm2 or mem128.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPSHUFB xmm1, xmm2, xmm3/mem128

C4

RXB.02

X.src1.0.01

00 /r

VPSHUFB ymm1, ymm2, ymm3/mem256

C4

RXB.02

X.src1.1.01

00 /r

Related Instructions
(V)PSHUFD, (V)PSHUFW, (V)PSHUHW, (V)PSHUFLW
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

436

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PSHUFB, VPSHUFB

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PSHUFD
VPSHUFD

Packed Shuffle
Doublewords

Copies packed doubleword values from a source to a doubleword in the destination, as specified by
bit fields of an immediate byte operand. A source doubleword can be copied more than once.
Source doublewords are selected by two-bit fields in the immediate-byte operand. Each field corresponds to a destination doubleword, as shown:
Destination
Doubleword

Immediate-Byte
Bit Field

Value of
Bit Field

Source
Doubleword

[31:0]

[1:0]

00

[31:0]

01

[63:32]

10

[95:64]

11

[127:96]

00

[31:0]

01

[63:32]

10

[95:64]

11

[127:96]

00

[31:0]

01

[63:32]

10

[95:64]

11

[127:96]

00

[31:0]

01

[63:32]

10

[95:64]

11

[127:96]

[63:32]

[95:64]

[127:96]

[3:2]

[5:4]

[7:6]

For the 256-bit form of the instruction, the same immediate byte selects doublewords in the upper
128-bits of the source operand to be copied to the destination.
Destination
Doubleword

Immediate-Byte
Bit Field

Value of
Bit Field

Source
Doubleword

[159:128]

[1:0]

00

[159:128]

01

[191:160]

10

[223:192]

11

[225:224]

00

[159:128]

01

[191:160]

10

[223:192]

11

[225:224]

[191:160]

Instruction Reference

[3:2]

PSHUFD, VPSHUFD

437

AMD64 Technology

26568—Rev. 3.22—May 2018

Destination
Doubleword

Immediate-Byte
Bit Field

Value of
Bit Field

Source
Doubleword

[223:192]

[5:4]

00

[159:128]

01

[191:160]

10

[223:192]

11

[225:224]

00

[159:128]

01

[191:160]

10

[223:192]

11

[225:224]

[255:224]

[7:6]

There are legacy and extended forms of the instruction:
PSHUFD

The source operand is either an XMM register or a 128-bit memory location. The destination is an
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not
affected.
VPSHUFD

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The source operand is either an XMM register or a 128-bit memory location. The destination is an
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The source operand is either a YMM register or a 256-bit memory location. The destination is a
YMM register.
Instruction Support
Form

Subset

Feature Flag

PSHUFD

SSE2

VPSHUFD 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

VPSHUFD 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

438

PSHUFD, VPSHUFD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic

Opcode

PSHUFD xmm1, xmm2/mem128, imm8

66 0F 70 /r ib

Description
Copies packed 32-bit values from xmm2 or
mem128 to xmm1, as specified by imm8.

Mnemonic

Encoding
VEX RXB.map_select

W.vvvv.L.pp

Opcode

VPSHUFD xmm1, xmm2/mem128, imm8

C4

RXB.01

X.1111.0.01

70 /r ib

VPSHUFD ymm1, ymm2/mem256, imm8

C4

RXB.01

X.1111.1.01

70 /r ib

Related Instructions
(V)PSHUFHW, (V)PSHUFLW, (V)PSHUFW
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

Instruction Reference

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PSHUFD, VPSHUFD

439

AMD64 Technology

26568—Rev. 3.22—May 2018

PSHUFHW
VPSHUFHW

Packed Shuffle
High Words

Copies packed word values from the high quadword of the source operand or the upper quadwords of
two halves of the source operand to a word in the high quadword of the destination or the upper quadwords of two halves of the destination, as specified by bit fields of an immediate byte operand. A
source word can be copied more than once.
Source words are selected by two-bit fields in the immediate-byte operand. Each field corresponds to
a destination word, as shown:
Destination
Word

Immediate-Byte
Bit Field

Value of
Bit Field

Source
Word

[79:64]

[1:0]

00

[79:64]

01

[95:80]

10

[111:96]

11

[127:112]

00

[79:64]

01

[95:80]

10

[111:96]

11

[127:112]

00

[79:64]

01

[95:80]

10

[111:96]

11

[127:112]

00

[79:64]

01

[95:80]

10

[111:96]

11

[127:112]

[95:80]

[111:96]

[127:112]

[3:2]

[5:4]

[7:6]

The least-significant quadword of the source is copied to the corresponding quadword of the destination.
For the 256-bit form of the instruction, the same immediate byte selects words in the most-significant
quadword of the source operand to be copied to the destination:

440

Destination
Word

Immediate-Byte
Bit Field

Value of
Bit Field

Source
Word

[207:192]

[1:0]

00

[207:192]

01

[223:208]

10

[239:224]

11

[255:240]

PSHUFHW, VPSHUFHW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Destination
Word

Immediate-Byte
Bit Field

Value of
Bit Field

Source
Word

[223:208]

[3:2]

00

[207:192]

01

[223:208]

10

[239:224]

11

[255:240]

00

[207:192]

01

[223:208]

10

[239:224]

11

[255:240]

00

[207:192]

01

[223:208]

10

[239:224]

11

[255:240]

[239:224]

[255:240]

[5:4]

[7:6]

The least-significant quadword of the upper 128 bits of the source is copied to the corresponding
quadword of the destination.
There are legacy and extended forms of the instruction:
PSHUFHW

The source operand is either an XMM register or a 128-bit memory location. The destination is an
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not
affected.
VPSHUFHW

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The source operand is either an XMM register or a 128-bit memory location. The destination is an
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The source operand is either a YMM register or a 256-bit memory location. The destination is a
YMM register.
Instruction Support
Form

Subset

PSHUFHW

SSE2

Feature Flag

VPSHUFHW 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPSHUFHW 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

Instruction Reference

PSHUFHW, VPSHUFHW

441

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Encoding
Mnemonic
PSHUFHW xmm1, xmm2/mem128, imm8

Opcode

Description

F3 0F 70 /r ib

Copies packed 16-bit values from the
high-order quadword of xmm2 or mem128
to the high-order quadword of xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPSHUFHW xmm1, xmm2/mem128, imm8

C4

RXB.01

X.1111.0.10

70 /r ib

VPSHUFHW ymm1, ymm2/mem256, imm8

C4

RXB.01

X.1111.1.10

70 /r ib

Related Instructions
(V)PSHUFD, (V)PSHUFLW, (V)PSHUFW
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

442

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PSHUFHW, VPSHUFHW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PSHUFLW
VPSHUFLW

Packed Shuffle
Low Words

Copies packed word values from the low quadword of the source operand or the lower quadwords of
two halves of the source operand to a word in the low quadword of the destination or the lower quadwords of two halves of the destination, as specified by bit fields of an immediate byte operand. A
source word can be copied more than once.
Source words are selected by two-bit fields in the immediate-byte operand. Each bit field corresponds
to a destination word, as shown:
Destination
Word

Immediate-Byte
Bit Field

Value of
Bit Field

Source
Word

[15:0]

[1:0]

00

[15:0]

01

[31:16]

10

[47:32]

11

[63:48]

00

[15:0]

01

[31:16]

10

[47:32]

11

[63:48]

00

[15:0]

01

[31:16]

10

[47:32]

11

[63:48]

00

[15:0]

01

[31:16]

10

[47:32]

11

[63:48]

[31:16]

[47:32]

[63:48]

[3:2]

[5:4]

[7:6]

The most-significant quadword of the source is copied to the corresponding quadword of the destination.
For the 256-bit form of the instruction, the same immediate byte selects words in the lower quadword
of the upper 128 bits of the source operand to be copied to the destination:
Destination
Word

Immediate-Byte
Bit Field

Value of
Bit Field

Source
Word

[143:128]

[1:0]

00

[143:128]

01

[159:144]

10

[175:160]

11

[191:176]

Instruction Reference

PSHUFLW, VPSHUFLW

443

AMD64 Technology

26568—Rev. 3.22—May 2018

Destination
Word

Immediate-Byte
Bit Field

Value of
Bit Field

Source
Word

[159:144]

[3:2]

00

[143:128]

01

[159:144]

10

[175:160]

11

[191:176]

00

[143:128]

01

[159:144]

10

[175:160]

11

[191:176]

00

[143:128]

01

[159:144]

10

[175:160]

11

[191:176]

[175:160]

[191:176]

[5:4]

[7:6]

The most-significant quadword of the upper 128 bits of the source is copied to the corresponding
quadword of the destination.
There are legacy and extended forms of the instruction:
PSHUFLW

The source operand is either an XMM register or a 128-bit memory location. The destination is an
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not
affected.
VPSHUFLW

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The source operand is either an XMM register or a 128-bit memory location. The destination is an
XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The source operand is either a YMM register or a 256-bit memory location. The destination is a
YMM register.
Instruction Support
Form

Subset

PSHUFLW

SSE2

Feature Flag

VPSHUFLW 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPSHUFLW 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

444

PSHUFLW, VPSHUFLW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic
PSHUFLW xmm1, xmm2/mem128, imm8

Opcode

Description

F2 0F 70 /r ib

Copies packed 16-bit values from the loworder quadword of xmm2 or mem128 to
the low-order quadword of xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPSHUFLW xmm1, xmm2/mem128, imm8

C4

RXB.01

X.1111.0.11

70 /r ib

VPSHUFLW ymm1, ymm2/mem256, imm8

C4

RXB.01

X.1111.1.11

70 /r ib

Related Instructions
(V)PSHUFD, (V)PSHUFHW, (V)PSHUFW
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

Instruction Reference

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PSHUFLW, VPSHUFLW

445

AMD64 Technology

26568—Rev. 3.22—May 2018

PSIGNB
VPSIGNB

Packed Sign
Byte

For each packed signed byte in the first source operand, evaluate the corresponding byte of the second
source operand and perform one of the following operations.
• When a byte of the second source is negative, write the two’s-complement of the corresponding
byte of the first source to the destination.
• When a byte of the second source is positive, copy the corresponding byte of the first source to the
destination.
• When a byte of the second source is zero, clear the corresponding byte of the destination.
There are legacy and extended forms of the instruction:
PSIGNB

The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The first source XMM register is also the destination. Bits
[255:128] of the YMM register that corresponds to the destination are not affected.
VPSIGNB

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM
register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PSIGNB

SSSE3

VPSIGNB 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPSIGNB 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_ECX[SSSE3] (bit 9)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

446

PSIGNB, VPSIGNB

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic
PSIGNB xmm1, xmm2/mem128

Opcode

Description

66 0F 38 08 /r

Perform operation based on evaluation of each packed
8-bit signed integer value in xmm2 or mem128.
Write 8-bit signed results to xmm1.

Mnemonic

Encoding
VEX RXB.map_select

W.vvvv.L.pp

Opcode

VPSIGNB xmm1, xmm2, xmm2/mem128

C4

RXB.02

X.src1.0.01

08 /r

VPSIGNB ymm1, ymm2, ymm2/mem256

C4

RXB.02

X.src1.1.01

08 /r

Related Instructions
(V)PSIGNW, (V)PSIGND
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

Instruction Reference

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PSIGNB, VPSIGNB

447

AMD64 Technology

26568—Rev. 3.22—May 2018

PSIGND
VPSIGND

Packed Sign
Doubleword

For each packed signed doubleword in the first source operand, evaluate the corresponding doubleword of the second source operand and perform one of the following operations.
• When a doubleword of the second source is negative, write the two’s-complement of the
corresponding doubleword of the first source to the destination.
• When a doubleword of the second source is positive, copy the corresponding doubleword of the
first source to the destination.
• When a doubleword of the second source is zero, clear the corresponding doubleword of the
destination.
There are legacy and extended forms of the instruction:
PSIGND

The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The first source XMM register is also the destination. Bits
[255:128] of the YMM register that corresponds to the destination are not affected.
VPSIGND

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM
register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PSIGND

SSSE3

VPSIGND 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

CPUID Fn0000_0001_ECX[SSSE3] (bit 9)

VPSIGND 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

448

PSIGND, VPSIGND

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic
PSIGND xmm1, xmm2/mem128

Opcode

Description

66 0F 38 0A /r

Perform operation based on evaluation of each packed
32-bit signed integer value in xmm2 or mem128.
Write 32-bit signed results to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPSIGND xmm1, xmm2, xmm3/mem128

C4

RXB.02

X.src1.0.01

0A /r

VPSIGND ymm1, ymm2, ymm3/mem256

C4

RXB.02

X.src1.1.01

0A /r

Related Instructions
(V)PSIGNB, (V)PSIGNW
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

Instruction Reference

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PSIGND, VPSIGND

449

AMD64 Technology

26568—Rev. 3.22—May 2018

PSIGNW
VPSIGNW

Packed Sign
Word

For each packed signed word in the first source operand, evaluate the corresponding word of the second source operand and perform one of the following operations.
• When a word of the second source is negative, write the two’s-complement of the corresponding
word of the first source to the destination.
• When a word of the second source is positive, copy the corresponding word of the first source to
the destination.
• When a word of the second source is zero, clear the corresponding word of the destination.
There are legacy and extended forms of the instruction:
PSIGNW

The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The first source XMM register is also the destination. Bits
[255:128] of the YMM register that corresponds to the destination are not affected.
VPSIGNW

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM
register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PSIGNW

SSSE3

VPSIGNW 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPSIGNW 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_ECX[SSSE3] (bit 9)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

450

PSIGNW, VPSIGNW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic
PSIGNW xmm1, xmm2/mem128

Opcode

Description

66 0F 38 09 /r

Perform operation based on evaluation of each packed
16-bit signed integer value in xmm2 or mem128.
Write 16-bit signed results to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPSIGNW xmm1, xmm2, xmm3/mem128

C4

RXB.02

X.src1.0.01

09 /r

VPSIGNW ymm1, ymm2, ymm3/mem256

C4

RXB.02

X.src1.1.01

09 /r

Related Instructions
(V)PSIGNB, (V)PSIGND
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

Instruction Reference

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PSIGNW, VPSIGNW

451

AMD64 Technology

26568—Rev. 3.22—May 2018

PSLLD
VPSLLD

Packed Shift Left Logical
Doublewords

Left-shifts each packed 32-bit value in the source operand as specified by a shift-count operand and
writes the shifted values to the destination.
The shift-count operand can be an immediate byte, a second register, or a memory location. The shift
count is treated as an unsigned integer. When the shift count is provided by a register or memory location, only bits [63:0] of the value are considered.
Low-order bits emptied by shifting are cleared. When the shift count is greater than 31, the destination is cleared.
There are legacy and extended forms of the instruction:
PSLLD

There are two forms of the instruction, based on the type of count operand.
The first source operand is an XMM register. The shift count is specified by either a second XMM
register or a 128-bit memory location, or by an immediate 8-bit operand. The first source XMM register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are
not affected.
VPSLLD

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

There are two 128-bit encodings. These differ based on the type of count operand.
The first source operand is an XMM register. The shift count is specified by either a second XMM
register or a 128-bit memory location, or by an immediate 8-bit operand. The destination is an XMM
register. For the immediate operand encoding, the destination is specified by VEX.vvvv. Bits
[255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

There are two 256-bit encodings. These differ based on the type of count operand.
The first source operand is a YMM register. The shift count is specified by either a second XMM register or a 128-bit memory location, or by an immediate 8-bit operand. The destination is a YMM register. For the immediate operand encoding, the destination is specified by VEX.vvvv.
Instruction Support
Form

Subset

Feature Flag

PSLLD

SSE2

VPSLLD 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

VPSLLD 256-bit

AVX2

CPUID Fn0000_00007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

452

PSLLD, VPSLLD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic
PSLLD xmm1, xmm2/mem128
PSLLD xmm, imm8

Opcode

Description

66 0F F2 /r

Left-shifts packed doublewords in xmm1 as specified
by xmm2[63:0] or mem128[63:0].

66 0F 72 /6 ib

Left-shifts packed doublewords in xmm as specified by
imm8.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPSLLD xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

F2 /r

VPSLLD xmm1, xmm2, imm8

C4

RXB.01

X.dest.0.01

72 /6 ib

VPSLLD ymm1, ymm2, xmm3/mem128

C4

RXB.01

X.src1.1.01

F2 /r

VPSLLD ymm1, ymm2, imm8

C4

RXB.01

X.dest.1.01

72 /6 ib

Related Instructions
(V)PSLLDQ, (V)PSLLQ, (V)PSLLW, (V)PSRAD, (V)PSRAW, (V)PSRLD, (V)PSRLDQ,
(V)PSRLQ, (V)PSRLW, VPSLLVD, VPSLLVQ, VPSRAVD, VPSRLVD, VPSRLVQ
rFLAGS Affected
None
MXCSR Flags Affected
None

Instruction Reference

PSLLD, VPSLLD

453

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S

S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
A
A
A
S

Alignment check, #AC
A
Page fault, #PF
X — AVX, AVX2, and SSE exception
A — AVX and AVX2 exception
S — SSE exception

454

A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
When alignment checking enabled:
• 128-bit memory operand not 16-byte aligned.
• 256-bit memory operand not 32-byte aligned.
Instruction execution caused a page fault.

PSLLD, VPSLLD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PSLLDQ
VPSLLDQ

Packed Shift Left Logical
Double Quadword

Left-shifts the one or each of the two double quadword values in the source operand the number of
bytes specified by an immediate byte operand and writes the shifted values to the destination.
The immediate byte operand supplies an unsigned shift count. Low-order bytes emptied by shifting
are cleared. When the shift value is greater than 15, the destination is cleared. For the 256-bit form of
the instruction, the shift count is applied to both the upper and the lower double quadword. Bytes
shifted out of the lower 128 bits are not shifted into the upper.
There are legacy and extended forms of the instruction:
PSLLDQ

The source XMM register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected.
VPSLLDQ

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The source operand is an XMM register. The destination is an XMM register specified by VEX.vvvv.
Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The source operand is a YMM register. The destination is a YMM register specified by VEX.vvvv.
Instruction Support
Form

Subset

Feature Flag

PSLLDQ

SSE2

VPSLLDQ 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

VPSLLDQ 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PSLLDQ xmm, imm8

Opcode

Description

66 0F 73 /7 ib

Left-shifts double quadword value in xmm1 as specified by imm8.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPSLLDQ xmm1, xmm2, imm8

C4

RXB.01

0.dest.0.01

73 /7 ib

VPSLLDQ ymm1, ymm2, imm8

C4

RXB.01

0.dest.1.01

73 /7 ib

Related Instructions
(V)PSLLD, (V)PSLLQ, (V)PSLLW, (V)PSRAD, (V)PSRAW, (V)PSRLD, (V)PSRLDQ, (V)PSRLQ,
(V)PSRLW, VPSLLVD, VPSLLVQ, VPSRAVD, VPSRLVD, VPSRLVQ

Instruction Reference

PSLLDQ, VPSLLDQ

455

AMD64 Technology

26568—Rev. 3.22—May 2018

rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

Invalid opcode, #UD

X
X
Device not available, #NM
S
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

456

X
S
S
A
A
A
A
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.

PSLLDQ, VPSLLDQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PSLLQ
VPSLLQ

Packed Shift Left Logical
Quadwords

Left-shifts each packed 64-bit value in the source operand as specified by a shift-count operand and
writes the shifted values to the destination.
The shift-count operand can be an immediate byte, a second register, or a memory location. The shift
count is treated as an unsigned integer. When the shift count is provided by a register or memory location, only bits [63:0] of the value are considered.
Low-order bits emptied by shifting are cleared. When the shift value is greater than 63, the destination is cleared.
There are legacy and extended forms of the instruction:
PSLLQ

There are two forms of the instruction, based on the type of count operand.
The first source operand is an XMM register. The shift count is specified by either a second XMM
register or a 128-bit memory location, or by an immediate 8-bit operand. The first source XMM register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are
not affected.
VPSLLQ

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

There are two 128-bit encodings. These differ based on the type of count operand.
The first source operand is an XMM register. The shift count is specified by either a second XMM
register or a 128-bit memory location, or by an immediate 8-bit operand. The destination is an XMM
register. For the immediate operand encoding, the destination is specified by VEX.vvvv. Bits
[255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

There are two 256-bit encodings. These differ based on the type of count operand.
The first source operand is a YMM register. The shift count is specified by either a second XMM register or a 128-bit memory location, or by an immediate 8-bit operand. The destination is a YMM register. For the immediate operand encoding, the destination is specified by VEX.vvvv.
Instruction Support
Form

Subset

Feature Flag

PSLLQ

SSE2

VPSLLQ 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

VPSLLQ 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

Instruction Reference

PSLLQ, VPSLLQ

457

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Encoding
Mnemonic
PSLLQ xmm1, xmm2/mem128
PSLLQ xmm, imm8

Opcode

Description

66 0F F3 /r

Left-shifts packed quadwords in xmm1 as specified by
xmm2[63:0] or mem128[63:0].

66 0F 73 /6 ib

Left-shifts packed quadwords in xmm as specified by
imm8.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPSLLQ xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

F3 /r

VPSLLQ xmm1, xmm2, imm8

C4

RXB.01

X.dest.0.01

73 /6 ib

VPSLLQ ymm1, ymm2, xmm3/mem128

C4

RXB.01

X.src1.1.01

F3 /r

VPSLLQ ymm1, ymm2, imm8

C4

RXB.01

X.dest.1.01

73 /6 ib

Related Instructions
(V)PSLLD, (V)PSLLDQ, (V)PSLLW, (V)PSRAD, (V)PSRAW, (V)PSRLD, (V)PSRLDQ,
(V)PSRLQ, (V)PSRLW, VPSLLVD, VPSLLVQ, VPSRAVD, VPSRLVD, VPSRLVQLLVQ
rFLAGS Affected
None
MXCSR Flags Affected
None

458

PSLLQ, VPSLLQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S

S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
A
A
A
S

Alignment check, #AC
A
Page fault, #PF
X — AVX, AVX2, and SSE exception
A — AVX and AVX2 exception
S — SSE exception

Instruction Reference

A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
When alignment checking enabled:
• 128-bit memory operand not 16-byte aligned.
• 256-bit memory operand not 32-byte aligned.
Instruction execution caused a page fault.

PSLLQ, VPSLLQ

459

AMD64 Technology

26568—Rev. 3.22—May 2018

PSLLW
VPSLLW

Packed Shift Left Logical
Words

Left-shifts each packed 16-bit value in the source operand as specified by a shift-count operand and
writes the shifted values to the destination.
The shift-count operand can be an immediate byte, a second register, or a memory location. The shift
count is treated as an unsigned integer. When the shift count is provided by a register or memory location, only bits [63:0] of the value are considered.
Low-order bits emptied by shifting are cleared. When the shift count is greater than 15, the destination is cleared.
There are legacy and extended forms of the instruction:
PSLLW

There are two forms of the instruction, based on the type of count operand.
The first source operand is an XMM register. The shift count is specified by either a second XMM
register or a 128-bit memory location, or by an immediate 8-bit operand. The first source XMM register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are
not affected.
VPSLLW

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

There are two 128-bit encodings. These differ based on the type of count operand.
The first source operand is an XMM register. The shift count is specified by either a second XMM
register or a 128-bit memory location, or by an immediate 8-bit operand. The destination is an XMM
register. For the immediate operand encoding, the destination is specified by VEX.vvvv. Bits
[255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

There are two 256-bit encodings. These differ based on the type of count operand.
The first source operand is a YMM register. The shift count is specified by either a second XMM register or a 128-bit memory location, or by an immediate 8-bit operand. The destination is a YMM register. For the immediate operand encoding, the destination is specified by VEX.vvvv.
Instruction Support
Form

Subset

Feature Flag

PSLLW

SSE2

VPSLLW 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

VPSLLW 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

460

PSLLW, VPSLLW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic
PSLLW xmm1, xmm2/mem128
PSLLW xmm, imm8

Opcode
66 0F F1 /r
66 0F 71 /6 ib

Description
Left-shifts packed words in xmm1 as specified by
xmm2[63:0] or mem128[63:0].
Left-shifts packed words in xmm as specified by imm8.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPSLLW xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

F1 /r

VPSLLW xmm1, xmm2, imm8

C4

RXB.01

X.dest.0.01

71 /6 ib

VPSLLW ymm1, ymm2, xmm3/mem128

C4

RXB.01

X.src1.1.01

F1 /r

VPSLLW ymm1, ymm2, imm8

C4

RXB.01

X.dest.1.01

71 /6 ib

Related Instructions
(V)PSLLD, (V)PSLLDQ, (V)PSLLQ, (V)PSRAD, (V)PSRAW, (V)PSRLD, (V)PSRLDQ,
(V)PSRLQ, (V)PSRLW, VPSLLVD, VPSLLVQ, VPSRAVD, VPSRLVD, VPSRLVQ
rFLAGS Affected
None
MXCSR Flags Affected
None

Instruction Reference

PSLLW, VPSLLW

461

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S

S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
A
A
A
S

Alignment check, #AC
A
Page fault, #PF
X — AVX, AVX2, and SSE exception
A — AVX and AVX2 exception
S — SSE exception

462

A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
When alignment checking enabled:
• 128-bit memory operand not 16-byte aligned.
• 256-bit memory operand not 32-byte aligned.
Instruction execution caused a page fault.

PSLLW, VPSLLW

Instruction Reference

26568—Rev. 3.22—May 2018

PSRAD
VPSRAD

AMD64 Technology

Packed Shift Right Arithmetic
Doublewords

Right-shifts each packed 32-bit value in the source operand as specified by a shift-count operand and
writes the shifted values to the destination.
The shift-count operand can be an immediate byte, a second register, or a memory location. The shift
count is treated as an unsigned integer. When the shift count is provided by a register or memory location, only bits [63:0] of the value are considered.
High-order bits emptied by shifting are filled with the sign bit of the initial value. When the shift
value is greater than 31, each doubleword of the destination is filled with the sign bit of its initial
value.
There are legacy and extended forms of the instruction:
PSRAD

There are two forms of the instruction, based on the type of count operand.
The first source operand is an XMM register. The shift count is specified by either a second XMM
register or a 128-bit memory location, or by an immediate 8-bit operand. The first source XMM register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are
not affected.
VPSRAD

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

There are two 128-bit encodings. These differ based on the type of count operand.
The first source operand is an XMM register. The shift count is specified by either a second XMM
register or a 128-bit memory location, or by an immediate 8-bit operand. The destination is an XMM
register. For the immediate operand encoding, the destination is specified by VEX.vvvv. Bits
[255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

There are two 256-bit encodings. These differ based on the type of count operand.
The first source operand is a YMM register. The shift count is specified by either a second XMM register or a 128-bit memory location, or by an immediate 8-bit operand. The destination is a YMM register. For the immediate operand encoding, the destination is specified by VEX.vvvv.
Instruction Support
Form

Subset

Feature Flag

PSRAD

SSE2

VPSRAD 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

VPSRAD 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

Instruction Reference

PSRAD, VPSRAD

463

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Encoding
Mnemonic
PSRAD xmm1, xmm2/mem128
PSRAD xmm, imm8

Opcode

Description

66 0F E2 /r

Right-shifts packed doublewords in xmm1 as specified
by xmm2[63:0] or mem128[63:0].

66 0F 72 /4 ib

Right-shifts packed doublewords in xmm as specified
by imm8.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPSRAD xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

E2 /r

VPSRAD xmm1, xmm2, imm8

C4

RXB.01

X.dest.0.01

72 /4 ib

VPSRAD ymm1, ymm2, xmm3/mem128

C4

RXB.01

X.src1.1.01

E2 /r

VPSRAD ymm1, ymm2, imm8

C4

RXB.01

X.dest.1.01

72 /4 ib

Related Instructions
(V)PSLLD, (V)PSLLDQ, (V)PSLLQ, (V)PSLLW, (V)PSRAW, (V)PSRLD, (V)PSRLDQ,
(V)PSRLQ, (V)PSRLW, VPSLLVD, VPSLLVQ, VPSRAVD, VPSRLVD, VPSRLVQ
rFLAGS Affected
None
MXCSR Flags Affected
None

464

PSRAD, VPSRAD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S

S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
A
A
A
S

Alignment check, #AC
A
Page fault, #PF
X — AVX, AVX2, and SSE exception
A — AVX and AVX2 exception
S — SSE exception

Instruction Reference

A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
When alignment checking enabled:
• 128-bit memory operand not 16-byte aligned.
• 256-bit memory operand not 32-byte aligned.
Instruction execution caused a page fault.

PSRAD, VPSRAD

465

AMD64 Technology

26568—Rev. 3.22—May 2018

PSRAW
VPSRAW

Packed Shift Right Arithmetic
Words

Right-shifts each packed 16-bit value in the source operand as specified by a shift-count operand and
writes the shifted values to the destination.
The shift-count operand can be an immediate byte, a second register, or a memory location. The shift
count is treated as an unsigned integer. When the shift count is provided by a register or memory location, only bits [63:0] of the value are considered.
High-order bits emptied by shifting are filled with the sign bit of the initial value. When the shift
value is greater than 16, each doubleword of the destination is filled with the sign bit of its initial
value.
There are legacy and extended forms of the instruction:
PSRAW

There are two forms of the instruction, based on the type of count operand.
The first source operand is an XMM register. The shift count is specified by either a second XMM
register or a 128-bit memory location, or by an immediate 8-bit operand. The first source XMM register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are
not affected.
VPSRAW

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

There are two 128-bit encodings. These differ based on the type of count operand.
The first source operand is an XMM register. The shift count is specified by either a second XMM
register or a 128-bit memory location, or by an immediate 8-bit operand. The destination is an XMM
register. For the immediate operand encoding, the destination is specified by VEX.vvvv. Bits
[255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

There are two 256-bit encodings. These differ based on the type of count operand.
The first source operand is a YMM register. The shift count is specified by either a second XMM register or a 128-bit memory location, or by an immediate 8-bit operand. The destination is a YMM register. For the immediate operand encoding, the destination is specified by VEX.vvvv.
Instruction Support
Form

Subset

Feature Flag

PSRAW

SSE2

VPSRAW 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

VPSRAW 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

466

PSRAW, VPSRAW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic
PSRAW xmm1, xmm2/mem128
PSRAW xmm, imm8

Opcode

Description

66 0F E1 /r

Right-shifts packed words in xmm1 as specified by
xmm2[63:0] or mem128[63:0].

66 0F 71 /4 ib

Right-shifts packed words in xmm as specified by
imm8.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPSRAW xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

E1 /r

VPSRAW xmm1, xmm2, imm8

C4

RXB.01

X.dest.0.01

71 /4 ib

VPSRAW ymm1, ymm2, xmm3/mem128

C4

RXB.01

X.src1.1.01

E1 /r

VPSRAW ymm1, ymm2, imm8

C4

RXB.01

X.dest.1.01

71 /4 ib

Related Instructions
(V)PSLLD, (V)PSLLDQ, (V)PSLLQ, (V)PSLLW, (V)PSRAD, (V)PSRLD, (V)PSRLDQ,
(V)PSRLQ, (V)PSRLW, VPSLLVD, VPSLLVQ, VPSRAVD, VPSRLVD, VPSRLVQ
rFLAGS Affected
None
MXCSR Flags Affected
None

Instruction Reference

PSRAW, VPSRAW

467

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S

S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
A
A
A
S

Alignment check, #AC
A
Page fault, #PF
X — AVX, AVX2, and SSE exception
A — AVX and AVX2 exception
S — SSE exception

468

A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
When alignment checking enabled:
• 128-bit memory operand not 16-byte aligned.
• 256-bit memory operand not 32-byte aligned.
Instruction execution caused a page fault.

PSRAW, VPSRAW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PSRLD
VPSRLD

Packed Shift Right Logical
Doublewords

Right-shifts each packed 32-bit value in the source operand as specified by a shift-count operand and
writes the shifted values to the destination.
The shift-count operand can be an immediate byte, a second register, or a memory location. The shift
count is treated as an unsigned integer. When the shift count is provided by a register or memory location, only bits [63:0] of the value are considered.
High-order bits emptied by shifting are cleared. When the shift value is greater than 31, the destination is cleared.
There are legacy and extended forms of the instruction:
PSRLD
There are two forms of the instruction, based on the type of count operand.
The first source operand is an XMM register. The shift count is specified by either a second XMM
register or a 128-bit memory location, or by an immediate 8-bit operand. The first source XMM register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are
not affected.
VPSRLD
The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

There are two 128-bit encodings. These differ based on the type of count operand.
The first source operand is an XMM register. The shift count is specified by either a second XMM
register or a 128-bit memory location, or by an immediate 8-bit operand. The destination is an XMM
register. For the immediate operand encoding, the destination is specified by VEX.vvvv. Bits
[255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

There are two 256-bit encodings. These differ based on the type of count operand.
The first source operand is a YMM register. The shift count is specified by either a second XMM register or a 128-bit memory location, or by an immediate 8-bit operand. The destination is a YMM register. For the immediate operand encoding, the destination is specified by VEX.vvvv.
Instruction Support
Form

Subset

Feature Flag

PSRLD

SSE2

VPSRLD 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPSRLD 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

Instruction Reference

PSRLD, VPSRLD

469

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Encoding
Mnemonic
PSRLD xmm1, xmm2/mem128
PSRLD xmm, imm8

Opcode

Description

66 0F D2 /r

Right-shifts packed doublewords in xmm1 as specified
by xmm2[63:0] or mem128[63:0].

66 0F 72 /2 ib

Right-shifts packed doublewords in xmm as specified
by imm8.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPSRLD xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

D2 /r

VPSRLD xmm1, xmm2, imm8

C4

RXB.01

X.dest.0.01

72 /2 ib

VPSRLD ymm1, ymm2, xmm3/mem128

C4

RXB.01

X.src1.1.01

D2 /r

VPSRLD ymm1, ymm2, imm8

C4

RXB.01

X.dest.1.01

72 /2 ib

Related Instructions
(V)PSLLD, (V)PSLLDQ, (V)PSLLQ, (V)PSLLW, (V)PSRAD, (V)PSRAW, (V)PSRLDQ,
(V)PSRLQ, (V)PSRLW, VPSLLVD, VPSLLVQ, VPSRAVD, VPSRLVD, VPSRLVQ
rFLAGS Affected
None
MXCSR Flags Affected
None

470

PSRLD, VPSRLD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S

S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
A
A
A
S

Alignment check, #AC
A
Page fault, #PF
X — AVX, AVX2, and SSE exception
A — AVX and AVX2 exception
S — SSE exception

Instruction Reference

A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
When alignment checking enabled:
• 128-bit memory operand not 16-byte aligned.
• 256-bit memory operand not 32-byte aligned.
Instruction execution caused a page fault.

PSRLD, VPSRLD

471

AMD64 Technology

26568—Rev. 3.22—May 2018

PSRLDQ
VPSRLDQ

Packed Shift Right Logical
Double Quadword

Right-shifts one or each of two double quadword values in the source operand the number of bytes
specified by an immediate byte operand and writes the shifted values to the destination.
The immediate byte operand supplies an unsigned shift count. High-order bytes emptied by shifting
are cleared. When the shift value is greater than 15, the destination is cleared. For the 256-bit form of
the instruction, the shift count is applied to both the upper and the lower double quadword. Bytes
shifted out of the upper 128 bits are not shifted into the lower.
There are legacy and extended forms of the instruction:
PSRLDQ

The source XMM register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected.
VPSRLDQ

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The source operand is an XMM register. The destination is an XMM register specified by VEX.vvvv.
Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The source operand is a YMM register. The destination is a YMM register specified by VEX.vvvv.
Instruction Support
Form

Subset

Feature Flag

PSRLDQ

SSE2

VPSRLDQ 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

VPSRLDQ 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PSRLDQ xmm, imm8

Opcode
66 0F 73 /3 ib

Description
Right-shifts double quadword value in xmm1 as specified by
imm8.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPSRLDQ xmm1, xmm2, imm8

C4

RXB.01

X.dest.0.01

73 /3 ib

VPSRLDQ ymm1, ymm2, imm8

C4

RXB.01

X.dest.1.01

73 /3 ib

472

PSRLDQ, VPSRLDQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Related Instructions
(V)PSLLD, (V)PSLLDQ, (V)PSLLQ, (V)PSLLW, (V)PSRAD, (V)PSRAW, (V)PSRLD, (V)PSRLQ,
(V)PSRLW, VPSLLVD, VPSLLVQ, VPSRAVD, VPSRLVD, VPSRLVQ
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

Invalid opcode, #UD

X
X
Device not available, #NM
S
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

Instruction Reference

X
S
S
A
A
A
A
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.

PSRLDQ, VPSRLDQ

473

AMD64 Technology

26568—Rev. 3.22—May 2018

PSRLQ
VPSRLQ

Packed Shift Right Logical
Quadwords

Right-shifts each packed 64-bit value in the source operand as specified by a shift-count operand and
writes the shifted values to the destination.
The shift-count operand can be an immediate byte, a second register, or a memory location. The shift
count is treated as an unsigned integer. When the shift count is provided by a register or memory location, only bits [63:0] of the value are considered.
High-order bits emptied by shifting are cleared. When the shift value is greater than 63, the destination is cleared.
There are legacy and extended forms of the instruction:
PSRLQ

There are two forms of the instruction, based on the type of count operand.
The first source operand is an XMM register. The shift count is specified by either a second XMM
register or a 128-bit memory location, or by an immediate 8-bit operand. The first source XMM register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are
not affected.
VPSRLQ

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

There are two 128-bit encodings. These differ based on the type of count operand.
The first source operand is an XMM register. The shift count is specified by either a second XMM
register or a 128-bit memory location, or by an immediate 8-bit operand. The destination is an XMM
register. For the immediate operand encoding, the destination is specified by VEX.vvvv. Bits
[255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

There are two 256-bit encodings. These differ based on the type of count operand.
The first source operand is a YMM register. The shift count is specified by either a second XMM register or a 128-bit memory location, or by an immediate 8-bit operand. The destination is a YMM register. For the immediate operand encoding, the destination is specified by VEX.vvvv.
Instruction Support
Form

Subset

Feature Flag

PSRLQ

SSE2

VPSRLQ 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

VPSRLQ 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

474

PSRLQ, VPSRLQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic
PSRLQ xmm1, xmm2/mem128
PSRLQ xmm, imm8

Opcode
66 0F D3 /r
66 0F 73 /2 ib

Description
Right-shifts packed quadwords in xmm1 as specified
by xmm2[63:0] or mem128[63:0].
Right-shifts packed quadwords in xmm as specified by
imm8.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPSRLQ xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

D3 /r

VPSRLQ xmm1, xmm2, imm8

C4

RXB.01

X.dest.0.01

73 /2 ib

VPSRLQ ymm1, ymm2, xmm3/mem128

C4

RXB.01

X.src1.1.01

D3 /r

VPSRLQ ymm1, ymm2, imm8

C4

RXB.01

X.dest.1.01

73 /2 ib

Related Instructions
(V)PSLLD, (V)PSLLDQ, (V)PSLLQ, (V)PSLLW, (V)PSRAD, (V)PSRAW, (V)PSRLD,
(V)PSRLDQ, (V)PSRLW, VPSLLVD, VPSLLVQ, VPSRAVD, VPSRLVD, VPSRLVQ
rFLAGS Affected
None
MXCSR Flags Affected
None

Instruction Reference

PSRLQ, VPSRLQ

475

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S

S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
A
A
A
S

Alignment check, #AC
A
Page fault, #PF
X — AVX, AVX2, and SSE exception
A — AVX and AVX2 exception
S — SSE exception

476

A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
When alignment checking enabled:
• 128-bit memory operand not 16-byte aligned.
• 256-bit memory operand not 32-byte aligned.
Instruction execution caused a page fault.

PSRLQ, VPSRLQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PSRLW
VPSRLW

Packed Shift Right Logical
Words

Right-shifts each packed 16-bit value in the source operand as specified by a shift-count operand and
writes the shifted values to the destination.
The shift-count operand can be an immediate byte, a second register, or a memory location. The shift
count is treated as an unsigned integer. When the shift count is provided by a register or memory location, only bits [63:0] of the value are considered.
High-order bits emptied by shifting are cleared. When the shift value is greater than 15, the destination is cleared.
There are legacy and extended forms of the instruction:
PSRLW

There are two forms of the instruction, based on the type of count operand.
The first source operand is an XMM register. The shift count is specified by either a second XMM
register or a 128-bit memory location, or by an immediate 8-bit operand. The first source XMM register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are
not affected.
VPSRLW

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

There are two 128-bit encodings. These differ based on the type of count operand.
The first source operand is an XMM register. The shift count is specified by either a second XMM
register or a 128-bit memory location, or by an immediate 8-bit operand. The destination is an XMM
register. For the immediate operand encoding, the destination is specified by VEX.vvvv. Bits
[255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

There are two 256-bit encodings. These differ based on the type of count operand.
The first source operand is a YMM register. The shift count is specified by either a second XMM register or a 128-bit memory location, or by an immediate 8-bit operand. The destination is a YMM register. For the immediate operand encoding, the destination is specified by VEX.vvvv.
Instruction Support
Form

Subset

Feature Flag

PSRLW

SSE2

VPSRLW 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

VPSRLW 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

Instruction Reference

PSRLW, VPSRLW

477

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Encoding
Mnemonic
PSRLW xmm1, xmm2/mem128
PSRLW xmm, imm8

Opcode

Description

66 0F D1 /r

Right-shifts packed words in xmm1 as specified by
xmm2[63:0] or mem128[63:0].

66 0F 71 /2 ib

Right-shifts packed words in xmm as specified by
imm8.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPSRLW xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

D1 /r

VPSRLW xmm1, xmm2, imm8

C4

RXB.01

X.dest.0.01

71 /2 ib

VPSRLW ymm1, ymm2, xmm3/mem128

C4

RXB.01

X.src1.1.01

D1 /r

VPSRLW ymm1, ymm2, imm8

C4

RXB.01

X.dest.1.01

71 /2 ib

Related Instructions
(V)PSLLD, (V)PSLLDQ, (V)PSLLQ, (V)PSLLW, (V)PSRAD, (V)PSRAW, (V)PSRLD,
(V)PSRLDQ, (V)PSRLQ, VPSLLVD, VPSLLVQ, VPSRAVD, VPSRLVD, VPSRLVQ
rFLAGS Affected
None
MXCSR Flags Affected
None

478

PSRLW, VPSRLW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S

S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
A
A
A
S

Alignment check, #AC
A
Page fault, #PF
X — AVX, AVX2, and SSE exception
A — AVX and AVX2 exception
S — SSE exception

Instruction Reference

A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
When alignment checking enabled:
• 128-bit memory operand not 16-byte aligned.
• 256-bit memory operand not 32-byte aligned.
Instruction execution caused a page fault.

PSRLW, VPSRLW

479

AMD64 Technology

26568—Rev. 3.22—May 2018

PSUBB
VPSUBB

Packed Subtract
Bytes

Subtracts 16 or 32 packed 8-bit integer values in the second source operand from the corresponding
values in the first source operand and writes the integer differences to the corresponding bytes of the
destination.
This instruction operates on both signed and unsigned integers. When a result overflows, the carry is
ignored (neither the overflow nor carry bit in rFLAGS is set), and only the low-order 8 bits of each
result are written to the destination.
There are legacy and extended forms of the instruction:
PSUBB

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPSUBB

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PSUBB

SSE2

VPSUBB 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

VPSUBB 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PSUBB xmm1, xmm2/mem128

Opcode
66 0F F8 /r

Description
Subtracts 8-bit signed integer values in xmm2 or
mem128 from corresponding values in xmm1.
Writes differences to xmm1

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPSUBB xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

F8 /r

VPSUBB ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

F8 /r

480

PSUBB, VPSUBB

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Related Instructions
(V)PSUBD, (V)PSUBQ, (V)PSUBSB, (V)PSUBSW, (V)PSUBUSB, (V)PSUBUSW, (V)PSUBW
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

Instruction Reference

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PSUBB, VPSUBB

481

AMD64 Technology

26568—Rev. 3.22—May 2018

PSUBD
VPSUBD

Packed Subtract
Doublewords

Subtracts four or eight packed 32-bit integer values in the second source operand from the corresponding values in the first source operand and writes the integer differences to the corresponding
doubleword of the destination.
This instruction operates on both signed and unsigned integers. When a result overflows, the carry is
ignored (neither the overflow nor carry bit in rFLAGS is set), and only the low-order 8 bits of each
result are written to the destination.
There are legacy and extended forms of the instruction:
PSUBD

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VSUBD

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PSUBD

SSE2

VPSUBD 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

VPSUBD 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PSUBD xmm1, xmm2/mem128

Opcode
66 0F FA /r

Description
Subtracts packed 32-bit integer values in xmm2 or
mem128 from corresponding values in xmm1. Writes the
differences to xmm1

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPSUBD xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

FA /r

VPSUBD ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

FA /r

482

PSUBD, VPSUBD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Related Instructions
(V)PSUBB, (V)PSUBQ, (V)PSUBSB, (V)PSUBSW, (V)PSUBUSB, (V)PSUBUSW, (V)PSUBW
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

Instruction Reference

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PSUBD, VPSUBD

483

AMD64 Technology

26568—Rev. 3.22—May 2018

PSUBQ
VPSUBQ

Packed Subtract
Quadword

Subtracts two or four packed 64-bit integer values in the second source operand from the corresponding values in the first source operand and writes the differences to the corresponding quadword of the
destination.
This instruction operates on both signed and unsigned integers. When a result overflows, the carry is
ignored (neither the overflow nor carry bit in rFLAGS is set), and only the low-order 8 bits of each
result are written to the destination.
There are legacy and extended forms of the instruction:
PSUBQ

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VSUBQ

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PSUBQ

SSE2

VPSUBQ 128-bit

AVX

CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPSUBQ 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PSUBQ xmm1, xmm2/mem128

Opcode

Description

66 0F FB /r

Subtracts packed 64-bit integer values in xmm2 or
mem128 from corresponding values in xmm1. Writes the
differences to xmm1

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPSUBQ xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

FB /r

VPSUBQ ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

FB /r

484

PSUBQ, VPSUBQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Related Instructions
(V)PSUBB, (V)PSUBD, (V)PSUBSB, (V)PSUBSW, (V)PSUBUSB, (V)PSUBUSW, (V)PSUBW
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

Instruction Reference

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PSUBQ, VPSUBQ

485

AMD64 Technology

26568—Rev. 3.22—May 2018

PSUBSB
VPSUBSB

Packed Subtract Signed With Saturation
Bytes

Subtracts 16 or 32 packed 8-bit signed integer value in the second source operand from the corresponding values in the first source operand and writes the signed integer differences to the corresponding byte of the destination.
For each packed value in the destination, if the value is larger than the largest signed 8-bit integer, it is
saturated to 7Fh, and if the value is smaller than the smallest signed 8-bit integer, it is saturated to
80h.
There are legacy and extended forms of the instruction:
PSUBSB

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPSUBSB

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PSUBSB

SSE2

VPSUBSB 128-bit

AVX

CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPSUBSB 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PSUBSB xmm1, xmm2/mem128

Opcode

Description

66 0F E8 /r

Subtracts packed 8-bit signed integer values in xmm2 or
mem128 from corresponding values in xmm1. Writes the
differences to xmm1

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPSUBSB xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

E8 /r

VPSUBSB ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

E8 /r

486

PSUBSB, VPSUBSB

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Related Instructions
(V)PSUBB, (V)PSUBD, (V)PSUBQ, (V)PSUBSW, (V)PSUBUSB, (V)PSUBUSW, (V)PSUBW
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

Instruction Reference

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PSUBSB, VPSUBSB

487

AMD64 Technology

26568—Rev. 3.22—May 2018

PSUBSW
VPSUBSW

Packed Subtract Signed With Saturation
Words

Subtracts eight or sixteen packed 16-bit signed integer values in the second source operand from the
corresponding values in the first source operand and writes the signed integer differences to the corresponding word of the destination.
Positive differences greater than 7FFFh are saturated to 7FFFh; negative differences less than 8000h
are saturated to 8000h.
There are legacy and extended forms of the instruction:
PSUBSW

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPSUBSW

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PSUBSW

SSE2

VPSUBSW 128-bit

AVX

CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPSUBSW 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PSUBSW xmm1, xmm2/mem128

Opcode

Description

66 0F E9 /r

Subtracts packed 16-bit signed integer values in xmm2 or
mem128 from corresponding values in xmm1. Writes the
differences to xmm1

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPSUBSW xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

E9 /r

VPSUBSW ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

E9 /r

488

PSUBSW, VPSUBSW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Related Instructions
(V)PSUBB, (V)PSUBD, (V)PSUBQ, (V)PSUBSB, (V)PSUBUSB, (V)PSUBUSW, (V)PSUBW
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

Instruction Reference

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PSUBSW, VPSUBSW

489

AMD64 Technology

26568—Rev. 3.22—May 2018

PSUBUSB
VPSUBUSB

Packed Subtract Unsigned With Saturation
Bytes

Subtracts 16 or 32 packed 8-bit unsigned integer value in the second source operand from the corresponding values in the first source operand and writes the unsigned integer difference to the corresponding byte of the destination.
Differences less than 00h are saturated to 00h.
There are legacy and extended forms of the instruction:
PSUBUSB
The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPSUBUSB
The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PSUBUSB

SSE2

VPSUBUSB 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPSUBUSB 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PSUBUSB xmm1, xmm2/mem128

Opcode

Description

66 0F D8 /r

Subtracts packed byte unsigned integer values in
xmm2 or mem128 from corresponding values in xmm1.
Writes the differences to xmm1

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPSUBUSB xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

D8 /r

VPSUBUSB ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

D8 /r

490

PSUBUSB, VPSUBUSB

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Related Instructions
(V)PSUBB, (V)PSUBD, (V)PSUBQ, (V)PSUBSB, (V)PSUBSW, (V)PSUBUSW, (V)PSUBW
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

Instruction Reference

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PSUBUSB, VPSUBUSB

491

AMD64 Technology

26568—Rev. 3.22—May 2018

PSUBUSW
VPSUBUSW

Packed Subtract Unsigned With Saturation
Words

Subtracts eight or sixteen packed 16-bit unsigned integer value in the second source operand from the
corresponding values in the first source operand and writes the unsigned integer differences to the
corresponding word of the destination.
Differences less than 0000h are saturated to 0000h.
There are legacy and extended forms of the instruction:
PSUBUSW

The first source operand is an XMM register and the second source operand is an XMM register or
128-bit memory location. The first source operand is also the destination register. Bits [255:128] of
the YMM register that corresponds to the destination are not affected.
VPSUBUSW

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PSUBUSW

SSE2

VPSUBUSW 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

VPSUBUSW 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PSUBUSW xmm1, xmm2/mem128

Opcode
66 0F D9 /r

Description
Subtracts packed 16-bit unsigned integer values in
xmm2 or mem128 from corresponding values in
xmm1. Writes the differences to xmm1

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPSUBUSW xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

D9 /r

VPSUBUSW ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

D9 /r

492

PSUBUSW, VPSUBUSW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Related Instructions
(V)PSUBB, (V)PSUBD, (V)PSUBQ, (V)PSUBSB, (V)PSUBSW, (V)PSUBUSB, (V)PSUBW
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

Instruction Reference

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PSUBUSW, VPSUBUSW

493

AMD64 Technology

26568—Rev. 3.22—May 2018

PSUBW
VPSUBW

Packed Subtract
Words

Subtracts eight or sixteen packed 16-bit integer values in the second source operand from the corresponding values in the first source operand and writes the integer differences to the corresponding
word of the destination.
This instruction operates on both signed and unsigned integers. When a result overflows, the carry is
ignored (neither the overflow nor carry bit in rFLAGS is set), and only the low-order 8 bits of each
result are written to the destination.
There are legacy and extended forms of the instruction:
PSUBW

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VPSUBW

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PSUBW

SSE2

VPSUBW 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

VPSUBW 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PSUBW xmm1, xmm2/mem128

Opcode

Description

66 0F F9 /r

Subtracts packed 16-bit integer values in xmm2 or
mem128 from corresponding values in xmm1. Writes the
differences to xmm1

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPSUBW xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

F9 /r

VPSUBW ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

F9 /r

494

PSUBW, VPSUBW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Related Instructions
(V)PSUBB, (V)PSUBD, (V)PSUBQ, (V)PSUBSB, (V)PSUBSW, (V)PSUBUSB, (V)PSUBUSW
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

Instruction Reference

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PSUBW, VPSUBW

495

AMD64 Technology

26568—Rev. 3.22—May 2018

PTEST
VPTEST

Packed Bit Test

First, performs a bitwise AND of the first source operand with the second source operand.
Sets rFLAGS.ZF when all bit operations = 0; else, clears ZF.
Second. performs a bitwise AND of the second source operand with the logical complement (NOT)
of the first source operand. Sets rFLAGS.CF when all bit operations = 0; else, clears CF.
Neither source operand is modified.
There are legacy and extended forms of the instruction:
PTEST

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location.
VPTEST

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location.
YMM Encoding

The first source operand is a YMM register. The second source operand is a YMM register or 256-bit
memory location.
Instruction Support
Form

Subset

PTEST

SSE4.1

VPTEST

AVX

Feature Flag
CPUID Fn0000_0001_ECX[SSE41] (bit 19)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PTEST xmm1, xmm2/mem128

Opcode

Description

66 0F 38 17 /r Set ZF if bitwise AND of xmm2/m128 with xmm1 = 0;
else, clear ZF.
Set CF if bitwise AND of xmm2/m128 with NOTxmm1 = 0;
else, clear CF.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPTEST xmm1, xmm2/mem128

C4

RXB.00010

X.1111.0.01

17 /r

VPTEST ymm1, ymm2/mem256

C4

RXB.00010

X.1111.1.01

17 /r

Related Instructions
VTESTPD, VTESTPS
496

PTEST, VPTEST

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

rFLAGS Affected
ID

VIP

VIF

AC

VM

RF

NT

IOPL

OF

DF

IF

TF

0
21
Note:

20

19

18

17

16

14

13:12

11

10

9

8

SF

ZF

AF

PF

CF

0

M

0

0

M

7

6

4

2

0

Bits 31:22, 15, 5, 3 and 1 are reserved. A flag set or cleared is M (modified). Unaffected flags are blank. Undefined
flags are U.

MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

S

S

A
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

X
S
S
A
A
A
A
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

PTEST, VPTEST

497

AMD64 Technology

PUNPCKHBW
VPUNPCKHBW

26568—Rev. 3.22—May 2018

Unpack and Interleave
High Bytes

Unpacks the 8 high-order bytes of each octword the first and second source operands and interleaves
the bytes as they are copied to the destination. The low-order bytes of each octword of the source
operands are ignored.
Bytes are interleaved in ascending order from the least-significant byte of the upper 8 bytes of each
octword of the source operands with bytes from the first source operand occupying the lower byte of
each pair copied to the destination.
For the 128-bit form of the instruction, the following operations are performed:
dest[7:0] = src1[71:64]
dest[15:8] = src2[71:64]
dest[23:16] = src1[79:72]
dest[31:24] = src2[79:72]
dest[39:32] = src1[87:80]
dest[47:40] = src2[87:80]
dest[55:48] = src1[95:88]
dest[63:56] = src2[95:88]
dest[71:64] = src1[103:96]
dest[79:72] = src2[103:96]
dest[87:80] = src1[111:104]
dest[95:88] = src2[111:104]
dest[103:96] = src1[119:112]
dest[111:104] = src2[119:112]
dest[119:112] = src1[127:120]
dest[127:120] = src2[127:120]

Additionally, for the 256-bit form of the instruction, the following operations are performed:
dest[135:128] = src1[199:192]
dest[143:136] = src2[199:192]
dest[151:144] = src1[207:200]
dest[159:152] = src2[207:200]
dest[167:160] = src1[215:208]
dest[175:168] = src2[215:208]
dest[183:176] = src1[223:216]
dest[191:184] = src2[223:216]
dest[199:192] = src1[231:224]
dest[207:200] = src2[231:224]
dest[215:208] = src1[239:232]
dest[223:216] = src2[239:232]
dest[231:224] = src1[247:240]
dest[239:232] = src2[247:240]
dest[247:240] = src1[255:248]
dest[255:248] = src2[255:248]

When the second source operand is all 0s, the destination effectively contains the 8 high-order bytes
from the first source operand or the 8 high-order bytes from both octwords of the first source operand
zero-extended to 16 bits. This operation is useful for expanding unsigned 8-bit values to unsigned
16-bit operands for subsequent processing that requires higher precision.
498

PUNPCKHBW, VPUNPCKHBW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

There are legacy and extended forms of the instruction:
PUNPCKHBW

The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The first source operand is also the destination register. Bits
[255:128] of the YMM register that corresponds to the destination are not affected.
VPUNPCKHBW

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PUNPCKHBW

SSE2

VPUNPCKHBW 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPUNPCKHBW 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

PUNPCKHBW xmm1, xmm2/mem128

66 0F 68 /r

Description
Unpacks and interleaves the high-order bytes of
xmm1 and xmm2 or mem128. Writes the bytes to
xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPUNPCKHBW xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

68 /r

VPUNPCKHBW ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

68 /r

Related Instructions
(V)PUNPCKHDQ, (V)PUNPCKHQDQ, (V)PUNPCKHWD, (V)PUNPCKLBW, (V)PUNPCKLDQ,
(V)PUNPCKLQDQ, (V)PUNPCKLWD
rFLAGS Affected
None

Instruction Reference

PUNPCKHBW, VPUNPCKHBW

499

AMD64 Technology

26568—Rev. 3.22—May 2018

MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

500

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PUNPCKHBW, VPUNPCKHBW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PUNPCKHDQ
VPUNPCKHDQ

Unpack and Interleave
High Doublewords

Unpacks the two high-order doublewords of each octword of the first and second source operands and
interleaves the doublewords as they are copied to the destination. The low-order doublewords of each
octword of the source operands are ignored.
Doublewords are interleaved in ascending order from the least-significant doubleword of the high
quadword of each octword with doublewords from the first source operand occupying the lower doubleword of each pair copied to the destination.
For the 128-bit form of the instruction, the following operations are performed:
dest[31:0] = src1[95:64]
dest[63:32] = src2[95:64]
dest[95:64] = src1[127:96]
dest[127:96] = src2[127:96]

Additionally, for the 256-bit form of the instruction, the following operations are performed:
dest[159:128] = src1[223:192]
dest[191:160] = src2[223:192]
dest[223:192] = src1[255:224]
dest[255:224] = src2[255:224]

When the second source operand is all 0s, the destination effectively receives the 2 high-order doublewords from the first source operand or the 2 high-order doublewords from both octwords of the
first source operand zero-extended to 64 bits. This operation is useful for expanding unsigned 32-bit
values to unsigned 64-bit operands for subsequent processing that requires higher precision.
There are legacy and extended forms of the instruction:
PUNPCKHDQ

The first source operand is an XMM register and the second source operand is an XMM register or
128-bit memory location. The first source operand is also the destination register. Bits [255:128] of
the YMM register that corresponds to the destination are not affected.
VPUNPCKHDQ

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.

Instruction Reference

PUNPCKHDQ, VPUNPCKHDQ

501

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Support
Form

Subset

Feature Flag

PUNPCKHDQ

SSE2

VPUNPCKHDQ 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

VPUNPCKHDQ 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PUNPCKHDQ xmm1, xmm2/mem128

Opcode

Description

66 0F 6A /r

Unpacks and interleaves the high-order doublewords
of xmm1 and xmm2 or mem128. Writes the
doublewords to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPUNPCKHDQ xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

6A /r

VPUNPCKHDQ ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

6A /r

Related Instructions
(V)PUNPCKHBW, (V)PUNPCKHQDQ, (V)PUNPCKHWD, (V)PUNPCKLBW, (V)PUNPCKLDQ,
(V)PUNPCKLQDQ, (V)PUNPCKLWD
rFLAGS Affected
None
MXCSR Flags Affected
None

502

PUNPCKHDQ, VPUNPCKHDQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

Instruction Reference

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PUNPCKHDQ, VPUNPCKHDQ

503

AMD64 Technology

26568—Rev. 3.22—May 2018

PUNPCKHQDQ
VPUNPCKHQDQ

Unpack and Interleave
High Quadwords

Unpacks the high-order quadword of each octword of the first and second source operands and interleaves the quadwords as they are copied to the destination. The low-order quadword of each octword
of the source operands is ignored.
Quadwords are interleaved in ascending order with the high-order quadword from the first source
operand or each octword of the first source operand occupying the lower quadword of corresponding
octword of the destination.
For the 128-bit form of the instruction, the following operations are performed:
dest[63:0] = src1[127:64]
dest[127:64] = src2[127:64]

Additionally, for the 256-bit form of the instruction, the following operations are performed:
dest[191:128] = src1[255:192]
dest[255:192] = src2[255:192]

When the second source operand is all 0s, the destination effectively receives the quadword from
upper half of the first source operand or the high-order quadwords from each octword of the first
source operand zero-extended to 128 bits. This operation is useful for expanding unsigned 64-bit values to unsigned 128-bit operands for subsequent processing that requires higher precision.
There are legacy and extended forms of the instruction:
PUNPCKHQDQ

The first source operand is an XMM register and the second source operand is an XMM register or
128-bit memory location. The first source operand is also the destination register. Bits [255:128] of
the YMM register that corresponds to the destination are not affected.
VPUNPCKHQDQ

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

PUNPCKHQDQ

SSE2

VPUNPCKHQDQ 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPUNPCKHQDQ 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

504

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)

PUNPCKHQDQ, VPUNPCKHQDQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

PUNPCKHQDQ xmm1, xmm2/mem128

Description

66 0F 6D /r

Unpacks and interleaves the high-order
quadwords of xmm1 and xmm2 or mem128.
Writes the bytes to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPUNPCKHQDQ xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

6D /r

VPUNPCKHQDQ ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

6D /r

Related Instructions
(V)PUNPCKHBW, (V)PUNPCKHDQ, (V)PUNPCKHWD, (V)PUNPCKLBW, (V)PUNPCKLDQ,
(V)PUNPCKLQDQ, (V)PUNPCKLWD
rFLAGS Affected
None
MXCSR Flags Affected
None

Instruction Reference

PUNPCKHQDQ, VPUNPCKHQDQ

505

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

506

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PUNPCKHQDQ, VPUNPCKHQDQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PUNPCKHWD
VPUNPCKHWD

Unpack and Interleave
High Words

Unpacks the 4 high-order words of each octword of the first and second source operands and interleaves the words as they are copied to the destination. The low-order words of each octword of the
source operands are ignored.
Words are interleaved in ascending order from the least-significant word of the high quadword of
each octword with words from the first source operand occupying the lower word of each pair copied
to the destination.
For the 128-bit form of the instruction, the following operations are performed:
dest[15:0] = src1[79:64]
dest[31:16] = src2[79:64]
dest[47:32] = src1[95:80]
dest[63:48] = src2[95:80]
dest[79:64] = src1[111:96]
dest[95:80] = src2[111:96]
dest[111:96] = src1[127:112]
dest[127:112] = src2[127:112]

Additionally, for the 256-bit form of the instruction, the following operations are performed:
dest[143:128] = src1[207:192]
dest[159:144] = src2[207:192]
dest[175:160] = src1[223:208]
dest[191:176] = src2[223:208]
dest[207:192] = src1[239:224]
dest[223:208] = src2[239:224]
dest[239:224] = src1[255:240]
dest[255:240] = src2[255:240]

When the second source operand is all 0s, the destination effectively receives the 4 high-order words
from the first source operand or the 4 high-order words from both octwords of the first source operand zero-extended to 32 bits. This operation is useful for expanding unsigned 16-bit values to
unsigned 32-bit operands for subsequent processing that requires higher precision.
There are legacy and extended forms of the instruction:
PUNPCKHWD

The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The first source operand is also the destination register. Bits
[255:128] of the YMM register that corresponds to the destination are not affected.
VPUNPCKHWD

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.

Instruction Reference

PUNPCKHWD, VPUNPCKHWD

507

AMD64 Technology

26568—Rev. 3.22—May 2018

YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

PUNPCKHWD

SSE2

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)

VPUNPCKHWD 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPUNPCKHWD 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

PUNPCKHWD xmm1, xmm2/mem128

66 0F 69 /r

Description
Unpacks and interleaves the high-order words of
xmm1 and xmm2 or mem128. Writes the words to
xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPUNPCKHWD xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

69 /r

VPUNPCKHWD ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

69 /r

Related Instructions
(V)PUNPCKHBW, (V)PUNPCKHDQ, (V)PUNPCKHQDQ, (V)PUNPCKLBW, (V)PUNPCKLDQ,
(V)PUNPCKLQDQ, (V)PUNPCKLWD
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

Invalid opcode, #UD

508

X
S
S
A
A
A
A
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.

PUNPCKHWD, VPUNPCKHWD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception
Device not available, #NM
Stack, #SS
General protection, #GP

Mode
Real Virt Prot
S
S
S

S
S
S

X
X
X
X

S

S

S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

Instruction Reference

X

Cause of Exception
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PUNPCKHWD, VPUNPCKHWD

509

AMD64 Technology

PUNPCKLBW
VPUNPCKLBW

26568—Rev. 3.22—May 2018

Unpack and Interleave
Low Bytes

Unpacks the 8 low-order bytes of each octword of the first and second source operands and interleaves the bytes as they are copied to the destination. The high-order bytes of each octword are
ignored.
Bytes are interleaved in ascending order from the least-significant byte of source operands with bytes
from the first source operand occupying the lower byte of each pair copied to the destination.
For the 128-bit form of the instruction, the following operations are performed:
dest[7:0] = src1[7:0]
dest[15:8] = src2[7:0]
dest[23:16] = src1[15:8]
dest[31:24] = src2[15:8]
dest[39:32] = src1[23:16]
dest[47:40] = src2[23:16]
dest[55:48] = src1[31:24]
dest[63:56] = src2[31:24]
dest[71:64] = src1[39:32]
dest[79:72] = src2[39:32]
dest[87:80] = src1[47:40]
dest[95:88] = src2[47:40]
dest[103:96] = src1[55:48]
dest[111:104] = src2[55:48]
dest[119:112] = src1[63:56]
dest[127:120] = src2[63:56]

Additionally, for the 256-bit form of the instruction, the following operations are performed:
dest[135:128] = src1[135:128]
dest[143:136] = src2[135:128]
dest[151:144] = src1[143:136]
dest[159:152] = src2[143:136]
dest[167:160] = src1[151:144]
dest[175:168] = src2[151:144]
dest[183:176] = src1[159:152]
dest[191:184] = src2[159:152]
dest[199:192] = src1[167:160]
dest[207:200] = src2[167:160]
dest[215:208] = src1[175:168]
dest[223:216] = src2[175:168]
dest[231:224] = src1[183:176]
dest[239:232] = src2[183:176]
dest[247:240] = src1[191:184]
dest[255:248] = src2[191:184]

When the second source operand is all 0s, the destination effectively receives the eight low-order
bytes from the first source operand or the eight low-order bytes from both octwords of the first source
operand zero-extended to 16 bits. This operation is useful for expanding unsigned 8-bit values to
unsigned 16-bit operands for subsequent processing that requires higher precision.

510

PUNPCKLBW, VPUNPCKLBW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

There are legacy and extended forms of the instruction:
PUNPCKLBW

The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The first source operand is also the destination register. Bits
[255:128] of the YMM register that corresponds to the destination are not affected.
VPUNPCKLBW

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PUNPCKLBW

SSE2

VPUNPCKLBW 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPUNPCKLBW 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

CPUID Fn0000_0001_EDX[SSE2] (bit 26)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

PUNPCKLBW xmm1, xmm2/mem128

Description

66 0F 60 /r

Unpacks and interleaves the low-order bytes of
xmm1 and xmm2 or mem128. Writes the bytes to
xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPUNPCKLBW xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

60 /r

VPUNPCKLBW ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

60 /r

Related Instructions
(V)PUNPCKHBW, (V)PUNPCKHDQ, (V)PUNPCKHQDQ, (V)PUNPCKHWD, (V)PUNPCKLDQ, (V)PUNPCKLQDQ, (V)PUNPCKLWD
rFLAGS Affected
None
MXCSR Flags Affected
None

Instruction Reference

PUNPCKLBW, VPUNPCKLBW

511

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

512

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PUNPCKLBW, VPUNPCKLBW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PUNPCKLDQ
VPUNPCKLDQ

Unpack and Interleave
Low Doublewords

Unpacks the two low-order doublewords of each octword of the first and second source operands and
interleaves the doublewords as they are copied to the destination. The high-order doublewords of
each octword of the source operands are ignored.
Doublewords are interleaved in ascending order from the least-significant doubleword of the sources
with doublewords from the first source operand occupying the lower doubleword of each pair copied
to the destination.
For the 128-bit form of the instruction, the following operations are performed:
dest[31:0] = src1[31:0]
dest[63:32] = src2[31:0]
dest[95:64] = src1[63:32]
dest[127:96] = src2[63:32]

Additionally, for the 256-bit form of the instruction, the following operations are performed:
dest[159:128] = src1[159:128]
dest[191:160] = src2[159:128]
dest[223:192] = src1[191:160]
dest[255:224] = src2[191:160]

When the second source operand is all 0s, the destination effectively receives the two low-order doublewords from the first source operand or the two low-order doublewords from both octwords of the
source operand zero-extended to 64 bits. This operation is useful for expanding unsigned 32-bit values to unsigned 64-bit operands for subsequent processing that requires higher precision.
There are legacy and extended forms of the instruction:
PUNPCKLDQ

The first source operand is an XMM register and the second source operand is an XMM register or
128-bit memory location. The first source operand is also the destination register. Bits [255:128] of
the YMM register that corresponds to the destination are not affected.
VPUNPCKLDQ

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.

Instruction Reference

PUNPCKLDQ, VPUNPCKLDQ

513

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Support
Form

Subset

Feature Flag

PUNPCKLDQ

SSE2

VPUNPCKLDQ 128-bit

AVX

CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPUNPCKLDQ 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PUNPCKLDQ xmm1, xmm2/mem128

Opcode

Description

66 0F 62 /r

Unpacks and interleaves the low-order doublewords
of xmm1 and xmm2 or mem128. Writes the
doublewords to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPUNPCKLDQ xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

62 /r

VPUNPCKLDQ ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

62 /r

Related Instructions
(V)PUNPCKHW, (V)PUNPCKHDQ, (V)PUNPCKHQDQ, (V)PUNPCKHWD, (V)PUNPCKLBW,
(V)PUNPCKLQDQ, (V)PUNPCKLWD
rFLAGS Affected
None
MXCSR Flags Affected
None

514

PUNPCKLDQ, VPUNPCKLDQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

Instruction Reference

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PUNPCKLDQ, VPUNPCKLDQ

515

AMD64 Technology

26568—Rev. 3.22—May 2018

PUNPCKLQDQ
VPUNPCKLQDQ

Unpack and Interleave
Low Quadwords

Unpacks the low-order quadword of each octword of the first and second source operands and interleaves the quadwords as they are copied to the destination. The high-order quadword of each octword
of the source operands is ignored.
Quadwords are interleaved in ascending order from the least-significant quadword of the sources with
quadwords from the first source operand occupying the lower quadword of each pair copied to the
destination.
For the 128-bit form of the instruction, the following operations are performed:
dest[63:0] = src1[63:0]
dest[127:64] = src2[63:0]

Additionally, for the 256-bit form of the instruction, the following operations are performed:
dest[191:128] = src1[191:128]
dest[255:192] = src2[191:128]

When the second source operand is all 0s, the destination effectively receives the low-order quadword
from the first source operand or the low-order quadword of both octwords of the first source operand
zero-extended to 128 bits. This operation is useful for expanding unsigned 64-bit values to unsigned
128-bit operands for subsequent processing that requires higher precision.
There are legacy and extended forms of the instruction:
PUNPCKLQDQ

The first source operand is an XMM register and the second source operand is an XMM register or
128-bit memory location. The first source operand is also the destination register. Bits [255:128] of
the YMM register that corresponds to the destination are not affected.
VPUNPCKLQDQ

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

PUNPCKLQDQ

SSE2

VPUNPCKLQDQ 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPUNPCKLQDQ 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

516

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)

PUNPCKLQDQ, VPUNPCKLQDQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

PUNPCKLQDQ xmm1, xmm2/mem128

Description

66 0F 6C /r

Unpacks and interleaves the low-order
quadwords of xmm1 and xmm2 or mem128.
Writes the bytes to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPUNPCKLQDQ xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

6C /r

VPUNPCKLQDQ ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

6C /r

Related Instructions
(V)PUNPCKHBW, (V)PUNPCKHDQ, (V)PUNPCKHQDQ, (V)PUNPCKHWD, (V)PUNPCKLBW, (V)PUNPCKLDQ, (V)PUNPCKLWD
rFLAGS Affected
None
MXCSR Flags Affected
None

Instruction Reference

PUNPCKLQDQ, VPUNPCKLQDQ

517

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

518

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PUNPCKLQDQ, VPUNPCKLQDQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

PUNPCKLWD
VPUNPCKLWD

Unpack and Interleave
Low Words

Unpacks the four low-order words of each octword of the first and second source operands and interleaves the words as they are copied to the destination. The high-order words of each octword of the
source operands are ignored.
Words are interleaved in ascending order from the least-significant word of the source operands with
words from the first source operand occupying the lower word of each pair copied to the destination.
For the 128-bit form of the instruction, the following operations are performed:
dest[15:0] = src1[15:0]
dest[31:16] = src2[15:0]
dest[47:32] = src1[31:16]
dest[63:48] = src2[31:16]
dest[79:64] = src1[47:32]
dest[95:80] = src2[47:32]
dest[111:96] = src1[63:48]
dest[127:112] = src2[63:48]

Additionally, for the 256-bit form of the instruction, the following operations are performed:
dest[143:128] = src1[143:128]
dest[159:144] = src2[143:128]
dest[175:160] = src1[159:144]
dest[191:176] = src2[159:144]
dest[207:192] = src1[175:160]
dest[223:208] = src2[175:160]
dest[239:224] = src1[191:176]
dest[255:240] = src2[191:176]

When the second source operand is all 0s, the destination effectively receives the 4 low-order words
from the first source operand or the 4 low-order words of each octword of the first source operand
zero-extended to 32 bits. This operation is useful for expanding unsigned 16-bit values to unsigned
32-bit operands for subsequent processing that requires higher precision.
There are legacy and extended forms of the instruction:
PUNPCKLWD

The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The first source operand is also the destination register. Bits
[255:128] of the YMM register that corresponds to the destination are not affected.
PUNPCKLWD

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.

Instruction Reference

PUNPCKLWD, VPUNPCKLWD

519

AMD64 Technology

26568—Rev. 3.22—May 2018

YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

PUNPCKLWD

SSE2

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)

VPUNPCKLWD 128-bit

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPUNPCKLWD 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PUNPCKLWD xmm1, xmm2/mem128

Opcode

Description

66 0F 61 /r

Unpacks and interleaves the low-order words of
xmm1 and xmm2 or mem128. Writes the words to
xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPUNPCKLWD xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

61 /r

VPUNPCKLWD ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

61 /r

Related Instructions
(V)PUNPCKHBW, (V)PUNPCKHDQ, (V)PUNPCKHQDQ, (V)PUNPCKHWD, (V)PUNPCKLBW, (V)PUNPCKLDQ, (V)PUNPCKLQDQ
rFLAGS Affected
None
MXCSR Flags Affected
None

520

PUNPCKLWD, VPUNPCKLWD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

Instruction Reference

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PUNPCKLWD, VPUNPCKLWD

521

AMD64 Technology

26568—Rev. 3.22—May 2018

PXOR
VPXOR

Packed Exclusive OR

Performs a bitwise XOR of the first and second source operands and writes the result to the destination. When either of a pair of corresponding bits in the first and second operands are set, the corresponding bit of the destination is set; when both source bits are set or when both source bits are not
set, the destination bit is cleared.
There are legacy and extended forms of the instruction:
PXOR

The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The first source XMM register is also the destination. Bits
[255:128] of the YMM register that corresponds to the destination are not affected.
VPXOR

The extended form of the instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

PXOR

SSE2

VPXOR 128-bit

AVX

CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

VPXOR 256-bit

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
PXOR xmm1, xmm2/mem128

Opcode

Description

66 0F EF /r

Performs bitwise XOR of values in xmm1 and xmm2 or
mem128. Writes the result to xmm1

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPXOR xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

EF /r

VPXOR ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

EF /r

Related Instructions
(V)PAND, (V)PANDN, (V)POR

522

PXOR, VPXOR

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

Instruction Reference

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

PXOR, VPXOR

523

AMD64 Technology

26568—Rev. 3.22—May 2018

RCPPS
VRCPPS

Reciprocal
Packed Single-Precision Floating-Point

Computes the approximate reciprocal of each packed single-precision floating-point value in the
source operand and writes the results to the corresponding doubleword of the destination.
MXCSR.RC as no effect on the result.
The maximum error is less than or equal to 1.5 * 2–12 times the true reciprocal. A source value that is
±zero or denormal returns an infinity of the source value sign. Results that underflow are changed to
signed zero. For both SNaN and QNaN source operands, a QNaN is returned.
There are legacy and extended forms of the instruction:
RCPPS

Computes four reciprocals. The first source operand is an XMM register. The second source operand
is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected.
VRCPPS

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Computes four reciprocals. The source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the
destination are cleared.
YMM Encoding

Computes eight reciprocals. The source operand is either a YMM register or a 256-bit memory location. The destination is a YMM register.
Instruction Support
Form

Subset

RCPPS

SSE2

VRCPPS

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

RCPPS xmm1, xmm2/mem128

0F 53 /r

Description
Computes reciprocals of packed single-precision floatingpoint values in xmm1 or mem128. Writes result to xmm1

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VRCPPS xmm1, xmm2/mem128

C4

RXB.01

X.1111.0.00

53 /r

VRCPPS ymm1, ymm2/mem256

C4

RXB.01

X.1111.1.00

53 /r

524

RCPPS, VRCPPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Related Instructions
(V)RCPSS, (V)RSQRTPS, (V)RSQRTSS
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

S

S

A
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

X
S
S
A
A
A
A
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

RCPPS, VRCPPS

525

AMD64 Technology

26568—Rev. 3.22—May 2018

RCPSS
VRCPSS

Reciprocal
Scalar Single-Precision Floating-Point

Computes the approximate reciprocal of the scalar single-precision floating-point value in a source
operand and writes the results to the low-order doubleword of the destination. MXCSR.RC as no
effect on the result.
The maximum error is less than or equal to 1.5 * 2–12 times the true reciprocal. A source value that is
±zero or denormal returns an infinity of the source value sign. Results that underflow are changed to
signed zero. For both SNaN and QNaN source operands, a QNaN is returned.
There are legacy and extended forms of the instruction:
RCPSS

The source operand is either an XMM register or a 32-bit memory location. The destination is an
XMM register. Bits [127:32] of the destination are not affected. Bits [255:128] of the YMM register
that corresponds to the destination are not affected.
VRCPSS

The extended form of the instruction has a 128-bit encoding only.
The first source operand and the destination are XMM registers. The second source operand is either
an XMM register or a 32-bit memory location. Bits [31:0] of the destination contain the reciprocal;
bits [127:32] of the destination are copied from the first source register. Bits [255:128] of the YMM
register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

Feature Flag

RCPSS

SSE1

CPUID Fn0000_0001_EDX[SSE] (bit 25)

VRCPSS

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
RCPSS xmm1, xmm2/mem32

Opcode

Description

F3 0F 53 /r

Computes reciprocal of scalar single-precision floating-point
value in xmm1 or mem32. Writes the result to xmm1.

Mnemonic

Encoding
VEX RXB.map_select

VRCPSS xmm1, xmm2, xmm3/mem128

C4

RXB.01

W.vvvv.L.pp

Opcode

X.src1.X.10

53 /r

Related Instructions
(V)RCPPS, (V)RSQRTPS, (V)RSQRTSS
rFLAGS Affected
None

526

RCPSS, VRCPSS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S
S
S

X
S
S
A
A
A
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

RCPSS, VRCPSS

527

AMD64 Technology

26568—Rev. 3.22—May 2018

ROUNDPD
VROUNDPD

Round
Packed Double-Precision Floating-Point

Rounds two or four double-precision floating-point values as specified by an immediate byte operand. Source values are rounded to integral values and written to the destination as double-precision
floating-point values.
SNaN source values are converted to QNaN. When DAZ =1, denormals are converted to zero before
rounding.
The immediate byte operand is defined as follows.
7

4

3

2

Reserved

P

O

1

0

RC

Bits

Mnemonic

Description

[7:4]

—

Reserved

[3]

P

Precision Exception

[2]

O

Rounding Control Source

[1:0]

RC

Rounding Control

Precision exception definitions:
Value

Description

0

Normal PE exception

1

PE field is not updated.
No precision exception is taken when unmasked.

Rounding control source definitions:
Value

Description

0

Use RC from immediate operand

1

Use RC from MXCSR

Rounding control definition:
Value

Description

00

Nearest

01

Downward (toward negative infinity)

10

Upward (toward positive infinity)

11

Truncated

There are legacy and extended forms of the instruction:
ROUNDPD

Rounds two source values. The first source operand is an XMM register. The second source operand
is either an XMM register or a 128-bit memory location. There is a third 8-bit immediate operand.
The first source register is also the destination. Bits [255:128] of the YMM register that corresponds
to the destination are not affected.

528

ROUNDPD, VROUNDPD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

VROUNDPD

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Rounds two source values. The first source operand is an XMM register. The second source operand
is either an XMM register or a 128-bit memory location. There is a third 8-bit immediate operand.
The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the
destination are cleared.
YMM Encoding

Rounds four source values. The first source operand is a YMM register and the second source operand is either a YMM register or a 256-bit memory location. There is a third 8-bit immediate operand.
The destination is a third YMM register.
Instruction Support
Form

Subset

PCMPEQQ

SSE4.1

VPCMPEQQ

AVX

Feature Flag
CPUID Fn0000_0001_ECX[SSE41] (bit 19)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
ROUNDPD xmm1, xmm2/mem128,
imm8

Opcode

Description

66 0F 3A 09 /r ib

Rounds double-precision floating-point values
in xmm2 or mem128. Writes rounded doubleprecision values to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VROUNDPD xmm1, xmm2/mem128, imm8

C4

RXB.03

X.1111.0.01

09 /r ib

VROUNDPD ymm1, xmm2/mem256, imm8

C4

RXB.03

X.1111.1.01

09 /r ib

Related Instructions
(V)ROUNDPS, (V)ROUNDSD, (V)ROUNDSS
rFLAGS Affected
None
MXCSR Flags Affected
MM

FZ

17

15

RC

PM

UM

OM

ZM

DM

IM

DAZ

12

11

10

9

8

7

6

PE

UE

OE

ZE

DE

4

3

2

1

M
14

13

5

IE
M
0

Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank.

Instruction Reference

ROUNDPD, VROUNDPD

529

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S

X
X
X
S
X

S

S

S

S

A
X

S

S

X

S
S
S

S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

530

X
X
X

A source operand was an SNaN value.
Undefined operation.
A result could not be represented exactly in the destination format.

ROUNDPD, VROUNDPD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

ROUNDPS
VROUNDPS

Round
Packed Single-Precision Floating-Point

Rounds four or eight single-precision floating-point values as specified by an immediate byte operand. Source values are rounded to integral values and written to the destination as single-precision
floating-point values.
SNaN source values are converted to QNaN. When DAZ =1, denormals are converted to zero before
rounding.
The immediate byte operand is defined as follows.
7

4

3

2

Reserved

P

O

1

0

RC

Bits

Mnemonic

Description

[7:4]

—

Reserved

[3]

P

Precision Exception

[2]

O

Rounding Control Source

[1:0]

RC

Rounding Control

Precision exception definitions:
Value

Description

0

Normal PE exception

1

PE field is not updated.
No precision exception is taken when unmasked.

Rounding control source definitions:
Value

Description

0

Use RC from immediate operand

1

Use RC from MXCSR

Rounding control definition:
Value

Description

00

Nearest

01

Downward (toward negative infinity)

10

Upward (toward positive infinity)

11

Truncated

There are legacy and extended forms of the instruction:
ROUNDPS

Rounds four source values. The first source operand is an XMM register. The second source operand
is either an XMM register or a 128-bit memory location. There is a third 8-bit immediate operand.
The first source register is also the destination. Bits [255:128] of the YMM register that corresponds
to the destination are not affected.

Instruction Reference

ROUNDPS, VROUNDPS

531

AMD64 Technology

26568—Rev. 3.22—May 2018

VROUNDPS

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Rounds four source values. The first source operand is an XMM register. The second source operand
is either an XMM register or a 128-bit memory location. There is a third 8-bit immediate operand.
The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the
destination are cleared.
YMM Encoding

Rounds eight source values. The first source operand is a YMM register and the second source operand is either a YMM register or a 256-bit memory location. There is a third 8-bit immediate operand.
The destination is a third YMM register.
Instruction Support
Form

Subset

ROUNDPS

SSE4.1

VROUNDPS

AVX

Feature Flag
CPUID Fn0000_0001_ECX[SSE41] (bit 19)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

ROUNDPS xmm1, xmm2/mem128, imm8

Description

66 0F 3A 08 /r ib

Rounds single-precision floating-point
values in xmm2 or mem128. Writes
rounded single-precision values to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VROUNDPS xmm1, xmm2/mem128, imm8

C4

RXB.03

X.1111.0.01

08 /r ib

VROUNDPS ymm1, xmm2/mem256, imm8

C4

RXB.03

X.1111.1.01

08 /r ib

Related Instructions
(V)ROUNDPD, (V)ROUNDSD, (V)ROUNDSS
rFLAGS Affected
None
MXCSR Flags Affected
MM

FZ

17

15

RC

PM

UM

OM

ZM

DM

IM

DAZ

12

11

10

9

8

7

6

PE

UE

OE

ZE

DE

4

3

2

1

M
14

13

5

IE
M
0

Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank.

532

ROUNDPS, VROUNDPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S

X
X
X
S
X

S

S

S

S

A
X

S

S

X

S
S
S

S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

X
X
X

A source operand was an SNaN value.
Undefined operation.
A result could not be represented exactly in the destination format.

ROUNDPS, VROUNDPS

533

AMD64 Technology

26568—Rev. 3.22—May 2018

ROUNDSD
VROUNDSD

Round
Scalar Double-Precision

Rounds a scalar double-precision floating-point value as specified by an immediate byte operand.
Source values are rounded to integral values and written to the destination as double-precision floating-point values.
SNaN source values are converted to QNaN. When DAZ =1, denormals are converted to zero before
rounding.
The immediate byte operand is defined as follows.
7

4

3

2

Reserved

P

O

1

0

RC

Bits

Mnemonic

Description

[7:4]

—

Reserved

[3]

P

Precision Exception

[2]

O

Rounding Control Source

[1:0]

RC

Rounding Control

Precision exception definitions:
Value

Description

0

Normal PE exception

1

PE field is not updated.
No precision exception is taken when unmasked.

Rounding control source definitions:
Value

Description

0

Use RC from immediate operand

1

Use RC from MXCSR

Rounding control definition:
Value

Description

00

Nearest

01

Downward (toward negative infinity)

10

Upward (toward positive infinity)

11

Truncated

There are legacy and extended forms of the instruction:
ROUNDSD

The source operand is either an XMM register or a 64-bit memory location. When the source is an
XMM register, the value to be rounded must be in the low quadword. The destination is an XMM register. There is a third 8-bit immediate operand. Bits [127:64] of the destination are not affected. Bits
[255:128] of the YMM register that corresponds to destination XMM register are not affected.

534

ROUNDSD, VROUNDSD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

VROUNDSD

The extended form of the instruction has a 128-bit encoding only.
The first source operand is an XMM register. The second source operand is either an XMM register or
a 64-bit memory location. The destination is a third XMM register. There is a fourth 8-bit immediate
operand. Bits [127:64] of the destination are copied from the first source operand. Bits [255:128] of
the YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

ROUNDSD

SSE4.1

VROUNDSD

AVX

Feature Flag
CPUID Fn0000_0001_ECX[SSE41] (bit 19)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
ROUNDSD xmm1, xmm2/mem64, imm8

Opcode

Description

66 0F 3A 0B /r ib

Rounds a double-precision floating-point
value in xmm2[63:0] or mem64. Writes a
rounded double-precision value to xmm1.

Mnemonic

Encoding

VROUNDSD xmm1, xmm2, xmm3/mem64, imm8

VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.03

X.src1.X.01

0B /r ib

Related Instructions
(V)ROUNDPD, (V)ROUNDPS, (V)ROUNDSS
rFLAGS Affected
None
MXCSR Flags Affected
MM

FZ

RC

PM

UM

OM

ZM

DM

IM

DAZ

PE

UE

OE

ZE

DE

M
17

15

14

13

12

11

10

9

8

7

6

5

IE
M

4

3

2

1

0

Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank.

Instruction Reference

ROUNDSD, VROUNDSD

535

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S

S
S
S
S
S

X
X
X
X
X
X

S

S

X

S
S
S

S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

536

X
X
X

A source operand was an SNaN value.
Undefined operation.
A result could not be represented exactly in the destination format.

ROUNDSD, VROUNDSD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

ROUNDSS
VROUNDSS

Round
Scalar Single-Precision

Rounds a scalar single-precision floating-point value as specified by an immediate byte operand.
Source values are rounded to integral values and written to the destination as single-precision floating-point values.
SNaN source values are converted to QNaN. When DAZ =1, denormals are converted to zero before
rounding.
The immediate byte operand is defined as follows.
7

4

3

2

Reserved

P

O

1

0

RC

Bits

Mnemonic

Description

[7:4]

—

Reserved

[3]

P

Precision Exception

[2]

O

Rounding Control Source

[1:0]

RC

Rounding Control

Precision exception definitions:
Value

Description

0

Normal PE exception

1

PE field is not updated.
No precision exception is taken when unmasked.

Rounding control source definitions:
Value

Description

0

Use RC from immediate operand

1

Use RC from MXCSR

Rounding control definition:
Value

Description

00

Nearest

01

Downward (toward negative infinity)

10

Upward (toward positive infinity)

11

Truncated

There are legacy and extended forms of the instruction:
ROUNDSS

The source operand is either an XMM register or a 32-bit memory location. When the source is an
XMM register, the value to be rounded must be in the low doubleword. The destination is an XMM
register. There is a third 8-bit immediate operand. Bits [127:32] of the destination are not affected.
Bits [255:128] of the YMM register that corresponds to destination XMM register are not affected.

Instruction Reference

ROUNDSS, VROUNDSS

537

AMD64 Technology

26568—Rev. 3.22—May 2018

VROUNDSS

The extended form of the instruction has a 128-bit encoding only.
The first source operand is an XMM register. The second source operand is either an XMM register or
a 32-bit memory location. The destination is a third XMM register. There is a fourth 8-bit immediate
operand. Bits [127:32] of the destination are copied from the first source operand. Bits [255:128] of
the YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

ROUNDSS

SSE4.1

VROUNDSS

AVX

Feature Flag
CPUID Fn0000_0001_ECX[SSE41] (bit 19)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
ROUNDSS xmm1, xmm2/mem64, imm8

Opcode

Description

66 0F 3A 0A /r ib

Rounds a single-precision floating-point
value in xmm2[63:0] or mem64. Writes a
rounded single-precision value to xmm1.

Mnemonic

Encoding

VROUNDSS xmm1, xmm2, xmm3/mem64, imm8

VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.03

X.src1.X.01

0A /r ib

Related Instructions
(V)ROUNDPD, (V)ROUNDPS, (V)ROUNDSD
rFLAGS Affected
None
MXCSR Flags Affected
MM

FZ

RC

PM

UM

OM

ZM

DM

IM

DAZ

PE

UE

OE

ZE

DE

M
17

15

14

13

12

11

10

9

8

7

6

5

IE
M

4

3

2

1

0

Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank.

538

ROUNDSS, VROUNDSS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S

S
S
S
S
S

X
X
X
X
X
X

S

S

X

S
S
S

S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

X
X
X

A source operand was an SNaN value.
Undefined operation.
A result could not be represented exactly in the destination format.

ROUNDSS, VROUNDSS

539

AMD64 Technology

26568—Rev. 3.22—May 2018

RSQRTPS
VRSQRTPS

Reciprocal Square Root
Packed Single-Precision Floating-Point

Computes the approximate reciprocal of the square root of each packed single-precision floatingpoint value in the source operand and writes the results to the corresponding doublewords of the destination. MXCSR.RC has no effect on the result.
The maximum error is less than or equal to 1.5 * 2–12 times the true reciprocal square root. A source
value that is ±zero or denormal returns an infinity of the source value sign. Negative source values
other than –zero and –denormal return a QNaN floating-point indefinite value. For both SNaN and
QNaN source operands, a QNaN is returned.
There are legacy and extended forms of the instruction:
RSQRTPS

Computes four values. The first source operand is an XMM register. The second source operand is
either an XMM register or a 128-bit memory location. The first source register is also the destination.
Bits [255:128] of the YMM register that corresponds to the destination are not affected.
VRSQRTPS

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Computes four values. The destination is an XMM register. The source operand is either an XMM
register or a 128-bit memory location. Bits [255:128] of the YMM register that corresponds to the
destination are cleared.
YMM Encoding

Computes eight values. The destination is a YMM register. The source operand is either a YMM register or a 256-bit memory location.
Instruction Support
Form

Subset

Feature Flag

RSQRTPS

SSE1

CPUID Fn0000_0001_EDX[SSE] (bit 25)

VRSQRTPS

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

RSQRTPS xmm1, xmm2/mem128

0F 52 /r

Description
Computes reciprocals of square roots of packed singleprecision floating-point values in xmm1 or mem128.
Writes result to xmm1

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VRSQRTPS xmm1, xmm2/mem128

C4

RXB.01

X.1111.0.00

52 /r

VRSQRTPS ymm1, ymm2/mem256

C4

RXB.01

X.1111.1.00

52 /r

540

RSQRTPS, VRSQRTPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Related Instructions
(V)RSQRTSS, (V)SQRTPD, (V)SQRTPS, (V)SQRTSD, (V)SQRTSS
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

S

S

A
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

X
S
S
A
A
A
A
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

RSQRTPS, VRSQRTPS

541

AMD64 Technology

26568—Rev. 3.22—May 2018

RSQRTSS
VRSQRTSS

Reciprocal Square Root
Scalar Single-Precision Floating-Point

Computes the approximate reciprocal of the square root of the scalar single-precision floating-point
value in a source operand and writes the result to the low-order doubleword of the destination.
MXCSR.RC as no effect on the result.
The maximum error is less than or equal to 1.5 * 2–12 times the true reciprocal square root. A source
value that is ±zero or denormal returns an infinity of the source value’s sign. Negative source values
other than –zero and –denormal return a QNaN floating-point indefinite value. For both SNaN and
QNaN source operands, a QNaN is returned.
There are legacy and extended forms of the instruction:
RSQRTSS

The source operand is either an XMM register or a 32-bit memory location. The destination is an
XMM register. Bits [127:32] of the destination are not affected. Bits [255:128] of the YMM register
that corresponds to the destination are not affected.
VRSQRTSS

The extended form of the instruction has a 128-bit encoding only.
The first source operand and the destination are XMM registers. The second source operand is either
an XMM register or a 32-bit memory location. Bits [31:0] of the destination contain the reciprocal
square root of the single-precision floating-point value held in bits [31:0] of the second source operand; bits [127:32] of the destination are copied from the first source register. Bits [255:128] of the
YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

Feature Flag

RSQRTSS

SSE1

CPUID Fn0000_0001_EDX[SSE] (bit 25)

VRSQRTSS

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
RSQRTSS xmm1, xmm2/mem32

Opcode
F3 0F 52 /r

Description
Computes reciprocal of square root of a scalar singleprecision floating-point value in xmm1 or mem32. Writes
result to xmm1

Mnemonic

Encoding
VEX RXB.map_select

VRSQRTSS xmm1, xmm2, xmm3/mem128

C4

RXB.01

W.vvvv.L.pp

Opcode

X.src1.X.10

52 /r

Related Instructions
(V)RSQRTPS, (V)SQRTPD, (V)SQRTPS, (V)SQRTSD, (V)SQRTSS

542

RSQRTSS, VRSQRTSS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S
S
S

X
S
S
A
A
A
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

RSQRTSS, VRSQRTSS

543

AMD64 Technology

26568—Rev. 3.22—May 2018

SHA1RNDS4

Four Rounds of SHA1

Execute 4 rounds of a SHA1 operation using the 4 double words (A, B, C, D) from the first source
operand, and value E from the second operand. The lower two bits of the immediate are used to specify the function and constant appropriate for the current round of processing. The resulting (A, B, C,
D) is placed in the destination register which is the same as the first source register.
The following function is performed:
A SRC1[127:96];
B SRC1[95:64];
C SRC1[63:32];
D SRC1[31:0];
W0E SRC2[127:96];
W1 SRC2[95:64];
W2 SRC2[63:32];
W3 SRC2[31:0];
i=imm[1:0] which determines f_i and K_i
First Round operation:
A_1 f_ 0(B, C, D) + (A Rotate Left 5) +W0E +K_0;
B_1 A;
C_1 B Rotate Left 30;
D_1 C;
E_1 D;
FOR j = 1 to 3
{
A_(j +1) f_j(B_j, C_j, D_j) + (A_j Rotate Left 5) +Wj+ E_j +K_i;
B_(j+1) <- A_j;
C_(j +1) B_j Rotate Left 30;
D_(j +1) C_j;
E_(j +1) D_j;

}
DEST[127:96] A_4;
DEST[95:64] B_4;
DEST[63:32] C_4;
DEST[31:0] D_4;

Mnemonic

Opcode

SHA1RNDS4 xmm1, xmm2/m128, imm8

0F 3A CC /r ib

Description
Executes 4 Rounds of SHA1

Related Instructions
SHA1NEXTE, SHA1MSG1, SHA1MSG2

544

RSQRTSS, VRSQRTSS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exceptions
Invalid opcode, #UD

Real

Virtual Protected
8086

X

X

A

A

S

S

X

Cause of Exception
Instruction not supported by CPUID
AVX instructions are only recognized in protected
mode

S

CR0.EM=1 OR CR4.OSFXSR=0

A

CR4.OSXSAVE = 0, indicated by CPUID
Fn0000_0001_ECX[OSXSAVE]

A

XFEATURE_ENABLED_MASK[2:1] ! = 11b.

A

VEX.L = 1 when AVX2 not supported.

A

REX, F2, F3, or 66 prefix preceding VEX prefix.

S

S

X

Lock prefix (F0h) preceding opcode.

Device not available, #NM

S

S

X

CR0.TS = 1.

Stack, #SS

S

S

X

Memory address exceeding stack segment limit or
non-canonical.

General protection, #GP

S

S

X

Memory address exceeding data segment limit or
non-canonical.

X

Null data segment used to reference memory

S

Memory operand not 16-byte aligned when
alignment checking enabled and MXCSR.MM = 1.

A

Alignment checking enabled and 256-bit memory
operand not 32-byte aligned or 128-bit memory
operand not 16-byte aligned.

X

A page fault resulted from the execution of the
instruction

Alignment check, #AC

S

Page Fault, #PF

S

S

X - SSE, AVX, and AVX2 exception
A - AVX, AVX2 exception
S - SSE exception

Instruction Reference

RSQRTSS, VRSQRTSS

545

AMD64 Technology

26568—Rev. 3.22—May 2018

SHA1NEXTE

Calculate Next E SHA1

Calculate what the next E register values should be after 4 rounds of a SHA1 operation using the 4
double words from the second source operand, and value A from the first operand. The resulting E is
placed in the destination register which is the same as the first source register.
DEST[127:96] SRC2[127:96] + (SRC1[127:96] rotated left 30)
DEST[95:0] SRC2[95:0];

Mnemonic

Opcode

SHA1NEXTE xmm1,xmm2/m128

0F 38 C8 /r

Description
Calculate Next E of SHA1

Related Instructions
SHA1RNDS4, SHA1MSG1, SHA1MSG2
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exceptions
Invalid opcode, #UD

Real

Virtual Protected
8086

X

X

A

A

S

S

X

Cause of Exception
Instruction not supported by CPUID
AVX instructions are only recognized in protected
mode

S

CR0.EM=1 OR CR4.OSFXSR=0

A

CR4.OSXSAVE = 0, indicated by CPUID
Fn0000_0001_ECX[OSXSAVE]

A

XFEATURE_ENABLED_MASK[2:1] ! = 11b.

A

VEX.L = 1 when AVX2 not supported.

A

REX, F2, F3, or 66 prefix preceding VEX prefix.

S

S

X

Lock prefix (F0h) preceding opcode.

Device not available, #NM

S

S

X

CR0.TS = 1.

Stack, #SS

S

S

X

Memory address exceeding stack segment limit or
non-canonical.

General protection, #GP

S

S

X

Memory address exceeding data segment limit or
non-canonical.

X

Null data segment used to reference memory

546

RSQRTSS, VRSQRTSS

Instruction Reference

26568—Rev. 3.22—May 2018

Exceptions
Alignment check, #AC

Real
S

Page Fault, #PF

AMD64 Technology

Virtual Protected
8086
S

S

Cause of Exception

S

Memory operand not 16-byte aligned when
alignment checking enabled and MXCSR.MM = 1.

A

Alignment checking enabled and 256-bit memory
operand not 32-byte aligned or 128-bit memory
operand not 16-byte aligned.

X

A page fault resulted from the execution of the
instruction

X - SSE, AVX, and AVX2 exception
A - AVX, AVX2 exception
S - SSE exception

Instruction Reference

RSQRTSS, VRSQRTSS

547

AMD64 Technology

26568—Rev. 3.22—May 2018

SHA1MSG1

Message Intermediate 1

Performs the 1st of two intermediate calculations necessary before doing the next four rounds of the
SHA1 message.
DEST[127:96] SRC1[63:32] XOR
DEST[95:64] SRC1[31:0] XOR
DEST[63:32] SRC2[127:96] XOR
DEST[31:0] SRC2[95:64] XOR

SRC1[127:96]
SRC1[95:64]
SRC1[63:32]
SRC1[31:0]

Mnemonic

Opcode

SHA1MSG1 xmm1, xmm2/m128

0F 38 C9 /r

Description
Calculate Message Intermediate 1

Related Instructions
SHA1RNDS4, SHA1NEXTE, SHA1MSG2
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exceptions
Invalid opcode, #UD

Real

Virtual Protected
8086

X

X

A

A

S

S

X

Cause of Exception
Instruction not supported by CPUID
AVX instructions are only recognized in protected
mode

S

CR0.EM=1 OR CR4.OSFXSR=0

A

CR4.OSXSAVE = 0, indicated by CPUID
Fn0000_0001_ECX[OSXSAVE]

A

XFEATURE_ENABLED_MASK[2:1] ! = 11b.

A

VEX.L = 1 when AVX2 not supported.

A

REX, F2, F3, or 66 prefix preceding VEX prefix.

S

S

X

Lock prefix (F0h) preceding opcode.

Device not available, #NM

S

S

X

CR0.TS = 1.

Stack, #SS

S

S

X

Memory address exceeding stack segment limit or
non-canonical.

General protection, #GP

S

S

X

Memory address exceeding data segment limit or
non-canonical.

X

Null data segment used to reference memory

548

RSQRTSS, VRSQRTSS

Instruction Reference

26568—Rev. 3.22—May 2018

Exceptions
Alignment check, #AC

Real
S

Page Fault, #PF

AMD64 Technology

Virtual Protected
8086
S

S

Cause of Exception

S

Memory operand not 16-byte aligned when
alignment checking enabled and MXCSR.MM = 1.

A

Alignment checking enabled and 256-bit memory
operand not 32-byte aligned or 128-bit memory
operand not 16-byte aligned.

X

A page fault resulted from the execution of the
instruction

X - SSE, AVX, and AVX2 exception
A - AVX, AVX2 exception
S - SSE exception

Instruction Reference

RSQRTSS, VRSQRTSS

549

AMD64 Technology

26568—Rev. 3.22—May 2018

SHA1MSG2

Message Calculation 2

Performs the 2nd of two intermediate calculations necessary before doing the next four rounds of the
SHA1 message.
Temp[31:0] (SRC1[127:96] XOR SRC2[95:64]) Rotate Left 1
DEST[127:96] Temp[31:0]
DEST[95:64] (SRC1[95:64]
DEST[63:32] (SRC1{63:32]
DEST[31:0] (SRC1[31:0]

XOR SRC2[63:32]) Rotate Left 1
XOR SRC2[31:0]) Rotate Left 1
XOR Temp[31:0]) Rotate Left 1

Mnemonic
SHA1MSG2 xmm1, xmm2/m128

Opcode

Description

0F 38 CA /r

CCalculate Message Intermediate 2

Related Instructions
SHA1RNDS4, SHA1NEXTE, SHA1MSG1
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exceptions
Invalid opcode, #UD

Device not available, #NM

550

Real

Virtual Protected
8086

X

X

A

A

S

S

X

Cause of Exception
Instruction not supported by CPUID
AVX instructions are only recognized in protected
mode

S

CR0.EM=1 OR CR4.OSFXSR=0

A

CR4.OSXSAVE = 0, indicated by CPUID
Fn0000_0001_ECX[OSXSAVE]

A

XFEATURE_ENABLED_MASK[2:1] ! = 11b.

A

VEX.L = 1 when AVX2 not supported.

A

REX, F2, F3, or 66 prefix preceding VEX prefix.

S

S

X

Lock prefix (F0h) preceding opcode.

S

S

X

CR0.TS = 1.

RSQRTSS, VRSQRTSS

Instruction Reference

26568—Rev. 3.22—May 2018

Exceptions

Real

AMD64 Technology

Virtual Protected
8086

Cause of Exception

Stack, #SS

S

S

X

Memory address exceeding stack segment limit or
non-canonical.

General protection, #GP

S

S

X

Memory address exceeding data segment limit or
non-canonical.

X

Null data segment used to reference memory

S

Memory operand not 16-byte aligned when
alignment checking enabled and MXCSR.MM = 1.

A

Alignment checking enabled and 256-bit memory
operand not 32-byte aligned or 128-bit memory
operand not 16-byte aligned.

X

A page fault resulted from the execution of the
instruction

Alignment check, #AC

S

Page Fault, #PF

S

S

X - SSE, AVX, and AVX2 exception
A - AVX, AVX2 exception
S - SSE exception

Instruction Reference

RSQRTSS, VRSQRTSS

551

AMD64 Technology

26568—Rev. 3.22—May 2018

SHA256RNDS2

Two Rounds of SHA256

Performs 2 rounds of SHA256 operation with the first operand holding the initial SHA256 state (C,
D, G, H), the second operand holding the initial SHA256 state (A, B, E, F), and the implicit operand
xmm0 holding a pre-computed sum of the next two double word round 2 message as well as the corresponding round constants. The resulting SHA256 state (A, B, E, F) is placed in the destination register.
A_0 SRC2[127:96];
B_0 SRC2[95:64];
C_0 SRC1[127:96];
D_0 SRC1[95:64];
E_0 SRC2[63:32];
F_0 SRC2[31:0];
G_0 SRC1[63:32];
H_0 SRC1[31:0];
K0 XMM0[31: 0];
K1 XMM0[63: 32];
FOR i = 0 to 1
{
A_(i +1) Ch (E_i, F_i, G_i) + Perm1(E_i) +K_i + H_i + Ma(A_i , B_i, C_i) + Perm0(A_i);
B_(i +1) A_i;
C_(i +1) B_i ;
D_(i +1) C_i;
E_(i +1) Ch (E_i, F_i, G_i) + Perm1(E_i) + K_i + H_i + D_i;
F_(i +1) E_i ;
G_(i +1) F_i;
H_(i +1) G_i;
}
DEST[127:96] A_2;
DEST[95:64] B_2;
DEST[63:32] E_2;
DEST[31:0] F_2;

Mnemonic
SHA256RNDS2xmm1, xmm2/m128, xmm0

Opcode

Description

0F 38 CB /r

Execute 2 rounds of SHA256

Related Instructions
SHA256MSG1, SHA256MSG2
rFLAGS Affected
None
MXCSR Flags Affected
None

552

RSQRTSS, VRSQRTSS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exceptions
Invalid opcode, #UD

Real

Virtual Protected
8086

X

X

A

A

S

S

X

Cause of Exception
Instruction not supported by CPUID
AVX instructions are only recognized in protected
mode

S

CR0.EM=1 OR CR4.OSFXSR=0

A

CR4.OSXSAVE = 0, indicated by CPUID
Fn0000_0001_ECX[OSXSAVE]

A

XFEATURE_ENABLED_MASK[2:1] ! = 11b.

A

VEX.L = 1 when AVX2 not supported.

A

REX, F2, F3, or 66 prefix preceding VEX prefix.

S

S

X

Lock prefix (F0h) preceding opcode.

Device not available, #NM

S

S

X

CR0.TS = 1.

Stack, #SS

S

S

X

Memory address exceeding stack segment limit or
non-canonical.

General protection, #GP

S

S

X

Memory address exceeding data segment limit or
non-canonical.

X

Null data segment used to reference memory

S

Memory operand not 16-byte aligned when
alignment checking enabled and MXCSR.MM = 1.

A

Alignment checking enabled and 256-bit memory
operand not 32-byte aligned or 128-bit memory
operand not 16-byte aligned.

X

A page fault resulted from the execution of the
instruction

Alignment check, #AC

S

Page Fault, #PF

S

S

X - SSE, AVX, and AVX2 exception
A - AVX, AVX2 exception
S - SSE exception

Instruction Reference

RSQRTSS, VRSQRTSS

553

AMD64 Technology

26568—Rev. 3.22—May 2018

SHA256MSG1

Message Intermediate 1

Performs the 1st of two intermediate calculations necessary for the next four SHA256 message
dwords.
DEST[127:96] SRC1[127:96]
DEST[95:64] SRC1[95:64]
DEST[63:32] SRC1[63:32]
DEST[31:0] SRC1[31:0]

+
+
+
+

Perm2( SRC2[31:0])
Perm2( SRC1[127:96])
Perm2( SRC1[95:64]
Perm2( SRC1[63:62])

Mnemonic
SHA256MSG1xmm1, xmm2/m128

Opcode

Description

0F 38 CC /r

Calculate Message Intermediate 1

Related Instructions
SHA256RNDS2, SHA256MSG2
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exceptions
Invalid opcode, #UD

Real

Virtual Protected
8086

X

X

A

A

S

S

X

Cause of Exception
Instruction not supported by CPUID
AVX instructions are only recognized in protected
mode

S

CR0.EM=1 OR CR4.OSFXSR=0

A

CR4.OSXSAVE = 0, indicated by CPUID
Fn0000_0001_ECX[OSXSAVE]

A

XFEATURE_ENABLED_MASK[2:1] ! = 11b.

A

VEX.L = 1 when AVX2 not supported.

A

REX, F2, F3, or 66 prefix preceding VEX prefix.

S

S

X

Lock prefix (F0h) preceding opcode.

Device not available, #NM

S

S

X

CR0.TS = 1.

Stack, #SS

S

S

X

Memory address exceeding stack segment limit or
non-canonical.

554

RSQRTSS, VRSQRTSS

Instruction Reference

26568—Rev. 3.22—May 2018

Exceptions
General protection, #GP

Alignment check, #AC

Real
S

S

Page Fault, #PF

AMD64 Technology

Virtual Protected
8086
S

S

S

Cause of Exception

X

Memory address exceeding data segment limit or
non-canonical.

X

Null data segment used to reference memory

S

Memory operand not 16-byte aligned when
alignment checking enabled and MXCSR.MM = 1.

A

Alignment checking enabled and 256-bit memory
operand not 32-byte aligned or 128-bit memory
operand not 16-byte aligned.

X

A page fault resulted from the execution of the
instruction

X - SSE, AVX, and AVX2 exception
A - AVX, AVX2 exception
S - SSE exception

Instruction Reference

RSQRTSS, VRSQRTSS

555

AMD64 Technology

26568—Rev. 3.22—May 2018

SHA256MSG2

Message Intermediate 2

Performs the 2nd of two intermediate calculations necessary for the next four SHA256 message
dwords.
Temp0
Temp1

+ Perm3( SRC2[95:64])
+ Perm3( SRC2[127:96])

SRC1[31:0]
SRC1[63:32]

DEST[127:96] SRC1[127:96]
DEST[95:64] SRC1[95:64]
DEST[63:32] SRC1[63:32]
DEST[31:0] SRC1[31:0]

+
+
+
+

Perm3( Temp1)
Perm3( Temp0)
Perm3( SRC2[127:96])
Perm3( SRC2[95:624])

Mnemonic
SHA256MSG1 xmm1, xmm2/m128

Opcode

Description

0F 38 CD /r

Calculate Message Intermediate 2

Related Instructions
SHA256RNDS2, SHA256MSG1
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exceptions
Invalid opcode, #UD

Real

Virtual Protected
8086

X

X

A

A

S

S

X

Cause of Exception
Instruction not supported by CPUID
AVX instructions are only recognized in protected
mode

S

CR0.EM=1 OR CR4.OSFXSR=0

A

CR4.OSXSAVE = 0, indicated by CPUID
Fn0000_0001_ECX[OSXSAVE]

A

XFEATURE_ENABLED_MASK[2:1] ! = 11b.

A

VEX.L = 1 when AVX2 not supported.

A

REX, F2, F3, or 66 prefix preceding VEX prefix.

S

S

X

Lock prefix (F0h) preceding opcode.

Device not available, #NM

S

S

X

CR0.TS = 1.

Stack, #SS

S

S

X

Memory address exceeding stack segment limit or
non-canonical.

556

RSQRTSS, VRSQRTSS

Instruction Reference

26568—Rev. 3.22—May 2018

Exceptions
General protection, #GP

Alignment check, #AC

Real
S

S

Page Fault, #PF

AMD64 Technology

Virtual Protected
8086
S

S

S

Cause of Exception

X

Memory address exceeding data segment limit or
non-canonical.

X

Null data segment used to reference memory

S

Memory operand not 16-byte aligned when
alignment checking enabled and MXCSR.MM = 1.

A

Alignment checking enabled and 256-bit memory
operand not 32-byte aligned or 128-bit memory
operand not 16-byte aligned.

X

A page fault resulted from the execution of the
instruction

X - SSE, AVX, and AVX2 exception
A - AVX, AVX2 exception
S - SSE exception

Instruction Reference

RSQRTSS, VRSQRTSS

557

AMD64 Technology

26568—Rev. 3.22—May 2018

SHUFPD
VSHUFPD

Shuffle
Packed Double-Precision Floating-Point

Copies packed double-precision floating-point values from either of two sources to quadwords in the
destination, as specified by bit fields of an immediate byte operand.
Each bit corresponds to a quadword destination. The 128-bit legacy and extended versions of the
instruction use bits [1:0]; the 256-bit extended version uses bits [3:0], as shown.
Destination
Quadword

Immediate-Byte
Bit Field

Value of
Bit Field

Source 1
Bits Copied

Source 2
Bits Copied

Used by 128-bit encoding and 256-bit encoding
[63:0]
[127:64]

[0]
[1]

0

[63:0]

—

1

[127:64]

—

0

—

[63:0]

1

—

]127:64]

0

[191:128]

—

1

[255:192]

—

0

—

[191:128]

1

—

[255:192]

Used only by 256-bit encoding
[191:128]
[255:192]

[2]
[3]

There are legacy and extended forms of the instruction:
SHUFPD

Shuffles four source values. The first source operand is an XMM register. The second source operand
is either an XMM register or a 128-bit memory location. There is a third 8-bit immediate operand.
The first source register is also the destination. Bits [255:128] of the YMM register that corresponds
to the destination are not affected.
VSHUFPD

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Shuffles four source values. The first source operand is an XMM register. The second source operand
is either an XMM register or a 128-bit memory location. The destination is a third XMM register.
There is a fourth 8-bit immediate operand. Bits [255:128] of the YMM register that corresponds to the
destination are cleared.
YMM Encoding

Shuffles eight source values. The first source operand is a YMM register and the second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register.
There is a fourth 8-bit immediate operand.

558

SHUFPD, VSHUFPD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Support
Form

Subset

SHUFPD

SSE2

VSHUFPD

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
SHUFPD xmm1, xmm2/mem128, imm8

Opcode
66 0F C6 /r ib

Description
Shuffles packed double-precision floatingpoint values in xmm1 and xmm2 or
mem128. Writes the result to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VSHUFPD xmm1, xmm2, xmm3/mem128, imm8

C4

RXB.01

X.src1.0.01

C6 /r

VSHUFPD ymm1, ymm2, ymm3/mem256, imm8

C4

RXB.01

X.src1.1.01

C6 /r

Related Instructions
(V)SHUFPS
rFLAGS Affected
None
MXCSR Flags Affected
None

Instruction Reference

SHUFPD, VSHUFPD

559

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

560

X
A
S
S

X
A
S
S

X

S
S
S
S
S

S
S
S
S
S

S

S

S

S

A
X

S
S
A
A
A
X
X
X
X
S
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Memory operand not 16-byte aligned and MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

SHUFPD, VSHUFPD

Instruction Reference

26568—Rev. 3.22—May 2018

SHUFPS
VSHUFPS

AMD64 Technology

Shuffle
Packed Single-Precision Floating-Point

Copies packed single-precision floating-point values from either of two sources to doublewords in the
destination, as specified by bit fields of an immediate byte operand.
Each bit field corresponds to a doubleword destination. The 128-bit legacy and extended versions of
the instruction use a single 128-bit destination; the 256-bit extended version performs duplicate operations on bits [127:0] and bits [255:128] of the source and destination.
Destination
Doubleword
[31:0]

[63:32]

[95:64]

[127:96]

[159:128]

[191:160]

[223:192]

[255:224]

Value of Bit
Source 1
Field
Bits Copied
00
[31:0]
01
[63:32]
10
[95:64]
11
[127:96]
[3:2]
00
[31:0]
01
[63:32]
10
[95:64]
11
[127:96]
[5:4]
00
—
01
—
10
—
11
—
[7:6]
00
—
01
—
10
—
11
—
Upper 128 bits of 256-bit source and destination used by 256-bit encoding
[1:0]
00
[159:128]
01
[191:160]
10
[223:192]
11
[255:224]
[3:2]
00
[159:128]
01
[191:160]
10
[223:192]
11
[255:224]
[5:4]
00
—
01
—
10
—
11
—
[7:6]
00
—
01
—
10
—
11
—

Instruction Reference

Immediate-Byte
Bit Field
[1:0]

SHUFPS, VSHUFPS

Source 2
Bits Copied
—
—
—
—
—
—
—
—
[31:0]
[63:32]
[95:64]
[127:96]
[31:0]
[63:32]
[95:64]
[127:96]
—
—
—
—
—
—
—
—
[159:128]
[191:160]
[223:192]
[255:224]
[159:128]
[191:160]
[223:192]
[255:224]

561

AMD64 Technology

26568—Rev. 3.22—May 2018

There are legacy and extended forms of the instruction:
SHUFPS

Shuffles eight source values. The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. There is a third 8-bit immediate operand. The first source register is also the destination. Bits [255:128] of the YMM register that
corresponds to the destination are not affected.
VSHUFPS

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Shuffles eight source values. The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register.
There is a fourth 8-bit immediate operand. Bits [255:128] of the YMM register that corresponds to the
destination are cleared.
YMM Encoding

Shuffles 16 source values. The first source operand is a YMM register and the second source operand
is either a YMM register or a 256-bit memory location. The destination is a third YMM register.
There is a fourth 8-bit immediate operand.
Instruction Support
Form

Subset

Feature Flag

SHUFPS

SSE1

CPUID Fn0000_0001_EDX[SSE] (bit 25)

VSHUFPS

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
SHUFPS xmm1, xmm2/mem128, imm8

Opcode
0F C6 /r ib

Description
Shuffles packed single-precision floatingpoint values in xmm1 and xmm2 or
mem128. Writes the result to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VSHUFPS xmm1, xmm2, xmm3/mem128, imm8

C4

RXB.01

X.src1.0.00

C6 /r

VSHUFPS ymm1, ymm2, ymm3/mem256, imm8

C4

RXB.01

X.src1.1.00

C6 /r

Related Instructions
(V)SHUFPD
rFLAGS Affected
None
MXCSR Flags Affected
None

562

SHUFPS, VSHUFPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

X
A
S
S

X
A
S
S

X

S
S
S
S
S

S
S
S
S
S

S

S

S

S

A
X

S
S
A
A
A
X
X
X
X
S
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Memory operand not 16-byte aligned and MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

SHUFPS, VSHUFPS

563

AMD64 Technology

26568—Rev. 3.22—May 2018

SQRTPD
VSQRTPD

Square Root
Packed Double-Precision Floating-Point

Computes the square root of each packed double-precision floating-point value in a source operand
and writes the result to the corresponding quadword of the destination.
Performing the square root of +infinity returns +infinity.
There are legacy and extended forms of the instruction:
SQRTPD

Computes two values. The destination is an XMM register. The source operand is either an XMM
register or a 128-bit memory location. Bits [255:128] of the YMM register that corresponds to the
destination are not affected.
VSQRTPD

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Computes two values. The source operand is either an XMM register or a 128-bit memory location.
The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

Computes four values. The source operand is either a YMM register or a 256-bit memory location.
The destination is a YMM register.
Instruction Support
Form

Subset

SQRTPD

SSE2

VSQRTPD

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
SQRTPD xmm1, xmm2/mem128

Opcode
66 0F 51 /r

Description
Computes square roots of packed double-precision
floating-point values in xmm1 or mem128. Writes the
results to xmm1.

Mnemonic

Encoding
VEX RXB.map_select

W.vvvv.L.pp

Opcode

VSQRTPD xmm1, xmm2/mem128

C4

RXB.01

X.1111.0.01

51 /r

VSQRTPD ymm1, ymm2/mem256

C4

RXB.01

X.1111.1.01

51 /r

Related Instructions
(V)RSQRTPS, (V)RSQRTSS, (V)SQRTPS, (V)SQRTSD, (V)SQRTSS

564

SQRTPD, VSQRTPD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

rFLAGS Affected
None
MXCSR Flags Affected
MM

FZ

RC

PM

UM

OM

ZM

DM

IM

DAZ

PE

UE

OE

ZE

M
17
Note:

15

14

13

12

11

10

9

8

7

6

5

4

3

2

DE

IE

M

M

1

0

A flag that may be set or cleared is M (modified). Unaffected flags are blank.

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S

X
X
X
S
X

S

S

S

S

A
X

S

X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
SIMD floating-point, #XF

S

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

S
S
S
S

S
S
S
S

X
X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
A result could not be represented exactly in the destination format.

SQRTPD, VSQRTPD

565

AMD64 Technology

26568—Rev. 3.22—May 2018

SQRTPS
VSQRTPS

Square Root
Packed Single-Precision Floating-Point

Computes the square root of each packed single-precision floating-point value in a source operand
and writes the result to the corresponding doubleword of the destination.
Performing the square root of +infinity returns +infinity.
There are legacy and extended forms of the instruction:
SQRTPS

Computes four values. The destination is an XMM register. The source operand is either an XMM
register or a 128-bit memory location. Bits [255:128] of the YMM register that corresponds to the
destination are not affected.
VSQRTPS

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Computes four values. The destination is an XMM register. The source operand is either an XMM
register or a 128-bit memory location. Bits [255:128] of the YMM register that corresponds to the
destination are cleared.
YMM Encoding

Computes eight values. The destination is a YMM register. The source operand is either a YMM register or a 256-bit memory location.
Instruction Support
Form

Subset

Feature Flag

SQRTPS

SSE1

CPUID Fn0000_0001_EDX[SSE] (bit 25)

VSQRTPS

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

SQRTPS xmm1, xmm2/mem128

0F 51 /r

Description
Computes square roots of packed single-precision
floating-point values in xmm1 or mem128. Writes the
results to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VSQRTPS xmm1, xmm2/mem128

C4

RXB.01

X.1111.0.00

51 /r

VSQRTPS ymm1, ymm2/mem256

C4

RXB.01

X.1111.1.00

51 /r

Related Instructions
(V)RSQRTPS, (V)RSQRTSS, (V)SQRTPD, (V)SQRTSD, (V)SQRTSS

566

SQRTPS, VSQRTPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

rFLAGS Affected
None
MXCSR Flags Affected
MM

FZ

RC

PM

UM

OM

ZM

DM

IM

DAZ

PE

UE

OE

ZE

M
17
Note:

15

14

13

12

11

10

9

8

7

6

5

4

3

2

DE

IE

M

M

1

0

A flag that may be set or cleared is M (modified). Unaffected flags are blank.

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S

X
X
X
S
X

S

S

S

S

A
X

S

X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
SIMD floating-point, #XF

S

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

S
S
S
S

S
S
S
S

X
X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
A result could not be represented exactly in the destination format.

SQRTPS, VSQRTPS

567

AMD64 Technology

26568—Rev. 3.22—May 2018

SQRTSD
VSQRTSD

Square Root
Scalar Double-Precision Floating-Point

Computes the square root of a double-precision floating-point value and writes the result to the low
quadword of the destination. The three-operand form of the instruction also writes a copy of the upper
quadword of a second source operand to the upper quadword of the destination.
Performing the square root of +infinity returns +infinity.
There are legacy and extended forms of the instruction:
SQRTSD

The source operand is either an XMM register or a 64-bit memory location. When the source is an
XMM register, the source value must be in the low quadword. The destination is an XMM register.
Bits [127:64] of the destination are not affected. Bits [255:128] of the YMM register that corresponds
to destination XMM register are not affected.
VSQRTSD

The extended form of the instruction has a single 128-bit encoding that requires three operands:
VSQRTSD xmm1, xmm2, xmm3/mem64

The first source operand is an XMM register. The second source operand is either an XMM register or
a 64-bit memory location. When the second source is an XMM register, the source value must be in
the low quadword. The destination is a third XMM register. The square root of the second source
operand is written to bits [63:0] of the destination register. Bits [127:64] of the destination are copied
from the corresponding bits of the first source operand. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

SQRTSD

SSE2

VSQRTSD

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
SQRTSD xmm1, xmm2/mem64

Opcode

Description

F2 0F 51 /r

Computes the square root of a double-precision floatingpoint value in xmm1 or mem64. Writes the result to xmm1.

Mnemonic

Encoding
VEX RXB.map_select

VSQRTSD xmm1, xmm2, xmm3/mem64

C4

RXB.01

W.vvvv.L.pp

Opcode

X.src1.X.11

51 /r

Related Instructions
(V)RSQRTPS, (V)RSQRTSS, (V)SQRTPD, (V)SQRTPS, (V)SQRTSS

568

SQRTSD, VSQRTSD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

rFLAGS Affected
None
MXCSR Flags Affected
MM

FZ

RC

PM

UM

OM

ZM

DM

IM

DAZ

PE

UE

OE

ZE

M
17
Note:

15

14

13

12

11

10

9

8

7

6

5

4

3

2

DE

IE

M

M

1

0

A flag that may be set or cleared is M (modified). Unaffected flags are blank.

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S

S
S
S
S
S

X
X
X
X
X
X

S

X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

S

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

S
S
S
S

S
S
S
S

X
X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
A result could not be represented exactly in the destination format.

SQRTSD, VSQRTSD

569

AMD64 Technology

26568—Rev. 3.22—May 2018

SQRTSS
VSQRTSS

Square Root
Scalar Single-Precision Floating-Point

Computes the square root of a single-precision floating-point value and writes the result to the low
doubleword of the destination. The three-operand form of the instruction also writes a copy of the
three most significant doublewords of a second source operand to the upper 96 bits of the destination.
Performing the square root of +infinity returns +infinity.
There are legacy and extended forms of the instruction:
SQRTSS

The source operand is either an XMM register or a 32-bit memory location. When the source is an
XMM register, the source value must be in the low doubleword. The destination is an XMM register.
Bits [127:32] of the destination are not affected. Bits [255:128] of the YMM register that corresponds
to destination XMM register are not affected.
VSQRTSS

The extended form has a single 128-bit encoding that requires three operands:
VSQRTSS xmm1, xmm2, xmm3/mem64

The first source operand is an XMM register. The second source operand is either an XMM register or
a 32-bit memory location. When the second source is an XMM register, the source value must be in
the low doubleword. The destination is a third XMM register. The square root of the second source
operand is written to bits [31:0] of the destination register. Bits [127:32] of the destination are copied
from the corresponding bits of the first source operand. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

Feature Flag

SQRTSS

SSE1

CPUID Fn0000_0001_EDX[SSE] (bit 25)

VSQRTSS

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
SQRTSS xmm1, xmm2/mem32

Opcode

Description

F3 0F 51 /r

Computes square root of a single-precision floating-point
value in xmm1 or mem32. Writes the result to xmm1.

Mnemonic

Encoding
VEX RXB.map_select

VSQRTSS xmm1, xmm2, xmm3/mem64

C4

RXB.01

W.vvvv.L.pp

Opcode

X.src1.X.10

51 /r

Related Instructions
(V)RSQRTPS, (V)RSQRTSS, (V)SQRTPD, (V)SQRTPS, (V)SQRTSD

570

SQRTSS, VSQRTSS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

rFLAGS Affected
None
MXCSR Flags Affected
MM

FZ

RC

PM

UM

OM

ZM

DM

IM

DAZ

PE

UE

OE

ZE

M
17
Note:

15

14

13

12

11

10

9

8

7

6

5

4

3

2

DE

IE

M

M

1

0

A flag that may be set or cleared is M (modified). Unaffected flags are blank.

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S

S
S
S
S
S

X
X
X
X
X
X

S

X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

S

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

S
S
S
S

S
S
S
S

X
X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
A result could not be represented exactly in the destination format.

SQRTSS, VSQRTSS

571

AMD64 Technology

26568—Rev. 3.22—May 2018

STMXCSR
VSTMXCSR

Store MXCSR

Saves the content of the MXCSR extended control/status register to a 32-bit memory location.
Reserved bits are stored as zeroes. The MXCSR is described in “Registers” in Volume 1.
For both legacy STMXCSR and extended VSTMXCSR forms of the instruction, the source operand
is the MXCSR and the destination is a 32-bit memory location.
There is one encoding for each instruction form.
Instruction Support
Form

Subset

Feature Flag

STMXCSR

SSE1

CPUID Fn0000_0001_EDX[SSE] (bit 25)

VSTMXCSR

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

STMXCSR mem32

0F AE /3

Description
Stores content of MXCSR in mem32.

Mnemonic

Encoding
VEX RXB.map_select

VSTMXCSR mem32

C4

RXB.01

W.vvvv.L.pp

Opcode

X.1111.0.00

AE /3

Related Instructions
(V)LDMXCSR
rFLAGS Affected
None
MXCSR Flags Affected
MM

FZ

M

M

M

17

15

14

Note:

572

RC
M
13

PM

UM

OM

ZM

DM

IM

DAZ

PE

UE

OE

ZE

DE

IE

M

M

M

M

M

M

M

M

M

M

M

M

M

12

11

10

9

8

7

6

5

4

3

2

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

STMXCSR, VSTMXCSR

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

X
A

X
A

S
S

S
S

X
S
S
S
S
S

X
S
S
S
S
S
S

X
A
S
S
A
A
A
A
X
X
X
X
X
S
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
CR0.EM = 1.
CR4.OSFXSR = 0.
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Write to a read-only data segment.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

STMXCSR, VSTMXCSR

573

AMD64 Technology

26568—Rev. 3.22—May 2018

SUBPD
VSUBPD

Subtract
Packed Double-Precision Floating-Point

Subtracts each packed double-precision floating-point value of the second source operand from the
corresponding value of the first source operand and writes the difference to the corresponding quadword of the destination.
There are legacy and extended forms of the instruction:
SUBPD

Subtracts two pairs of values. The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected.
VSUBPD

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Subtracts two pairs of values. The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register.
Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

Subtracts four pairs of values. The first source operand is a YMM register and the second source
operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

SUBPD

SSE2

VSUBPD

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
SUBPD xmm1, xmm2/mem128

Opcode

Description

66 0F 5C /r

Subtracts packed double-precision floating-point values in
xmm2 or mem128 from corresponding values of xmm1.
Writes differences to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VSUBPD xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

5C /r

VSUBPD ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

5C /r

Related Instructions
(V)SUBPS, (V)SUBSD, (V)SUBSS

574

SUBPD, VSUBPD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

rFLAGS Affected
None
MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM

OM

11

10

ZM
9

DM
8

IM
7

DAZ
6

PE

UE

OE

M

M

M

5

4

3

ZE
2

DE

IE

M

M

1

0

A flag that may be set or cleared is M (modified). Unaffected flags are blank.

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S

X
X
X
S
X

S

S

S

S

A
X

S

S

X

S
S
S
S
S
S

S
S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

X
X
X
X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

SUBPD, VSUBPD

575

AMD64 Technology

26568—Rev. 3.22—May 2018

SUBPS
VSUBPS

Subtract
Packed Single-Precision Floating-Point

Subtracts each packed single-precision floating-point value of the second source operand from the
corresponding value of the first source operand and writes the difference to the corresponding quadword of the destination.
There are legacy and extended forms of the instruction:
SUBPS

Subtracts four pairs of values. The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected.
VSUBPS

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Subtracts four pairs of values. The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register.
Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

Subtracts eight pairs of values. The first source operand is a YMM register and the second source
operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

SUBPS

SSE1

CPUID Fn0000_0001_EDX[SSE] (bit 25)

VSUBPS

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

SUBPS xmm1, xmm2/mem128

0F 5C /r

Description
Subtracts packed single-precision floating-point values in
xmm2 or mem128 from corresponding values of xmm1.
Writes differences to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VSUBPS xmm1, xmm2, xmm3/mem128

C4

RXB.00001

X.src.0.00

5C /r

VSUBPS ymm1, ymm2, ymm3/mem256

C4

RXB.00001

X.src.1.00

5C /r

Related Instructions
(V)SUBPD, (V)SUBSD, (V)SUBSS

576

SUBPS, VSUBPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

rFLAGS Affected
None
MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM

OM

11

10

ZM
9

DM
8

IM
7

DAZ
6

PE

UE

OE

M

M

M

5

4

3

ZE
2

DE

IE

M

M

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S

X
X
X
S
X

S

S

S

S

A
X

S

S

X

S
S
S
S
S
S

S
S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

X
X
X
X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

SUBPS, VSUBPS

577

AMD64 Technology

26568—Rev. 3.22—May 2018

SUBSD
VSUBSD

Subtract
Scalar Double-Precision Floating-Point

Subtracts the double-precision floating-point value in the low-order quadword of the second source
operand from the corresponding value in the first source operand and writes the result to the loworder quadword of the destination
There are legacy and extended forms of the instruction:
SUBSD

The first source operand is an XMM register and the second source operand is either an XMM register or a 64-bit memory location. The first source register is also the destination register. Bits [127:64]
of the destination and bits [255:128] of the corresponding YMM register are not affected.
VSUBSD

The extended form of the instruction has a 128-bit encoding only.
The first source operand is an XMM register and the second source operand is either an XMM register or a 64-bit memory location. The destination is a third XMM register. Bits [127:64] of the first
source operand are copied to bits [127:64] of the destination. Bits [255:128] of the YMM register that
corresponds to the destination are cleared.
Instruction Support
Form

Subset

SUBSD

SSE2

VSUBSD

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
SUBSD xmm1, xmm2/mem64

Opcode

Description

F2 0F 5C /r

Subtracts low-order double-precision floating-point value in
xmm2 or mem64 from the corresponding value of xmm1.
Writes the difference to xmm1.

Mnemonic
VSUBSD xmm1, xmm2, xmm3/mem64

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.01

X.src1.X.11

5C /r

Related Instructions
(V)SUBPD, (V)SUBPS, (V)SUBSS
rFLAGS Affected
None

578

SUBSD, VSUBSD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM

OM

11

10

ZM
9

DM
8

IM
7

DAZ
6

PE

UE

OE

M

M

M

5

4

3

ZE
2

DE

IE

M

M

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S

S
S
S
S
S

X
X
X
X
X
X

S

S

X

S
S
S
S
S
S

S
S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

X
X
X
X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

SUBSD, VSUBSD

579

AMD64 Technology

26568—Rev. 3.22—May 2018

SUBSS
VSUBSS

Subtract
Scalar Single-Precision Floating-Point

Subtracts the single-precision floating-point value in the low-order word of the second source operand from the corresponding value in the first source operand and writes the result to the low-order
word of the destination
There are legacy and extended forms of the instruction:
SUBSS

The first source operand is an XMM register and the second source operand is either an XMM register or a 32-bit memory location. The first source register is also the destination register. Bits [127:32]
of the destination and bits [255:128] of the corresponding YMM register are not affected.
VSUBSS

The extended form of the instruction has a 128-bit encoding only.
The first source operand is an XMM register and the second source operand is either an XMM register or a 32-bit memory location. The destination is a third XMM register. Bits [127:32] of the first
source operand are copied to bits [127:32] of the destination. Bits [255:128] of the YMM register that
corresponds to the destination are cleared.
Instruction Support
Form

Subset

Feature Flag

SUBSS

SSE1

CPUID Fn0000_0001_EDX[SSE] (bit 25)

VSUBSS

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
SUBSS xmm1, xmm2/mem32

Opcode

Description

F3 0F 5C /r

Subtracts a low-order single-precision floating-point value
in xmm2 or mem32 from the corresponding value of xmm1.
Writes the difference to xmm1.

Mnemonic
VSUBSS xmm1, xmm2, xmm3/mem32

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.01

X.src1.X.10

5C /r

Related Instructions
(V)SUBPD, (V)SUBPS, (V)SUBSD
rFLAGS Affected
None

580

SUBSS, VSUBSS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM

OM

11

10

ZM
9

DM
8

IM
7

DAZ
6

PE

UE

OE

M

M

M

5

4

3

ZE
2

DE

IE

M

M

1

0

M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected.

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S

S
S
S
S
S

X
X
X
X
X
X

S

S

X

S
S
S
S
S
S

S
S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

X
X
X
X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

SUBSS, VSUBSS

581

AMD64 Technology

26568—Rev. 3.22—May 2018

UCOMISD
VUCOMISD

Unordered Compare
Scalar Double-Precision Floating-Point

Performs an unordered comparison of a double-precision floating-point value in the low-order 64 bits
of an XMM register with a double-precision floating-point value in the low-order 64 bits of an XMM
register or a 64-bit memory location.
The ZF, PF, and CF bits in the rFLAGS register reflect the result of the compare as follows.
Result of Compare

ZF

PF

CF

Unordered

1

1

1

Greater Than

0

0

0

Less Than

0

0

1

Equal

1

0

0

The OF, AF, and SF bits in rFLAGS are cleared. If the instruction causes an unmasked SIMD floating-point exception (#XF), the rFLAGS bits are not updated.
The result is unordered when one or both of the operand values is a NaN. UCOMISD signals a SIMD
floating-point invalid operation exception (#I) only when a source operand is an SNaN.
The legacy and extended forms of the instruction operate in the same way.
Instruction Support
Form

Subset

UCOMISD

SSE2

VUCOMISD

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
UCOMISD xmm1, xmm2/mem64

Opcode

Description

66 0F 2E /r

Compares scalar double-precision floating-point values
in xmm1 and xmm2 or mem64. Sets rFLAGS.

Mnemonic

Encoding
VEX RXB.map_select

VUCOMISD xmm1, xmm2/mem64

C4

RXB.00001

W.vvvv.L.pp

Opcode

X.1111.X.01

2E /r

Related Instructions
(V)CMPPD, (V)CMPPS, (V)CMPSD, (V)CMPSS, (V)COMISD, (V)COMISS, (V)UCOMISS

582

UCOMISD, VUCOMISD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

rFLAGS Affected
ID

VIP

VIF

AC

VM

RF

NT

IOPL

OF

DF

IF

TF

SF

ZF

AF

PF

CF

0

M

0

M

M

7

6

4

2

0

0
21
Note:
Note:

20

19

18

17

16

14

13:12

11

10

9

8

Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set or cleared is M (modified). Unaffected flags are blank.
If the instruction causes an unmasked SIMD floating-point exception (#XF), the rFLAGS bits are not updated.

MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM

OM

11

10

ZM
9

DM
8

IM
7

DAZ
6

PE
5

UE
4

OE
3

ZE
2

DE

IE

M

M

1

0

A flag that may be set or cleared is M (modified). Unaffected flags are blank.

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S

S
S
S
S
S

X
X
X
X
X
X

S

S

X

S
S
S

S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.

UCOMISD, VUCOMISD

583

AMD64 Technology

26568—Rev. 3.22—May 2018

UCOMISS
VUCOMISS

Unordered Compare
Scalar Single-Precision Floating-Point

Performs an unordered comparison of a single-precision floating-point value in the low-order 32 bits
of an XMM register with a single-precision floating-point value in the low-order 32 bits of an XMM
register or a 32-bit memory location.
The ZF, PF, and CF bits in the rFLAGS register reflect the result of the compare as follows.
Result of Compare

ZF

PF

CF

Unordered

1

1

1

Greater Than

0

0

0

Less Than

0

0

1

Equal

1

0

0

The OF, AF, and SF bits in rFLAGS are cleared. If the instruction causes an unmasked SIMD floating-point exception (#XF), the rFLAGS bits are not updated.
The result is unordered when one or both of the operand values is a NaN. UCOMISD signals a SIMD
floating-point invalid operation exception (#I) only when a source operand is an SNaN.
The legacy and extended forms of the instruction operate in the same way.
Instruction Support
Form

Subset

UCOMISS

SSE1

CPUID Fn0000_0001_EDX[SSE] (bit 25)

Feature Flag

VUCOMISS

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

UCOMISS xmm1, xmm2/mem32

0F 2E /r

Description
Compares scalar single-precision floating-point values
in xmm1 and xmm2 or mem64. Sets rFLAGS.

Mnemonic
VUCOMISS xmm1, xmm2/mem32

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.01

X.1111.X.00

2E /r

Related Instructions
(V)CMPPD, (V)CMPPS, (V)CMPSD, (V)CMPSS, (V)COMISD, (V)COMISS, (V)UCOMISD

584

UCOMISS, VUCOMISS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

rFLAGS Affected
ID

VIP

VIF

AC

VM

RF

NT

IOPL

OF

DF

IF

TF

SF

ZF

AF

PF

CF

0

M

0

M

M

7

6

4

2

0

0
21
Note:
Note:

20

19

18

17

16

14

13:12

11

10

9

8

Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set or cleared is M (modified). Unaffected flags are blank.
If the instruction causes an unmasked SIMD floating-point exception (#XF), the rFLAGS bits are not updated.

MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM

OM

11

10

ZM
9

DM
8

IM
7

DAZ
6

PE
5

UE
4

OE
3

ZE
2

DE

IE

M

M

1

0

A flag that may be set or cleared is M (modified). Unaffected flags are blank.

Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S

S
S
S
S
S

X
X
X
X
X
X

S

S

X

S
S
S

S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.

UCOMISS, VUCOMISS

585

AMD64 Technology

26568—Rev. 3.22—May 2018

UNPCKHPD
VUNPCKHPD

Unpack High
Double-Precision Floating-Point

Unpacks the high-order double-precision floating-point values of the first and second source operands and interleaves the values into the destination. Bits [63:0] of the source operands are ignored.
Values are interleaved in ascending order from the lsb of the sources and the destination. Bits
[127:64] of the first source are written to bits [63:0] of the destination; bits [127:64] of the second
source are written to bits [127:64] of the destination. For the 256-bit encoding, the process is repeated
for bits [255:192] of the sources and bits [255:128] of the destination.
There are legacy and extended forms of the instruction:
UNPCKHPD

Interleaves one pair of values. The first source operand is an XMM register and the second source
operand is either an XMM register or a 128-bit memory location. The first source register is also the
destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected.
VUNPCKHPD

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Interleaves one pair of values. The first source operand is an XMM register and the second source
operand is either an XMM register or a 128-bit memory location. The destination is an XMM register.
Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

Interleaves two pairs of values. The first source operand is a YMM register and the second source
operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

UNPCKHPD

SSE2

VUNPCKHPD

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
UNPCKHPD xmm1, xmm2/mem128

Opcode
66 0F 15 /r

Description
Unpacks the high-order double-precision floatingpoint values in xmm1 and xmm2 or mem128 and
interleaves them into xmm1

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VUNPCKHPD xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

15 /r

VUNPCKHPD ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

15 /r

586

UNPCKHPD, VUNPCKHPD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Related Instructions
(V)UNPCKHPS, (V)UNPCKLPD, (V)UNPCKLPS
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

X
A
S
S

X
A
S
S

X

S
S
S
S
S

S
S
S
S
S

S

S

S

S

A
X

S
S
A
A
A
X
X
X
X
S
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Memory operand not 16-byte aligned and MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

UNPCKHPD, VUNPCKHPD

587

AMD64 Technology

26568—Rev. 3.22—May 2018

UNPCKHPS
VUNPCKHPS

Unpack High
Single-Precision Floating-Point

Unpacks the high-order single-precision floating-point values of the first and second source operands
and interleaves the values into the destination. Bits [63:0] of the source operands are ignored.
Values are interleaved in ascending order from the lsb of the sources and the destination. Bits [95:64]
of the first source are written to bits [31:0] of the destination; bits [95:64] of the second source are
written to bits [63:32] of the destination and so on, ending with bits [127:96] of the second source in
bits [127:96] of the destination. For the 256-bit encoding, the process continues for bits [255:192] of
the sources and bits [255:128] of the destination.
There are legacy and extended forms of the instruction:
UNPCKHPS

Interleaves two pairs of values. The first source operand is an XMM register and the second source
operand is either an XMM register or a 128-bit memory location. The first source register is also the
destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected.
VUNPCKHPS

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Interleaves two pairs of values. The first source operand is an XMM register and the second source
operand is either an XMM register or a 128-bit memory location. The destination is an XMM register.
Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

Interleaves four pairs of values. The first source operand is a YMM register and the second source
operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

UNPCKHPS

SSE1

CPUID Fn0000_0001_EDX[SSE] (bit 25)

VUNPCKHPS

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

588

UNPCKHPS, VUNPCKHPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic

Opcode

Description

UNPCKHPS xmm1, xmm2/mem128

0F 15 /r

Unpacks the high-order single-precision floating-point
values in xmm1 and xmm2 or mem128 and
interleaves them into xmm1

Mnemonic

Encoding
VEX RXB.map_select

W.vvvv.L.pp

Opcode

VUNPCKHPS xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.00

15 /r

VUNPCKHPS ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.00

15 /r

Related Instructions
(V)UNPCKHPD, (V)UNPCKLPD, (V)UNPCKLPS
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

X
A
S
S

X
A
S
S

X

S
S
S
S
S

S
S
S
S
S

S

S

S

S

A
X

S
S
A
A
A
X
X
X
X
S
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Memory operand not 16-byte aligned and MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

UNPCKHPS, VUNPCKHPS

589

AMD64 Technology

26568—Rev. 3.22—May 2018

UNPCKLPD
VUNPCKLPD

Unpack Low
Double-Precision Floating-Point

Unpacks the low-order double-precision floating-point values of the first and second source operands
and interleaves the values into the destination. Bits [127:64] of the source operands are ignored.
Values are interleaved in ascending order from the lsb of the sources and the destination. Bits [63:0]
of the first source are written to bits [63:0] of the destination; bits [63:0] of the second source are written to bits [127:64] of the destination. For the 256-bit encoding, the process is repeated for bits
[191:128] of the sources and bits [255:128] of the destination.
There are legacy and extended forms of the instruction:
UNPCKLPD

Interleaves one pair of values. The first source operand is an XMM register and the second source
operand is either an XMM register or a 128-bit memory location. The first source register is also the
destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected.
VUNPCKLPD

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Interleaves one pair of values. The first source operand is an XMM register and the second source
operand is either an XMM register or a 128-bit memory location. The destination is an XMM register.
Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

Interleaves two pairs of values. The first source operand is a YMM register and the second source
operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

UNPCKLPD

SSE2

VUNPCKLPD

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
UNPCKLPD xmm1, xmm2/mem128

Opcode

Description

66 0F 14 /r

Unpacks the low-order double-precision floating-point
values in xmm1 and xmm2 or mem128 and
interleaves them into xmm1

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VUNPCKLPD xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

14 /r

VUNPCKLPD ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

14 /r

590

UNPCKLPD, VUNPCKLPD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Related Instructions
(V)UNPCKHPD, (V)UNPCKHPS, (V)UNPCKLPS
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

X
A
S
S

X
A
S
S

X

S
S
S
S
S

S
S
S
S
S

S

S

S

S

A
X

S
S
A
A
A
X
X
X
X
S
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Memory operand not 16-byte aligned and MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

UNPCKLPD, VUNPCKLPD

591

AMD64 Technology

26568—Rev. 3.22—May 2018

UNPCKLPS
VUNPCKLPS

Unpack Low
Single-Precision Floating-Point

Unpacks the low-order single-precision floating-point values of the first and second source operands
and interleaves the values into the destination. Bits [127:64] of the source operands are ignored.
Values are interleaved in ascending order from the lsb of the sources and the destination. Bits [31:0]
of the first source are written to bits [31:0] of the destination; bits [31:0] of the second source are written to bits [63:32] of the destination and so on, ending with bits [63:32] of the second source in bits
[127:96] of the destination. For the 256-bit encoding, the process continues for bits [191:128] of the
sources and bits [255:128] of the destination.
There are legacy and extended forms of the instruction:
UNPCKLPS

Interleaves two pairs of values. The first source operand is an XMM register and the second source
operand is either an XMM register or a 128-bit memory location. The first source register is also the
destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected.
VUNPCKLPS

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Interleaves two pairs of values. The first source operand is an XMM register and the second source
operand is either an XMM register or a 128-bit memory location. The destination is an XMM register.
Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

Interleaves four pairs of values. The first source operand is a YMM register and the second source
operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

Feature Flag

UNPCKLPS

SSE1

CPUID Fn0000_0001_EDX[SSE] (bit 25)

VUNPCKLPS

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

592

UNPCKLPS, VUNPCKLPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic

Opcode

Description

UNPCKLPS xmm1, xmm2/mem128

0F 14 /r

Unpacks the high-order single-precision floating-point
values in xmm1 and xmm2 or mem128 and
interleaves them into xmm1

Mnemonic

Encoding
VEX RXB.map_select

W.vvvv.L.pp

Opcode

VUNPCKLPS xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.00

14 /r

VUNPCKLPS ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.00

14 /r

Related Instructions
(V)UNPCKHPD, (V)UNPCKHPS, (V)UNPCKLPD
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

Instruction Reference

X
A
S
S

X
A
S
S

X

S
S
S
S
S

S
S
S
S
S

S

S

S

S

A
X

S
S
A
A
A
X
X
X
X
S
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Memory operand not 16-byte aligned and MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

UNPCKLPS, VUNPCKLPS

593

AMD64 Technology

26568—Rev. 3.22—May 2018

VBROADCASTF128

Load With Broadcast
From 128-bit Memory Location

Loads double-precision floating-point data from a 128-bit memory location and writes it to the two
128-bit elements of a YMM register
This extended-form instruction has a single 256-bit encoding.
The source operand is a 128-bit memory location. The destination is a YMM register.
Instruction Support
Form

Subset

VBROADCASTF128

AVX

Feature Flag
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding

VBROADCASTF128 ymm1, mem128

VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.02

0.1111.1.01

1A /r

Related Instructions
VBROADCASTSD, VBROADCASTSS
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS

594

A
A
A
A
A
A
A
A
A
A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
VEX.vvvv ! = 1111b.
VEX.L = 0.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.

VBROADCASTF128

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception
General protection, #GP
Page fault, #PF
Alignment check, #AC
A — AVX exception.

Instruction Reference

Mode
Real Virt Prot
A
A
A
A

Cause of Exception
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

VBROADCASTF128

595

AMD64 Technology

26568—Rev. 3.22—May 2018

VBROADCASTI128

Load With Broadcast Integer
From 128-bit Memory Location

Loads data from a 128-bit memory location and writes it to the two 128-bit elements of a YMM register
There is a single form of this instruction:
VBROADCASTI128 dest, mem128

There is a single VEX.L = 1 encoding of this instruction.
The source operand is a 128-bit memory location. The destination is a YMM register.
Instruction Support
Form

Subset

VBROADCASTI128

AVX2

Feature Flag
Fn0000_00007_EBX[AVX2]_x0 (bit 5)

Instruction Encoding
Encoding
Mnemonic
VBROADCASTI128 ymm1, mem128

VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.02

0.1111.1.01

5A /r

Related Instructions
VBROADCASTF128, VEXTRACTF128, VEXTRACTI128, VINSERTF128, VINSERTI128
rFLAGS Affected
None
MXCSR Flags Affected
None

596

VBROADCASTI128

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
A — AVX exception.

Instruction Reference

A
A
A
A
A
A
A
A
A
A
A
A
A
A
A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
VEX.vvvv ! = 1111b.
VEX.L = 0.
Register-based source operand specified (MODRM.mod = 11b)
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

VBROADCASTI128

597

AMD64 Technology

26568—Rev. 3.22—May 2018

VBROADCASTSD

Load With Broadcast Scalar Double

Loads a double-precision floating-point value from a register or memory and writes it to the four 64bit elements of a YMM register
This extended-form instruction has a single 256-bit encoding.
The source operand is the lower half of an XMM register or a 64-bit memory location. The destination is a YMM register.
Instruction Support
Form

Subset

Feature Flag

VBROADCASTSD ymm1, mem64

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VBROADCASTSD ymm1, xmm

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding

VBROADCASTSD ymm1, xmm2/mem64

VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.02

0.1111.1.01

19 /r

Related Instructions
VBROADCASTF128, VBROADCASTSS
rFLAGS Affected
None
MXCSR Flags Affected
None

598

VBROADCASTSD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
A — AVX, AVX2 exception.

Instruction Reference

A
A
A
A
A
A
A
A
A
A
A
A
A
A
A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
VEX.vvvv ! = 1111b.
VEX.L = 0.
Register-based source operand specified when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

VBROADCASTSD

599

AMD64 Technology

26568—Rev. 3.22—May 2018

VBROADCASTSS

Load With Broadcast Scalar Single

Loads a single-precision floating-point value from a register or memory and writes it to all 4 or 8 doublewords of an XMM or YMM register.
This extended-form instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Copies the source operand to all four 32-bit elements of the destination.
The source operand is the least-significant 32 bits of an XMM register or a 32-bit memory location.
The destination is an XMM register.
YMM Encoding

Copies the source operand to all eight 32-bit elements of the destination.
The source operand is the least-significant 32 bits of an XMM register or a 32-bit memory location.
The destination is a YMM register.
Instruction Support
Form

Subset

Feature Flag

VBROADCASTSS mem32

AVX

CPUID Fn0000_0001_ECX[AVX] (bit 28)

VBROADCASTSS xmm

AVX2

CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VBROADCASTSS xmm1, xmm2/mem32

C4

RXB.02

0.1111.0.01

18 /r

VBROADCASTSS ymm1, xmm2/mem32

C4

RXB.02

0.1111.1.01

18 /r

Related Instructions
VBROADCASTF128, VBROADCASTSD
rFLAGS Affected
None
MXCSR Flags Affected
None

600

VBROADCASTSS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
A — AVX, AVX2 exception.

Instruction Reference

A
A
A
A
A
A
A
A
A
A
A
A
A
A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
VEX.vvvv ! = 1111b.
MODRM.mod = 11b when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

VBROADCASTSS

601

AMD64 Technology

26568—Rev. 3.22—May 2018

VCVTPH2PS

Convert Packed 16-Bit Floating-Point to
Single-Precision Floating-Point

Converts packed 16-bit floating point values to single-precision floating point values.
A denormal source operand is converted to a normal result in the destination register. MXCSR.DAZ
is ignored and no MXCSR denormal exception is reported.
Because the full range of 16-bit floating-point encodings, including denormal encodings, can be represented exactly in single-precision format, rounding, inexact results, and denormalized results are
not applicable.
The operation of this instruction is illustrated in the following diagram.
VCVTPH2PS
128-Bit

src = xmm2/mem64

127

6463

127
255

96 95

64 63

16 15

0

convert

32 31

0

128
0s

dest = xmm1

VCVTPH2PS
256-Bit

src = xmm2/
mem128

127 112 111 96 95

convert

255

32 31

convert

convert

convert

48 47

convert
convert

224 223

convert

192 191

80 79

convert

64 63

128 127

96 95

32 31

convert

convert

160 159

48 47

64 63

16 15

0

convert

32 31

0

dest = ymm1

This extended-form instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Converts four packed 16-bit floating-point values in the low-order 64 bits of an XMM register or in a
64-bit memory location to four packed single-precision floating-point values and writes the converted
values to an XMM destination register. When the result operand is written to the destination register,
the upper 128 bits of the corresponding YMM register are zeroed.

602

VCVTPH2PS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

YMM Encoding

Converts eight packed 16-bit floating-point values in the low-order 128 bits of a YMM register or in a
128-bit memory location to eight packed single-precision floating-point values and writes the converted values to a YMM destination register.
Instruction Support
Form

Subset

VCVTPH2PS

F16C

Feature Flag
CPUID Fn0000_0001_ECX[F16C] (bit 29)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VCVTPH2PS xmm1, xmm2/mem64

C4

RXB.02

0.1111.0.01

13 /r

VCVTPH2PS ymm1, xmm2/mem128

C4

RXB.02

0.1111.1.01

13 /r

Related Instructions
VCVTPS2PH
rFLAGS Affected
None

Instruction Reference

VCVTPH2PS

603

AMD64 Technology

26568—Rev. 3.22—May 2018

MXCSR Flags Affected
MM

FZ

RC

PM

UM

OM

ZM

DM

IM

DAZ

PE

UE

OE

ZE

DE

IE
M

17

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception

Mode
F
F

F

Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.

F

Invalid opcode, #UD

Cause of Exception

Real Virt Prot

CR4.OSXSAVE = 0, indicated by CPUID
Fn0000_0001_ECX[OSXSAVE].

F

XFEATURE_ENABLED_MASK[2:1] ! = 11b.

F

VEX.W field = 1.

A

VEX.vvvv ! = 1111b.

F

REX, F2, F3, or 66 prefix preceding VEX prefix.

F

Lock prefix (F0h) preceding opcode.

F

Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.

Device not available, #NM

F

CR0.TS = 1.

Stack, #SS

F

Memory address exceeding stack segment limit or non-canonical.

F

Memory address exceeding data segment limit or non-canonical.

General protection, #GP

F

Null data segment used to reference memory.

Alignment check, #AC

F

Unaligned memory reference when alignment checking enabled.

Page fault, #PF

F

Instruction execution caused a page fault.

SIMD Floating-Point
Exception, #XF

F

Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid-operation exception
(IE)

F

A source operand was an SNaN value.

F

Undefined operation.

Denormalized-operand
exception (DE)

F

A source operand was a denormal value.

Overflow exception (OE)

F

Rounded result too large to fit into the format of the destination operand.

Underflow exception (UE)

F

Rounded result too small to fit into the format of the destination operand.

Precision exception (PE)

F

A result could not be represented exactly in the destination format.

F — F16C exception.

604

VCVTPH2PS

Instruction Reference

26568—Rev. 3.22—May 2018

VCVTPS2PH

AMD64 Technology

Convert Packed Single-Precision Floating-Point
to 16-Bit Floating-Point

Converts packed single-precision floating-point values to packed 16-bit floating-point values and
writes the converted values to the destination register or to memory. An 8-bit immediate operand provides dynamic control of rounding.
The operation of this instruction is illustrated in the following diagram.
VCVTPS2PH
128-Bit
127

96 95

64 63

32 31

0

src = xmm2

convert
convert

255

128

convert

round

imm8

convert

127

6463

48 47

32 31

16 15

0

0s

0s

dest = xmm1/mem64

VCVTPS2PH
256-Bit
src = ymm2
255

224 223

192 191

160 159

128 127

96 95

64 63

32 31

0

convert
convert
convert
convert
convert
convert

imm8

convert

round

convert
128

255

127 112 111 96 95

0s

Instruction Reference

80 79

64 63

48 47

32 31

1615

0

dest = xmm1/mem128

VCVTPS2PH

605

AMD64 Technology

26568—Rev. 3.22—May 2018

The handling of rounding is controlled by fields in the immediate byte, as shown in the following
table.
Rounding Control with Immediate Byte Operand

Mnemonic

Rounding
Source
(RS)

Bit

2

0

1

Value

Rounding Control
(RC)
1

0

Description

0

0

Nearest

0

1

Down

1

0

Up

1

1

Truncate

X

X

Use MXCSR.RC for
rounding.

Notes

Ignore MXCSR.RC.

MXCSR[FTZ] has no effect on this instruction. Values within the half-precision denormal range are
unconditionally converted to denormals.
This extended-form instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Converts four packed single-precision floating-point values in an XMM register to four packed 16-bit
floating-point values and writes the converted values to the low-order 64 bits of the destination XMM
register or to a 64-bit memory location. When the result is written to the destination XMM register,
the high-order 64 bits in the destination XMM register and the upper 128 bits of the corresponding
YMM register are cleared to 0s.
YMM Encoding

Converts eight packed single-precision floating-point values in a YMM register to eight packed 16bit floating-point values and writes the converted values to the low-order 128 bits of a YMM register
or to a 128-bit memory location. When the result is written to the destination YMM register, the highorder 128 bits in the register are cleared to 0s.
Instruction Support
Form

Subset

VCVTPH2PH

F16C

Feature Flag
CPUID Fn0000_0001_ECX[F16C] (bit 29)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

606

VCVTPS2PH

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic

Encoding
VEX RXB.map_select W.vvvv.L.pp

Opcode

VCVTPS2PH xmm1/mem64, xmm2, imm8

C4

RXB.03

0.1111.0.01

1D /r /imm8

VCVTPS2PH xmm1/mem128, ymm2, imm8

C4

RXB.03

0.1111.1.01

1D /r /imm8

Related Instructions
VCVTPH2PS
rFLAGS Affected
None
MXCSR Flags Affected
MM

17

FZ

15

RC

14

PM

13

12

UM

11

OM

10

ZM

9

DM

8

IM

7

DAZ

6

PE

UE

OE

M

M

M

5

4

3

ZE

2

DE

IE

M

M

1

0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Instruction Reference

VCVTPS2PH

607

AMD64 Technology

Exception

26568—Rev. 3.22—May 2018

Mode

Cause of Exception

Real Virt Prot
F
F

F

Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.

F

CR4.OSXSAVE = 0, indicated by CPUID
Fn0000_0001_ECX[OSXSAVE].

F

XFEATURE_ENABLED_MASK[2:1] ! = 11b.

F

VEX.W field = 1.

A

VEX.vvvv ! = 1111b.

F

REX, F2, F3, or 66 prefix preceding VEX prefix.

F

Lock prefix (F0h) preceding opcode.

F

Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.

Device not available, #NM

F

CR0.TS = 1.

Stack, #SS

F

Memory address exceeding stack segment limit or non-canonical.

F

Memory address exceeding data segment limit or non-canonical.

F

Null data segment used to reference memory.

Alignment check, #AC

F

Unaligned memory reference when alignment checking enabled.

Page fault, #PF

F

Instruction execution caused a page fault.

SIMD Floating-Point
Exception, #XF

F

Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

Invalid opcode, #UD

General protection, #GP

SIMD Floating-Point Exceptions
Invalid-operation exception
(IE)

F

A source operand was an SNaN value.

F

Undefined operation.

Denormalized-operand
exception (DE)

F

A source operand was a denormal value.

Overflow exception (OE)

F

Rounded result too large to fit into the format of the destination operand.

Underflow exception (UE)

F

Rounded result too small to fit into the format of the destination operand.

Precision exception (PE)

F

A result could not be represented exactly in the destination format.

F — F16C exception.

608

VCVTPS2PH

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

VEXTRACTF128

Extract
Packed Floating-Point Values

Extracts 128 bits of packed data from a YMM register as specified by an immediate byte operand, and
writes it to either an XMM register or a 128-bit memory location.
Only bit [0] of the immediate operand is used. Operation is as follows.
• When imm8[0] = 0, copy bits [127:0] of the source to the destination.
• When imm8[0] = 1, copy bits [255:128] of the source to the destination.
This extended-form instruction has a single 256-bit encoding.
The source operand is a YMM register and the destination is either an XMM register or a 128-bit
memory location. There is a third immediate byte operand.
Instruction Support
Form

Subset

VEXTRACTF128

AVX

Feature Flag
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding

VEXTRACTF128 xmm/mem128, ymm, imm8

VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.03

0.1111.1.01

19 /r ib

Related Instructions
VBROADCASTF128, VINSERTF128
rFLAGS Affected
None
MXCSR Flags Affected
None

Instruction Reference

VEXTRACTF128

609

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
A — AVX exception.

610

A
A
A
A
A
A
A
A
A
A
A
A
A
A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
VEX.L = 0.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Write to a read-only data segment.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VEXTRACTF128

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

VEXTRACTI128

Extract 128-bit Integer

Writes a selected 128-bit half of a YMM register to an XMM register or a 128-bit memory location
based on the value of bit 0 of an immediate byte.
There is a single form of this instruction:
VEXTRACTI128 dest, src, imm8

If imm8[0] = 0, the lower half of the source YMM register is selected; if imm8[0] = 1, the upper half
of the source register is selected.
There is a single VEX.L = 1 encoding of this instruction.
The source operand is a YMM register. The destination is either an XMM register or a 128-bit memory location. When the destination is a register, bits [255:128] of the YMM register that corresponds
to the destination are cleared.
Instruction Support
Form

Subset

VEXTRACTI128

AVX2

Feature Flag
Fn0000_00007_EBX[AVX2]_x0 (bit 5)

Instruction Encoding
Encoding
Mnemonic
VEXTRACTI128 xmm1/mem128, ymm2, imm8

VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.03

0.1111.1.01

39 /r ib

Related Instructions
VBROADCASTF128, VBROADCASTI128, VEXTRACTF128, VINSERTF128, VINSERTI128
rFLAGS Affected
None
MXCSR Flags Affected
None

Instruction Reference

VEXTRACTI128

611

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
A — AVX exception.

612

A
A
A
A
A
A
A
A
A
A
A
A
A
A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
VEX.vvvv ! = 1111b.
VEX.L = 0.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

VEXTRACTI128

Instruction Reference

26568—Rev. 3.22—May 2018

VFMADDPD
VFMADD132PD
VFMADD213PD
VFMADD231PD

AMD64 Technology

Multiply and Add
Packed Double-Precision Floating-Point

Multiplies together two double-precision floating-point vectors and adds the unrounded product to a
third double-precision floating-point vector producing a precise result which is then rounded to double-precision based on the mode specified by the MXCSR[RC] field. The rounded sum is written to
the destination register. The role of each of the source operands specified by the assembly language
prototypes given below is reflected in the vector equation in the comment on the right.
There are two four-operand forms:
VFMADDPD dest, src1, src2/mem, src3
VFMADDPD dest, src1, src2, src3/mem

// dest = (src1* src2/mem) + src3
// dest = (src1* src2) + src3/mem

and three three-operand forms:
VFMADD132PD scr1, src2, src3/mem
VFMADD213PD scr1, src2, src3/mem
VFMADD231PD scr1, src2, src3/mem

// src1 = (src1* src3/mem) + src2
// src1 = (src2* src1) + src3/mem
// src1 = (src2* src3/mem) + src1

When VEX.L = 0, the vector size is 128 bits (two double-precision elements per vector) and registerbased source operands are held in XMM registers.
When VEX.L = 1, the vector size is 256 bits (four double-precision elements per vector) and registerbased source operands are held in YMM registers.
For the four-operand forms, VEX.W determines operand configuration.
• When VEX.W = 0, the second source is either a register or a memory location and the third source
is a register.
• When VEX.W = 1, the second source is a register and the third source is either a register or a
memory location.
For the three-operand forms, VEX.W is 1. The first and second operands are registers and the third
operand is either a register or a memory location.
The destination is either an XMM register or a YMM register, as determined by VEX.L. When the
destination is an XMM register (L = 0), bits [255:128] of the corresponding YMM register are
cleared.
Instruction Support
Form

Subset

Feature Flag

VFMADDPD

FMA4

CPUID Fn8000_0001_ECX[FMA4] (bit 16)

VFMADDnnnPD

FMA

CPUID Fn0000_0001_ECX[FMA] (bit 12)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

Instruction Reference

VFMADDPD, VFMADDnnnPD

613

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Encoding
Mnemonic

Encoding
VEX RXB.map_select W.vvvv.L.pp

Opcode

VFMADDPD xmm1, xmm2, xmm3/mem128, xmm4

C4

RXB.03

0.src1.0.01

69 /r /is4

VFMADDPD ymm1, ymm2, ymm3/mem256, ymm4

C4

RXB.03

0.src1.1.01

69 /r /is4

VFMADDPD xmm1, xmm2, xmm3, xmm4/mem128

C4

RXB.03

1.src1.0.01

69 /r /is4

VFMADDPD ymm1, ymm2, ymm3, ymm4/mem256

C4

RXB.03

1.src1.1.01

69 /r /is4

VFMADD132PD xmm0, xmm1, xmm2/m128

C4

RXB.02

1.src2.0.01

98 /r

VFMADD132PD ymm0, ymm1, ymm2/m256

C4

RXB.02

1.src2.1.01

98 /r

VFMADD213PD xmm0, xmm1, xmm2/m128

C4

RXB.02

1.src2.0.01

A8 /r

VFMADD213PD ymm0, ymm1, ymm2/m256

C4

RXB.02

1.src2.1.01

A8 /r

VFMADD231PD xmm0, xmm1, xmm2/m128

C4

RXB.02

1.src2.0.01

B8 /r

VFMADD231PD ymm0, ymm1, ymm2/m256

C4

RXB.02

1.src2.1.01

B8 /r

Related Instructions
VFMADDPS, VFMADD132PS, VFMADD213PS, VFMADD231PS, VFMADDSD,
VFMADD132SD, VFMADD213SD, VFMADD231SD, VFMADDSS, VFMADD132SS,
VFMADD213SS, VFMADD231SS
rFLAGS Affected
None
MXCSR Flags Affected
MM

FZ

17

15

Note:

614

RC
14

13

PM

UM

OM

ZM

DM

IM

DAZ

12

11

10

9

8

7

6

PE

UE

OE

M

M

M

5

4

3

ZE
2

DE

IE

M

M

1

0

A flag that may be set or cleared is M (modified). Unaffected flags are blank.

VFMADDPD, VFMADDnnnPD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
F
F

Invalid opcode, #UD

F
F
F
F
F
F

Device not available, #NM
Stack, #SS

Page fault, #PF
Alignment check, #AC

F
F
F
F
F
F

SIMD floating-point, #XF

F

General protection, #GP

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
FMA instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
F — FMA, FMA4 exception

Instruction Reference

F
F
F
F
F
F

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

VFMADDPD, VFMADDnnnPD

615

AMD64 Technology

26568—Rev. 3.22—May 2018

VFMADDPS
VFMADD132PS
VFMADD213PS
VFMADD231PS

Multiply and Add
Packed Single-Precision Floating-Point

Multiplies together two single-precision floating-point vectors and adds the unrounded product to a
third single-precision floating-point vector producing a precise result which is then rounded to singleprecision based on the mode specified by the MXCSR[RC] field. The rounded sum is written to the
destination register. The role of each of the source operands specified by the assembly language prototypes given below is reflected in the vector equation in the comment on the right.
There are two four-operand forms:
VFMADDPS dest, src1, src2/mem, src3
VFMADDPS dest, src1, src2, src3/mem

// dest = (src1* src2/mem) + src3
// dest = (src1* src2) + src3/mem

and three three-operand forms:
VFMADD132PS scr1, src2, src3/mem
VFMADD213PS scr1, src2, src3/mem
VFMADD231PS scr1, src2, src3/mem

// src1 = (src1* src3/mem) + src2
// src1 = (src2* src1) + src3/mem
// src1 = (src2* src3/mem) + src1

When VEX.L = 0, the vector size is 128 bits (four single-precision elements per vector) and registerbased source operands are held in XMM registers.
When VEX.L = 1, the vector size is 256 bits (eight single-precision elements per vector) and registerbased source operands are held in YMM registers.
For the four-operand forms, VEX.W determines operand configuration.
• When VEX.W = 0, the second source is either a register or a memory location and the third source
is a register.
• When VEX.W = 1, the second source is a register and the third source is either a register or a
memory location.
For the three-operand forms, VEX.W is 0. The first and second operands are registers and the third
operand is either a register or a memory location.
The destination is either an XMM register or a YMM register, as determined by VEX.L. When the
destination is an XMM register (L = 0), bits [255:128] of the corresponding YMM register are
cleared.
Instruction Support
Form

Subset

Feature Flag

VFMADDPS

FMA4

CPUID Fn8000_0001_ECX[FMA4] (bit 16)

VFMADDnnnPS

FMA

CPUID Fn0000_0001_ECX[FMA] (bit 12)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

616

VFMADDPS, VFMADDnnnPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic

Encoding
VEX RXB.map_select W.vvvv.L.pp

Opcode

VFMADDPS xmm1, xmm2, xmm3/mem128, xmm4

C4

RXB.03

0.src1.0.01

68 /r /is4

VFMADDPS ymm1, ymm2, ymm3/mem256, ymm4

C4

RXB.03

0.src1.1.01

68 /r /is4

VFMADDPS xmm1, xmm2, xmm3, xmm4/mem128

C4

RXB.03

1.src1.0.01

68 /r /is4

VFMADDPS ymm1, ymm2, ymm3, ymm4/mem256

C4

RXB.03

1.src1.1.01

68 /r /is4

VFMADD132PS xmm0, xmm1, xmm2/m128

C4

RXB.02

0.src2.0.01

98 /r

VFMADD132PS ymm0, ymm1, ymm2/m256

C4

RXB.02

0.src2.1.01

98 /r

VFMADD213PS xmm0, xmm1, xmm2/m128

C4

RXB.02

0.src2.0.01

A8 /r

VFMADD213PS ymm0, ymm1, ymm2/m256

C4

RXB.02

0.src2.1.01

A8 /r

VFMADD231PS xmm0, xmm1, xmm2/m128

C4

RXB.02

0.src2.0.01

B8 /r

VFMADD231PS ymm0, ymm1, ymm2/m256

C4

RXB.02

0.src2.1.01

B8 /r

Related Instructions
VFMADDPD, VFMADD132PD, VFMADD213PD, VFMADD231PD, VFMADDSD,
VFMADD132SD, VFMADD213SD, VFMADD231SD, VFMADDSS, VFMADD132SS,
VFMADD213SS, VFMADD231SS
rFLAGS Affected
None
MXCSR Flags Affected
MM

FZ

17

15

Note:

RC
14

13

PM

UM

OM

ZM

DM

IM

DAZ

12

11

10

9

8

7

6

PE

UE

OE

M

M

M

5

4

3

ZE
2

DE

IE

M

M

1

0

A flag that may be set or cleared is M (modified). Unaffected flags are blank.

Instruction Reference

VFMADDPS, VFMADDnnnPS

617

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
F
F

Invalid opcode, #UD

F
F
F
F
F
F

Device not available, #NM
Stack, #SS

Page fault, #PF
Alignment check, #AC

F
F
F
F
F
F

SIMD floating-point, #XF

F

General protection, #GP

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
FMA instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
F — FMA, FMA4 exception

618

F
F
F
F
F
F

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

VFMADDPS, VFMADDnnnPS

Instruction Reference

26568—Rev. 3.22—May 2018

VFMADDSD
VFMADD132SD
VFMADD213SD
VFMADD231SD

AMD64 Technology

Multiply and Add
Scalar Double-Precision Floating-Point

Multiplies together two double-precision floating-point values and adds the unrounded product to a
third double-precision floating-point value producing a precise result which is then rounded to double-precision based on the mode specified by the MXCSR[RC] field. The rounded sum is written to
the destination register. The role of each of the source operands specified by the assembly language
prototypes given below is reflected in the equation in the comment on the right.
There are two four-operand forms:
VFMADDSD dest, src1, src2/mem64, src3
VFMADDSD dest, src1, src2, src3/mem64

// dest = (src1* src2/mem64) + src3
// dest = (src1* src2) + src3/mem64

and three three-operand forms:
VFMADD132SD scr1, src2, src3/mem64
VFMADD213SD scr1, src2, src3/mem64
VFMADD231SD scr1, src2, src3/mem64

// src1 = (src1* src3/mem64) + src2
// src1 = (src2* src1) + src3/mem64
// src1 = (src2* src3/mem64) + src1

All 64-bit double-precision floating-point register-based operands are held in the lower quadword of
XMM registers. The result is written to the lower quadword of the destination register. For those
instructions that use a memory-based operand, one of the source operands is a 64-bit value read from
memory.
For the four-operand forms, VEX.W determines operand configuration.
• When VEX.W = 0, the second source is either a register or a 64-bit memory location and the third
source is a register.
• When VEX.W = 1, the second source is a register and the third source is either a register or a 64-bit
memory location.
For the three-operand forms, VEX.W is 1. The first and second operands are registers and the third
operand is either a register or a 64-bit memory location.
The destination is an XMM register. When the result is written to the destination XMM register, bits
[127:64] of the destination and bits [255:128] of the corresponding YMM register are cleared.
Instruction Support
Form

Subset

Feature Flag

VFMADDSD

FMA4

CPUID Fn8000_0001_ECX[FMA4] (bit 16)

VFMADDnnnSD

FMA

CPUID Fn0000_0001_ECX[FMA] (bit 12)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

Instruction Reference

VFMADDSD, VFMADDnnnSD

619

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Encoding
Mnemonic

Encoding
VEX RXB.map_select W.vvvv.L.pp

Opcode

VFMADDSD xmm1, xmm2, xmm3/mem128, xmm4

C4

RXB.03

0.src1.X.01

6B /r /is4

VFMADDSD xmm1, xmm2, xmm3, xmm4/mem128

C4

RXB.03

1.src1.X.01

6B /r /is4

VFMADD132SD xmm0, xmm1, xmm2/m128

C4

RXB.02

1.src2.X.01

99 /r

VFMADD213SD xmm0, xmm1, xmm2/m128

C4

RXB.02

1.src2.X.01

A9 /r

VFMADD231SD xmm0, xmm1, xmm2/m128

C4

RXB.02

1.src2.X.01

B9 /r

Related Instructions
VFMADDPD, VFMADD132PD, VFMADD213PD, VFMADD231PD, VFMADDPS,
VFMADD132PS, VFMADD213PS, VFMADD231PS, VFMADDSS, VFMADD132SS,
VFMADD213SS, VFMADD231SS
rFLAGS Affected
None
MXCSR Flags Affected
MM
17
Note:

620

FZ
15

RC
14

PM
13

12

UM
11

OM
10

ZM
9

DM
8

IM
7

DAZ
6

PE

UE

OE

M

M

M

5

4

3

ZE
2

DE

IE

M

M

1

0

A flag that may be set or cleared is M (modified). Unaffected flags are blank.

VFMADDSD, VFMADDnnnSD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
F
F

Invalid opcode, #UD

F
F
F
F
F
F

Device not available, #NM
Stack, #SS

Page fault, #PF
Alignment check, #AC

F
F
F
F
F
F

SIMD floating-point, #XF

F

General protection, #GP

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
FMA instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Non-aligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
F — FMA, FMA4 exception

Instruction Reference

F
F
F
F
F
F

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

VFMADDSD, VFMADDnnnSD

621

AMD64 Technology

26568—Rev. 3.22—May 2018

VFMADDSS
VFMADD132SS
VFMADD213SS
VFMADD231SS

Multiply and Add
Scalar Single-Precision Floating-Point

Multiplies together two single-precision floating-point values and adds the unrounded product to a
third single-precision floating-point value producing a precise result which is then rounded to singleprecision based on the mode specified by the MXCSR[RC] field. The rounded sum is written to the
destination register. The role of each of the source operands specified by the assembly language prototypes given below is reflected in the equation in the comment on the right.
There are two four-operand forms:
VFMADDSS dest, src1, src2/mem32, src3
VFMADDSS dest, src1, src2, src3/mem32

// dest = (src1* src2/mem32) + src3
// dest = (src1* src2) + src3/mem32

and three three-operand forms:
VFMADD132SS scr1, src2, src3/mem32
VFMADD213SS scr1, src2, src3/mem32
VFMADD231SS scr1, src2, src3/mem32

// src1 = (src1* src3/mem32) + src2
// src1 = (src2* src1) + src3/mem32
// src1 = (src2* src3/mem32) + src1

All 32-bit single-precision floating-point register-based operands are held in the lower doubleword of
XMM registers. The result is written to the low doubleword of the destination register. For those
instructions that use a memory-based operand, one of the source operands is a 32-bit value read from
memory.
For the four-operand forms, VEX.W determines operand configuration.
• When VEX.W = 0, the second source is either a register or a 32-bit memory location and the third
source is a register.
• When VEX.W = 1, the second source is a a register and the third source is either a register or a 32bit memory location.
For the three-operand forms, VEX.W is 0. The first and second operands are registers and the third
operand is either a register or a 32-bit memory location.
The destination is an XMM register. When the result is written to the destination XMM register, bits
[127:32] of the destination and bits [255:128] of the corresponding YMM register are cleared.
Instruction Support
Form

Subset

Feature Flag

VFMADDSS

FMA4

CPUID Fn8000_0001_ECX[FMA4] (bit 16)

VFMADDnnnSS

FMA

CPUID Fn0000_0001_ECX[FMA] (bit 12)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

622

VFMADDSS, VFMADDnnnSS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic

Encoding
VEX RXB.map_select

W.vvvv.L.pp Opcode

VFMADDSS xmm1, xmm2, xmm3/mem32, xmm4

C4

RXB.03

0.src1.X.01

6A /r /is4

VFMADDSS xmm1, xmm2, xmm3, xmm4/mem32

C4

RXB.03

1.src1.X.01

6A /r /is4

VFMADD132SS xmm1, xmm2, xmm3/mem32

C4

RXB.02

0.src2.X.01

99 /r

VFMADD213SS xmm1, xmm2, xmm3/mem32

C4

RXB.02

0.src2.X.01

A9 /r

VFMADD231SS xmm1, xmm2, xmm3/mem32

C4

RXB.02

0.src2.X.01

B9 /r

Related Instructions
VFMADDPD, VFMADD132PD, VFMADD213PD, VFMADD231PD, VFMADDPS,
VFMADD132PS, VFMADD213PS, VFMADD231PS, VFMADDSD, VFMADD132SD,
VFMADD213SD, VFMADD231SD
rFLAGS Affected
None
MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM
11

OM
10

ZM
9

DM
8

IM
7

DAZ
6

PE

UE

OE

M

M

M

5

4

3

ZE
2

DE

IE

M

M

1

0

A flag that may be set or cleared is M (modified). Unaffected flags are blank.

Instruction Reference

VFMADDSS, VFMADDnnnSS

623

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
F
F

Invalid opcode, #UD

F
F
F
F
F
F

Device not available, #NM
Stack, #SS

Page fault, #PF
Alignment check, #AC

F
F
F
F
F
F

SIMD floating-point, #XF

F

General protection, #GP

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
FMA instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Non-aligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
F — FMA, FMA4 exception

624

F
F
F
F
F
F

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

VFMADDSS, VFMADDnnnSS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

VFMADDSUBPD
VFMADDSUB132PD
VFMADDSUB213PD
VFMADDSUB231PD

Multiply with Alternating Add/Subtract
Packed Double-Precision Floating-Point

Multiplies together two double-precision floating-point vectors, adds odd elements of the unrounded
product to odd elements of a third double-precision floating-point vector, and subtracts even elements
of the third floating point vector from even elements of unrounded product. The precise result of each
addition or subtraction is then rounded to double-precision based on the mode specified by the
MXCSR[RC] field and written to the corresponding element of the destination.
The role of each of the source operands specified by the assembly language prototypes given below is
reflected in the equation in the comment on the right.
There are two four-operand forms:
VFMADDSUBPD dest, src1, src2/mem, src3
VFMADDSUBPD dest, src1, src2, src3/mem

// destodd = (src1odd* src2odd/memodd) + src3odd
// desteven = (src1even * src2even /memeven ) − src3even
// destodd = (src1odd* src2odd) + src3odd/memodd
// desteven = (src1even* src2even) − src3even/memeven

and three three-operand forms:
VFMADDSUB132PD scr1, src2, src3/mem
VFMADDSUB213PD scr1, src2, src3/mem
VFMADDSUB231PD scr1, src2, src3/mem

// src1odd = (src1odd * src3odd /memodd ) + src2odd
// src1even = (src1even* src3even/memeven) − src2even
// src1odd = (src2odd * src1odd ) + src3odd /memodd
// src1even = (src2even* src1even) − src3even/memeven
// src1odd = (src2odd * src3odd /memodd ) + src1odd
// src1even = (src2even* src3even/memeven) − src1even

When VEX.L = 0, the vector size is 128 bits (two double-precision elements per vector) and registerbased source operands are held in XMM registers.
When VEX.L = 1, the vector size is 256 bits (four double-precision elements per vector) and registerbased source operands are held in YMM registers.
For the four-operand forms, VEX.W determines operand configuration.
• When VEX.W = 0, the second source is either a register or a memory location and the third source
is a register.
• When VEX.W = 1, the second source is a register and the third source is either a register or a
memory location.
For the three-operand forms, VEX.W is 1. The first and second operands are registers and the third
operand is either a register or a memory location.
The destination is either an XMM register or a YMM register, as determined by VEX.L. When the
destination is an XMM register (L = 0), bits [255:128] of the corresponding YMM register are
cleared.

Instruction Reference

VFMADDSUBPD, VFMADDSUBnnnPD

625

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Support
Form

Subset

Feature Flag

VFMADDSUBPD

FMA4

CPUID Fn8000_0001_ECX[FMA4] (bit 16)

VFMADDSUBnnnPD

FMA

CPUID Fn0000_0001_ECX[FMA] (bit 12)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
VEX RXB.map_select W.vvvv.L.pp Opcode

VFMADDSUBPD xmm1, xmm2, xmm3/mem128, xmm4

C4

RXB.03

0.src1.0.01 5D /r /is4

VFMADDSUBPD ymm1, ymm2, ymm3/mem256, ymm4

C4

RXB.03

0.src1.1.01 5D /r /is4

VFMADDSUBPD xmm1, xmm2, xmm3, xmm4/mem128

C4

RXB.03

1.src1.0.01 5D /r /is4

VFMADDSUBPD ymm1, ymm2, ymm3, ymm4/mem256

C4

RXB.03

1.src1.1.01 5D /r /is4

VFMADDSUB132PD xmm1, xmm2, xmm3/mem128

C4

RXB.02

1.src2.0.01 96 /r

VFMADDSUB132PD ymm1, ymm2, ymm3/mem256

C4

RXB.02

1.src2.1.01 96 /r

VFMADDSUB213PD xmm1, xmm2, xmm3/mem128

C4

RXB.02

1.src2.0.01 A6 /r

VFMADDSUB213PD ymm1, ymm2, ymm3/mem256

C4

RXB.02

1.src2.1.01 A6 /r

VFMADDSUB231PD xmm1, xmm2, xmm3/mem128

C4

RXB.02

1.src2.0.01 B6 /r

VFMADDSUB231PD ymm1, ymm2, ymm3/mem256

C4

RXB.02

1.src2.1.01 B6 /r

Related Instructions
VFMSUBADDPD, VFMSUBADD132PD, VFMSUBADD213PD, VFMSUBADD231PD,
VFMADDSUBPS, VFMADDSUB132PS, VFMADDSUB213PS, VFMADDSUB231PS, VFMSUBADDPS, VFMSUBADD132PS, VFMSUBADD213PS, VFMSUBADD231PS
rFLAGS Affected
None
MXCSR Flags Affected
MM
17
Note:

626

FZ
15

RC
14

PM
13

12

UM
11

OM
10

ZM
9

DM
8

IM
7

DAZ
6

PE

UE

OE

M

M

M

5

4

3

ZE
2

DE

IE

M

M

1

0

A flag that may be set or cleared is M (modified). Unaffected flags are blank.

VFMADDSUBPD, VFMADDSUBnnnPD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
F
F

Invalid opcode, #UD

F
F
F
F
F
F

Device not available, #NM
Stack, #SS

Page fault, #PF
Alignment check, #AC

F
F
F
F
F
F

SIMD floating-point, #XF

F

General protection, #GP

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
FMA instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
F — FMA, FMA4 exception

Instruction Reference

F
F
F
F
F
F

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

VFMADDSUBPD, VFMADDSUBnnnPD

627

AMD64 Technology

26568—Rev. 3.22—May 2018

VFMADDSUBPS
VFMADDSUB132PS
VFMADDSUB213PS
VFMADDSUB231PS

Multiply with Alternating Add/Subtract
Packed Single-Precision Floating-Point

Multiplies together two single-precision floating-point vectors, adds odd elements of the unrounded
product to odd elements of a third single-precision floating-point vector, and subtracts even elements
of the third floating point vector from even elements of unrounded product. The precise result of each
addition or subtraction is then rounded to single-precision based on the mode specified by the
MXCSR[RC] field and written to the corresponding element of the destination.
The role of each of the source operands specified by the assembly language prototypes given below is
reflected in the equation in the comment on the right.
There are two four-operand forms:
VFMADDSUBPS dest, src1, src2/mem, src3
VFMADDSUBPS dest, src1, src2, src3/mem

// destodd = (src1odd* src2odd/memodd) + src3odd
// desteven = (src1even * src2even /memeven ) − src3even
// destodd = (src1odd* src2odd) + src3odd/memodd
// desteven = (src1even* src2even) − src3even/memeven

and three three-operand forms:
VFMADDSUB132PS scr1, src2, src3/mem
VFMADDSUB213PS scr1, src2, src3/mem
VFMADDSUB231PS scr1, src2, src3/mem

// src1odd = (src1odd * src3odd /memodd ) + src2odd
// src1even = (src1even* src3even/memeven) − src2even
// src1odd = (src2odd * src1odd ) + src3odd /memodd
// src1even = (src2even* src1even) − src3even/memeven
// src1odd = (src2odd * src3odd /memodd ) + src1odd
// src1even = (src2even* src3even/memeven) − src1even

When VEX.L = 0, the vector size is 128 bits (four single-precision elements per vector) and registerbased source operands are held in XMM registers.
When VEX.L = 1, the vector size is 256 bits (eight single-precision elements per vector) and registerbased source operands are held in YMM registers.
For the four-operand forms, VEX.W determines operand configuration.
• When VEX.W = 0, the second source is either a register or a memory location and the third source
is a register.
• When VEX.W = 1, the second source is a register and the third source is either a register or a
memory location.
For the three-operand forms, VEX.W is 0. The first and second operands are registers and the third
operand is either a register or a memory location.
The destination is either an XMM register or a YMM register, as determined by VEX.L. When the
destination is an XMM register (L = 0), bits [255:128] of the corresponding YMM register are
cleared.

628

VFMADDSUBPS, VFMADDSUBnnnPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Support
Form

Subset

Feature Flag

VFMADDSUBPS

FMA4

CPUID Fn8000_0001_ECX[FMA4] (bit 16)

VFMADDSUBnnnPS

FMA

CPUID Fn0000_0001_ECX[FMA] (bit 12)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
VEX RXB.map_select W.vvvv.L.pp

Opcode

VFMADDSUBPS xmm1, xmm2, xmm3/mem128, xmm4

C4

RXB.03

0.src1.0.01

5C /r /is4

VFMADDSUBPS ymm1, ymm2, ymm3/mem256, ymm4

C4

RXB.03

0.src1.1.01

5C /r /is4

VFMADDSUBPS xmm1, xmm2, xmm3, xmm4/mem128

C4

RXB.03

1.src1.0.01

5C /r /is4

VFMADDSUBPS ymm1, ymm2, ymm3, ymm4/mem256

C4

RXB.03

1.src1.1.01

5C /r /is4

VFMADDSUB132PS xmm1, xmm2, xmm3/mem128

C4

RXB.02

0.src2.0.01

96 /r

VFMADDSUB132PS ymm1, ymm2, ymm3/mem256

C4

RXB.02

0.src2.1.01

96 /r

VFMADDSUB213PS xmm1, xmm2, xmm3/mem128

C4

RXB.02

0.src2.0.01

A6 /r

VFMADDSUB213PS ymm1, ymm2, ymm3/mem256

C4

RXB.02

0.src2.1.01

A6 /r

VFMADDSUB231PS xmm1, xmm2, xmm3/mem128

C4

RXB.02

0.src2.0.01

B6 /r

VFMADDSUB231PS ymm1, ymm2, ymm3/mem256

C4

RXB.02

0.src2.1.01

B6 /r

Related Instructions
VFMADDSUBPD, VFMADDSUB132PD, VFMADDSUB213PD, VFMADDSUB231PD, VFMSUBADDPD, VFMSUBADD132PD, VFMSUBADD213PD, VFMSUBADD231PD, VFMSUBADDPS, VFMSUBADD132PS, VFMSUBADD213PS, VFMSUBADD231PS
rFLAGS Affected
None
MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM
11

OM
10

ZM
9

DM
8

IM
7

DAZ
6

PE

UE

OE

M

M

M

5

4

3

ZE
2

DE

IE

M

M

1

0

A flag that may be set or cleared is M (modified). Unaffected flags are blank.

Instruction Reference

VFMADDSUBPS, VFMADDSUBnnnPS

629

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
F
F

Invalid opcode, #UD

F
F
F
F
F
F

Device not available, #NM
Stack, #SS

Page fault, #PF
Alignment check, #AC

F
F
F
F
F
F

SIMD floating-point, #XF

F

General protection, #GP

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
FMA instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
F — FMA, FMA4 exception

630

F
F
F
F
F
F

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

VFMADDSUBPS, VFMADDSUBnnnPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

VFMSUBADDPD
VFMSUBADD132PD
VFMSUBADD213PD
VFMSUBADD231PD

Multiply with Alternating Subtract/Add
Packed Double-Precision Floating-Point

Multiplies together two double-precision floating-point vectors, adds even elements of the unrounded
product to even elements of a third double-precision floating-point vector, and subtracts odd elements
of the third floating point vector from odd elements of unrounded product. The precise result of each
addition or subtraction is then rounded to double-precision based on the mode specified by the
MXCSR[RC] field and written to the corresponding element of the destination.
The role of each of the source operands specified by the assembly language prototypes given below is
reflected in the equation in the comment on the right.
There are two four-operand forms:
VFMSUBADDPD dest, src1, src2/mem, src3
VFMSUBADDPD dest, src1, src2, src3/mem

// destodd = (src1odd* src2odd/memodd) − src3odd
// desteven = (src1even * src2even /memeven ) + src3even
// destodd = (src1odd* src2odd) − src3odd/memodd
// desteven = (src1even* src2even) + src3even/memeven

and three three-operand forms:
VFMSUBADD132PD scr1, src2, src3/mem
VFMSUBADD213PD scr1, src2, src3/mem
VFMSUBADD231PD scr1, src2, src3/mem

// src1odd = (src1odd * src3odd /memodd ) − src2odd
// src1even = (src1even* src3even/memeven) + src2even
// src1odd = (src2odd * src1odd ) − src3odd /memodd
// src1even = (src2even* src1even) + src3even/memeven
// src1odd = (src2odd * src3odd /memodd ) − src1odd
// src1even = (src2even* src3even/memeven) + src1even

For VEX.L = 0, vector size is 128 bits and register-based operands are held in XMM registers. For
VEX.L = 1, vector size is 256 bits and register-based operands are held in YMM registers.
For the four-operand forms, VEX.W determines operand configuration.
• When VEX.W = 0, the second source is either a register or a memory location and the third source
is a register.
• When VEX.W = 1, the second source is a register and the third source operand is either a register
or a memory location.
For the three-operand forms, VEX.W is 1. The first and second operands are registers and the third
operand is either a register or a memory location.
The destination is either an XMM register or a YMM register, as determined by VEX.L. When the
destination is an XMM register (L = 0), bits [255:128] of the corresponding YMM register are
cleared.
Instruction Support
Form

Subset

VFMSUBADDPD

FMA4

CPUID Fn8000_0001_ECX[FMA4] (bit 16)

VFMSUBADDnnnPD

FMA

CPUID Fn0000_0001_ECX[FMA] (bit 12)

Instruction Reference

Feature Flag

VFMSUBADDPD, VFMSUBADDnnnPD

631

AMD64 Technology

26568—Rev. 3.22—May 2018

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
VEX RXB.map_select W.vvvv.L.pp Opcode

VFMSUBADDPD xmm1, xmm2, xmm3/mem128, xmm4

C4

RXB.03

0.src1.0.01 5F /r /is4

VFMSUBADDPD ymm1, ymm2, ymm3/mem256, ymm4

C4

RXB.03

0.src1.1.01 5F /r /is4

VFMSUBADDPD xmm1, xmm2, xmm3, xmm4/mem128

C4

RXB.03

1.src1.0.01 5F /r /is4

VFMSUBADDPD ymm1, ymm2, ymm3, ymm4/mem256

C4

RXB.03

1.src1.1.01 5F /r /is4

VFMSUBADD132PD xmm1, xmm2, xmm3/mem128

C4

RXB.02

1.src2.0.01 97 /r

VFMSUBADD132PD ymm1, ymm2, ymm3/mem256

C4

RXB.02

1.src2.1.01 97 /r

VFMSUBADD213PD xmm1, xmm2, xmm3/mem128

C4

RXB.02

1.src2.0.01 A7 /r

VFMSUBADD213PD ymm1, ymm2, ymm3/mem256

C4

RXB.02

1.src2.1.01 A7 /r

VFMSUBADD231PD xmm1, xmm2, xmm3/mem128

C4

RXB.02

1.src2.0.01 B7 /r

VFMSUBADD231PD ymm1, ymm2, ymm3/mem256

C4

RXB.02

1.src2.1.01 B7 /r

Related Instructions
VFMADDSUBPD, VFMADDSUB132PD, VFMADDSUB213PD, VFMADDSUB231PD,
VFMADDSUBPS, VFMADDSUB132PS, VFMADDSUB213PS, VFMADDSUB231PS, VFMSUBADDPS, VFMSUBADD132PS, VFMSUBADD213PS, VFMSUBADD231PS
rFLAGS Affected
None
MXCSR Flags Affected
MM

FZ

17

15

Note:

632

RC
14

13

PM

UM

OM

ZM

DM

IM

DAZ

12

11

10

9

8

7

6

PE

UE

OE

M

M

M

5

4

3

ZE
2

DE

IE

M

M

1

0

A flag that may be set or cleared is M (modified). Unaffected flags are blank.

VFMSUBADDPD, VFMSUBADDnnnPD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
F
F

Invalid opcode, #UD

F
F
F
F
F
F

Device not available, #NM
Stack, #SS

Page fault, #PF
Alignment check, #AC

F
F
F
F
F
F

SIMD floating-point, #XF

F

General protection, #GP

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
FMA instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
F — FMA, FMA4 exception

Instruction Reference

F
F
F
F
F
F

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

VFMSUBADDPD, VFMSUBADDnnnPD

633

AMD64 Technology

26568—Rev. 3.22—May 2018

VFMSUBADDPS
VFMSUBADD132PS
VFMSUBADD213PS
VFMSUBADD231PS

Multiply with Alternating Subtract/Add
Packed Single-Precision Floating-Point

Multiplies together two single-precision floating-point vectors, adds even elements of the unrounded
product to even elements of a third single-precision floating-point vector, and subtracts odd elements
of the third floating point vector from odd elements of unrounded product. The precise result of each
addition or subtraction is then rounded to single-precision based on the mode specified by the
MXCSR[RC] field and written to the corresponding element of the destination.
The role of each of the source operands specified by the assembly language prototypes given below is
reflected in the equation in the comment on the right.
There are two four-operand forms:
VFMSUBADDPS dest, src1, src2/mem, src3
VFMSUBADDPS dest, src1, src2, src3/mem

// destodd = (src1odd* src2odd/memodd) − src3odd
// desteven = (src1even * src2even /memeven ) + src3even
// destodd = (src1odd* src2odd) − src3odd/memodd
// desteven = (src1even* src2even) + src3even/memeven

and three three-operand forms:
VFMSUBADD132PS scr1, src2, src3/mem
VFMSUBADD213PS scr1, src2, src3/mem
VFMSUBADD231PS scr1, src2, src3/mem

// src1odd = (src1odd * src3odd /memodd ) − src2odd
// src1even = (src1even* src3even/memeven) + src2even
// src1odd = (src2odd * src1odd ) − src3odd /memodd
// src1even = (src2even* src1even) + src3even/memeven
// src1odd = (src2odd * src3odd /memodd ) − src1odd
// src1even = (src2even* src3even/memeven) + src1even

When VEX.L = 0, the vector size is 128 bits (four single-precision elements per vector) and registerbased source operands are held in XMM registers.
When VEX.L = 1, the vector size is 256 bits (eight single-precision elements per vector) and registerbased source operands are held in YMM registers.
For the four-operand forms, VEX.W determines operand configuration.
• When VEX.W = 0, the second source is either a register or a memory location and the third source
is a register.
• When VEX.W = 1, the second source is a register and the third source is either a register or a
memory location.
For the three-operand forms, VEX.W is 0. The first and second operands are registers and the third
operand is either a register or a memory location.
The destination is either an XMM register or a YMM register, as determined by VEX.L. When the
destination is an XMM register (L = 0), bits [255:128] of the corresponding YMM register are
cleared.

634

VFMSUBADDPS, VFMSUBADDnnnPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Support
Form

Subset

Feature Flag

VFMSUBADDPS

FMA4

CPUID Fn8000_0001_ECX[FMA4] (bit 16)

VFMSUBADDnnnPS

FMA

CPUID Fn0000_0001_ECX[FMA] (bit 12)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
VEX RXB.map_select W.vvvv.L.pp

Opcode

VFMSUBADDPS xmm1, xmm2, xmm3/mem128, xmm4

C4

RXB.03

0.src1.0.01

5E /r /is4

VFMSUBADDPS ymm1, ymm2, ymm3/mem256, ymm4

C4

RXB.03

0.src1.1.01

5E /r /is4

VFMSUBADDPS xmm1, xmm2, xmm3, xmm4/mem128

C4

RXB.03

1.src1.0.01

5E /r /is4

VFMSUBADDPS ymm1, ymm2, ymm3, ymm4/mem256

C4

RXB.03

1.src1.1.01

5E /r /is4

VFMSUBADD132PS xmm1, xmm2, xmm3/mem128

C4

RXB.00010

0.src2.0.01

97 /r

VFMSUBADD132PS ymm1, ymm2, ymm3/mem256

C4

RXB.00010

0.src2.1.01

97 /r

VFMSUBADD213PS xmm1, xmm2, xmm3/mem128

C4

RXB.00010

0.src2.0.01

A7 /r

VFMSUBADD213PS ymm1, ymm2, ymm3/mem256

C4

RXB.00010

0.src2.1.01

A7 /r

VFMSUBADD231PS xmm1, xmm2, xmm3/mem128

C4

RXB.00010

0.src2.0.01

B7 /r

VFMSUBADD231PS ymm1, ymm2, ymm3/mem256

C4

RXB.00010

0.src2.1.01

B7 /r

Related Instructions
VFMADDSUBPD, VFMADDSUB132PD, VFMADDSUB213PD, VFMADDSUB231PD,
VFMADDSUBPS, VFMADDSUB132PS, VFMADDSUB213PS, VFMADDSUB231PS, VFMSUBADDPD, VFMSUBADD132PD, VFMSUBADD213PD, VFMSUBADD231PD
rFLAGS Affected
None
MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM
11

OM
10

ZM
9

DM
8

IM
7

DAZ
6

PE

UE

OE

M

M

M

5

4

3

ZE
2

DE

IE

M

M

1

0

A flag that may be set or cleared is M (modified). Unaffected flags are blank.

Instruction Reference

VFMSUBADDPS, VFMSUBADDnnnPS

635

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
F
F

Invalid opcode, #UD

F
F
F
F
F
F

Device not available, #NM
Stack, #SS

Page fault, #PF
Alignment check, #AC

F
F
F
F
F
F

SIMD floating-point, #XF

F

General protection, #GP

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
FMA instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
F — FMA, FMA4 exception

636

F
F
F
F
F
F

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

VFMSUBADDPS, VFMSUBADDnnnPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

VFMSUBPD
VFMSUB132PD
VFMSUB213PD
VFMSUB231PD

Multiply and Subtract
Packed Double-Precision Floating-Point

Multiplies together two double-precision floating-point vectors and subtracts a third double-precision
floating-point vector from the unrounded product to produce a precise intermediate result. The intermediate result is then rounded to double-precision based on the mode specified by the MXCSR[RC]
field and written to the destination register. The role of each of the source operands specified by the
assembly language prototypes given below is reflected in the vector equation in the comment on the
right.
There are two four-operand forms:
VFMSUBPD dest, src1, src2/mem, src3
VFMSUBPD dest, src1, src2, src3/mem

// dest = (src1* src2/mem) − src3
// dest = (src1* src2) − src3/mem

and three three-operand forms:
VFMSUB132PD scr1, src2, src3/mem
VFMSUB213PD scr1, src2, src3/mem
VFMSUB231PD scr1, src2, src3/mem

// src1 = (src1* src3/mem) − src2
// src1 = (src2* src1) − src3/mem
// src1 = (src2* src3/mem) − src1

For VEX.L = 0, vector size is 128 bits and register-based operands are held in XMM registers. For
VEX.L = 1, vector size is 256 bits and register-based operands are held in YMM registers.
For the four-operand forms, VEX.W determines operand configuration.
• When VEX.W = 0, the second source is either a register or a memory location and the third source
is a register.
• When VEX.W = 1, the second source is a register and the third source is either a register or a
memory location.
For the three-operand forms, VEX.W is 1. The first and second operands are registers and the third
operand is either a register or a memory location.
The destination is either an XMM register or a YMM register, as determined by VEX.L. When the
destination is an XMM register (L = 0), bits [255:128] of the corresponding YMM register are
cleared.
Instruction Support
Form

Subset

Feature Flag

VFMSUBPD

FMA4

CPUID Fn8000_0001_ECX[FMA4] (bit 16)

VFMSUBnnnPD

FMA

CPUID Fn0000_0001_ECX[FMA] (bit 12)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

Instruction Reference

VFMSUBPD, VFMSUBnnnPD

637

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Encoding
Mnemonic

Encoding
VEX RXB.map_select W.vvvv.L.pp

Opcode

VFMSUBPD xmm1, xmm2, xmm3/mem128, xmm4

C4

RXB.03

0.src1.0.01

6D /r /is4

VFMSUBPD ymm1, ymm2, ymm3/mem256, ymm4

C4

RXB.03

0.src1.1.01

6D /r /is4

VFMSUBPD xmm1, xmm2, xmm3, xmm4/mem128

C4

RXB.03

1.src1.0.01

6D /r /is4

VFMSUBPD ymm1, ymm2, ymm3, ymm4/mem256

C4

RXB.03

1.src1.1.01

6D /r /is4

VFMSUB132PD xmm1, xmm2, xmm3/mem128

C4

RXB.02

1.src2.0.01

9A /r

VFMSUB132PD ymm1, ymm2, ymm3/mem256

C4

RXB.02

1.src2.1.01

9A /r

VFMSUB213PD xmm1, xmm2, xmm3/mem128

C4

RXB.02

1.src2.0.01

AA /r

VFMSUB213PD ymm1, ymm2, ymm3/mem256

C4

RXB.02

1.src2.1.01

AA /r

VFMSUB231PD xmm1, xmm2, xmm3/mem128

C4

RXB.02

1.src2.0.01

BA /r

VFMSUB231PD ymm1, ymm2, ymm3/mem256

C4

RXB.02

1.src2.1.01

BA /r

Related Instructions
VFMSUBPS, VFMSUB132PS, VFMSUB213PS, VFMSUB231PPS, VFMSUBSD,
VFMSUB132SD, VFMSUB213SD, VFMSUB2P31SD, VFMSUBSS, VFMSUB132SS,
VFMSUB213SS, VFMSUBP231SS
rFLAGS Affected
None
MXCSR Flags Affected
MM

FZ

17

15

Note:

638

RC
14

13

PM

UM

OM

ZM

DM

IM

DAZ

12

11

10

9

8

7

6

PE

UE

OE

M

M

M

5

4

3

ZE
2

DE

IE

M

M

1

0

A flag that may be set or cleared is M (modified). Unaffected flags are blank.

VFMSUBPD, VFMSUBnnnPD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
F
F

Invalid opcode, #UD

F
F
F
F
F
F

Device not available, #NM
Stack, #SS

Page fault, #PF
Alignment check, #AC

F
F
F
F
F
F

SIMD floating-point, #XF

F

General protection, #GP

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
FMA instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
F — FMA, FMA4 exception

Instruction Reference

F
F
F
F
F
F

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

VFMSUBPD, VFMSUBnnnPD

639

AMD64 Technology

26568—Rev. 3.22—May 2018

VFMSUBPS
VFMSUB132PS
VFMSUB213PS
VFMSUB231PS

Multiply and Subtract
Packed Single-Precision Floating-Point

Multiplies together two single-precision floating-point vectors and subtracts a third single-precision
floating-point vector from the unrounded product to produce a precise intermediate result. The intermediate result is then rounded to single-precision based on the mode specified by the MXCSR[RC]
field and written to the destination register. The role of each of the source operands specified by the
assembly language prototypes given below is reflected in the vector equation in the comment on the
right.
There are two four-operand forms:
VFMSUBPS dest, src1, src2/mem, src3
VFMSUBPS dest, src1, src2, src3/mem

// dest = (src1* src2/mem) − src3
// dest = (src1* src2) − src3/mem

and three three-operand forms:
VFMSUB132PS scr1, src2, src3/mem
VFMSUB213PS scr1, src2, src3/mem
VFMSUB231PS scr1, src2, src3/mem

// src1 = (src1* src3/mem) − src2
// src1 = (src2* src1) − src3/mem
// src1 = (src2* src3/mem) − src1

When VEX.L = 0, the vector size is 128 bits (four single-precision elements per vector) and registerbased source operands are held in XMM registers.
When VEX.L = 1, the vector size is 256 bits (eight single-precision elements per vector) and registerbased source operands are held in YMM registers.
For the four-operand forms, VEX.W determines operand configuration.
• When VEX.W = 0, the second source is either a register or a memory location and the third source
is a register.
• When VEX.W = 1, the second source is a a register and the third source is either a register or a
memory location.
For the three-operand forms, VEX.W is 0. The first and second operands are registers and the third
operand is either a register or a memory location.
The destination is either an XMM register or a YMM register, as determined by VEX.L. When the
destination is an XMM register (L = 0), bits [255:128] of the corresponding YMM register are
cleared.
Instruction Support
Form

Subset

Feature Flag

VFMSUBPS

FMA4

CPUID Fn8000_0001_ECX[FMA4] (bit 16)

VFMSUBnnnPS

FMA

CPUID Fn0000_0001_ECX[FMA] (bit 12)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

640

VFMSUBPS, VFMSUBnnnPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic

Encoding
VEX RXB.map_select W.vvvv.L.pp

Opcode

VFMSUBPS xmm1, xmm2, xmm3/mem128, xmm4

C4

RXB.03

0.src1.0.01

6C /r /is4

VFMSUBPS ymm1, ymm2, ymm3/mem256, ymm4

C4

RXB.03

0.src1.1.01

6C /r /is4

VFMSUBPS xmm1, xmm2, xmm3, xmm4/mem128

C4

RXB.03

1.src1.0.01

6C /r /is4

VFMSUBPS ymm1, ymm2, ymm3, ymm4/mem256

C4

RXB.03

1.src1.1.01

6C /r /is4

VFMSUB132PS xmm1, xmm2, xmm3/mem128

C4

RXB.02

0.src2.0.01

9A /r

VFMSUB132PS ymm1, ymm2, ymm3/mem256

C4

RXB.02

0.src2.1.01

9A /r

VFMSUB213PS xmm1, xmm2, xmm3/mem128

C4

RXB.02

0.src2.0.01

AA /r

VFMSUB213PS ymm1, ymm2, ymm3/mem256

C4

RXB.02

0.src2.1.01

AA /r

VFMSUB231PS xmm1, xmm2, xmm3/mem128

C4

RXB.02

0.src2.0.01

BA /r

VFMSUB231PS ymm1, ymm2, ymm3/mem256

C4

RXB.02

0.src2.1.01

BA /r

Related Instructions
VFMSUBPD, VFMSUB132PD, VFMSUB213PD, VFMSUB231PD, VFMSUBSD,
VFMSUB132SD, VFMSUB213SD, VFMSUB231SD, VFMSUBSS, VFMSUB132SS,
VFMSUB213SS, VFMSUB231SS
rFLAGS Affected
None
MXCSR Flags Affected
MM

FZ

17

15

Note:

RC
14

13

PM

UM

OM

ZM

DM

IM

DAZ

12

11

10

9

8

7

6

PE

UE

OE

M

M

M

5

4

3

ZE
2

DE

IE

M

M

1

0

A flag that may be set or cleared is M (modified). Unaffected flags are blank.

Instruction Reference

VFMSUBPS, VFMSUBnnnPS

641

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
F
F

Invalid opcode, #UD

F
F
F
F
F
F

Device not available, #NM
Stack, #SS

Page fault, #PF
Alignment check, #AC

F
F
F
F
F
F

SIMD floating-point, #XF

F

General protection, #GP

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
FMA instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
F — FMA, FMA4 exception

642

F
F
F
F
F
F

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

VFMSUBPS, VFMSUBnnnPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

VFMSUBSD
VFMSUB132SD
VFMSUB213SD
VFMSUB231SD

Multiply and Subtract
Scalar Double-Precision Floating-Point

Multiplies together two double-precision floating-point values and subtracts a third double-precision
floating-point value from the unrounded product to produce a precise intermediate result. The intermediate result is then rounded to double-precision based on the mode specified by the MXCSR[RC]
field and written to the destination register. The role of each of the source operands specified by the
assembly language prototypes given below is reflected in the vector equation in the comment on the
right.
There are two four-operand forms:
VFMSUBSD dest, src1, src2/mem, src3
VFMSUBSD dest, src1, src2, src3/mem

// dest = (src1* src2/mem) − src3
// dest = (src1* src2) − src3/mem

and three three-operand forms:
VFMSUB132SD scr1, src2, src3/mem
VFMSUB213SD scr1, src2, src3/mem
VFMSUB231SD scr1, src2, src3/mem

// src1 = (src1* src3/mem) − src2
// src1 = (src2* src1) − src3/mem
// src1 = (src2* src3/mem) − src1

For the four-operand forms, VEX.W determines operand configuration.
• When VEX.W = 0, the second source is either a register or 64-bit memory location and the third
source is a register.
• When VEX.W = 1, the second source is a register and the third source is a register or 64-bit
memory location.
For the three-operand forms, VEX.W is 1. The first and second operands are registers and the third
operand is either a register or a memory location.
The destination is an XMM register. When the result is written to the destination XMM register, bits
[127:64] of the destination and bits [255:128] of the corresponding YMM register are cleared.
Instruction Support
Form

Subset

Feature Flag

VFMSUBSD

FMA4

CPUID Fn8000_0001_ECX[FMA4] (bit 16)

VFMSUBnnnSD

FMA

CPUID Fn0000_0001_ECX[FMA] (bit 12)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

Instruction Reference

VFMSUBSD, VFMSUBnnnSD

643

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Encoding
.

Mnemonic

Encoding
VEX RXB.map_select

W.vvvv.L.pp

Opcode

VFMSUBSD xmm1, xmm2, xmm3/mem64, xmm4

C4

RXB.03

0.src1.X.01

6F /r /is4

VFMSUBSD xmm1, xmm2, xmm3, xmm4/mem64

C4

RXB.03

1.src1.X.01

6F /r /is4

VFMSUB132SD xmm1, xmm2, xmm3/mem64

C4

RXB.02

1.src2.X.01

9B /r

VFMSUB213SD xmm1, xmm2, xmm3/mem64

C4

RXB.02

1.src2.X.01

AB /r

VFMSUB231SD xmm1, xmm2, xmm3/mem64

C4

RXB.02

1.src2.X.01

BB /r

Related Instructions
VFMSUBPD, VFMSUB132PD, VFMSUB213PD, VFMSUB231PD, VFMSUBPS,
VFMSUB132PS, VFMSUB213PS, VFMSUB231PS, VFMSUBSS, VFMSUB132SS,
VFMSUB213SS, VFMSUB231SS
rFLAGS Affected
None
MXCSR Flags Affected
MM
17
Note:

644

FZ
15

RC
14

PM
13

12

UM
11

OM
10

ZM
9

DM
8

IM
7

DAZ
6

PE

UE

OE

M

M

M

5

4

3

ZE
2

DE

IE

M

M

1

0

A flag that may be set or cleared is M (modified). Unaffected flags are blank.

VFMSUBSD, VFMSUBnnnSD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
F
F

Invalid opcode, #UD

F
F
F
F
F
F

Device not available, #NM
Stack, #SS

Page fault, #PF
Alignment check, #AC

F
F
F
F
F
F

SIMD floating-point, #XF

F

General protection, #GP

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
FMA instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Non-aligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
F — FMA, FMA4 exception

Instruction Reference

F
F
F
F
F
F

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

VFMSUBSD, VFMSUBnnnSD

645

AMD64 Technology

26568—Rev. 3.22—May 2018

VFMSUBSS
VFMSUB132SS
VFMSUB213SS
VFMSUB231SS

Multiply and Subtract
Scalar Single-Precision Floating-Point

Multiplies together two single-precision floating-point values and subtracts a third single-precision
floating-point value from the unrounded product to produce a precise intermediate result. The intermediate result is then rounded to single-precision based on the mode specified by the MXCSR[RC]
field and written to the destination register. The role of each of the source operands specified by the
assembly language prototypes given below is reflected in the vector equation in the comment on the
right.
There are two four-operand forms:
VFMSUBSS dest, src1, src2/mem, src3
VFMSUBSS dest, src1, src2, src3/mem

// dest = (src1* src2/mem) − src3
// dest = (src1* src2) − src3/mem

and three three-operand forms:
VFMSUB132SS scr1, src2, src3/mem
VFMSUB213SS scr1, src2, src3/mem
VFMSUB231SS scr1, src2, src3/mem

// src1 = (src1* src3/mem) − src2
// src1 = (src2* src1) − src3/mem
// src1 = (src2* src3/mem) − src1

For the four-operand forms, VEX.W determines operand configuration.
• When VEX.W = 0, the second source is either a register or 32-bit memory location and the third
source is a register.
• When VEX.W = 1, the second source is a register and the third source is a register or 32-bit
memory location.
For the three-operand forms, VEX.W is 0. The first and second operands are registers and the third
operand is either a register or a memory location.
The destination is an XMM register. When the result is written to the destination XMM register, bits
[127:32] of the XMM register and bits [255:128] of the corresponding YMM register are cleared.
Instruction Support
Form

Subset

Feature Flag

VFMSUBSS

FMA4

CPUID Fn8000_0001_ECX[FMA4] (bit 16)

VFMSUBnnnSS

FMA

CPUID Fn0000_0001_ECX[FMA] (bit 12)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

646

VFMSUBSS, VFMSUBnnnSS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
.

Mnemonic

Encoding
VEX RXB.map_select

W.vvvv.L.pp

Opcode

VFMSUBSS xmm1, xmm2, xmm3/mem32, xmm4

C4

RXB.03

0.src1.X.01

6E /r /is4

VFMSUBSS xmm1, xmm2, xmm3, xmm4/mem32

C4

RXB.03

1.src1.X.01

6E /r /is4

VFMSUB132SS xmm1, xmm2, xmm3/mem32

C4

RXB.02

0.src2.X.01

9B /r

VFMSUB213SS xmm1, xmm2, xmm3/mem32

C4

RXB.02

0.src2.X.01

AB /r

VFMSUB231SS xmm1, xmm2, xmm3/mem32

C4

RXB.02

0.src2.X.01

BB /r

Related Instructions
VFMSUBPD, VFMSUB132PD, VFMSUB213PD, VFMSUB231PD, VFMSUBPS,
VFMSUB132PS, VFMSUB213PS, VFMSUB231PS, VFMSUBSD, VFMSUB132SD,
VFMSUB213SD, VFMSUB231SD
rFLAGS Affected
None
MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM
11

OM
10

ZM
9

DM
8

IM
7

DAZ
6

PE

UE

OE

M

M

M

5

4

3

ZE
2

DE

IE

M

M

1

0

A flag that may be set or cleared is M (modified). Unaffected flags are blank.

Instruction Reference

VFMSUBSS, VFMSUBnnnSS

647

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
F
F

Invalid opcode, #UD

F
F
F
F
F
F

Device not available, #NM
Stack, #SS

Page fault, #PF
Alignment check, #AC

F
F
F
F
F
F

SIMD floating-point, #XF

F

General protection, #GP

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
FMA instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Non-aligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
F — FMA, FMA4 exception

648

F
F
F
F
F
F

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

VFMSUBSS, VFMSUBnnnSS

Instruction Reference

26568—Rev. 3.22—May 2018

VFNMADDPD
VFNMADD132PD
VFNMADD213PD
VFNMADD231PD

AMD64 Technology

Negative Multiply and Add
Packed Double-Precision Floating-Point

Multiplies together two double-precision floating-point vectors, negates the unrounded product, and
adds it to a third double-precision floating-point vector. The precise result is then rounded to doubleprecision based on the mode specified by the MXCSR[RC] field and written to the destination register. The role of each of the source operands specified by the assembly language prototypes given
below is reflected in the vector equation in the comment on the right.
There are two four-operand forms:
VFNMADDPD dest, src1, src2/mem, src3
VFNMADDPD dest, src1, src2, src3/mem

// dest = −(src1* src2/mem) + src3
// dest = −(src1* src2) + src3/mem

and three three-operand forms:
VFNMADD132PD scr1, src2, src3/mem
VFNMADD213PD scr1, src2, src3/mem
VFNMADD231PD scr1, src2, src3/mem

// src1 = −(src1* src3/mem) + src2
// src1 = −(src2* src1) + src3/mem
// src1 = −(src2* src3/mem) + src1

When VEX.L = 0, the vector size is 128 bits (two double-precision elements per vector) and registerbased source operands are held in XMM registers.
When VEX.L = 1, the vector size is 256 bits (four double-precision elements per vector) and registerbased source operands are held in YMM registers.
For the four-operand forms, VEX.W determines operand configuration.
• When VEX.W = 0, the second source is either a register or a memory location and the third source
is a register.
• When VEX.W = 1, the second source is a register and the third source is either a register or a
memory location.
For the three-operand forms, VEX.W is 1. The first and second operands are registers and the third
operand is either a register or a memory location.
The destination is either an XMM register or a YMM register, as determined by VEX.L. When the
destination is an XMM register (L = 0), bits [255:128] of the corresponding YMM register are
cleared.
Instruction Support
Form

Subset

Feature Flag

VFNMADDPD

FMA4

CPUID Fn8000_0001_ECX[FMA4] (bit 16)

VFNMADDnnnPD

FMA

CPUID Fn0000_0001_ECX[FMA] (bit 12)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

Instruction Reference

FNMADDPD, FNMADDnnnPD

649

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Encoding
Mnemonic

Encoding
VEX RXB.map_select

W.vvvv.L.pp

Opcode

VFNMADDPD xmm1, xmm2, xmm3/mem128, xmm4

C4

RXB.03

0.src1.0.01

79 /r /is4

VFNMADDPD ymm1, ymm2, ymm3/mem256, ymm4

C4

RXB.03

0.src1.1.01

79 /r /is4

VFNMADDPD xmm1, xmm2, xmm3, xmm4/mem128

C4

RXB.03

1.src1.0.01

79 /r /is4

VFNMADDPD ymm1, ymm2, ymm3, ymm4/mem256

C4

RXB.03

1.src1.1.01

79 /r /is4

VFNMADD132PD xmm1, xmm2, xmm3/mem128

C4

RXB.02

1.src2.0.01

9C /r

VFNMADD132PD ymm1, ymm2, ymm3/mem256

C4

RXB.02

1.src2.1.01

9C /r

VFNMADD213PD xmm1, xmm2, xmm3/mem128

C4

RXB.02

1.src2.0.01

AC /r

VFNMADD213PD ymm1, ymm2, ymm3/mem256

C4

RXB.02

1.src2.1.01

AC /r

VFNMADD231PD xmm1, xmm2, xmm3/mem128

C4

RXB.02

1.src2.0.01

BC /r

VFNMADD231PD ymm1, ymm2, ymm3/mem256

C4

RXB.02

1.src2.1.01

BC /r

Related Instructions
VFNMADDPS, VFNMADD132PS, VFNMADD213PS, VFNMADD231PS, VFNMADDSD,
VFNMADD132SD, VFNMADD213SD, VFNMADD231SD, VFNMADDSS, VFNMADD132SS,
VFNMADD213SS, VFNMADD231SS
rFLAGS Affected
None
MXCSR Flags Affected
MM

FZ

17

15

Note:

650

RC
14

13

PM

UM

OM

ZM

DM

IM

DAZ

12

11

10

9

8

7

6

PE

UE

OE

M

M

M

5

4

3

ZE
2

DE

IE

M

M

1

0

A flag that may be set or cleared is M (modified). Unaffected flags are blank.

FNMADDPD, FNMADDnnnPD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
F
F

Invalid opcode, #UD

F
F
F
F
F
F

Device not available, #NM
Stack, #SS

Page fault, #PF
Alignment check, #AC

F
F
F
F
F
F

SIMD floating-point, #XF

F

General protection, #GP

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
FMA instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
F — FMA, FMA4 exception

Instruction Reference

F
F
F
F
F
F

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

FNMADDPD, FNMADDnnnPD

651

AMD64 Technology

26568—Rev. 3.22—May 2018

VFNMADDPS
VFNMADD132PS
VFNMADD213PS
VFNMADD231PS

Negative Multiply and Add
Packed Single-Precision Floating-Point

Multiplies together two single-precision floating-point vectors, negates the unrounded product, and
adds it to a third single-precision floating-point vector. The precise result is then rounded to singleprecision based on the mode specified by the MXCSR[RC] field and written to the destination register. The role of each of the source operands specified by the assembly language prototypes given
below is reflected in the vector equation in the comment on the right.
There are two four-operand forms:
VFNMADDPS dest, src1, src2/mem, src3
VFNMADDPS dest, src1, src2, src3/mem

// dest = −(src1* src2/mem) + src3
// dest = −(src1* src2) + src3/mem

and three three-operand forms:
VFNMADD132PS scr1, src2, src3/mem
VFNMADD213PS scr1, src2, src3/mem
VFNMADD231PS scr1, src2, src3/mem

// src1 = −(src1* src3/mem) + src2
// src1 = −(src2* src1) + src3/mem
// src1 = −(src2* src3/mem) + src1

When VEX.L = 0, the vector size is 128 bits (four single-precision elements per vector) and registerbased source operands are held in XMM registers.
When VEX.L = 1, the vector size is 256 bits (eight single-precision elements per vector) and registerbased source operands are held in YMM registers.
For the four-operand forms, VEX.W determines operand configuration.
• When VEX.W = 0, the second source is either a register or a memory location and the third source
is a register.
• When VEX.W = 1, the second source is a register and the third source is either a register or a
memory location.
For the three-operand forms, VEX.W is 0. The first and second operands are registers and the third
operand is either a register or a memory location.
The destination is either an XMM register or a YMM register, as determined by VEX.L. When the
destination is an XMM register (L = 0), bits [255:128] of the corresponding YMM register are
cleared.
Instruction Support
Form

Subset

Feature Flag

VFNMADDPS

FMA4

CPUID Fn8000_0001_ECX[FMA4] (bit 16)

VFNMADDnnnPS

FMA

CPUID Fn0000_0001_ECX[FMA] (bit 12)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

652

FNMADDPS, FNMADDnnnPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic

Encoding
VEX RXB.map_select W.vvvv.L.pp

Opcode

VFNMADDPS xmm1, xmm2, xmm3/mem128, xmm4

C4

RXB.03

0.src1.0.01

78 /r /is4

VFNMADDPS ymm1, ymm2, ymm3/mem256, ymm4

C4

RXB.03

0.src1.1.01

78 /r /is4

VFNMADDPS xmm1, xmm2, xmm3, xmm4/mem128

C4

RXB.03

1.src1.0.01

78 /r /is4

VFNMADDPS ymm1, ymm2, ymm3, ymm4/mem256

C4

RXB.03

1.src1.1.01

78 /r /is4

VFNMADD132PS xmm1, xmm2, xmm3/mem128

C4

RXB.02

0.src2.0.01

9C / r

VFNMADD132PS ymm1, ymm2, ymm3/mem256

C4

RXB.02

0.src2.1.01

9C / r

VFNMADD213PS xmm1, xmm2, xmm3/mem128

C4

RXB.02

0.src2.0.01

AC / r

VFNMADD213PS ymm1, ymm2, ymm3/mem256

C4

RXB.02

0.src2.1.01

AC / r

VFNMADD231PS xmm1, xmm2, xmm3/mem128

C4

RXB.02

0.src2.0.01

BC / r

VFNMADD231PS ymm1, ymm2, ymm3/mem256

C4

RXB.02

0.src2.1.01

BC / r

Related Instructions
VFNMADDPD, VFNMADD132PD, VFNMADD213PD, VFNMADD231PD, VFNMADDSD,
VFNMADD132SD, VFNMADD213SD, VFNMADD231SD, VFNMADDSS, VFNMADD132SS,
VFNMADD213SS, VFNMADD231SS
rFLAGS Affected
None
MXCSR Flags Affected
MM

FZ

17

15

Note:

RC
14

13

PM

UM

OM

ZM

DM

IM

DAZ

12

11

10

9

8

7

6

PE

UE

OE

M

M

M

5

4

3

ZE
2

DE

IE

M

M

1

0

A flag that may be set or cleared is M (modified). Unaffected flags are blank.

Instruction Reference

FNMADDPS, FNMADDnnnPS

653

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
F
F

Invalid opcode, #UD

F
F
F
F
F
F

Device not available, #NM
Stack, #SS

Page fault, #PF
Alignment check, #AC

F
F
F
F
F
F

SIMD floating-point, #XF

F

General protection, #GP

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
FMA instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
F — FMA, FMA4 exception

654

F
F
F
F
F
F

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

FNMADDPS, FNMADDnnnPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

VFNMADDSD
VFNMADD132SD
VFNMADD213SD
VFNMADD231SD

Negative Multiply and Add
Scalar Double-Precision Floating-Point

Multiplies together two double-precision floating-point values, negates the unrounded product, and
adds it to a third double-precision floating-point value. The precise result is then rounded to doubleprecision based on the mode specified by the MXCSR[RC] field and written to the destination register. The role of each of the source operands specified by the assembly language prototypes given
below is reflected in the equation in the comment on the right.
There are two four-operand forms:
VFNMADDSD dest, src1, src2/mem, src3
VFNMADDSD dest, src1, src2, src3/mem

// dest = −(src1* src2/mem) + src3
// dest = −(src1* src2) + src3/mem

and three three-operand forms:
VFNMADD132SD scr1, src2, src3/mem
VFNMADD213SD scr1, src2, src3/mem
VFNMADD231SD scr1, src2, src3/mem

// src1 = −(src1* src3/mem) + src2
// src1 = −(src2* src1) + src3/mem
// src1 = −(src2* src3/mem) + src1

For the four-operand forms, VEX.W determines operand configuration.
• When VEX.W = 0, the second source is either a register or 64-bit memory location and the third
source is a register.
• When VEX.W = 1, the second source is a register and the third source is a register or 64-bit
memory location.
For the three-operand forms, VEX.W is 1. The first and second operands are registers and the third
operand is either a register or a 64-bit memory location.
The destination is an XMM register. When the result is written to the destination, bits [127:64] of the
XMM register and bits [255:128] of the corresponding YMM register are cleared.
Instruction Support
Form

Subset

Feature Flag

VFNMADDSD

FMA4

CPUID Fn8000_0001_ECX[FMA4] (bit 16)

VFNMADDnnnSD

FMA

CPUID Fn0000_0001_ECX[FMA] (bit 12)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

Instruction Reference

VFNMADDSD, VFNMADDnnnSD

655

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Encoding
Mnemonic

Encoding
VEX RXB.map_select W.vvvv.L.pp

Opcode

VFNMADDSD xmm1, xmm2, xmm3/mem64, xmm4

C4

RXB.03

0.src1.X.01

7B /r /is4

VFNMADDSD xmm1, xmm2, xmm3, xmm4/mem64

C4

RXB.03

1.src1.X.01

7B /r /is4

VFNMADD132SD xmm1, xmm2, xmm3/mem64

C4

RXB.02

1.src2.X.01

9D /r

VFNMADD213SD xmm1, xmm2, xmm3/mem64

C4

RXB.02

1.src2.X.01

AD /r

VFNMADD231SD xmm1, xmm2, xmm3/mem64

C4

RXB.02

1.src2.X.01

BD /r

Related Instructions
VFNMADDPD, VFNMADD132PD, VFNMADD213PD, VFNMADD231PD, VFNMADDPS,
VFNMADD132PS, VFNMADD213PS, VFNMADD231PS, VFNMADDSS, VFNMADD132SS,
VFNMADD213SS, VFNMADD231SS
rFLAGS Affected
None
MXCSR Flags Affected
MM
17
Note:

656

FZ
15

RC
14

PM
13

12

UM
11

OM
10

ZM
9

DM
8

IM
7

DAZ
6

PE

UE

OE

M

M

M

5

4

3

ZE
2

DE

IE

M

M

1

0

A flag that may be set or cleared is M (modified). Unaffected flags are blank.

VFNMADDSD, VFNMADDnnnSD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
F
F

Invalid opcode, #UD

F
F
F
F
F
F

Device not available, #NM
Stack, #SS

Page fault, #PF
Alignment check, #AC

F
F
F
F
F
F

SIMD floating-point, #XF

F

General protection, #GP

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
FMA instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Non-aligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
F — FMA, FMA4 exception

Instruction Reference

F
F
F
F
F
F

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

VFNMADDSD, VFNMADDnnnSD

657

AMD64 Technology

26568—Rev. 3.22—May 2018

VFNMADDSS
VFNMADD132SS
VFNMADD213SS
VFNMADD231SS

Negative Multiply and Add
Scalar Single-Precision Floating-Point

Multiplies together two single-precision floating-point values, negates the unrounded product, and
adds it to a third single-precision floating-point value. The precise result is then rounded to singleprecision based on the mode specified by the MXCSR[RC] field and written to the destination register. The role of each of the source operands specified by the assembly language prototypes given
below is reflected in the equation in the comment on the right.
There are two four-operand forms:
VFNMADDSS dest, src1, src2/mem, src3
VFNMADDSS dest, src1, src2, src3/mem

// dest = −(src1* src2/mem) + src3
// dest = −(src1* src2) + src3/mem

and three three-operand forms:
VFNMADD132SS scr1, src2, src3/mem
VFNMADD213SS scr1, src2, src3/mem
VFNMADD231SS scr1, src2, src3/mem

// src1 = −(src1* src3/mem) + src2
// src1 = −(src2* src1) + src3/mem
// src1 = −(src2* src3/mem) + src1

For the four-operand forms, VEX.W determines operand configuration.
• When VEX.W = 0, the second source is either a register or 32-bit memory location and the third
source is a register.
• When VEX.W = 1, the second source is a register and the third source is a register or 32-bit
memory location.
For the three-operand forms, VEX.W is 0. The first and second operands are registers and the third
operand is either a register or a 32-bit memory location.
The destination is an XMM register. When the result is written to the destination, bits [127:32] of the
XMM register and bits [255:128] of the corresponding YMM register are cleared.
Instruction Support
Form

Subset

Feature Flag

VFNMADDSS

FMA4

CPUID Fn8000_0001_ECX[FMA4] (bit 16)

VFNMADDnnnSS

FMA

CPUID Fn0000_0001_ECX[FMA] (bit 12)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

658

VFNMADDSS, VFNMADDnnnSS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic

Encoding
VEX RXB.map_select

W.vvvv.L.pp

Opcode

VFNMADDSS xmm1, xmm2, xmm3/mem32, xmm4

C4

RXB.03

0.src1.X.01

7A /r /is4

VFNMADDSS xmm1, xmm2, xmm3, xmm4/mem32

C4

RXB.03

1.src1.X.01

7A /r /is4

VFNMADD132SS xmm1, xmm2, xmm3/mem32

C4

RXB.02

0.src2.X.01

9D /r

VFNMADD213SS xmm1, xmm2, xmm3/mem32

C4

RXB.02

0.src2.X.01

AD /r

VFNMADD231SS xmm1, xmm2, xmm3/mem32

C4

RXB.02

0.src2.X.01

BD /r

Related Instructions
VFNMADDPD, VFNMADD132PD, VFNMADD213PD, VFNMADD231PD, VFNMADDPS,
VFNMADD132PS, VFNMADD213PS, VFNMADD231PS, VFNMADDSS, VFNMADD132SS,
VFNMADD213SS, VFNMADD231SS
rFLAGS Affected
None
MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM
11

OM
10

ZM
9

DM
8

IM
7

DAZ

PE

UE

OE

M

M

M

5

4

3

6

ZE
2

DE

IE

M

M

1

0

A flag that may be set or cleared is M (modified). Unaffected flags are blank.

Instruction Reference

VFNMADDSS, VFNMADDnnnSS

659

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
F
F

Invalid opcode, #UD

F
F
F
F
F
F

Device not available, #NM
Stack, #SS

Page fault, #PF
Alignment check, #AC

F
F
F
F
F
F

SIMD floating-point, #XF

F

General protection, #GP

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
FMA instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Non-aligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
F — FMA, FMA4 exception

660

F
F
F
F
F
F

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

VFNMADDSS, VFNMADDnnnSS

Instruction Reference

26568—Rev. 3.22—May 2018

VFNMSUBPD
VFNMSUB132PD
VFNMSUB213PD
VFNMSUB231PD

AMD64 Technology

Negative Multiply and Subtract
Packed Double-Precision Floating-Point

Multiplies together two double-precision floating-point vectors, negates the unrounded product, and
subtracts a third double-precision floating-point vector from it. The precise result is then rounded to
double-precision based on the mode specified by the MXCSR[RC] field and written to the destination
register. The role of each of the source operands specified by the assembly language prototypes given
below is reflected in the vector equation in the comment on the right.
There are two four-operand forms:
VFNMSUBPD dest, src1, src2/mem, src3
VFNMSUBPD dest, src1, src2, src3/mem

// dest = −(src1* src2/mem) − src3
// dest = −(src1* src2) − src3/mem

and three three-operand forms:
VFNMSUB132PD scr1, src2, src3/mem
VFNMSUB213PD scr1, src2, src3/mem
VFNMSUB231PD scr1, src2, src3/mem

// src1 = −(src1* src3/mem) − src2
// src1 = −(src2* src1) − src3/mem
// src1 = −(src2* src3/mem) − src1

When VEX.L = 0, the vector size is 128 bits (two double-precision elements per vector) and registerbased source operands are held in XMM registers.
When VEX.L = 1, the vector size is 256 bits (four double-precision elements per vector) and registerbased source operands are held in YMM registers.
For the four-operand forms, VEX.W determines operand configuration.
• When VEX.W = 0, the second source is either a register or a memory location and the third source
is a register.
• When VEX.W = 1, the second source is a register and the third source is either a register or a
memory location.
For the three-operand forms, VEX.W is 1. The first and second operands are registers and the third
operand is either a register or a memory location.
The destination is either an XMM register or a YMM register, as determined by VEX.L. When the
destination is an XMM register (L = 0), bits [255:128] of the corresponding YMM register are
cleared.
Instruction Support
Form

Subset

Feature Flag

VFNMSUBPD

FMA4

CPUID Fn8000_0001_ECX[FMA4] (bit 16)

VFNMSUBnnnPD

FMA

CPUID Fn0000_0001_ECX[FMA] (bit 12)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

Instruction Reference

VFNMSUBPD, VFNMSUBnnnPD

661

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Encoding
Mnemonic

Encoding
VEX RXB.map_select W.vvvv.L.pp

Opcode

VFNMSUBPD xmm1, xmm2, xmm3/mem128, xmm4

C4

RXB.03

0.src1.0.01

7D /r /is4

VFNMSUBPD ymm1, ymm2, ymm3/mem256, ymm4

C4

RXB.03

0.src1.1.01

7D /r /is4

VFNMSUBPD xmm1, xmm2, xmm3, xmm4/mem128

C4

RXB.03

1.src1.0.01

7D /r /is4

VFNMSUBPD ymm1, ymm2, ymm3, ymm4/mem256

C4

RXB.03

1.src1.1.01

7D /r /is4

VFNMSUB132PD xmm1, xmm2, xmm3/mem128

C4

RXB.02

1.src2.0.01

9E /r

VFNMSUB132PD ymm1, ymm2, ymm3/mem256

C4

RXB.02

1.src2.1.01

9E /r

VFNMSUB213PD xmm1, xmm2, xmm3/mem128

C4

RXB.02

1.src2.0.01

AE /r

VFNMSUB213PD ymm1, ymm2, ymm3/mem256

C4

RXB.02

1.src2.1.01

AE /r

VFNMSUB231PD xmm1, xmm2, xmm3/mem128

C4

RXB.02

1.src2.0.01

BE /r

VFNMSUB231PD ymm1, ymm2, ymm3/mem256

C4

RXB.02

1.src2.1.01

BE /r

Related Instructions
VFNMSUBPS, VFNMSUB132PS, VFNMSUB213PS, VFNMSUB231PS, VFNMSUBSD,
VFNMSUB132SD, VFNMSUB213SD, VFNMSUB231SD, VFNMSUBSS, VFNMSUB132SS,
VFNMSUB213SS, VFNMSUB231SS
rFLAGS Affected
None
MXCSR Flags Affected
MM

FZ

17

15

Note:

662

RC
14

13

PM

UM

OM

ZM

DM

IM

DAZ

12

11

10

9

8

7

6

PE

UE

OE

M

M

M

5

4

3

ZE
2

DE

IE

M

M

1

0

A flag that may be set or cleared is M (modified). Unaffected flags are blank.

VFNMSUBPD, VFNMSUBnnnPD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
F
F

Invalid opcode, #UD

F
F
F
F
F
F

Device not available, #NM
Stack, #SS

Page fault, #PF
Alignment check, #AC

F
F
F
F
F
F

SIMD floating-point, #XF

F

General protection, #GP

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
FMA instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
F — FMA, FMA4 exception

Instruction Reference

F
F
F
F
F
F

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

VFNMSUBPD, VFNMSUBnnnPD

663

AMD64 Technology

26568—Rev. 3.22—May 2018

VFNMSUBPS
VFNMSUB132PS
VFNMSUB213PS
VFNMSUB231PS

Negative Multiply and Subtract
Packed Single-Precision Floating-Point

Multiplies together two single-precision floating-point vectors, negates the unrounded product, and
subtracts a third single-precision floating-point vector from it. The precise result is then rounded to
single-precision based on the mode specified by the MXCSR[RC] field and written to the destination
register. The role of each of the source operands specified by the assembly language prototypes given
below is reflected in the vector equation in the comment on the right.
There are two four-operand forms:
VFNMADDPS dest, src1, src2/mem, src3
VFNMADDPS dest, src1, src2, src3/mem

// dest = −(src1* src2/mem) − src3
// dest = −(src1* src2) − src3/mem

and three three-operand forms:
VFNMADD132PS scr1, src2, src3/mem
VFNMADD213PS scr1, src2, src3/mem
VFNMADD231PS scr1, src2, src3/mem

// src1 = −(src1* src3/mem) − src2
// src1 = −(src2* src1) − src3/mem
// src1 = −(src2* src3/mem) − src1

When VEX.L = 0, the vector size is 128 bits (four single-precision elements per vector) and registerbased source operands are held in XMM registers.
When VEX.L = 1, the vector size is 256 bits (eight single-precision elements per vector) and registerbased source operands are held in YMM registers.
For the four-operand forms, VEX.W determines operand configuration.
• When VEX.W = 0, the second source is either a register or a memory location and the third source
is a register.
• When VEX.W = 1, the second source is a register and the third source is either a register or a
memory location.
For the three-operand forms, VEX.W is 0. The first and second operands are registers and the third
operand is either a register or a memory location.
The destination is either an XMM register or a YMM register, as determined by VEX.L. When the
destination is an XMM register (L = 0), bits [255:128] of the corresponding YMM register are
cleared.
Instruction Support
Form

Subset

Feature Flag

VFNMSUBPS

FMA4

CPUID Fn8000_0001_ECX[FMA4] (bit 16)

VFNMSUBnnnPS

FMA

CPUID Fn0000_0001_ECX[FMA] (bit 12)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

664

VFNMSUBPS, VFNMSUBnnnPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic

Encoding
VEX RXB.map_select W.vvvv.L.pp

Opcode

VFNMSUBPS xmm1, xmm2, xmm3/mem128, xmm4

C4

RXB.03

0.src1.0.01

7C /r /is4

VFNMSUBPS ymm1, ymm2, ymm3/mem256, ymm4

C4

RXB.03

0.src1.1.01

7C /r /is4

VFNMSUBPS xmm1, xmm2, xmm3, xmm4/mem128

C4

RXB.03

1.src1.0.01

7C /r /is4

VFNMSUBPS ymm1, ymm2, ymm3, ymm4/mem256

C4

RXB.03

1.src1.1.01

7C /r /is4

VFNMSUB132PS xmm1, xmm2, xmm3/mem128

C4

RXB.02

0.src2.0.01

9E /r

VFNMSUB132PS ymm1, ymm2, ymm3/mem256

C4

RXB.02

0.src2.1.01

9E /r

VFNMSUB213PS xmm1, xmm2, xmm3/mem128

C4

RXB.02

0.src2.0.01

AE /r

VFNMSUB213PS ymm1, ymm2, ymm3/mem256

C4

RXB.02

0.src2.1.01

AE /r

VFNMSUB231PS xmm1, xmm2, xmm3/mem128

C4

RXB.02

0.src2.0.01

BE /r

VFNMSUB231PS ymm1, ymm2, ymm3/mem256

C4

RXB.02

0.src2.1.01

BE /r

Related Instructions
VFNMSUBPD, VFNMSUB132PD, VFNMSUB213PD, VFNMSUB231PD, VFNMSUBSD,
VFNMSUB132SD, VFNMSUB213SD, VFNMSUB231SD, VFNMSUBSS, VFNMSUB132SS,
VFNMSUB213SS, VFNMSUB231SS
rFLAGS Affected
None
MXCSR Flags Affected
MM

FZ

17

15

Note:

RC
14

13

PM

UM

OM

ZM

DM

IM

DAZ

12

11

10

9

8

7

6

PE

UE

OE

M

M

M

5

4

3

ZE
2

DE

IE

M

M

1

0

A flag that may be set or cleared is M (modified). Unaffected flags are blank.

Instruction Reference

VFNMSUBPS, VFNMSUBnnnPS

665

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
F
F

Invalid opcode, #UD

F
F
F
F
F
F

Device not available, #NM
Stack, #SS

Page fault, #PF
Alignment check, #AC

F
F
F
F
F
F

SIMD floating-point, #XF

F

General protection, #GP

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
FMA instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
F — FMA, FMA4 exception

666

F
F
F
F
F
F

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

VFNMSUBPS, VFNMSUBnnnPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

VFNMSUBSD
VFNMSUB132SD
VFNMSUB213SD
VFNMSUB231SD

Negative Multiply and Subtract
Scalar Double-Precision Floating-Point

Multiplies together two double-precision floating-point values, negates the unrounded product, and
subtracts a third double-precision floating-point value from it. The precise result is then rounded to
double-precision based on the mode specified by the MXCSR[RC] field and written to the destination
register. The role of each of the source operands specified by the assembly language prototypes given
below is reflected in the equation in the comment on the right.
There are two four-operand forms:
VFNMSUBSD dest, src1, src2/mem, src3
VFNMSUBSD dest, src1, src2, src3/mem

// dest = −(src1* src2/mem) − src3
// dest = −(src1* src2) − src3/mem

and three three-operand forms:
VFNMSUB132SD scr1, src2, src3/mem
VFNMSUB213SD scr1, src2, src3/mem
VFNMSUB231SD scr1, src2, src3/mem

// src1 = −(src1* src3/mem) − src2
// src1 = −(src2* src1) − src3/mem
// src1 = −(src2* src3/mem) − src1

For the four-operand forms, VEX.W determines operand configuration.
• When VEX.W = 0, the second source is either a register or a 64-bit memory location and the third
source is a register.
• When VEX.W = 1, the second source is a register and the third source is either a register or a 64-bit
memory location.
For the three-operand forms, VEX.W is 1. The first and second operands are registers and the third
operand is either a register or a 64-bit memory location.
The destination is an XMM register. Bits [127:64] of the destination XMM register and bits [255:128]
of the corresponding YMM register are cleared.
Instruction Support
Form

Subset

Feature Flag

VFNMSUBSD

FMA4

CPUID Fn8000_0001_ECX[FMA4] (bit 16)

VFNMSUBnnnSD

FMA

CPUID Fn0000_0001_ECX[FMA] (bit 12)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

Instruction Reference

VFNMSUBSD, VFNMSUBnnnSD

667

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Encoding
Mnemonic

Encoding
VEX RXB.map_select W.vvvv.L.pp

Opcode

VFNMSUBSD xmm1, xmm2, xmm3/mem64, xmm4

C4

RXB.03

0.src1.X.01

7F /r /is4

VFNMSUBSD xmm1, xmm2, xmm3, xmm4/mem64

C4

RXB.03

1.src1.X.01

7F /r /is4

VFNMSUB132SD xmm1, xmm2, xmm3/mem64

C4

RXB.02

1.src2.X.01

9F /r

VFNMSUB213SD xmm1, xmm2, xmm3/mem64

C4

RXB.02

1.src2.X.01

AF /r

VFNMSUB231SD xmm1, xmm2, xmm3/mem64

C4

RXB.02

1.src2.X.01

BF /r

Related Instructions
VFNMSUBPD, VFNMSUB132PD, VFNMSUB213PD, VFNMSUB231PD, VFNMSUBPS,
VFNMSUB132PS, VFNMSUB213PS, VFNMSUB231PS, VFNMSUBSS, VFNMSUB132SS,
VFNMSUB213SS, VFNMSUB231SS
rFLAGS Affected
None
MXCSR Flags Affected
MM
17
Note:

668

FZ
15

RC
14

PM
13

12

UM
11

OM
10

ZM
9

DM
8

IM
7

DAZ

PE

UE

OE

M

M

M

5

4

3

6

ZE
2

DE

IE

M

M

1

0

A flag that may be set or cleared is M (modified). Unaffected flags are blank.

VFNMSUBSD, VFNMSUBnnnSD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
F
F

Invalid opcode, #UD

F
F
F
F
F
F

Device not available, #NM
Stack, #SS

Page fault, #PF
Alignment check, #AC

F
F
F
F
F
F

SIMD floating-point, #XF

F

General protection, #GP

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
FMA instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Non-aligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
F — FMA, FMA4 exception

Instruction Reference

F
F
F
F
F
F

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

VFNMSUBSD, VFNMSUBnnnSD

669

AMD64 Technology

26568—Rev. 3.22—May 2018

VFNMSUBSS
VFNMSUB132SS
VFNMSUB213SS
VFNMSUB231SS

Negative Multiply and Subtract
Scalar Single-Precision Floating-Point

Multiplies together two single-precision floating-point values, negates the unrounded product, and
subtracts a third single-precision floating-point value from it. The precise result is then rounded to
single-precision based on the mode specified by the MXCSR[RC] field and written to the destination
register. The role of each of the source operands specified by the assembly language prototypes given
below is reflected in the equation in the comment on the right.
There are two four-operand forms:
VFNMSUBSS dest, src1, src2/mem, src3
VFNMSUBSS dest, src1, src2, src3/mem

// dest = −(src1* src2/mem) − src3
// dest = −(src1* src2) − src3/mem

and three three-operand forms:
VFNMSUB132SS scr1, src2, src3/mem
VFNMSUB213SS scr1, src2, src3/mem
VFNMSUB231SS scr1, src2, src3/mem

// src1 = −(src1* src3/mem) − src2
// src1 = −(src2* src1) − src3/mem
// src1 = −(src2* src3/mem) − src1

For the four-operand forms, VEX.W determines operand configuration.
• When VEX.W = 0, the second source is either a register or a 32-bit memory location and the third
source is a register.
• When VEX.W = 1, the second source is a register and the third source is either a register or a 32-bit
memory location.
For the three-operand forms, VEX.W is 0. The first and second operands are registers and the third
operand is either a register or a 32-bit memory location.
The destination is an XMM register. Bits[127:32] of the destination XMM register and bits [255:128]
of the corresponding YMM register are cleared.
Instruction Support
Form

Subset

Feature Flag

VFNMSUBSS

FMA4

CPUID Fn8000_0001_ECX[FMA4] (bit 16)

VFNMSUBnnnSS

FMA

CPUID Fn0000_0001_ECX[FMA] (bit 12)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

670

VFNMSUBSS, VFNMSUBnnnSS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic

Encoding
VEX RXB.map_select W.vvvv.L.pp

Opcode

VFNMSUBSS xmm1, xmm2, xmm3/mem32, xmm4

C4

RXB.03

0.src1.X.01

7E /r /is4

VFNMSUBSS xmm1, xmm2, xmm3, xmm4/mem32

C4

RXB.03

1.src1.X.01

7E /r /is4

VFNMSUB132SS xmm1, xmm2, xmm3/mem32

C4

RXB.02

0.src2.X.01

9F /r

VFNMSUB213SS xmm1, xmm2, xmm3/mem32

C4

RXB.02

0.src2.X.01

AF /r

VFNMSUB231SS xmm1, xmm2, xmm3/mem32

C4

RXB.02

0.src2.X.01

BF /r

Related Instructions
VFNMSUBPD, VFNMSUB132PD, VFNMSUB213PD, VFNMSUB231PD, VFNMSUBPS,
VFNMSUB132PS, VFNMSUB213PS, VFNMSUB231PS, VFNMSUBSD, VFNMSUB132SD,
VFNMSUB213SD, VFNMSUB231SD
rFLAGS Affected
None
MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM
11

OM
10

ZM
9

DM
8

IM
7

DAZ

PE

UE

OE

M

M

M

5

4

3

6

ZE
2

DE

IE

M

M

1

0

A flag that may be set or cleared is M (modified). Unaffected flags are blank.

Instruction Reference

VFNMSUBSS, VFNMSUBnnnSS

671

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
F
F

Invalid opcode, #UD

F
F
F
F
F
F

Device not available, #NM
Stack, #SS

Page fault, #PF
Alignment check, #AC

F
F
F
F
F
F

SIMD floating-point, #XF

F

General protection, #GP

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
FMA instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Non-aligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
F — FMA, FMA4 exception

672

F
F
F
F
F
F

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

VFNMSUBSS, VFNMSUBnnnSS

Instruction Reference

26568—Rev. 3.22—May 2018

VFRCZPD

AMD64 Technology

Extract Fraction
Packed Double-Precision Floating-Point

Extracts the fractional portion of each double-precision floating-point value of either a source register
or a memory location and writes the resulting values to the corresponding elements of the destination.
The fractional results are precise.
• When XOP.L = 0, the source is either an XMM register or a 128-bit memory location.
• When XOP.L = 1, the source is a YMM register or 256-bit memory location.
When the destination is an XMM register, bits [255:128] of the corresponding YMM register are
cleared.
Exception conditions are the same as for other arithmetic instructions, except with respect to the sign
of a zero result. A zero is returned in the following cases:
• When the operand is a zero.
• When the operand is a normal integer.
• When the operand is a denormal value and is coerced to zero by MXCSR.DAZ.
• When the operand is a denormal value that is not coerced to zero by MXCSR.DAZ.
In the first three cases, when MXCSR.RC = 01b (round toward − ∞) the sign of the zero result is negative, and is otherwise positive.
In the fourth case, the operand is its own fractional part, which results in underflow, and the result is
forced to zero by MXCSR.FZ; the result has the same sign as the operand.
Instruction Support
Form

Subset

VFRCZPD

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
XOP

RXB.map_select

W.vvvv.L.pp

Opcode

VFRCZPD xmm1, xmm2/mem128

8F

RXB.09

0.1111.0.00

81 /r

VFRCZPD ymm1, ymm2/mem256

8F

RXB.09

0.1111.1.00

81 /r

Related Instructions
(V)ROUNDPD, (V)ROUNDPS, (V)ROUNDSD, (V)ROUNDSS, VFRCZPS, VFRCZSS, VFRCZSD
rFLAGS Affected
None

Instruction Reference

VFRCZPD

673

AMD64 Technology

26568—Rev. 3.22—May 2018

MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM

OM

11

10

ZM
9

DM
8

IM
7

DAZ
6

PE

UE

M

M

5

4

OE
3

ZE
2

DE

IE

M

M

1

0

A flag that may be set or cleared is M (modified). Unaffected flags are blank.

Exceptions
Exception

Mode
Real Virt Prot
X
X

X
X
X
X
X
X

Invalid opcode, #UD

X
Device not available, #NM
Stack, #SS

X
X
X
X
X
X

General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

S

S

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.W = 1.
XOP.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.
See SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Underflow, UE
Precision, PE
X — XOP exception

674

X
X
X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

VFRCZPD

Instruction Reference

26568—Rev. 3.22—May 2018

VFRCZPS

AMD64 Technology

Extract Fraction
Packed Single-Precision Floating-Point

Extracts the fractional portion of each single-precision floating-point value of either a source register
or a memory location and writes the resulting values to the corresponding elements of the destination.
The fractional results are exact.
• When XOP.L = 0, the source is either an XMM register or a 128-bit memory location.
• When XOP.L = 1, the source is a YMM register or 256-bit memory location.
When the destination is an XMM register, bits [255:128] of the corresponding YMM register are
cleared.
Exception conditions are the same as for other arithmetic instructions, except with respect to the sign
of a zero result. A zero is returned in the following cases:
• When the operand is a zero.
• When the operand is a normal integer.
• When the operand is a denormal value and is coerced to zero by MXCSR.DAZ.
• When the operand is a denormal value that is not coerced to zero by MXCSR.DAZ.
In the first three cases, when MXCSR.RC = 01b (round toward − ∞) the sign of the zero result is negative, and is otherwise positive.
In the fourth case, the operand is its own fractional part, which results in underflow, and the result is
forced to zero by MXCSR.FZ; the result has the same sign as the operand.
Instruction Support
Form

Subset

VFRCZPS

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
XOP

RXB.map_select

W.vvvv.L.pp

Opcode

VFRCZPS xmm1, xmm2/mem128

8F

RXB.09

0.1111.0.00

80 /r

VFRCZPS ymm1, ymm2/mem256

8F

RXB.09

0.1111.1.00

80 /r

Related Instructions
(V)ROUNDPD, (V)ROUNDPS, (V)ROUNDSD, (V)ROUNDSS, VFRCZPD, VFRCZSS, VFRCZSD
rFLAGS Affected
None

Instruction Reference

VFRCZPS

675

AMD64 Technology

26568—Rev. 3.22—May 2018

MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM

OM

11

10

ZM
9

DM

IM

8

7

DAZ
6

PE

UE

M

M

5

4

OE
3

ZE
2

DE

IE

M

M

1

0

A flag that may be set or cleared is M (modified). Unaffected flags are blank.

Exceptions
Exception

Mode
Real Virt Prot
X
X

X
X
X
X
X
X

Invalid opcode, #UD

X
Device not available, #NM
Stack, #SS

X
X
X
X
X
X

General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

S

S

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.W = 1.
XOP.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.
See SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Underflow, UE
Precision, PE
X — XOP exception

676

X
X
X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

VFRCZPS

Instruction Reference

26568—Rev. 3.22—May 2018

VFRCZSD

AMD64 Technology

Extract Fraction
Scalar Double-Precision Floating-Point

Extracts the fractional portion of the double-precision floating-point value of either the low-order
quadword of an XMM register or a 64-bit memory location and writes the result to the low-order
quadword of the destination XMM register. The fractional results are precise.
When the result is written to the destination XMM register, bits [127:64] of the destination and bits
[255:128] of the corresponding YMM register are cleared.
Exception conditions are the same as for other arithmetic instructions, except with respect to the sign
of a zero result. A zero is returned in the following cases:
• When the operand is a zero.
• When the operand is a normal integer.
• When the operand is a denormal value and is coerced to zero by MXCSR.DAZ.
• When the operand is a denormal value that is not coerced to zero by MXCSR.DAZ.
In the first three cases, when MXCSR.RC = 01b (round toward − ∞) the sign of the zero result is negative, and is otherwise positive.
In the fourth case, the operand is its own fractional part, which results in underflow, and the result is
forced to zero by MXCSR.FZ; the result has the same sign as the operand.
Instruction Support
Form

Subset

VFRCZSD

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
VFRCZSD xmm1, xmm2/mem64

Encoding
XOP

RXB.map_select

W.vvvv.L.pp

Opcode

8F

RXB.09

0.1111.0.00

83 /r

Related Instructions
(V)ROUNDPD, (V)ROUNDPS, (V)ROUNDSD, (V)ROUNDSS, VFRCZPS, VFRCZPD, VFRCZSS
rFLAGS Affected
None

Instruction Reference

VFRCZSD

677

AMD64 Technology

26568—Rev. 3.22—May 2018

MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM

OM

11

10

ZM
9

DM
8

IM
7

DAZ
6

PE

UE

M

M

5

4

OE
3

ZE
2

DE

IE

M

M

1

0

A flag that may be set or cleared is M (modified). Unaffected flags are blank.

Exceptions
Exception

Mode
Real Virt Prot
X
X

X
X
X
X
X
X

Invalid opcode, #UD

X
Device not available, #NM
Stack, #SS

X
X
X
X
X
X

General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

S

S

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.W = 1.
XOP.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.
See SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Underflow, UE
Precision, PE
X — XOP exception

678

X
X
X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

VFRCZSD

Instruction Reference

26568—Rev. 3.22—May 2018

VFRCZSS

AMD64 Technology

Extract Fraction
Scalar Single-Precision Floating Point

Extracts the fractional portion of the single-precision floating-point value of the low-order doubleword of an XMM register or 32-bit memory location and writes the result in the low-order doubleword of the destination XMM register. The fractional results are precise.
When the result is written to the destination XMM register, bits [127:32] of the destination and bits
[255:128] of the corresponding YMM register are cleared.
Exception conditions are the same as for other arithmetic instructions, except with respect to the sign
of a zero result. A zero is returned in the following cases:
• When the operand is a zero.
• When the operand is a normal integer.
• When the operand is a denormal value and is coerced to zero by MXCSR.DAZ.
• When the operand is a denormal value that is not coerced to zero by MXCSR.DAZ.
In the first three cases, when MXCSR.RC = 01b (round toward − ∞) the sign of the zero result is negative, and is otherwise positive.
In the fourth case, the operand is its own fractional part, which results in underflow, and the result is
forced to zero by MXCSR.FZ; the result has the same sign as the operand.
Instruction Support
Form

Subset

VFRCZSS

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
VFRCZSS xmm1, xmm2/mem32

Encoding
XOP

RXB.map_select

W.vvvv.L.pp

Opcode

8F

RXB.09

0.1111.0.00

82 /r

Related Instructions
ROUNDPD, ROUNDPS, ROUNDSD, ROUNDSS, VFRCZPS, VFRCZPD, VFRCZSD
rFLAGS Affected
None

Instruction Reference

VFRCZSS

679

AMD64 Technology

26568—Rev. 3.22—May 2018

MXCSR Flags Affected
MM
17
Note:

FZ
15

RC
14

PM
13

12

UM

OM

11

10

ZM
9

DM

IM

8

7

DAZ
6

PE

UE

M

M

5

4

OE
3

ZE
2

DE

IE

M

M

1

0

A flag that may be set or cleared is M (modified). Unaffected flags are blank.

Exceptions
Exception

Mode
Real Virt Prot
X
X

X
X
X
X
X
X

Invalid opcode, #UD

X
Device not available, #NM
Stack, #SS

X
X
X
X
X
X

General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

S

S

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.W = 1.
XOP.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.
See SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Underflow, UE
Precision, PE
X — XOP exception

680

X
X
X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

VFRCZSS

Instruction Reference

26568—Rev. 3.22—May 2018

VGATHERDPD

AMD64 Technology

Conditionally Gather Double-Precision
Floating-Point Values, Doubleword Indices

Conditionally loads double-precision (64-bit) values from memory using VSIB addressing with doubleword indices.
The instruction is of the form:
VGATHERDPD dest, mem64[vm32x], mask

Loading of each element of the destination register is conditional based on the value of the corresponding element of the mask operand. If the most-significant bit of the ith element of the mask is set,
the ith element of the destination is loaded from memory using the ith address of the array of effective
addresses calculated using VSIB addressing.
The index register is treated as an array of signed 32-bit values. Quadword elements of the destination
for which the corresponding mask element is zero are not affected by the operation. If no exceptions
occur, the mask register is set to zero.
Execution of the instruction can be suspended by an exception if the exception is triggered by an element other than the rightmost element loaded. When this happens, the destination register and the
mask operand may be observed as partially updated. Elements that have been loaded will have their
mask elements set to zero. If any traps or faults are pending from elements that have been loaded,
they will be delivered in lieu of the exception; in this case, the RF flag is set so that an instruction
breakpoint is not re-triggered when the instruction execution is resumed.
See Section 1.3, “VSIB Addressing,” on page 6 for a discussion of the VSIB addressing mode.
There are 128-bit and 256-bit forms of this instruction.
XMM Encoding

The destination is an XMM register. The first source operand is up to two 64-bit values located in
memory. The second source operand (the mask) is an XMM register. The index vector is the two
low-order doublewords of an XMM register; the two high-order doublewords of the index register are
not used. Bits [255:128] of the YMM register that corresponds to the destination and bits [255:128] of
the YMM register that corresponds to the second source (mask) operand are cleared.
YMM Encoding

The destination is a YMM register. The first source operand is up to four 64-bit values located in
memory. The second source operand (the mask) is a YMM register. The index vector is the four doublewords of an XMM register.
Instruction Support
Form

Subset

VGATHERDPD

AVX2

Feature Flag
Fn0000_00007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

Instruction Reference

VGATHERDPD

681

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Encoding
Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VGATHERDPD xmm1, vm32x, xmm2

C4

RXB.02

1.src2.0.01

92 /r

VGATHERDPD ymm1, vm32x, ymm2

C4

RXB.02

1.src2.1.01

92 /r

Related Instructions
VGATHERDPS, VGATHERQPD, VGATHERQPS, VPGATHERDD, VPGATHERDQ, VPGATHERQD, VPGATHERQQ
rFLAGS Affected
RF
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
A
A

A
A

A
A
A
A
A
A
A
A
A
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Alignment check, #AC
Page fault, #PF
A — AVX2 exception

682

A

A
A

A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
MODRM.mod = 11b
MODRM.rm ! = 100b
YMM/XMM registers specified for destination, mask, and index not unique.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

VGATHERDPD

Instruction Reference

26568—Rev. 3.22—May 2018

VGATHERDPS

AMD64 Technology

Conditionally Gather Single-Precision
Floating-Point Values, Doubleword Indices

Conditionally loads single-precision (32-bit) values from memory using VSIB addressing with doubleword indices.
The instruction is of the form:
VGATHERDPS dest, mem32[vm32x/y], mask

Loading of each element of the destination register is conditional based on the value of the corresponding element of the mask operand. If the most-significant bit of the ith element of the mask is set,
the ith element of the destination is loaded from memory using the ith address of the array of effective
addresses calculated using VSIB addressing.
The index register is treated as an array of signed 32-bit values. Doubleword elements of the destination for which the corresponding mask element is zero are not affected by the operation. If no exceptions occur, the mask register is set to zero.
Execution of the instruction can be suspended by an exception if the exception is triggered by an element other than the rightmost element loaded. When this happens, the destination register and the
mask operand may be observed as partially updated. Elements that have been loaded will have their
mask elements set to zero. If any traps or faults are pending from elements that have been loaded,
they will be delivered in lieu of the exception; in this case, the RF flag is set so that an instruction
breakpoint is not re-triggered when the instruction execution is resumed.
See Section 1.3, “VSIB Addressing,” on page 6 for a discussion of the VSIB addressing mode.
There are 128-bit and 256-bit forms of this instruction.
XMM Encoding

The destination is an XMM register. The first source operand is up to four 32-bit values located in
memory. The second source operand (the mask) is an XMM register. The index vector is the four doublewords of an XMM register. Bits [255:128] of the YMM register that corresponds to the destination
and bits [255:128] of the YMM register that corresponds to the second source (mask) operand are
cleared.
YMM Encoding

The destination is a YMM register. The first source operand is up to eight 32-bit values located in
memory. The second source operand (the mask) is a YMM register. The index vector is the eight doublewords of a YMM register.
Instruction Support
Form

Subset

VGATHERDPS

AVX2

Feature Flag
Fn0000_00007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

Instruction Reference

VGATHERDPS

683

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Encoding
Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VGATHERDPS xmm1, vm32x, xmm2

C4

RXB.02

0.src2.0.01

92 /r

VGATHERDPS ymm1, vm32y, ymm2

C4

RXB.02

0.src2.1.01

92 /r

Related Instructions
VGATHERDPD, VGATHERQPD, VGATHERQPS, VPGATHERDD, VPGATHERDQ, VPGATHERQD, VPGATHERQQ
rFLAGS Affected
RF
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
A
A

A
A

A
A
A
A
A
A
A
A
A
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Alignment check, #AC
Page fault, #PF
A — AVX2 exception

684

A

A
A

A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
MODRM.mod = 11b
MODRM.rm ! = 100b
YMM/XMM registers specified for destination, mask, and index not unique.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

VGATHERDPS

Instruction Reference

26568—Rev. 3.22—May 2018

VGATHERQPD

AMD64 Technology

Conditionally Gather Double-Precision
Floating-Point Values, Quadword Indices

Conditionally loads double-precision (64-bit) values from memory using VSIB addressing with quadword indices.
The instruction is of the form:
VGATHERQPD dest, mem64[vm64x/y], mask

Loading of each element of the destination register is conditional based on the value of the corresponding element of the mask operand. If the most-significant bit of the ith element of the mask is set,
the ith element of the destination is loaded from memory using the ith address of the array of effective
addresses calculated using VSIB addressing.
The index register is treated as an array of signed 64-bit values. Quadword elements of the destination
for which the corresponding mask element is zero are not affected by the operation. If no exceptions
occur, the mask register is set to zero.
Execution of the instruction can be suspended by an exception if the exception is triggered by an element other than the rightmost element loaded. When this happens, the destination register and the
mask operand may be observed as partially updated. Elements that have been loaded will have their
mask elements set to zero. If any traps or faults are pending from elements that have been loaded,
they will be delivered in lieu of the exception; in this case, the RF flag is set so that an instruction
breakpoint is not re-triggered when the instruction execution is resumed.
See Section 1.3, “VSIB Addressing,” on page 6 for a discussion of the VSIB addressing mode.
There are 128-bit and 256-bit forms of this instruction.
XMM Encoding

The destination is an XMM register. The first source operand is up to two 64-bit values located in
memory. The second source operand (the mask) is an XMM register. The index vector is the two
quadwords of an XMM register. Bits [255:128] of the YMM register that corresponds to the destination and bits [255:128] of the YMM register that corresponds to the second source (mask) operand are
cleared.
YMM Encoding

The destination is a YMM register. The first source operand is up to four 64-bit values located in
memory. The second source operand (the mask) is a YMM register. The index vector is the four quadwords of a YMM register.
Instruction Support
Form

Subset

VGATHERQPD

AVX2

Feature Flag
Fn0000_00007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

Instruction Reference

VGATHERQPD

685

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Encoding
Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VGATHERQPD xmm1, vm64x, xmm2

C4

RXB.02

1.src2.0.01

93 /r

VGATHERQPD ymm1, vm64y, ymm2

C4

RXB.02

1.src2.1.01

93 /r

Related Instructions
VGATHERDPD, VGATHERDPS, VGATHERQPS, VPGATHERDD, VPGATHERDQ, VPGATHERQD, VPGATHERQQ
rFLAGS Affected
RF
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
A
A

A
A

A
A
A
A
A
A
A
A
A
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Alignment check, #AC
Page fault, #PF
A — AVX2 exception

686

A

A
A

A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
MODRM.mod = 11b
MODRM.rm ! = 100b
YMM/XMM registers specified for destination, mask, and index not unique.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

VGATHERQPD

Instruction Reference

26568—Rev. 3.22—May 2018

VGATHERQPS

AMD64 Technology

Conditionally Gather Single-Precision
Floating-Point Values, Quadword Indices

Conditionally loads single-precision (32-bit) values from memory using VSIB addressing with quadword indices.
The instruction is of the form:
VGATHERQPS dest, mem32[vm64x/y], mask

Loading of each element of the destination register is conditional based on the value of the corresponding element of the mask operand. If the most-significant bit of the ith element of the mask is set,
the ith element of the destination is loaded from memory using the ith address of the array of effective
addresses calculated using VSIB addressing.
The index register is treated as an array of signed 64-bit values. Doubleword elements of the destination for which the corresponding mask element is zero are not affected by the operation. The upper
half of the destination is zeroed. If no exceptions occur, the mask register is set to zero.
Execution of the instruction can be suspended by an exception if the exception is triggered by an element other than the rightmost element loaded. When this happens, the destination register and the
mask operand may be observed as partially updated. Elements that have been loaded will have their
mask elements set to zero. If any traps or faults are pending from elements that have been loaded,
they will be delivered in lieu of the exception; in this case, the RF flag is set so that an instruction
breakpoint is not re-triggered when the instruction execution is resumed.
See Section 1.3, “VSIB Addressing,” on page 6 for a discussion of the VSIB addressing mode.
There are 128-bit and 256-bit forms of this instruction.
XMM Encoding

The destination is an XMM register. The first source operand is up to two 32-bit values located in
memory. The second source operand (the mask) is an XMM register. Only the lower half of the mask
is used. The index vector is the two quadwords of an XMM register. Bits [255:64] of the YMM register that corresponds to the destination and bits [255:64] of the YMM register that corresponds to the
second source (mask) operand are cleared.
YMM Encoding

The destination is an XMM register. The first source operand is up to four 32-bit values located in
memory. The second source operand (the mask) is an XMM register. The index vector is the four
quadwords of a YMM register. Bits [255:128] of the YMM register that corresponds to the destination and bits [255:128] of the YMM register that corresponds to the second source (mask) operand are
cleared.
Instruction Support
Form

Subset

VGATHERQPS

AVX2

Feature Flag
Fn0000_00007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

Instruction Reference

VGATHERQPS

687

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Encoding
Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VGATHERQPS xmm1, vm64x, xmm2

C4

RXB.02

0.src2.0.01

93 /r

VGATHERQPS xmm1, vm64y, xmm2

C4

RXB.02

0.src2.1.01

93 /r

Related Instructions
VGATHERDPD, VGATHERDPS, VGATHERQPD, VPGATHERDD, VPGATHERDQ, VPGATHERQD, VPGATHERQQ
rFLAGS Affected
RF
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
A
A

A
A

A
A
A
A
A
A
A
A
A
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Alignment check, #AC
Page fault, #PF
A — AVX2 exception

688

A

A
A

A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
MODRM.mod = 11b
MODRM.rm ! = 100b
YMM/XMM registers specified for destination, mask, and index not unique.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

VGATHERQPS

Instruction Reference

26568—Rev. 3.22—May 2018

VINSERTF128

AMD64 Technology

Insert Packed Floating-Point Values
128-bit

Combines 128 bits of data from a YMM register with 128-bit packed-value data from an XMM register or a 128-bit memory location, as specified by an immediate byte operand, and writes the combined
data to the destination.
Only bit [0] of the immediate operand is used. Operation is as follows.
• When imm8[0] = 0, copy bits [255:128] of the first source to bits [255:128] of the destination and
copy bits [127:0] of the second source to bits [127:0] of the destination.
• When imm8[0] = 1, copy bits [127:0] of the first source to bits [127:0] of the destination and copy
bits [127:0] of the second source to bits [255:128] of the destination.
This extended-form instruction has a single 256-bit encoding.
The first source operand is a YMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a YMM register. There is a third immediate byte operand.
Instruction Support
Form

Subset

VINSERTF128

AVX

Feature Flag
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding

VINSERTF128 ymm1, ymm2, xmm3/mem128, imm8

VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.03

0.src.1.01

18 /r ib

Related Instructions
VBROADCASTF128, VBROADCASTI128, VEXTRACTF128, VEXTRACTI128, VINSERTI128
rFLAGS Affected
None
MXCSR Flags Affected
None

Instruction Reference

VINSERTF128

689

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
A — AVX exception.

690

A
A
A
A
A
A
A
A
A
A
A
A
A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
VEX.L = 0.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VINSERTF128

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

VINSERTI128

Insert Packed Integer Values
128-bit

Combines 128 bits of data from a YMM register with 128-bit packed-value data from an XMM register or a 128-bit memory location, as specified by an immediate byte operand, and writes the combined
data to the destination.
Bit [0] of the immediate operand controls how the 128-bit values from the source operands are
merged into the destination. The operation is as follows.
• When imm8[0] = 0, copy bits [255:128] of the first source to bits [255:128] of the destination and
copy bits [127:0] of the second source to bits [127:0] of the destination.
• When imm8[0] = 1, copy bits [127:0] of the first source to bits [127:0] of the destination and copy
bits [127:0] of the second source to bits [255:128] of the destination.
This instruction has a single 256-bit encoding.
The first source operand is a YMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a YMM register. The immediate byte is encoded in the
instruction.
Instruction Support
Form

Subset

VINSERTI128

AVX2

Feature Flag
Fn0000_00007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding

VINSERTI128 ymm1, ymm2, xmm3/mem128, imm8

VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.03

0.src1.1.01

38 /r ib

Related Instructions
VBROADCASTF128, VBROADCASTI128, VEXTRACTF128, VEXTRACTI128, VINSERTF128
rFLAGS Affected
None
MXCSR Flags Affected
None

Instruction Reference

VINSERTI128

691

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
A — AVX exception.

692

A
A
A
A
A
A
A
A
A
A
A
A
A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
VEX.L = 0.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VINSERTI128

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

VMASKMOVPD

Masked Move
Packed Double-Precision

Moves packed double-precision data elements from a source element to a destination element, as
specified by mask bits in a source operand. There are load and store versions of the instruction.
For loads, the data elements are in a source memory location; for stores the data elements are in a
source register. The mask bits are the most-significant bit of the corresponding data element of a
source register.
• For loads, when a mask bit = 1, the corresponding data element is copied from the source to the
same element of the destination; when a mask bit = 0, the corresponding element of the destination
is cleared.
• For stores, when a mask bit = 1, the corresponding data element is copied from the source to the
same element of the destination; when a mask bit = 0, the corresponding element of the destination
is not affected.
Exception and trap behavior for elements not selected for loading or storing from/to memory is
implementation dependent. For instance, a given implementation may signal a data breakpoint or a
page fault for quadwords that are zero-masked and not actually written.
XMM Encoding

There are load and store encodings.
• For loads, there are two 64-bit source data elements in a 128-bit memory location, the mask
operand is an XMM register, and the destination is an XMM register. Bits [255:128] of the YMM
register that corresponds to the destination are cleared.
• For stores, there are two 64-bit source data elements in an XMM register, the mask operand is an
XMM register, and the destination is a 128-bit memory location.
YMM Encoding

There are load and store encodings.
• For loads, there are four 64-bit source data elements in a 256-bit memory location, the mask
operand is a YMM register, and the destination is a YMM register.
• For stores, there are four 64-bit source data elements in a YMM register, the mask operand is a
YMM register, and the destination is a 128-bit memory location.
Instruction Support
Form

Subset

VMASKMOVPD

AVX

Feature Flag
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

Instruction Reference

VMASKMOVPD

693

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Encoding
Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VMASKMOVPD xmm1, xmm2, mem128

C4

RXB.02

0.src1.0.01

2D /r

VMASKMOVPD ymm1, ymm2, mem256

C4

RXB.02

0.src1.1.01

2D /r

VMASKMOVPD mem128, xmm1, xmm2

C4

RXB.02

0.src1.0.01

2F /r

VMASKMOVPD mem256, ymm1, ymm2

C4

RXB.02

0.src1.1.01

2F /r

Loads:

Stores:

Related Instructions
VMASKMOVPS
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
A
A

A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
S
Page fault, #PF
A — AVX exception.

694

S

A
A
A
A
A
A
A
A
A
X
A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Write to a read-only data segment.
Instruction execution caused a page fault.

VMASKMOVPD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

VMASKMOVPS

Masked Move
Packed Single-Precision

Moves packed single-precision data elements from a source element to a destination element, as specified by mask bits in a source operand. There are load and store versions of the instruction.
For loads, the data elements are in a source memory location; for stores the data elements are in a
source register. The mask bits are the most-significant bits of the corresponding data element of a
source register.
• For loads, when a mask bit = 1, the corresponding data element is copied from the source to the
same element of the destination; when a mask bit = 0, the corresponding element of the destination
is cleared.
• For stores, when a mask bit = 1, the corresponding data element is copied from the source to the
same element of the destination; when a mask bit = 0, the corresponding element of the destination
is not affected.
Exception and trap behavior for elements not selected for loading or storing from/to memory is
implementation dependent. For instance, a given implementation may signal a data breakpoint or a
page fault for doublewords that are zero-masked and not actually written.
XMM Encoding

There are load and store encodings.
• For loads, there are four 32-bit source data elements in a 128-bit memory location, the mask
operand is an XMM register, and the destination is an XMM register. Bits [255:128] of the YMM
register that corresponds to the destination are cleared.
• For stores, there are four 32-bit source data elements in an XMM register, the mask operand is an
XMM register, and the destination is a 128-bit memory location.
YMM Encoding

There are load and store encodings.
• For loads, there are eight 32-bit source data elements in a 256-bit memory location, the mask
operand is a YMM register, and the destination is a YMM register.
• For stores, there are eight 32-bit source data elements in a YMM register, the mask operand is a
YMM register, and the destination is a 128-bit memory location.
Instruction Support
Form

Subset

VMASKMOVPS

AVX

Feature Flag
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

Instruction Reference

VMASKMOVPS

695

AMD64 Technology

26568—Rev. 3.22—May 2018

Instruction Encoding
Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VMASKMOVPS xmm1, xmm2, mem128

C4

RXB.02

0.src1.0.01

2C /r

VMASKMOVPS ymm1, ymm2, mem256

C4

RXB.02

0.src1.1.01

2C /r

VMASKMOVPS mem128, xmm1, xmm2

C4

RXB.02

0.src1.0.01

2E /r

VMASKMOVPS mem256, ymm1, ymm2

C4

RXB.02

0.src1.1.01

2E /r

Loads:

Stores:

Related Instructions
VMASKMOVPS
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
A
A

A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
S
Page fault, #PF
A — AVX exception.

696

S

A
A
A
A
A
A
A
A
A
X
A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Write to a read-only data segment.
Instruction execution caused a page fault.

VMASKMOVPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

VPBLENDD

Blend
Packed Doublewords

Copies packed doublewords from either of two sources to a destination, as specified by an immediate
8-bit mask operand.
Each bit of the mask selects a doubleword from one of the source operands to be copied to the destination. The least-significant bit controls the selection of the doubleword to be copied to the lowest
doubleword of the destination. For each doubleword i of the destination:
• When mask bit [i] = 0, doubleword i of the first source operand is copied to the corresponding
doubleword of the destination.
• When mask bit [i] = 1, doubleword i of the second source operand is copied to the corresponding
doubleword of the destination.
VPBLENDD

The instruction has 128-bit and 256-bit encodings.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

VPBLENDD

AVX2

Feature Flag
Fn0000_00007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
VEX RXB.map_select

W.vvvv.L.pp

Opcode

VPBLENDD xmm1, xmm2, xmm3/mem128, imm8

C4

RXB.03

0.src1.0.01

02 /r /ib

VPBLENDD ymm1, ymm2, ymm3/mem256, imm8

C4

RXB.03

0.src1.1.01

02 /r /ib

Related Instructions
VBLENDW
rFLAGS Affected
None

Instruction Reference

VPBLENDD

697

AMD64 Technology

26568—Rev. 3.22—May 2018

MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

A
A

A
A
A
A
A
A
A
A
A
A

Alignment check, #AC

A

Page fault, #PF
A — AVX2 exception

A

698

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

VPBLENDD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

VPBROADCASTB

Broadcast Packed Byte

Loads a byte from a register or memory and writes it to all 16 or 32 bytes of an XMM or YMM register.
This instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Copies the source operand to all 16 bytes of the destination.
The source operand is the least-significant 8 bits of an XMM register or an 8-bit memory location.
The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

Copies the source operand to all 32 bytes of the destination.
The source operand is the least-significant 8 bits of an XMM register or an 8-bit memory location.
The destination is a YMM register.
Instruction Support
Form

Subset

VPBROADCASTB

AVX2

Feature Flag
CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPBROADCASTB xmm1, xmm2/mem8

C4

RXB.02

0.1111.0.01

78 /r

VPBROADCASTB ymm1, xmm2/mem8

C4

RXB.02

0.1111.1.01

78 /r

Related Instructions
VPBROADCASTD, VPBROADCASTQ, VPBROADCASTW
rFLAGS Affected
None
MXCSR Flags Affected
None

Instruction Reference

VPBROADCASTB

699

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
A — AVX exception.

700

A
A
A
A
A
A
A
A
A
A
A
A
A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

VPBROADCASTB

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

VPBROADCASTD

Broadcast Packed Doubleword

Loads a doubleword from a register or memory and writes it to all 4 or 8 doublewords of an XMM or
YMM register.
This instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Copies the source operand to all 4 doublewords of the destination.
The source operand is the least-significant 32 bits of an XMM register or a 32-bit memory location.
The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

Copies the source operand to all 8 doublewords of the destination.
The source operand is the least-significant 32 bits of an XMM register or a 32-bit memory location.
The destination is a YMM register.
Instruction Support
Form

Subset

VPBROADCASTD

AVX2

Feature Flag
CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPBROADCASTD xmm1, xmm2/mem32

C4

RXB.02

0.1111.0.01

58 /r

VPBROADCASTD ymm1, xmm2/mem32

C4

RXB.02

0.1111.1.01

58 /r

Related Instructions
VPBROADCASTB, VPBROADCASTQ, VPBROADCASTW
rFLAGS Affected
None
MXCSR Flags Affected
None

Instruction Reference

VPBROADCASTD

701

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
A — AVX exception.

702

A
A
A
A
A
A
A
A
A
A
A
A
A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

VPBROADCASTD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

VPBROADCASTQ

Broadcast Packed Quadword

Loads a quadword from a register or memory and writes it to all 2 or 4 quadwords of an XMM or
YMM register.
This instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Copies the source operand to both quadwords of the destination.
The source operand is the least-significant 64 bits of an XMM register or a 64-bit memory location.
The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

Copies the source operand to all 4 quadwords of the destination.
The source operand is the least-significant 64 bits of an XMM register or a 64-bit memory location.
The destination is a YMM register.
Instruction Support
Form

Subset

VPBROADCASTQ

AVX2

Feature Flag
CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPBROADCASTQ xmm1, xmm2/mem64

C4

RXB.02

0.1111.0.01

59 /r

VPBROADCASTQ ymm1, xmm2/mem64

C4

RXB.02

0.1111.1.01

59 /r

Related Instructions
VPBROADCASTB, VPBROADCASTD, VPBROADCASTW
rFLAGS Affected
None
MXCSR Flags Affected
None

Instruction Reference

VPBROADCASTQ

703

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
A — AVX exception.

704

A
A
A
A
A
A
A
A
A
A
A
A
A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

VPBROADCASTQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

VPBROADCASTW

Broadcast Packed Word

Loads a word from a register or memory and writes it to all 8 or 16 words of an XMM or YMM register.
This instruction has both 128-bit and 256-bit encodings:
XMM Encoding

Copies the source operand to all 8 words of the destination.
The source operand is the least-significant 16 bits of an XMM register or a 16-bit memory location.
The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

Copies the source operand to all 16 words of the destination.
The source operand is the least-significant 16 bits of an XMM register or a 16-bit memory location.
The destination is a YMM register.
Instruction Support
Form

Subset

VPBROADCASTW

AVX2

Feature Flag
CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPBROADCASTW xmm1, xmm2/mem16

C4

RXB.02

0.1111.0.01

79 /r

VPBROADCASTW ymm1, xmm2/mem16

C4

RXB.02

0.1111.1.01

79 /r

Related Instructions
VPBROADCASTB, VPBROADCASTD, VPBROADCASTQ
rFLAGS Affected
None
MXCSR Flags Affected
None

Instruction Reference

VPBROADCASTW

705

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
A — AVX exception.

706

A
A
A
A
A
A
A
A
A
A
A
A
A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

VPBROADCASTW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

VPCMOV

Vector Conditional Move

Moves bits of either the first source or the second source to the corresponding positions in the destination, depending on the value of the corresponding bit of a third source.
When a bit of the third source = 1, the corresponding bit of the first source is moved to the destination; when a bit of the third source = 0, the corresponding bit of the second source is moved to the
destination.
This instruction directly implements the C-language ternary “?” operation on each source bit.
Arbitrary bit-granular predicates can be constructed by any number of methods, or loaded as constants from memory. This instruction may use the results of any SSE instructions as the predicate in
the selector. VPCMPEQB (VPCMPGTB), VPCMPEQW (VPCMPGTW), VPCMPEQD (VPCMPGTD) and VPCMPEQQ (VPCMPGTQ) compare bytes, words, doublewords, quadwords and integers, respectively, and set the predicate in the destination to masks of 1s and 0s accordingly.
VCMPPS (VCMPSS) and VCMPPD (VCMPSD) compare word and doubleword floating-point
source values, respectively, and provide the predicate for the floating-point instructions.
There are four operands: VPCMOV dest, src1, src2, src3.
The first source (src1) is an XMM or YMM register specified by XOP.vvvv.
XOP.W and bits [7:4] of an immediate byte (imm8) configure src2 and src3:
• When XOP.W = 0, src2 is either a register or a memory location specified by ModRM.r/m and src3
is a register specified by imm8[7:4].
• When XOP.W = 1, src2 is a register specified by imm8[7:4] and src3 is either a register or a
memory location specified by ModRM.r/m.
The destination (dest) is either an XMM or a YMM register, as determined by XOP.L. When the destination is an XMM register, bits [255:128] of the corresponding YMM register are cleared.
Instruction Support
Form

Subset

VPCMOV

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
XOP RXB.map_select

W.vvvv.L.pp

Opcode

VPCMOV xmm1, xmm2, xmm3/mem128, xmm4

8F

RXB.08

0.src1.0.00

A2 /r ib

VPCMOV ymm1, ymm2, ymm3/mem256, ymm4

8F

RXB.08

0.src1.1.00

A2 /r ib

VPCMOV xmm1, xmm2, xmm3, xmm4/mem128

8F

RXB.08

1.src1.0.00

A2 /r ib

VPCMOV ymm1, ymm2, ymm3, ymm4/mem256

8F

RXB.08

1.src1.1.00

A2 /r ib

Related Instructions
VPCOMUB, VPCOMUD, VPCOMUQ, VPCOMUW, VCMPPD, VCMPPS

Instruction Reference

VPCMOV

707

AMD64 Technology

26568—Rev. 3.22—May 2018

rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

708

X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPCMOV

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

VPCOMB

Compare Vector
Signed Bytes

Compares corresponding packed signed bytes in the first and second sources and writes the result of
each comparison in the corresponding byte of the destination. The result of each comparison is an 8bit value of all 1s (TRUE) or all 0s (FALSE).
There are four operands: VPCOMB dest, src1, src2, imm8
The destination (dest) is an XMM registers specified by ModRM.reg. When the comparison results
are written to the destination XMM register, bits [255:128] of the corresponding YMM register are
cleared.
The first source (src1) is an XMM register specified by the XOP.vvvv field and the second source
(src2) is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field.
The comparison type is specified by bits [2:0] of the immediate-byte operand (imm8). Each type has
an alias mnemonic to facilitate coding.
imm8[2:0]

Comparison

Mnemonic

000

Less Than

VPCOMLTB

001

Less Than or Equal

VPCOMLEB

010

Greater Than

VPCOMGTB

011

Greater Than or Equal

VPCOMGEB

100

Equal

VPCOMEQB

101

Not Equal

VPCOMNEQB

110

False

VPCOMFALSEB

111

True

VPCOMTRUEB

Instruction Support
Form

Subset

VPCOMB

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
VPCOMB xmm1, xmm2, xmm3/mem128, imm8

Encoding
XOP

RXB.map_select

W.vvvv.L.pp

Opcode

8F

RXB.08

0.src1.0.00

CC /r ib

Related Instructions
VPCOMUB, VPCOMUW, VPCOMUD, VPCOMUQ, VPCOMW, VPCOMD, VPCOMQ
rFLAGS Affected
None

Instruction Reference

VPCOMB

709

AMD64 Technology

26568—Rev. 3.22—May 2018

MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

710

X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPCOMB

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

VPCOMD

Compare Vector
Signed Doublewords

Compares corresponding packed signed doublewords in the first and second sources and writes the
result of each comparison to the corresponding doubleword of the destination. The result of each
comparison is a 32-bit value of all 1s (TRUE) or all 0s (FALSE).
There are four operands: VPCOMD dest, src1, src2, imm8
The destination (dest) is an XMM register specified by ModRM.reg. When the results of the comparisons are written to the destination XMM register, bits [255:128] of the corresponding YMM register
are cleared.
The first source (src1) is an XMM register specified by the XOP.vvvv field and the second source
(src2) is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field.
The comparison type is specified by bits [2:0] of an immediate-byte operand (imm8). Each type has
an alias mnemonic to facilitate coding.
imm8[2:0]

Comparison

Mnemonic

000

Less Than

VPCOMLTD

001

Less Than or Equal

VPCOMLED

010

Greater Than

VPCOMGTD

011

Greater Than or Equal

VPCOMGED

100

Equal

VPCOMEQD

101

Not Equal

VPCOMNEQD

110

False

VPCOMFALSED

111

True

VPCOMTRUED

Instruction Support
Form

Subset

VPCOMD

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
XOP RXB.map_select

VPCOMD xmm1, xmm2, xmm3/mem128, imm8

8F

RXB.08

W.vvvv.L.pp

Opcode

0.src1.0.00

CE /r ib

Related Instructions
VPCOMUB, VPCOMUW, VPCOMUD, VPCOMUQ, VPCOMB, VPCOMW, VPCOMQ
rFLAGS Affected
None

Instruction Reference

VPCOMD

711

AMD64 Technology

26568—Rev. 3.22—May 2018

MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

712

X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPCOMD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

VPCOMQ

Compare Vector
Signed Quadwords

Compares corresponding packed signed quadwords in the first and second sources and writes the
result of each comparison to the corresponding quadword of the destination. The result of each comparison is a 64-bit value of all 1s (TRUE) or all 0s (FALSE).
There are four operands: VPCOMQ dest, src1, src2, imm8
The destination (dest) is an XMM register specified by ModRM.reg. When the result is written to the
destination XMM register, bits [255:128] of the corresponding YMM register are cleared.
The first source (src1) is an XMM register specified by the XOP.vvvv field and the second source
(src2) is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field.
The comparison type is specified by bits [2:0] of an immediate-byte operand (imm8). Each type has
an alias mnemonic to facilitate coding.
imm8[2:0]

Comparison

Mnemonic

000

Less Than

VPCOMLTQ

001

Less Than or Equal

VPCOMLEQ

010

Greater Than

VPCOMGTQ

011

Greater Than or Equal

VPCOMGEQ

100

Equal

VPCOMEQQ

101

Not Equal

VPCOMNEQQ

110

False

VPCOMFALSEQ

111

True

VPCOMTRUEQ

Instruction Support
Form

Subset

VPCOMQ

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
XOP RXB.map_select

VPCOMQ xmm1, xmm2, xmm3/mem128, imm8

8F

RXB.08

W.vvvv.L.pp

Opcode

0.src1.0.00

CF /r ib

Related Instructions
VPCOMUB, VPCOMUW, VPCOMUD, VPCOMUQ, VPCOMB, VPCOMW, VPCOMD
rFLAGS Affected
None

Instruction Reference

VPCOMQ

713

AMD64 Technology

26568—Rev. 3.22—May 2018

MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

714

X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPCOMQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

VPCOMUB

Compare Vector
Unsigned Bytes

Compares corresponding packed unsigned bytes in the first and second sources and writes the result
of each comparison to the corresponding byte of the destination. The result of each comparison is an
8-bit value of all 1s (TRUE) or all 0s (FALSE).
There are four operands: VPCOMUB dest, src1, src2, imm8
The destination (dest) is an XMM register specified by ModRM.reg. When the result is written to the
destination XMM register, bits [255:128] of the corresponding YMM register are cleared.
The first source (src1) is an XMM register specified by the XOP.vvvv field and the second source
(src2) is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field.
The comparison type is specified by bits [2:0] of an immediate-byte operand (imm8). Each type has
an alias mnemonic to facilitate coding.
imm8[2:0]

Comparison

Mnemonic

000

Less Than

VPCOMLTUB

001

Less Than or Equal

VPCOMLEUB

010

Greater Than

VPCOMGTUB

011

Greater Than or Equal

VPCOMGEUB

100

Equal

VPCOMEQUB

101

Not Equal

VPCOMNEQUB

110

False

VPCOMFALSEUB

111

True

VPCOMTRUEUB

Instruction Support
Form

Subset

VPCOMUB

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
XOP RXB.map_select

VPCOMUB xmm1, xmm2, xmm3/mem128, imm8

8F

RXB.08

W.vvvv.L.pp

Opcode

0.src1.0.00

EC /r ib

Related Instructions
VPCOMUW, VPCOMUD, VPCOMUQ, VPCOMB, VPCOMW, VPCOMD, VPCOMQ
rFLAGS Affected
None

Instruction Reference

VPCOMUB

715

AMD64 Technology

26568—Rev. 3.22—May 2018

MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

716

X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPCOMUB

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

VPCOMUD

Compare Vector
Unsigned Doublewords

Compares corresponding packed unsigned doublewords in the first and second sources and writes the
result of each comparison to the corresponding doubleword of the destination. The result of each
comparison is a 32-bit value of all 1s (TRUE) or all 0s (FALSE).
There are four operands: VPCOMUD dest, src1, src2, imm8
The destination (dest) is an XMM register specified by ModRM.reg. When the results are written to
the destination XMM register, bits [255:128] of the corresponding YMM register are cleared.
The first source (src1) is an XMM register specified by the XOP.vvvv field and the second source
(src2) is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field.
The comparison type is specified by bits [2:0] of an immediate-byte operand (imm8). Each type has
an alias mnemonic to facilitate coding.
imm8[2:0]

Comparison

Mnemonic

000

Less Than

VPCOMLTUD

001

Less Than or Equal

VPCOMLEUD

010

Greater Than

VPCOMGTUD

011

Greater Than or Equal

VPCOMGEUD

100

Equal

VPCOMEQUD

101

Not Equal

VPCOMNEQUD

110

False

VPCOMFALSEUD

111

True

VPCOMTRUEUD

Instruction Support
Form

Subset

VPCOMUD

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
XOP RXB.map_select

VPCOMUD xmm1, xmm2, xmm3/mem128, imm8

8F

RXB.08

W.vvvv.L.pp

Opcode

0.src1.0.00

EE /r ib

Related Instructions
VPCOMUB, VPCOMUW, VPCOMUQ, VPCOMB, VPCOMW, VPCOMD, VPCOMQ
rFLAGS Affected
None

Instruction Reference

VPCOMUD

717

AMD64 Technology

26568—Rev. 3.22—May 2018

MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

718

X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPCOMUD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

VPCOMUQ

Compare Vector
Unsigned Quadwords

Compares corresponding packed unsigned quadwords in the first and second sources and writes the
result of each comparison to the corresponding quadword of the destination. The result of each comparison is a 64-bit value of all 1s (TRUE) or all 0s (FALSE).
There are four operands: VPCOMUQ dest, src1, src2, imm8
The destination (dest) is an XMM register specified by ModRM.reg. When the results are written to
the destination XMM register, bits [255:128] of the corresponding YMM register are cleared.
The first source (src1) is an XMM register specified by the XOP.vvvv field and the second source
(src2) is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field.
The comparison type is specified by bits [2:0] of an immediate-byte operand (imm8). Each type has
an alias mnemonic to facilitate coding.
imm8[2:0]

Comparison

Mnemonic

000

Less Than

VPCOMLTUQ

001

Less Than or Equal

VPCOMLEUQ

010

Greater Than

VPCOMGTUQ

011

Greater Than or Equal

VPCOMGEUQ

100

Equal

VPCOMEQUQ

101

Not Equal

VPCOMNEQUQ

110

False

VPCOMFALSEUQ

111

True

VPCOMTRUEUQ

Instruction Support
Form

Subset

VPCOMUQ

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
XOP RXB.map_select

VPCOMUQ xmm1, xmm2, xmm3/mem128, imm8

8F

RXB.08

W.vvvv.L.pp

Opcode

0.src1.0.00

EF /r ib

Related Instructions
VPCOMUB, VPCOMUW, VPCOMUD, VPCOMB, VPCOMW, VPCOMD, VPCOMQ
rFLAGS Affected
None

Instruction Reference

VPCOMUQ

719

AMD64 Technology

26568—Rev. 3.22—May 2018

MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

720

X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPCOMUQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

VPCOMUW

Compare Vector
Unsigned Words

Compares corresponding packed unsigned words in the first and second sources and writes the result
of each comparison to the corresponding word of the destination. The result of each comparison is a
16-bit value of all 1s (TRUE) or all 0s (FALSE).
There are four operands: VPCOMUW dest, src1, src2, imm8
The destination (dest) is an XMM register specified by ModRM.reg. When the results are written to
the destination XMM register, bits [255:128] of the corresponding YMM register are cleared.
The first source (src1) is an XMM register specified by the XOP.vvvv field and the second source
(src2) is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field.
The comparison type is specified by bits [2:0] of an immediate-byte operand (imm8). Each type has
an alias mnemonic to facilitate coding.
imm8[2:0]

Comparison

Mnemonic

000

Less Than

VPCOMLTUW

001

Less Than or Equal

VPCOMLEUW

010

Greater Than

VPCOMGTUW

011

Greater Than or Equal

VPCOMGEUW

100

Equal

VPCOMEQUW

101

Not Equal

VPCOMNEQUW

110

False

VPCOMFALSEUW

111

True

VPCOMTRUEUW

Instruction Support
Form

Subset

VPCOMUW

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
XOP RXB.map_select W.vvvv.L.pp

VPCOMUW xmm1, xmm2, xmm3/mem128, imm8

8F

RXB.08

0.src1.0.00

Opcode
ED /r ib

Related Instructions
VPCOMUB, VPCOMUD, VPCOMUQ, VPCOMB, VPCOMW, VPCOMD, VPCOMQ
rFLAGS Affected
None

Instruction Reference

VPCOMUW

721

AMD64 Technology

26568—Rev. 3.22—May 2018

MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

722

X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPCOMUW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

VPCOMW

Compare Vector
Signed Words

Compares corresponding packed signed words in the first and second sources and writes the result of
each comparison in the corresponding word of the destination. The result of each comparison is a 16bit value of all 1s (TRUE) or all 0s (FALSE).
There are four operands: VPCOMW dest, src1, src2, imm8
The destination (dest) is an XMM register specified by ModRM.reg. When the results are written to
the destination XMM register, bits [255:128] of the corresponding YMM register are cleared.
The first source (src1) is an XMM register specified by the XOP.vvvv field and the second source
(src2) is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field.
The comparison type is specified by bits [2:0] of an immediate-byte operand (imm8). Each type has
an alias mnemonic to facilitate coding.
imm8[2:0]

Comparison

Mnemonic

000

Less Than

VPCOMLTW

001

Less Than or Equal

VPCOMLEW

010

Greater Than

VPCOMGTW

011

Greater Than or Equal

VPCOMGEW

100

Equal

VPCOMEQW

101

Not Equal

VPCOMNEQW

110

False

VPCOMFALSEW

111

True

VPCOMTRUEW

Instruction Support
Form

Subset

VPCOMW

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
XOP RXB.map_select

VPCOMW xmm1, xmm2, xmm3/mem128, imm8

8F

RXB.08

W.vvvv.L.pp

Opcode

0.src1.0.00

CD /r ib

Related Instructions
VPCOMUB, VPCOMUW, VPCOMUD, VPCOMUQ, VPCOMB, VPCOMD, VPCOMQ
rFLAGS Affected
None

Instruction Reference

VPCOMW

723

AMD64 Technology

26568—Rev. 3.22—May 2018

MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

724

X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPCOMW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

VPERM2F128

Permute Floating-Point
128-bit

Copies 128 bits of floating-point data from a selected octword of two 256-bit source operands or zero
to each octword of a 256-bit destination, as specified by an immediate byte operand.
The immediate operand is encoded as follows.
Destination

Immediate-Byte
Bit Field

Value of
Bit Field

Source 1
Bits Copied

Source 2
Bits Copied

[127:0]

[1:0]

00

[127:0]

—

01

[255:128]

—

10

—

[127:0]

11

—

[255:128]

Setting imm8 [3] clears bits [127:0] of the destination; imm8 [2] is ignored.
[255:128]

[5:4]

00

[127:0]

—

01

[255:128]

—

10

—

[127:0]

11

—

[255:128]

Setting imm8 [7] clears bits [255:128] of the destination; imm8 [6] is ignored.

This is a 256-bit extended-form instruction:
The first source operand is a YMM register and the second source operand is either a YMM register
or a 256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

VPERM2F128

AVX

Feature Flag
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding

VPERM2F128 ymm1, ymm2, ymm3/mem256, imm8

VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.03

0.src1.1.01

06 /r ib

Related Instructions
VEXTRACTF128, VINSERTF128, VPERMILPD, VPERMILPS
rFLAGS Affected
None

Instruction Reference

VPERM2F128

725

AMD64 Technology

26568—Rev. 3.22—May 2018

MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
A — AVX exception.

726

A
A
A
A
A
A
A
A
A
A
A
A
A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
VEX.L = 0.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPERM2F128

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

VPERM2I128

Permute Integer
128-bit

Copies 128 bits of integer data from a selected octword of two 256-bit source operands or zero to
each octword of a 256-bit destination, as specified by an immediate byte operand.
The immediate operand is encoded as follows.
Destination

Immediate-Byte
Bit Field

Value of
Bit Field

Source 1
Bits Copied

Source 2
Bits Copied

[127:0]

[1:0]

00

[127:0]

—

01

[255:128]

—

10

—

[127:0]

11

—

[255:128]

Setting imm8 [3] clears bits [127:0] of the destination; imm8 [2] is ignored.
[255:128]

[5:4]

00

[127:0]

—

01

[255:128]

—

10

—

[127:0]

11

—

[255:128]

Setting imm8 [7] clears bits [255:128] of the destination; imm8 [6] is ignored.

This is a 256-bit extended-form instruction:
The first source operand is a YMM register and the second source operand is either a YMM register
or a 256-bit memory location. The destination is a third YMM register. Bits 2 and 6 of the immediate
byte are ignored.
Instruction Support
Form

Subset

VPERM2I128

AVX2

Feature Flag
CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding

VPERM2I128 ymm1, ymm2, ymm3/mem256, imm8

VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.03

0.src1.1.01

46 /r ib

Related Instructions
VEXTRACTI128, VEXTRACTF128, VINSERTI128, VINSERTF128, VPERMILPD, VPERMILPS
rFLAGS Affected
None

Instruction Reference

VPERM2I128

727

AMD64 Technology

26568—Rev. 3.22—May 2018

MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
A — AVX exception.

728

A
A
A
A
A
A
A
A
A
A
A
A
A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
VEX.L = 0.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPERM2I128

Instruction Reference

26568—Rev. 3.22—May 2018

VPERMD

AMD64 Technology

Packed Permute Doubleword

Copies selected doublewords from a 256-bit value located either in memory or a YMM register to
specific doublewords of the destination YMM register. For each doubleword of the destination, selection of which doubleword to copy from the source is specified by a selector field in the corresponding
doubleword of a YMM register.
There is a single form of this instruction:
VPERMD dest, src1, src2

The first source operand provides eight 3-bit selectors, each selector occupying the least-significant
bits of a doubleword. Each selector specifies the index of the doubleword of the second source operand to be copied to the destination. The doubleword in the destination that each selector controls is
based on its position within the first source operand.
The index value may be the same in multiple selectors. This results in multiple copies of the same
source doubleword being copied to the destination.
There is no 128-bit form of this instruction.
YMM Encoding

The destination is a YMM register. The first source operand is a YMM register and the second source
operand is either a YMM register or a 256-bit memory location.
Instruction Support
Form

Subset

VPERMD

AVX2

Feature Flag
Fn0000_00007_EBX[AVX2]_x0 (bit 5)

Instruction Encoding
Encoding
Mnemonic
VPERMD ymm1, ymm2, ymm3/mem256

VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.02

0.src1.1.01

36 /r

Related Instructions
VPERMQ, VPERMPD, VPERMPS
rFLAGS Affected
None
MXCSR Flags Affected
None

Instruction Reference

VPERMD

729

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
A
A
A
A

A
A
A
A

A
A
A
A

A
A
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Alignment check, #AC
Page fault, #PF
A — AVX2 exception

730

A

A
A
A
A
A
A
A
A
A
A
A
A
A
A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L= 0.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

VPERMD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

VPERMIL2PD

Permute Two-Source
Double-Precision Floating-Point

Copies a selected quadword from one of two source operands to a selected quadword of the destination or clears the selected quadword of the destination. Values in a third source operand and an immediate two-bit operand control the operation.
There are 128-bit and 256-bit versions of this instruction. Both versions have five operands:
VPERMIL2PD dest, src1, src2, src3, m2z.

The first four operands are either 128 bits or 256 bits wide, as determined by VEX.L. When the destination is an XMM register, bits [255:128] of the corresponding YMM register are cleared.
The third source operand is a selector that specifies how quadwords are copied or cleared in the destination. The selector contains one selector element for each quadword of the destination register.
Selector for 128-bit Instruction Form
127

64 63

0

S1

S0

The selector for the 128-bit instruction form is an octword composed of two quadword selector elements S0 and S1. S0 (the lower quadword) controls the value written to destination quadword 0 (bits
[63:0]) and S1 (the upper quadword) controls the destination quadword 1 (bits [127:64]).
Selector for 256-bit Instruction Form
255

192 191

128

S3

S2

127

64 63

0

S1

S0

The selector for the 256-bit instruction form is a double octword and adds two more selector elements
S2 and S3. S0 controls the value written to the destination quadword 0 (bits [63:0]), S1 controls the
destination quadword 1 (bits [127:64]), S2 controls the destination quadword 2 (bits [191:128]), and
S3 controls the destination quadword 3 (bits [255:192]).
The layout of each selector element is as follows:
63

4 3 2 1 0
Reserved, IGN

M

Bits

Mnemonic

Description

[63:4]

—

Reserved, IGN

[3]

M

Match

[2:1]

Sel

Select

[0]

—

Reserved, IGN

Sel

The fields are defined as follows:

Instruction Reference

VPERMIL2PD

731

AMD64 Technology

•

•

26568—Rev. 3.22—May 2018

Sel — Select. Selects the source quadword to copy into the corresponding quadword of the
destination:
Sel Value

Source Selected for Destination
Quadwords 0 and 1 (both forms)

Source Selected for Destination
Quadwords 2 and 3 (256-bit form)

00b

src1[63:0]

src1[191:128]

01b

src1[127:64]

src1[255:192]

10b

src2[63:0]

src2[191:128]

11b

src2[127:64]

src2[255:192]

M — Match bit. The combination of the Match bit in each selector element and the value of the
M2Z field determines if the Select field is overridden. This is described below.

m2z immediate operand

The fifth operand is m2z. The assembler uses this 2-bit value to encode the M2Z field in the instruction. M2Z occupies bits [1:0] of an immediate byte. Bits [7:4] of the same byte are used to select one
of 16 YMM/XMM registers. This dual use of the immediate byte is indicated in the instruction synopsis by the symbol “is5”.
The immediate byte is defined as follows.
7

4

3

2

1

SRS

0

M2Z

Bits

Mnemonic

Description

[7:4]

SRS

Source Register Select

[3:2]

—

Reserved, IGN

[1:0]

M2Z

Match to Zero

Fields are defined as follows:
• SRS — Source Register Select. As with many other extended instructions, bits in the immediate
byte are used to select a source operand register. This field is set by the assembler based on the
operands listed in the instruction. See discussion in “src2 and src3 Operand Addressing” below.
• M2Z — Match to Zero. This field, combined with the M bit of the selector element, controls the
function of the Sel field as follows:
.

M2Z Field

Selector M Bit

Value Loaded into Destination Quadword

0Xb

X

Source quadword selected by selector element Sel field.

10b

0

Source quadword selected by selector element Sel field.

10b

1

Zero

11b

0

Zero

11b

1

Source quadword selected by selector element Sel field.

src2 and src3 Operand Addressing

In 64-bit mode, VEX.W and bits [7:4] of the immediate byte specify src2 and src3:

732

VPERMIL2PD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

•

When VEX.W = 0, src2 is either a register or a memory location specified by ModRM.r/m and
src3 is a register specified by bits [7:4] of the immediate byte.
• When VEX.W = 1, src2 is a register specified by bits [7:4] of the immediate byte and src3 is either
a register or a memory location specified by ModRM.r/m.
In non-64-bit mode, bit 7 is ignored.
Instruction Support
Form

Subset

VPERMIL2PD

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Encoding
Mnemonic

VEX RXB.map_select W.vvvv.L.pp Opcode

VPERMIL2PD xmm1, xmm2, xmm3/mem128, xmm4, m2z

C4

RXB.03

0.src1.0.01

49 /r is5

VPERMIL2PD xmm1, xmm2, xmm3, xmm4/mem128, m2z

C4

RXB.03

1.src1.0.01

49 /r is5

VPERMIL2PD ymm1, ymm2, ymm3/mem256, ymm4, m2z

C4

RXB.03

0.src1.1.01

49 /r is5

VPERMIL2PD ymm1, ymm2, ymm3, ymm4/mem256, m2z

C4

RXB.03

1.src1.1.01

49 /r is5

NOTE: VPERMIL2PD is encoded using the VEX prefix even though it is an XOP instruction.
Related Instructions
VPERM2F128, VPERMIL2PS, VPERMILPD, VPERMILPS, VPPERM
rFLAGS Affected
None
MXCSR Flags Affected
None

Instruction Reference

VPERMIL2PD

733

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

734

X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPERMIL2PD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

VPERMIL2PS

Permute Two-Source
Single-Precision Floating-Point

Copies a selected doubleword from one of two source operands to a selected doubleword of the destination or clears the selected doubleword of the destination. Values in a third source operand and an
immediate two-bit operand control operation.
There are 128-bit and 256-bit versions of this instruction. Both versions have five operands:
VPERMIL2PS dest, src1, src2, src3, m2z

The first four operands are either 128 bits or 256 bits wide, as determined by VEX.L. When the destination is an XMM register, bits [255:128] of the corresponding YMM register are cleared.
The third source operand is a selector that specifies how doublewords are copied or cleared in the destination. The selector contains one selector element for each doubleword of the destination register.
Selector for 128-bit Instruction Form
127

96 95

64 63

S3

32 31

S2

S1

0

S0

The selector for the 128-bit instruction form is an octword containing four selector elements S0–S3.
S0 controls the value written to the destination doubleword 0 (bits [31:0]), S1 controls the destination
doubleword 1 (bits [63:32]), S2 controls the destination doubleword 2 (bits [95:64]), and S3 controls
the destination doubleword 3 (bits [127:96]).
Selector for 256-bit Instruction Form
255

224 223

192 191

S7
127

160 159

S6

S5

96 95

64 63

S3

128

S4
32 31

S2

S1

0

S0

The selector for the 256-bit instruction form is a double octword and adds four more selector elements S4–S7. S4 controls the value written to the destination doubleword 4 (bits [159:128]), S5 controls the destination doubleword 5 (bits [191:160]), S6 controls the destination doubleword 6 (bits
[223:192]), and S7 controls the destination doubleword 7 (bits [255:224]).
The layout of each selector element is as follows.
31

4 3 2 1 0
Reserved, IGN

M

Bits

Mnemonic

Description

[31:4]

—

Reserved, IGN

[3]

M

Match

[2:0]

Sel

Select

Sel

The fields are defined as follows:

Instruction Reference

VPERMIL2PS

735

AMD64 Technology

•

•

26568—Rev. 3.22—May 2018

Sel — Select. Selects the source doubleword to copy into the corresponding doubleword of the
destination:
Sel Value

Source Selected for Destination
Doublewords 0, 1, 2 and 3 (both forms)

Source Selected for Destination
Doublewords 4, 5, 6 and 7 (256-bit form)

000b

src1[31:0]

src1[159:128]

001b

src1[63:32]

src1[191:160]

010b

src1[95:64]

src1[223:192]

011b

src1[127:96]

src1[255:224]

100b

src2[31:0]

src2[159:128]

101b

src2[63:32]

src2[191:160]

110b

src2[95:64]

src2[223:192]

111b

src2[127:96]

src2[255:224]

M — Match. The combination of the M bit in each selector element and the value of the M2Z field
determines if the Sel field is overridden. This is described below.

m2z immediate operand

The fifth operand is m2z. The assembler uses this 2-bit value to encode the M2Z field in the instruction. M2Z occupies bits [1:0] of an immediate byte. Bits [7:4] of the same byte are used to select one
of 16 YMM/XMM registers. This dual use of the immediate byte is indicated in the instruction synopsis by the symbol “is5”.
The immediate byte is defined as follows.
7

4

3

2

SRS

1

0

M2Z

Bits

Mnemonic

Description

[7:4]

SRS

Source Register Select

[3:2]

—

Reserved, IGN

[1:0]

M2Z

Match to Zero

Fields are defined as follows:
• SRS — Source Register Select. As with many other extended instructions, bits in the immediate
byte are used to select a source operand register. This field is set by the assembler based on the
operands listed in the instruction. See discussion in “src2 and src3 Operand Addressing” below.
• M2Z — Match to Zero. This field, combined with the M bit of the selector element, controls the
function of the Sel field as follows:

736

M2Z Field

Selector M Bit

Value Loaded into Destination Doubleword

0Xb

X

Source doubleword selected by Sel field.

10b

0

Source doubleword selected by Sel field.

VPERMIL2PS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

M2Z Field

Selector M Bit

Value Loaded into Destination Doubleword

10b

1

Zero

11b

0

Zero

11b

1

Source doubleword selected by Sel field.

src2 and src3 Operand Addressing

In 64-bit mode, VEX.W and bits [7:4] of the immediate byte specify src2 and src3:
• When VEX.W = 0, src2 is either a register or a memory location specified by ModRM.r/m and
src3 is a register specified by bits [7:4] of the immediate byte.
• When VEX.W = 1, src2 is a register specified by bits [7:4] of the immediate byte and src3 is either
a register or a memory location specified by ModRM.r/m.
In non-64-bit mode, bit 7 is ignored.
Instruction Support
Form

Subset

VPERMIL2PS

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Encoding
Mnemonic

VEX RXB.map_select W.vvvv.L.pp Opcode

VPERMIL2PS xmm1, xmm2, xmm3/mem128, xmm4, m2z

C4

RXB.03

0.src1.0.01

48 /r is5

VPERMIL2PS xmm1, xmm2, xmm3, xmm4/mem128, m2z

C4

RXB.03

1.src1.0.01

48 /r is5

VPERMIL2PS ymm1, ymm2, ymm3/mem256, ymm4, m2z

C4

RXB.03

0.src1.1.01

48 /r is5

VPERMIL2PS ymm1, ymm2, ymm3, ymm4/mem256, m2z

C4

RXB.03

1.src1.1.01

48 /r is5

NOTE: VPERMIL2PS is encoded using the VEX prefix even though it is an XOP instruction.
Related Instructions
VPERM2F128, VPERMIL2PD, VPERMILPD, VPERMILPS, VPPERM
rFLAGS Affected
None
MXCSR Flags Affected
None

Instruction Reference

VPERMIL2PS

737

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

738

X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPERMIL2PS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

VPERMILPD

Permute
Double-Precision

Copies double-precision floating-point values from a source to a destination. Source and destination
can be selected in two ways. There are different encodings for each selection method.
Selection by bits in a source register or memory location:
Each quadword of the operand is defined as follows.
63

2

1

0

Sel

A bit selects source and destination. Only bit [1] is used; bits [63:2} and bit [0] are ignored. Setting
the bit selects the corresponding quadword element of the source and the destination.
Selection by bits in an immediate byte:
Each bit corresponds to a destination quadword. Only bits [3:2] and bits [1:0] are used; bits [7:4] are
ignored. Selections are defined as follows.
Destination
Quadword

Immediate-Byte
Bit Field

Value of
Bit Field

Source 1
Bits Copied

Used by 128-bit encoding and 256-bit encoding
[63:0]
[127:64]

[0]
[1]

0

[63:0]

1

[127:64]

0

[63:0]

1

[127:64]

Used only by 256-bit encoding
[191:128]
[255:192]

[2]
[3]

0

[191:128]

1

[255:192]

0

[191:128]

1

[255:192]

This extended-form instruction has both 128-bit and 256-bit encoding.
XMM Encoding

There are two encodings, one for each selection method:
• The first source operand is an XMM register. The second source operand is either an XMM
register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of
the YMM register that corresponds to the destination are cleared.
• The first source operand is either an XMM register or a 128-bit memory location. The destination
is an XMM register. There is a third, immediate byte operand. Bits [255:128] of the YMM register
that corresponds to the destination are cleared.
YMM Encoding

There are two encodings, one for each selection method:
Instruction Reference

VPERMILPD

739

AMD64 Technology

•
•

26568—Rev. 3.22—May 2018

The first source operand is a YMM register. The second source operand is either a YMM register
or a 256-bit memory location. The destination is a third YMM register.
The first source operand is either a YMM register or a 256-bit memory location. The destination is
a YMM register. There is a third, immediate byte operand.

Instruction Support
Form

Subset

VPERMILPD

AVX

Feature Flag
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp Opcode

Selection by source register or memory:
VPERMILPD xmm1, xmm2, xmm3/mem128

C4

RXB.02

0.src1.0.01

0D /r

VPERMILPD ymm1, ymm2, ymm3/mem256

C4

RXB.02

0.src1.1.01

0D /r

VPERMILPD xmm1, xmm2/mem128, imm8

C4

RXB.03

0.1111.0.01

05 /r ib

VPERMILPD ymm1, ymm2/mem256, imm8

C4

RXB.03

0.1111.1.01

05 /r ib

Selection by immediate byte operand:

Related Instructions
VPERM2F128, VPERMIL2PD, VPERMIL2PS, VPERMILPS, VPPERM
rFLAGS Affected
None
MXCSR Flags Affected
None

740

VPERMILPD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
A — AVX exception.

Instruction Reference

A
A
A
A
A
A
A
A
A
A
A
A
A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
VEX.vvvv ! = 1111b (for versions with immediate byte operand only).
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

VPERMILPD

741

AMD64 Technology

26568—Rev. 3.22—May 2018

VPERMILPS

Permute
Single-Precision

Copies single-precision floating-point values from a source to a destination. Source and destination
can be selected in two ways. There are different encodings for each selection method.
Selection by bit fields in a source register or memory location:
Each doubleword of the operand is defined as follows.
31

2

1

0
Sel

Each bit field corresponds to a destination doubleword. Bit values select a source doubleword. Only
bits [1:0] of each word are used; bits [31:2} are ignored. The 128-bit encoding uses four two-bit
fields; the 256-bit version uses eight two-bit fields. Field encoding is as follows.

742

Destination
Doubleword
[31:0]

Immediate Operand
Bit Field
[1:0]

[63:32]

[33:32]

[95:64]

[65:64]

[127:96]

[97:96]

VPERMILPS

Value of
Bit Field
00
01
10
11
00
01
10
11
00
01
10
11
00
01
10
11

Source
Bits Copied
[31:0]
[63:32]
[95:64]
[127:96]
[31:0]
[63:32]
[95:64]
[127:96]
[31:0]
[63:32]
[95:64]
[127:96]
[31:0]
[63:32]
[95:64]
[127:96]

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Destination
Immediate Operand
Value of
Source
Doubleword
Bit Field
Bit Field
Bits Copied
Upper 128 bits of 256-bit source and destination used by 256-bit encoding
[159:128]
[129:128]
00
[159:128]
01
[191:160]
10
[223:192]
11
[255:224]
[191:160]
[161:160]
00
[159:128]
01
[191:160]
10
[223:192]
11
[255:224]
[223:192]
[193:192]
00
[159:128]
01
[191:160]
10
[223:192]
11
[255:224]
[255:224]
[225:224]
00
[159:128]
01
[191:160]
10
[223:192]
11
[255:224]

Selection by bit fields in an immediate byte:
Each bit field corresponds to a destination doubleword. For the 256-bit encoding, the fields specify
sources and destinations in both the upper and lower 128 bits of the register. Selections are defined as
follows.
Destination
Doubleword
[31:0]

Bit Field

[63:32]

[3:2]

[95:64]

[5:4]

[127:96]

[7:6]

Instruction Reference

[1:0]

VPERMILPS

Value of Bit
Field
00
01
10
11
00
01
10
11
00
01
10
11
00
01
10
11

Source
Bits Copied
[31:0]
[63:32]
[95:64]
[127:96]
[31:0]
[63:32]
[95:64]
[127:96]
[31:0]
[63:32]
[95:64]
[127:96]
[31:0]
[63:32]
[95:64]
[127:96]

743

AMD64 Technology

26568—Rev. 3.22—May 2018

Destination
Bit Field
Value of Bit
Source
Doubleword
Field
Bits Copied
Upper 128 bits of 256-bit source and destination used by 256-bit encoding
[159:128]
[1:0]
00
[159:128]
01
[191:160]
10
[223:192]
11
[255:224]
[191:160]
[3:2]
00
[159:128]
01
[191:160]
10
[223:192]
11
[255:224]
[223:192]
[5:4]
00
[159:128]
01
[191:160]
10
[223:192]
11
[255:224]
[255:224]
[7:6]
00
[159:128]
01
[191:160]
10
[223:192]
11
[255:224]

This extended-form instruction has both 128-bit and 256-bit encodings:
XMM Encoding

There are two encodings, one for each selection method:
• The first source operand is an XMM register. The second source operand is either an XMM
register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of
the YMM register that corresponds to the destination are cleared.
• The first source operand is either an XMM register or a 128-bit memory location. The destination
is an XMM register. There is a third, immediate byte operand. Bits [255:128] of the YMM register
that corresponds to the destination are cleared.
YMM Encoding

There are two encodings, one for each selection method:
• The first source operand is a YMM register. The second source operand is either a YMM register
or a 256-bit memory location. The destination is a third YMM register.
• The first source operand is either a YMM register or a 256-bit memory location. The destination is
a YMM register. There is a third, immediate byte operand.
Instruction Support
Form

Subset

VPERMILPS

AVX

Feature Flag
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

744

VPERMILPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPERMILPS xmm1, xmm2, xmm3/mem128

C4

RXB.02

0.src1.0.01

0C /r

VPERMILPS ymm1, ymm2, ymm3/mem256

C4

RXB.02

0.src1.1.01

0C /r

VPERMILPS xmm1, xmm2/mem128, imm8

C4

RXB.03

0.1111.0.01

04 /r ib

VPERMILPS ymm1, ymm2/mem256, imm8

C4

RXB.03

0.1111.1.01

04 /r ib

Selection by source register or memory:

Selection by immediate byte operand:

Related Instructions
VPERM2F128, VPERMIL2PD, VPERMIL2PS, VPERMILPD, VPPERM
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
A — AVX exception.

Instruction Reference

A
A
A
A
A
A
A
A
A
A
A
A
A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
VEX.vvvv ! = 1111b (for versions with immediate byte operand only).
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

VPERMILPS

745

AMD64 Technology

26568—Rev. 3.22—May 2018

VPERMPD

Packed Permute
Double-Precision Floating-Point

Copies selected quadwords from a 256-bit value located either in memory or a YMM register to specific quadwords of the destination. For each quadword of the destination, selection of which quadword to copy from the source is specified by a 2 bit selector field in an immediate byte.
There is a single form of this instruction:
VPERMPD dest, src, imm8

The selection of which quadword of the source operand to copy to each quadword of the destination
is specified by four 2-bit selector fields in the immediate byte. Bits [1:0] specify the index of the
quadword to be copied to the destination quadword 0. Bits [3:2] select the quadword to be copied to
quadword 1, bits [5:4] select the quadword to be copied to quadword 2, and bits [7:6] select the quadword to be copied to quadword 3.
The index value may be the same in multiple selectors. This results in multiple copies of the same
source quadword being copied to the destination.
There is no 128-bit form of this instruction.
YMM Encoding

The destination is a YMM register. The source operand is a YMM register or a 256-bit memory location.
Instruction Support
Form

Subset

VPERMPD

AVX2

Feature Flag
Fn0000_00007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Encoding
Mnemonic
VPERMPD ymm1, ymm2/mem256, imm8

VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.03

1.1111.1.01

01 /r ib

Related Instructions
VPERMD, VPERMQ, VPERMPS
rFLAGS Affected
None
MXCSR Flags Affected
None

746

VPERMPD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
A
A
A
A

A
A
A
A

A
A
A
A

A
A
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Alignment check, #AC
Page fault, #PF
A — AVX2 exception

Instruction Reference

A

A
A
A
A
A
A
A
A
A
A
A
A
A
A
A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L= 0.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

VPERMPD

747

AMD64 Technology

26568—Rev. 3.22—May 2018

VPERMPS

Packed Permute
Single-Precision Floating-Point

Copies selected doublewords from a 256-bit value located either in memory or a YMM register to
specific doublewords of the destination YMM register. For each doubleword of the destination, selection of which doubleword to copy from the source is specified by a selector field in the corresponding
doubleword of a YMM register.
There is a single form of this instruction:
VPERMPS dest, src1, src2

The first source operand provides eight 3-bit selectors, each selector occupying the least-significant
bits of a doubleword. Each selector specifies the index of the doubleword of the second source operand to be copied to the destination. The doubleword in the destination that each selector controls is
based on its position within the first source operand.
The index value may be the same in multiple selectors. This results in multiple copies of the same
source doubleword being copied to the destination.
There is no 128-bit form of this instruction.
YMM Encoding

The destination is a YMM register. The first source operand is a YMM register and the second source
operand is either a YMM register or a 256-bit memory location.
Instruction Support
Form

Subset

VPERMPS

AVX2

Feature Flag
Fn0000_00007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Encoding
Mnemonic
VPERMPS ymm1, ymm2, ymm3/mem256

VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.02

0.src1.1.01

16 /r

Related Instructions
VPERMD, VPERMQ, VPERMPD
rFLAGS Affected
None
MXCSR Flags Affected
None

748

VPERMPS

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception
Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Alignment check, #AC
Page fault, #PF
A — AVX2 exception

Instruction Reference

Mode
Real Virt Prot
A
A
A
A

A
A
A
A

A
A
A
A

A
A
A
A

A

A
A
A
A
A
A
A
A
A
A
A
A
A
A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L= 0.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

VPERMPS

749

AMD64 Technology

26568—Rev. 3.22—May 2018

VPERMQ

Packed Permute Quadword

Copies selected quadwords from a 256-bit value located either in memory or a YMM register to specific quadwords of the destination. For each quadword of the destination, selection of which quadword to copy from the source is specified by a 2 bit selector field in an immediate byte.
There is a single form of this instruction:
VPERMQ dest, src, imm8

The selection of which quadword of the source operand to copy to each quadword of the destination
is specified by four 2-bit selector fields in the immediate byte. Bits [1:0] specify the index of the
quadword to be copied to the destination quadword 0. Bits [3:2] select the quadword to be copied to
quadword 1, bits [5:4] select the quadword to be copied to quadword 2, and bits [7:6] select the quadword to be copied to quadword 3.
The index value may be the same in multiple selectors. This results in multiple copies of the same
source quadword being copied to the destination.
There is no 128-bit form of this instruction.
YMM Encoding

The destination is a YMM register. The source operand is a YMM register or a 256-bit memory location.
Instruction Support
Form

Subset

VPERMQ

AVX2

Feature Flag
Fn0000_00007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Encoding
Mnemonic
VPERMQ ymm1, ymm2/mem256, imm8

VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.03

1.1111.1.01

00 /r ib

Related Instructions
VPERMD, VPERMPD, VPERMPS
rFLAGS Affected
None
MXCSR Flags Affected
None

750

VPERMQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
A
A
A
A

A
A
A
A

A
A
A
A

A
A
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Alignment check, #AC
Page fault, #PF
A — AVX2 exception

Instruction Reference

A

A
A
A
A
A
A
A
A
A
A
A
A
A
A
A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L= 0.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

VPERMQ

751

AMD64 Technology

26568—Rev. 3.22—May 2018

VPGATHERDD

Conditionally Gather Doublewords,
Doubleword Indices

Conditionally loads doubleword values from memory using VSIB addressing with doubleword indices.
The instruction is of the form:
VPGATHERDD dest, mem32[vm32x/y], mask

The loading of each element of the destination register is conditional based on the value of the corresponding element of the mask (second source operand). If the most-significant bit of the ith element
of the mask is set, the ith element of the destination is loaded from memory using the ith address of
the array of effective addresses calculated using VSIB addressing.
The index register is treated as an array of signed 32-bit values. Doubleword elements of the destination for which the corresponding mask element is zero are not affected by the operation. If no exceptions occur, the mask register is set to zero.
Execution of the instruction can be suspended by an exception if the exception is triggered by an element other than the rightmost element loaded. When this happens, the destination register and the
mask operand may be observed as partially updated. Elements that have been loaded will have their
mask elements set to zero. If any traps or faults are pending from elements that have been loaded,
they will be delivered in lieu of the exception; in this case, the RF flag is set so that an instruction
breakpoint is not re-triggered when the instruction execution is resumed.
See Section 1.3, “VSIB Addressing,” on page 6 for a discussion of the VSIB addressing mode.
There are 128-bit and 256-bit forms of this instruction.
XMM Encoding

The destination is an XMM register. The first source operand is up to four 32-bit values located in
memory. The second source operand (the mask) is an XMM register. The index vector is the four doublewords of an XMM register. Bits [255:128] of the YMM register that corresponds to the destination
and bits [255:128] of the YMM register that corresponds to the second source (mask) operand are
cleared.
YMM Encoding

The destination is a YMM register. The first source operand is up to eight 32-bit values located in
memory. The second source operand (the mask) is a YMM register. The index vector is the eight doublewords of a YMM register.
Instruction Support
Form

Subset

VPGATHERDD

AVX2

Feature Flag
Fn0000_00007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

752

VPGATHERDD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPGATHERDD xmm1, vm32x, xmm2

C4

RXB.02

0.src2.0.01

90 /r

VPGATHERDD ymm1, vm32y, ymm2

C4

RXB.02

0.src2.1.01

90 /r

Related Instructions
VGATHERDPD, VGATHERDPS, VGATHERQPD, VGATHERQPS, VPGATHERDQ, VPGATHERQD, VPGATHERQQ
rFLAGS Affected
RF
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
A
A

A
A

A
A
A
A
A
A
A
A
A
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Alignment check, #AC
Page fault, #PF
A — AVX2 exception

Instruction Reference

A

A
A

A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
MODRM.mod = 11b
MODRM.rm ! = 100b
YMM/XMM registers specified for destination, mask, and index not unique.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

VPGATHERDD

753

AMD64 Technology

26568—Rev. 3.22—May 2018

VPGATHERDQ

Conditionally Gather Quadwords,
Doubleword Indices

Conditionally loads quadword values from memory using VSIB addressing with doubleword indices.
The instruction is of the form:
VPGATHERDQ dest, mem64[vm32x], mask

The loading of each element of the destination register is conditional based on the value of the corresponding element of the mask (second source operand). If the most-significant bit of the ith element
of the mask is set, the ith element of the destination is loaded from memory using the ith address of
the array of effective addresses calculated using VSIB addressing.
The index register is treated as an array of signed 32-bit values. Quadword elements of the destination
for which the corresponding mask element is zero are not affected by the operation. If no exceptions
occur, the mask register is set to zero.
Execution of the instruction can be suspended by an exception if the exception is triggered by an element other than the rightmost element loaded. When this happens, the destination register and the
mask operand may be observed as partially updated. Elements that have been loaded will have their
mask elements set to zero. If any traps or faults are pending from elements that have been loaded,
they will be delivered in lieu of the exception; in this case, the RF flag is set so that an instruction
breakpoint is not re-triggered when the instruction execution is resumed.
See Section 1.3, “VSIB Addressing,” on page 6 for a discussion of the VSIB addressing mode.
There are 128-bit and 256-bit forms of this instruction.
XMM Encoding

The destination is an XMM register. The first source operand is up to two 64-bit values located in
memory. The second source operand (the mask) is an XMM register. The index vector is the two
low-order doublewords of an XMM register; the two high-order doublewords of the index register are
not used. Bits [255:128] of the YMM register that corresponds to the destination and bits [255:128] of
the YMM register that corresponds to the second source (mask) operand are cleared.
YMM Encoding

The destination is a YMM register. The first source operand is up to four 64-bit values located in
memory. The second source operand (the mask) is a YMM register. The index vector is the four doublewords of an XMM register.
Instruction Support
Form

Subset

VPGATHERDQ

AVX2

Feature Flag
Fn0000_00007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

754

VPGATHERDQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPGATHERDQ xmm1, vm32x, xmm2

C4

RXB.02

1.src2.0.01

90 /r

VPGATHERDQ ymm1, vm32x, ymm2

C4

RXB.02

1.src2.1.01

90 /r

Related Instructions
VGATHERDPD, VGATHERDPS, VGATHERQPD, VGATHERQPS, VPGATHERDD, VPGATHERQD, VPGATHERQQ
rFLAGS Affected
RF
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
A
A

A
A

A
A
A
A
A
A
A
A
A
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Alignment check, #AC
Page fault, #PF
A — AVX2 exception

Instruction Reference

A

A
A

A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
MODRM.mod = 11b
MODRM.rm ! = 100b
YMM/XMM registers specified for destination, mask, and index not unique.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

VPGATHERDQ

755

AMD64 Technology

26568—Rev. 3.22—May 2018

VPGATHERQD

Conditionally Gather Doublewords,
Quadword Indices

Conditionally loads doubleword values from memory using VSIB addressing with quadword indices.
The instruction is of the form:
VPGATHERQD dest, mem32[vm64x/y], mask

The loading of each element of the destination register is conditional based on the value of the corresponding element of the mask (second source operand). If the most-significant bit of the ith element
of the mask is set, the ith element of the destination is loaded from memory using the ith address of
the array of effective addresses calculated using VSIB addressing.
The index register is treated as an array of signed 64-bit values. Doubleword elements of the destination for which the corresponding mask element is zero are not affected by the operation. If no exceptions occur, the mask register is set to zero.
Execution of the instruction can be suspended by an exception if the exception is triggered by an element other than the rightmost element loaded. When this happens, the destination register and the
mask operand may be observed as partially updated. Elements that have been loaded will have their
mask elements set to zero. If any traps or faults are pending from elements that have been loaded,
they will be delivered in lieu of the exception; in this case, the RF flag is set so that an instruction
breakpoint is not re-triggered when the instruction execution is resumed.
See Section 1.3, “VSIB Addressing,” on page 6 for a discussion of the VSIB addressing mode.
There are 128-bit and 256-bit forms of this instruction.
XMM Encoding

The destination is an XMM register. The first source operand is up to two 32-bit values located in
memory. The second source operand (the mask) is an XMM register. The index vector is the two
quadwords of an XMM register. The upper half of the destination register and the mask register are
cleared. Bits [255:128] of the YMM register that corresponds to the destination and bits [255:128] of
the YMM register that corresponds to the mask register are cleared.
YMM Encoding

The destination is an XMM register. The first source operand is up to four 32-bit values located in
memory. The second source operand (the mask) is an XMM register. The index vector is the four
quadwords of a YMM register. Bits [255:128] of the YMM register that corresponds to the destination and bits [255:128] of the YMM register that corresponds to the mask register are cleared.
Instruction Support
Form

Subset

VPGATHERQD

AVX2

Feature Flag
Fn0000_00007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

756

VPGATHERQD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPGATHERQD xmm1, vm64x, xmm2

C4

RXB.02

0.src2.0.01

91 /r

VPGATHERQD xmm1, vm64y, xmm2

C4

RXB.02

0.src2.1.01

91 /r

Related Instructions
VGATHERDPD, VGATHERDPS, VGATHERQPD, VGATHERQPS, VPGATHERDD, VPGATHERDQ, VPGATHERQQ
rFLAGS Affected
RF
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
A
A

A
A

A
A
A
A
A
A
A
A
A
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Alignment check, #AC
Page fault, #PF
A — AVX2 exception

Instruction Reference

A

A
A

A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
MODRM.mod = 11b
MODRM.rm ! = 100b
YMM/XMM registers specified for destination, mask, and index not unique.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

VPGATHERQD

757

AMD64 Technology

26568—Rev. 3.22—May 2018

VPGATHERQQ

Conditionally Gather Quadwords,
Quadword Indices

Conditionally loads quadword values from memory using VSIB addressing with quadword indices.
The instruction is of the form:
VPGATHERQQ dest, mem64[vm64x/y], mask

The loading of each element of the destination register is conditional based on the value of the corresponding element of the mask (second source operand). If the most-significant bit of the ith element
of the mask is set, the ith element of the destination is loaded from memory using the ith address of
the array of effective addresses calculated using VSIB addressing.
The index register is treated as an array of signed 64-bit values. Quadword elements of the destination
for which the corresponding mask element is zero are not affected by the operation. If no exceptions
occur, the mask register is set to zero.
Execution of the instruction can be suspended by an exception if the exception is triggered by an element other than the rightmost element loaded. When this happens, the destination register and the
mask operand may be observed as partially updated. Elements that have been loaded will have their
mask elements set to zero. If any traps or faults are pending from elements that have been loaded,
they will be delivered in lieu of the exception; in this case, the RF flag is set so that an instruction
breakpoint is not re-triggered when the instruction execution is resumed.
See Section 1.3, “VSIB Addressing,” on page 6 for a discussion of the VSIB addressing mode.
There are 128-bit and 256-bit forms of this instruction.
XMM Encoding

The destination is an XMM register. The first source operand is up to two 64-bit values located in
memory. The second source operand (the mask) is an XMM register. The index vector is the two
quadwords of an XMM register. Bits [255:128] of the YMM register that corresponds to the destination and bits [255:128] of the YMM register that corresponds to the second source (mask) operand are
cleared.
YMM Encoding

The destination is a YMM register. The first source operand is up to four 64-bit values located in
memory. The second source operand (the mask) is a YMM register. The index vector is the four quadwords of a YMM register.
Instruction Support
Form

Subset

VPGATHERQQ

AVX2

Feature Flag
Fn0000_00007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

758

VPGATHERQQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPGATHERQQ xmm1, vm64x, xmm2

C4

RXB.02

1.src2.0.01

91 /r

VPGATHERQQ ymm1, vm64y, ymm2

C4

RXB.02

1.src2.1.01

91 /r

Related Instructions
VGATHERDPD, VGATHERDPS, VGATHERQPD, VGATHERQPS, VPGATHERDD, VPGATHERDQ, VPGATHERQD
rFLAGS Affected
RF
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
A
A

A
A

A
A
A
A
A
A
A
A
A
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Alignment check, #AC
Page fault, #PF
A — AVX2 exception

Instruction Reference

A

A
A

A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
MODRM.mod = 11b
MODRM.rm ! = 100b
YMM/XMM registers specified for destination, mask, and index not unique.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

VPGATHERQQ

759

AMD64 Technology

26568—Rev. 3.22—May 2018

VPHADDBD

Packed Horizontal Add
Signed Byte to Signed Doubleword

Adds four sets of four 8-bit signed integer values of the source and packs the sign-extended sums into
the corresponding doubleword of the destination.
There are two operands: VPHADDBD dest, src
The destination is an XMM register and the source is either an XMM register or a 128-bit memory
location. Bits [255:128] of the corresponding YMM register are cleared.
Instruction Support
Form

Subset

VPHADDBD

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
VPHADDBD xmm1, xmm2/mem128

Encoding
XOP

RXB.map_select

W.vvvv.L.pp

Opcode

8F

RXB.09

0.1111.0.00

C2 /r

Related Instructions
VPHADDBW, VPHADDBQ, VPHADDWD, VPHADDWQ, VPHADDDQ
rFLAGS Affected
None
MXCSR Flags Affected
None

760

VPHADDBD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

Instruction Reference

X
X
X
X
A
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.W = 1.
XOP.vvvv ! = 1111b.
XOP.L = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPHADDBD

761

AMD64 Technology

26568—Rev. 3.22—May 2018

VPHADDBQ

Packed Horizontal Add
Signed Byte to Signed Quadword

Adds two sets of eight 8-bit signed integer values of the source and packs the sign-extended sums into
the corresponding quadword of the destination.
There are two operands: VPHADDBQ dest, src
The destination is an XMM register and the source is either an XMM register or a 128-bit memory
location. Bits [255:128] of the corresponding YMM register are cleared.
Instruction Support
Form

Subset

VPHADDBQ

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
VPHADDBQ xmm1, xmm2/mem128

Encoding
XOP

RXB.map_select

W.vvvv.L.pp

Opcode

8F

RXB.09

0.1111.0.00

C3 /r

Related Instructions
VPHADDBW, VPHADDBD, VPHADDWD, VPHADDWQ, VPHADDDQ
rFLAGS Affected
None
MXCSR Flags Affected
None

762

VPHADDBQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

Instruction Reference

X
X
X
X
A
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.W = 1.
XOP.vvvv ! = 1111b.
XOP.L = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPHADDBQ

763

AMD64 Technology

26568—Rev. 3.22—May 2018

VPHADDBW

Packed Horizontal Add
Signed Byte to Signed Word

Adds each adjacent pair of 8-bit signed integer values of the source and packs the sign-extended 16bit integer result of each addition into the corresponding word element of the destination.
There are two operands: VPHADDBW dest, src
The destination is an XMM register and the source is either an XMM register or a 128-bit memory
location. Bits [255:128] of the corresponding YMM register are cleared.
Instruction Support
Form

Subset

VPHADDBW

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
VPHADDBW xmm1, xmm2/mem128

Encoding
XOP

RXB.map_select

W.vvvv.L.pp

Opcode

8F

RXB.09

0.1111.0.00

C1 /r

Related Instructions
VPHADDBD, VPHADDBQ, VPHADDWD, VPHADDWQ, VPHADDDQ
rFLAGS Affected
None
MXCSR Flags Affected
None

764

VPHADDBW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

Instruction Reference

X
X
X
X
A
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.W = 1.
XOP.vvvv ! = 1111b.
XOP.L = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPHADDBW

765

AMD64 Technology

26568—Rev. 3.22—May 2018

VPHADDDQ

Packed Horizontal Add
Signed Doubleword to Signed Quadword

Adds each adjacent pair of signed doubleword integer values of the source and packs the signextended sums into the corresponding quadword of the destination.
There are two operands: VPHADDDQ dest, src
The source is either an XMM register or a 128-bit memory location and the destination is an XMM
register. Bits [255:128] of the corresponding YMM register are cleared.
Instruction Support
Form

Subset

VPHADDDQ

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
VPHADDDQ xmm1, xmm2/mem128

Encoding
XOP

RXB.map_select

W.vvvv.L.pp

Opcode

8F

RXB.09

0.1111.0.00

CB /r

Related Instructions
VPHADDBW, VPHADDBD, VPHADDBQ, VPHADDWD, VPHADDWQ
rFLAGS Affected
None
MXCSR Flags Affected
None

766

VPHADDDQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

Instruction Reference

X
X
X
X
A
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.W = 1.
XOP.vvvv ! = 1111b.
XOP.L = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPHADDDQ

767

AMD64 Technology

26568—Rev. 3.22—May 2018

VPHADDUBD

Packed Horizontal Add
Unsigned Byte to Doubleword

Adds four sets of four 8-bit unsigned integer values of the source and packs the sums into the corresponding doublewords of the destination.
There are two operands: VPHADDUBD dest, src
The destination is an XMM register and the source is either an XMM register or a 128-bit memory
location. Bits [255:128] of the corresponding YMM register are cleared.
Instruction Support
Form

Subset

VPHADDUBD

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
VPHADDUBD xmm1, xmm2/mem128

Encoding
XOP

RXB.map_select

W.vvvv.L.pp

Opcode

8F

RXB.09

0.1111.0.00

D2 /r

Related Instructions
VPHADDUBW, VPHADDUBQ, VPHADDUWD, VPHADDUWQ, VPHADDUDQ
rFLAGS Affected
None
MXCSR Flags Affected
None

768

VPHADDUBD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

Instruction Reference

X
X
X
X
A
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.W = 1.
XOP.vvvv ! = 1111b.
XOP.L = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPHADDUBD

769

AMD64 Technology

26568—Rev. 3.22—May 2018

VPHADDUBQ

Packed Horizontal Add
Unsigned Byte to Quadword

Adds two sets of eight 8-bit unsigned integer values from the second source and packs the sums into
the corresponding quadword of the destination.
There are two operands: VPHADDUBQ dest, src
The destination is an XMM register and the source is either an XMM register or a 128-bit memory
location. When the destination XMM register is written, bits [255:128] of the corresponding YMM
register are cleared.
Instruction Support
Form

Subset

VPHADDUBQ

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
VPHADDUBQ xmm1, xmm2/mem128

Encoding
XOP

RXB.map_select

W.vvvv.L.pp

Opcode

8F

RXB.09

0.1111.0.00

D3 /r

Related Instructions
VPHADDUBW, VPHADDUBD, VPHADDUWD, VPHADDUWQ, VPHADDUDQ
rFLAGS Affected
None
MXCSR Flags Affected
None

770

VPHADDUBQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

Instruction Reference

X
X
X
X
A
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.W = 1.
XOP.vvvv ! = 1111b.
XOP.L = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPHADDUBQ

771

AMD64 Technology

26568—Rev. 3.22—May 2018

VPHADDUBW

Packed Horizontal Add
Unsigned Byte to Word

Adds each adjacent pair of 8-bit unsigned integer values of the source and packs the 16-bit integer
sums to the corresponding word of the destination.
There are two operands: VPHADDUBW dest, src
The destination is an XMM register and the source is either an XMM register or a 128-bit memory
location. Bits [255:128] of the corresponding YMM register are cleared.
Instruction Support
Form

Subset

VPHADDUBW

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
VPHADDUBW xmm1, xmm2/mem128

Encoding
XOP

RXB.map_select

W.vvvv.L.pp

Opcode

8F

RXB.09

0.1111.0.00

D1 /r

Related Instructions
VPHADDUBD, VPHADDUBQ, VPHADDUWD, VPHADDUWQ, VPHADDUDQ
rFLAGS Affected
None
MXCSR Flags Affected
None

772

VPHADDUBW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

Instruction Reference

X
X
X
X
A
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.W = 1.
XOP.vvvv ! = 1111b.
XOP.L = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPHADDUBW

773

AMD64 Technology

26568—Rev. 3.22—May 2018

VPHADDUDQ

Packed Horizontal Add
Unsigned Doubleword to Quadword

Adds two adjacent pairs of 32-bit unsigned integer values of the source and packs the sums into the
corresponding quadword of the destination.
There are two operands: VPHADDUDQ dest, src
The destination is an XMM register and the source is either an XMM register or a 128-bit memory
location. Bits [255:128] of the corresponding YMM register are cleared.
Instruction Support
Form

Subset

VPHADDUDQ

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
VPHADDUDQ xmm1, xmm2/mem128

Encoding
XOP

RXB.map_select

W.vvvv.L.pp

Opcode

8F

RXB.09

0.1111.0.00

DB /r

Related Instructions
VPHADDUBW, VPHADDUBD, VPHADDUBQ, VPHADDUWD, VPHADDUWQ
rFLAGS Affected
None
MXCSR Flags Affected
None

774

VPHADDUDQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

Instruction Reference

X
X
X
X
A
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.W = 1.
XOP.vvvv ! = 1111b.
XOP.L = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPHADDUDQ

775

AMD64 Technology

26568—Rev. 3.22—May 2018

VPHADDUWD

Packed Horizontal Add
Unsigned Word to Doubleword

Adds four adjacent pairs of 16-bit unsigned integer values of the source and packs the sums into the
corresponding doubleword of the destination.
There are two operands: VPHADDUWD dest, src
The destination is an XMM register and the source is either an XMM register or a 128-bit memory
location. Bits [255:128] of the corresponding YMM register are cleared.
Instruction Support
Form

Subset

VPHADDUWD

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
VPHADDUWD xmm1, xmm2/mem128

Encoding
XOP

RXB.map_select

W.vvvv.L.pp

Opcode

8F

RXB.09

0.1111.0.00

D6 /r

Related Instructions
VPHADDUBW, VPHADDUBD, VPHADDUBQ, VPHADDUWQ, VPHADDUDQ
rFLAGS Affected
None
MXCSR Flags Affected
None

776

VPHADDUWD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

Instruction Reference

X
X
X
X
A
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.W = 1.
XOP.vvvv ! = 1111b.
XOP.L = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPHADDUWD

777

AMD64 Technology

26568—Rev. 3.22—May 2018

VPHADDUWQ

Packed Horizontal Add
Unsigned Word to Quadword

Adds two pairs of 16-bit unsigned integer values of the source and packs the sums into the corresponding quadword element of the destination.
There are two operands: VPHADDUWQ dest, src
The destination is an XMM register and the source is either an XMM register or a 128-bit memory
location. Bits [255:128] of the corresponding YMM register are cleared.
Instruction Support
Form

Subset

VPHADDUWQ

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
VPHADDUWQ xmm1, xmm2/mem128

Encoding
XOP

RXB.map_select

W.vvvv.L.pp

Opcode

8F

RXB.09

0.1111.0.00

D7 /r

Related Instructions
VPHADDUBW, VPHADDUBD, VPHADDUBQ, VPHADDUWD, VPHADDUDQ
rFLAGS Affected
None
MXCSR Flags Affected
None

778

VPHADDUWQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

Instruction Reference

X
X
X
X
A
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.W = 1.
XOP.vvvv ! = 1111b.
XOP.L = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPHADDUWQ

779

AMD64 Technology

26568—Rev. 3.22—May 2018

VPHADDWD

Packed Horizontal Add
Signed Word to Signed Doubleword

Adds four adjacent pairs of 16-bit signed integer values of the source and packs the sign-extended
sums to the corresponding doubleword of the destination.
There are two operands: VPHADDWD dest, src
The destination is an XMM register and the source is either an XMM register or a 128-bit memory
location. Bits [255:128] of the corresponding YMM register are cleared.
Instruction Support
Form

Subset

VPHADDWD

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
VPHADDWD xmm1, xmm2/mem128

Encoding
XOP

RXB.map_select

W.vvvv.L.pp

Opcode

8F

RXB.09

0.1111.0.00

C6 /r

Related Instructions
VPHADDBW, VPHADDBD, VPHADDBQ, VPHADDWQ, VPHADDDQ
rFLAGS Affected
None
MXCSR Flags Affected
None

780

VPHADDWD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

Instruction Reference

X
X
X
X
A
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.W = 1.
XOP.vvvv ! = 1111b.
XOP.L = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPHADDWD

781

AMD64 Technology

26568—Rev. 3.22—May 2018

VPHADDWQ

Packed Horizontal Add
Signed Word to Signed Quadword

Adds four successive pairs of 16-bit signed integer values of the source and packs the sign-extended
sums to the corresponding quadword of the destination.
There are two operands: VPHADDWQ dest, src
The destination is an XMM register and the source is either an XMM register or a 128-bit memory
location. Bits [255:128] of the corresponding YMM register are cleared.
Instruction Support
Form

Subset

VPHADDWQ

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
VPHADDWQ xmm1, xmm2/mem128

Encoding
XOP

RXB.map_select

W.vvvv.L.pp

Opcode

8F

RXB.09

0.1111.0.00

C7 /r

Related Instructions
VPHADDBW, VPHADDBD, VPHADDBQ, VPHADDWD, VPHADDDQ
rFLAGS Affected
None
MXCSR Flags Affected
None

782

VPHADDWQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

Instruction Reference

X
X
X
X
A
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.W = 1.
XOP.vvvv ! = 1111b.
XOP.L = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPHADDWQ

783

AMD64 Technology

26568—Rev. 3.22—May 2018

VPHSUBBW

Packed Horizontal Subtract
Signed Byte to Signed Word

Subtracts the most significant signed integer byte from the least significant signed integer byte of
each word element in the source and packs the sign-extended 16-bit integer differences into the destination.
There are two operands: VPHSUBBW dest, src
The destination is an XMM register and the source is either an XMM register or a 128-bit memory
location. When the destination is written, bits [255:128] of the corresponding YMM register are
cleared.
Instruction Support
Form

Subset

VPHSUBBW

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
VPHSUBBW xmm1, xmm2/mem128

Encoding
XOP

RXB.map_select

W.vvvv.L.pp

Opcode

8F

RXB.09

0.1111.0.00

E1 /r

Related Instructions
VPHSUBWD, VPHSUBDQ
rFLAGS Affected
None
MXCSR Flags Affected
None

784

VPHSUBBW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

Instruction Reference

X
X
X
X
A
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.W = 1.
XOP.vvvv ! = 1111b.
XOP.L = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPHSUBBW

785

AMD64 Technology

26568—Rev. 3.22—May 2018

VPHSUBDQ

Packed Horizontal Subtract
Signed Doubleword to Signed Quadword

Subtracts the most significant signed integer doubleword from the least significant signed integer
doubleword of each quadword in the source and packs the sign-extended 64-bit integer differences
into the corresponding quadword element of the destination.
There are two operands: VPHSUBDQ dest, src
The destination is an XMM register and the source is either an XMM register or a 128-bit memory
location. When the destination is written, bits [255:128] of the corresponding YMM register are
cleared.
Instruction Support
Form

Subset

VPHSUBDQ

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
VPHSUBDQ xmm1, xmm2/mem128

Encoding
XOP

RXB.map_select

W.vvvv.L.pp

Opcode

8F

RXB.09

0.1111.0.00

E3 /r

Related Instructions
VPHSUBBW, VPHSUBWD
rFLAGS Affected
None
MXCSR Flags Affected
None

786

VPHSUBDQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

Instruction Reference

X
X
X
X
A
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.W = 1.
XOP.vvvv ! = 1111b.
XOP.L = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPHSUBDQ

787

AMD64 Technology

26568—Rev. 3.22—May 2018

VPHSUBWD

Packed Horizontal Subtract
Signed Word to Signed Doubleword

Subtracts the most significant signed integer word from the least significant signed integer word of
each doubleword of the source and packs the sign-extended 32-bit integer differences into the destination.
There are two operands: VPHSUBWD dest, src
The destination is an XMM register and the source is either an XMM register or a 128-bit memory
location. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

VPHSUBWD

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
VPHSUBWD xmm1, xmm2/mem128

Encoding
XOP

RXB.map_select

W.vvvv.L.pp

Opcode

8F

RXB.09

0.1111.0.00

E2 /r

Related Instructions
VPHSUBBW, VPHSUBDQ
rFLAGS Affected
None
MXCSR Flags Affected
None

788

VPHSUBWD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

Instruction Reference

X
X
X
X
A
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.W = 1.
XOP.vvvv ! = 1111b.
XOP.L = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPHSUBWD

789

AMD64 Technology

VPMACSDD

26568—Rev. 3.22—May 2018

Packed Multiply Accumulate
Signed Doubleword to Signed Doubleword

Multiplies each packed 32-bit signed integer value of the first source by the corresponding value of
the second source, adds the corresponding value of the third source to the 64-bit signed integer product, and writes four 32-bit sums to the destination.
No saturation is performed on the sum. When the result of the multiplication causes non-zero values
to be set in the upper 32 bits of the 64-bit product, they are ignored. When the result of the add overflows, the carry is ignored (neither the overflow nor carry bit in rFLAGS is set). In both cases, only
the signed low-order 32 bits of the result are written to the destination.
There are four operands: VPMACSDD dest, src1, src2, src3
dest = src1* src2 + src3
The destination (dest) is an XMM register specified by ModRM.reg. When the destination is written,
bits [255:128] of the corresponding YMM register are cleared.
The first source (src1) is an XMM register specified by XOP.vvvv; the second source (src2) is either
an XMM register or a 128-bit memory location specified by the ModRM.r/m field; and the third
source (src3) is an XMM register specified by bits [7:4] of an immediate byte operand.
When the third source designates the same XMM register as the destination, the XMM register
behaves as an accumulator.
Instruction Support
Form

Subset

VPMACSDD

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
XOP RXB.map_select

VPMACSDD xmm1, xmm2, xmm3/mem128, xmm4

8F

RXB.08

W.vvvv.L.pp

Opcode

0.src1.0.00

9E /r ib

Related Instructions
VPMACSSWW, VPMACSWW, VPMACSSWD, VPMACSWD, VPMACSSDD, VPMACSSDQL,
VPMACSSDQH, VPMACSDQL, VPMACSDQH, VPMADCSSWD, VPMADCSWD
rFLAGS Affected
None
MXCSR Flags Affected
None

790

VPMACSDD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

Instruction Reference

X
X
X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.W = 1.
XOP.L = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPMACSDD

791

AMD64 Technology

26568—Rev. 3.22—May 2018

VPMACSDQH

Packed Multiply Accumulate
Signed High Doubleword to Signed Quadword

Multiplies the second 32-bit signed integer value of the first source by the corresponding value of the
second source, then adds the low-order 64-bit signed integer value of the third source to the 64-bit
signed integer product. Simultaneously, multiplies the fourth 32-bit signed integer value of the first
source by the fourth 32-bit signed integer value of the second source, then adds the high-order 64-bit
signed integer value of the third source to the 64-bit signed integer product. Writes two 64-bit sums to
the destination.
No saturation is performed on the sum. When the result of the add overflows, the carry is ignored
(neither the overflow nor carry bit in rFLAGS is set).
There are four operands: VPMACSDQH dest, src1, src2, src3
dest = src1* src2 + src3
The destination (dest) is an XMM register specified by ModRM.reg. When the destination is written,
bits [255:128] of the corresponding YMM register are cleared.
The first source (src1) is an XMM register specified by the XOP.vvvv field; the second source (src2)
is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field; and the
third source (src3) is an XMM register specified by bits [7:4] of an immediate byte operand.
When the third source designates the same XMM register as the destination, the XMM register
behaves as an accumulator.
Instruction Support
Form

Subset

VPMACSDQH

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
XOP RXB.map_select

VPMACSDQH xmm1, xmm2, xmm3/mem128, xmm4

8F

RXB.01000

W.vvvv.L.pp

Opcode

0.src1.0.00

9F /r ib

Related Instructions
VPMACSSWW, VPMACSWW, VPMACSSWD, VPMACSWD, VPMACSSDD, VPMACSDD,
VPMACSSDQL, VPMACSSDQH, VPMACSDQL, VPMADCSSWD, VPMADCSWD
rFLAGS Affected
None
MXCSR Flags Affected
None

792

VPMACSDQH

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

Instruction Reference

X
X
X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.W = 1.
XOP.L = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPMACSDQH

793

AMD64 Technology

26568—Rev. 3.22—May 2018

VPMACSDQL

Packed Multiply Accumulate
Signed Low Doubleword to Signed Quadword

Multiplies the low-order 32-bit signed integer value of the first source by the corresponding value of
the second source, then adds the low-order 64-bit signed integer value of the third source to the 64-bit
signed integer product. Simultaneously, multiplies the third 32-bit signed integer value of the first
source by the corresponding value of the second source, then adds the high-order 64-bit signed integer value of the third source to the 64-bit signed integer product. Writes two 64-bit sums to the destination register.
No saturation is performed on the sum. When the result of the add overflows, the carry is ignored
(neither the overflow nor carry bit in rFLAGS is set). Only the low-order 64 bits of each result are
written to the destination.
There are four operands: VPMACSDQL dest, src1, src2, src3
dest = src1* src2 + src3
The destination is a YMM register specified by ModRM.reg. When the destination is written, bits
[255:128] of the corresponding YMM register are cleared.
The first source (src1) is an XMM register specified by XOP.vvvv; the second source (src2) is either
an XMM register or a 128-bit memory location specified by the ModRM.r/m field; and the third
source (src3) is an XMM register specified by bits [7:4] of an immediate byte operand.
When src3 designates the same XMM register as the dest register, the XMM register behaves as an
accumulator.
Instruction Support
Form

Subset

VPMACSDQL

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
XOP RXB.map_select

VPMACSDQL xmm1, xmm2, xmm3/mem128, xmm4

8F

RXB.08

W.vvvv.L.pp

Opcode

0.src1.0.00

97 /r ib

Related Instructions
VPMACSSWW, VPMACSWW, VPMACSSWD, VPMACSWD, VPMACSSDD, VPMACSDD,
VPMACSSDQL, VPMACSSDQH, VPMACSDQH, VPMADCSSWD, VPMADCSWD
rFLAGS Affected
None
MXCSR Flags Affected
None

794

VPMACSDQL

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

Instruction Reference

X
X
X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.W = 1.
XOP.L = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPMACSDQL

795

AMD64 Technology

26568—Rev. 3.22—May 2018

VPMACSSDD

Packed Multiply Accumulate with Saturation
Signed Doubleword to Signed Doubleword

Multiplies each packed 32-bit signed integer value of the first source by the corresponding value of
the second source, then adds the corresponding packed 32-bit signed integer value of the third source
to each 64-bit signed integer product. Writes four saturated 32-bit sums to the destination.
Out of range results of the addition are saturated to fit into a signed 32-bit integer. For each packed
value of the destination, when the value is larger than the largest signed 32-bit integer, it is saturated
to 7FFF_FFFFh, and when the value is smaller than the smallest signed 32-bit integer, it is saturated
to 8000_0000h.
There are four operands: VPMACSSDD dest, src1, src2, src3
dest = src1* src2 + src3
The destination (dest) is an XMM register specified by ModRM.reg. When the destination is written,
bits [255:128] of the corresponding YMM register are cleared.
The first source (src1) is an XMM register specified by XOP.vvvv; the second source (src2) is either
an XMM register or a 128-bit memory location specified by the ModRM.r/m field; and the third
source (src3) is an XMM register specified by bits [7:4] of an immediate byte operand.
When src3 designates the same XMM register as the dest register, the XMM register behaves as an
accumulator.
Instruction Support
Form

Subset

VPMACSSDD

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
XOP RXB.map_select

VPMACSSDD xmm1, xmm2, xmm3/mem128, xmm4

8F

RXB.08

W.vvvv.L.pp

Opcode

X.src1.0.00

8E /r ib

Related Instructions
VPMACSSWW, VPMACSWW, VPMACSSWD, VPMACSWD, VPMACSDD, VPMACSSDQL,
VPMACSSDQH, VPMACSDQL, VPMACSDQH, VPMADCSSWD, VPMADCSWD
rFLAGS Affected
None
MXCSR Flags Affected
None

796

VPMACSSDD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

Instruction Reference

X
X
X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.W = 1.
XOP.L = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPMACSSDD

797

AMD64 Technology

VPMACSSDQH

26568—Rev. 3.22—May 2018

Packed Multiply Accumulate with Saturation
Signed High Doubleword to Signed Quadword

Multiplies the second 32-bit signed integer value of the first source by the corresponding value of the
second source, then adds the low-order 64-bit signed integer value of the third source to the 64-bit
signed integer product. Simultaneously, multiplies the fourth 32-bit signed integer value of the first
source by the corresponding value of the second source, then adds the high-order 64-bit signed integer value of the third source to the 64-bit signed integer product. Writes two saturated sums to the
destination.
Out of range results of the addition are saturated to fit into a signed 64-bit integer. For each packed
value of the destination, when the value is larger than the largest signed 64-bit integer, it is saturated
to 7FFF_FFFF_FFFF_FFFFh, and when the value is smaller than the smallest signed 64-bit integer, it
is saturated to 8000_0000_0000_0000h.
There are four operands: VPMACSSDQH dest, src1, src2, src3
dest = src1* src2 + src3
The destination (dest) is an XMM register specified by ModRM.reg. When the destination XMM register is written, bits [255:128] of the corresponding YMM register are cleared.
The first source (src1) is an XMM register specified by XOP.vvvv; the second source (src2) is either
an XMM register or a 128-bit memory location specified by the ModRM.r/m field; and the third
source (src3) is an XMM register specified by bits [7:4] of an immediate byte operand.
When src3 designates the same XMM register as the dest register, the XMM register behaves as an
accumulator.
Instruction Support
Form

Subset

VPMACSSDQH

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
XOP RXB.map_select W.vvvv.L.pp

VPMACSSDQH xmm1, xmm2, xmm3/mem128, xmm4

8F

RXB.08

0.src1.0.00

Opcode
8F /r ib

Related Instructions
VPMACSSWW, VPMACSWW, VPMACSSWD, VPMACSWD, VPMACSSDD, VPMACSDD,
VPMACSSDQL, VPMACSDQL, VPMACSDQH, VPMADCSSWD, VPMADCSWD
rFLAGS Affected
None
MXCSR Flags Affected
None

798

VPMACSSDQH

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

Instruction Reference

X
X
X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.W = 1.
XOP.L = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPMACSSDQH

799

AMD64 Technology

VPMACSSDQL

26568—Rev. 3.22—May 2018

Packed Multiply Accumulate with Saturation
Signed Low Doubleword to Signed Quadword

Multiplies the low-order 32-bit signed integer value of the first source by the corresponding value of
the second source, then adds the low-order 64-bit signed integer value of the third source to the 64-bit
signed integer product. Simultaneously, multiplies the third 32-bit signed integer value of the first
source by the third 32-bit signed integer value of the second source, then adds the high-order 64-bit
signed integer value of the third source to the 64-bit signed integer product. Writes two saturated
sums to the destination.
Out of range results of the addition are saturated to fit into a signed 64-bit integer. For each packed
value of the destination, when the value is larger than the largest signed 64-bit integer, it is saturated
to 7FFF_FFFF_FFFF_FFFFh, and when the value is smaller than the smallest signed 64-bit integer, it
is saturated to 8000_0000_0000_0000h.
There are four operands: VPMACSSDQL dest, src1, src2, src3
dest = src1* src2 + src3
The destination (dest) register is an XMM register specified by ModRM.reg. When the destination is
written, bits [255:128] of the corresponding YMM register are cleared.
The first source (src1) is an XMM register specified by XOP.vvvv; the second source (src2) is either
an XMM register or a 128-bit memory location specified by the ModRM.r/m field; and the third
source (src3) is an XMM register specified by bits [7:4] of an immediate byte operand.
When src3 designates the same XMM register as the dest register, the XMM register behaves as an
accumulator.
Instruction Support
Form

Subset

VPMACSSDQL

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
XOP RXB.map_select

VPMACSSDQL xmm1, xmm2, xmm3/mem128, xmm4

8F

RXB.08

W.vvvv.L.pp

Opcode

0.src1.0.00

87 /r ib

Related Instructions
VPMACSSWW, VPMACSWW, VPMACSSWD, VPMACSWD, VPMACSSDD, VPMACSDD,
VPMACSSDQH, VPMACSDQL, VPMACSDQH, VPMADCSSWD, VPMADCSWD
rFLAGS Affected
None
MXCSR Flags Affected
None

800

VPMACSSDQL

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

Instruction Reference

X
X
X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.W = 1.
XOP.L = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPMACSSDQL

801

AMD64 Technology

VPMACSSWD

26568—Rev. 3.22—May 2018

Packed Multiply Accumulate with Saturation
Signed Word to Signed Doubleword

Multiplies the odd-numbered packed 16-bit signed integer values of the first source by the corresponding values of the second source, then adds the corresponding packed 32-bit signed integer values of the third source to the 32-bit signed integer products. Writes four saturated sums to the
destination.
Out of range results of the addition are saturated to fit into a signed 32-bit integer. For each packed
value of the destination, when the value is larger than the largest signed 32-bit integer, it is saturated
to 7FFF_FFFFh, and when the value is smaller than the smallest signed 32-bit integer, it is saturated
to 8000_0000h.
There are four operands:
VPMACSSWD dest, src1, src2, src3

dest = src1* src2 + src3

The destination (dest) is an XMM register specified by ModRM.reg. When the destination XMM register is written, bits [255:128] of the corresponding YMM register are cleared.
The first source (src1) is an XMM register specified by the XOP.vvvv field; the second source (src2)
is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field; and the
third source (src3) is an XMM register specified by bits [7:4] of an immediate byte operand.
When src3 designates the same XMM register as the dest register, the XMM register behaves as an
accumulator.
Instruction Support
Form

Subset

VPMACSSWD

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
XOP RXB.map_select W.vvvv.L.pp Opcode

VPMACSSWD xmm1, xmm2, xmm3/mem128, xmm4

8F

RXB.08

0.src1.0.00

86 /r ib

Related Instructions
VPMACSSWW, VPMACSWW, VPMACSWD, VPMACSSDD, VPMACSDD, VPMACSSDQL,
VPMACSSDQH, VPMACSDQL, VPMACSDQH, VPMADCSSWD, VPMADCSWD
rFLAGS Affected
None
MXCSR Flags Affected
None

802

VPMACSSWD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

Instruction Reference

X
X
X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.W = 1.
XOP.L = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPMACSSWD

803

AMD64 Technology

VPMACSSWW

26568—Rev. 3.22—May 2018

Packed Multiply Accumulate with Saturation
Signed Word to Signed Word

Multiplies each packed 16-bit signed integer value of the first source by the corresponding packed 16bit signed integer value of the second source, then adds the corresponding packed 16-bit signed integer value of the third source to the 32-bit signed integer products. Writes eight saturated sums to the
destination.
Out of range results of the addition are saturated to fit into a signed 16-bit integer. For each packed
value of the destination, when the value is larger than the largest signed 16-bit integer, it is saturated
to 7FFFh, and when the value is smaller than the smallest signed 16-bit integer, it is saturated to
8000h.
There are four operands:
VPMACSSWW dest, src1, src2, src3

dest = src1* src2 + src3

The destination is an XMM register specified by ModRM.reg. When the destination is written, bits
[255:128] of the corresponding YMM register are cleared.
The first source (src1) is an XMM register specified by XOP.vvvv; the second source (src2) is either
an XMM register or a 128-bit memory location specified by the ModRM.r/m field; and the third
source (src3) is an XMM register specified by bits [7:4] of an immediate byte.
When src3 and dest designate the same XMM register, this register behaves as an accumulator.
Instruction Support
Form

Subset

VPMACSSWW

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
XOP RXB.map_select W.vvvv.L.pp

VPMACSSWW xmm1, xmm2, xmm3/mem128, xmm4

8F

RXB.08

X.src1.0.00

Opcode
85 /r ib

Related Instructions
VPMACSWW, VPMACSSWD, VPMACSWD, VPMACSSDD, VPMACSDD, VPMACSSDQL,
VPMACSSDQH, VPMACSDQL,VPMACSDQH, VPMADCSSWD, VPMADCSWD
rFLAGS Affected
None
MXCSR Flags Affected
None

804

VPMACSSWW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

Instruction Reference

X
X
X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.W = 1.
XOP.L = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPMACSSWW

805

AMD64 Technology

VPMACSWD

26568—Rev. 3.22—May 2018

Packed Multiply Accumulate
Signed Word to Signed Doubleword

Multiplies each odd-numbered packed 16-bit signed integer value of the first source by the corresponding value of the second source, then adds the corresponding packed 32-bit signed integer value
of the third source to the 32-bit signed integer products. Writes four 32-bit results to the destination.
When the result of the add overflows, the carry is ignored (neither the overflow nor carry bit in
rFLAGS is set). Only the low-order 32 bits of the result are written to the destination.
There are four operands: VPMACSWD dest, src1, src2, src3
dest = src1* src2 + src3
The destination (dest) register is an XMM register specified by ModRM.reg. When the destination
XMM register is written, bits [255:128] of the corresponding YMM register are cleared.
The first source (src1) is an XMM register specified by XOP.vvvv; the second source (src2) is either
an XMM register or a 128-bit memory location specified by the ModRM.r/m field; and the third
source (src3) is an XMM register specified by bits [7:4] of an immediate byte operand.
When src3 designates the same XMM register as the dest register, the XMM register behaves as an
accumulator.
Instruction Support
Form

Subset

VPMACSWD

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding

VPMACSWD xmm1, xmm2, xmm3/mem128, xmm4

XOP

RXB.map_select

W.vvvv.L.pp

Opcode

8F

RXB.08

0.src1.0.00

96 /r ib

Related Instructions
VPMACSSWW, VPMACSWW, VPMACSSWD, VPMACSSDD, VPMACSDO, VPMACSSDQL,
VPMACSSDQH, VPMACSDQL, VPMACSDQH, VPMADCSSWD, VPMADCSWD
rFLAGS Affected
None
MXCSR Flags Affected
None

806

VPMACSWD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

Instruction Reference

X
X
X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.W = 1.
XOP.L = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPMACSWD

807

AMD64 Technology

26568—Rev. 3.22—May 2018

VPMACSWW

Packed Multiply Accumulate
Signed Word to Signed Word

Multiplies each packed 16-bit signed integer value of the first source by the corresponding value of
the second source, then adds the corresponding packed 16-bit signed integer value of the third source
to each 32-bit signed integer product. Writes eight 16-bit results to the destination.
No saturation is performed on the sum. When the result of the multiplication causes non-zero values
to be set in the upper 16 bits of the 32 bit result, they are ignored. When the result of the add overflows, the carry is ignored (neither the overflow nor carry bit in rFLAGS is set). In both cases, only
the signed low-order 16 bits of the result are written to the destination.
There are four operands: VPMACSWW dest, src1, src2, src3
dest = src1* src2 + src3
The destination (dest) is an XMM register specified by ModRM.reg. When the destination XMM register is written, bits [255:128] of the corresponding YMM register are cleared.
The first source (src1) is an XMM register specified by XOP.vvvv; the second source (src2) is either
an XMM register or a 128-bit memory location specified by the ModRM.r/m field; and the third
source (src3) is an XMM register specified by bits [7:4] of an immediate byte operand.
When src3 designates the same XMM register as the dest register, the XMM register behaves as an
accumulator.
Instruction Support
Form

Subset

VPMACSWW

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
XOP RXB.map_select W.vvvv.L.pp Opcode

VPMACSWW xmm1, xmm2, xmm3/mem128, xmm4

8F

RXB.08

0.src1.0.00

95 /r ib

Related Instructions
VPMACSSWW, VPMACSSWD, VPMACSWD, VPMACSSDD, VPMACSDD, VPMACSSDQL,
VPMACSSDQH, VPMACSDQL, VPMACSDQH, VPMADCSSWD, VPMADCSWD
rFLAGS Affected
None
MXCSR Flags Affected
None

808

VPMACSWW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

Instruction Reference

X
X
X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.W = 1.
XOP.L = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPMACSWW

809

AMD64 Technology

VPMADCSSWD

26568—Rev. 3.22—May 2018

Packed Multiply Add Accumulate
with Saturation
Signed Word to Signed Doubleword

Multiplies each packed 16-bit signed integer value of the first source by the corresponding value of
the second source, then adds the 32-bit signed integer products of the even-odd adjacent words. Each
resulting sum is then added to the corresponding packed 32-bit signed integer value of the third
source. Writes four 32-bit signed-integer results to the destination.
Out of range results of the addition are saturated to fit into a signed 32-bit integer. For each packed
value of the destination, when the value is larger than the largest signed 32-bit integer, it is saturated
to 7FFF_FFFFh, and when the value is smaller than the smallest signed 32-bit integer, it is saturated
to 8000_0000h.
There are four operands: VPMADCSSWD dest, src1, src2, src3
dest = src1* src2 + src3
The destination is an XMM register specified by ModRM.reg. When the destination is written, bits
[255:128] of the corresponding YMM register are cleared.
The first source is an XMM register specified by XOP.vvvv; the second source is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field; and the third source is an
XMM register specified by bits [7:4] of an immediate byte operand.
When src3 designates the same XMM register as the dest register, the XMM register behaves as an
accumulator.
Instruction Support
Form

Subset

VPMADCSSWD

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
XOP RXB.map_select W.vvvv.L.pp

VPMADCSSWD xmm1, xmm2, xmm3/mem128, xmm4

8F

RXB.08

0.src1.0.00

Opcode
A6 /r ib

Related Instructions
VPMACSSWW, VPMACSWW, VPMACSSWD, VPMACSWD, VPMACSSDD, VPMACSDD,
VPMACSSDQL, VPMACSSDQH, VPMACSDQL, VPMACSDQH, VPMADCSWD
rFLAGS Affected
None
MXCSR Flags Affected
None

810

VPMADCSSWD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

Instruction Reference

X
X
X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.W = 1.
XOP.L = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPMADCSSWD

811

AMD64 Technology

VPMADCSWD

26568—Rev. 3.22—May 2018

Packed Multiply Add Accumulate
Signed Word to Signed Doubleword

Multiplies each packed 16-bit signed integer value of the first source by the corresponding value of
the second source, then adds the 32-bit signed integer products of the even-odd adjacent words
together and adds the sums to the corresponding packed 32-bit signed integer values of the third
source. Writes four 32-bit sums to the destination.
No saturation is performed on the sum. When the result of the addition overflows, the carry is ignored
(neither the overflow nor carry bit in rFLAGS is set). Only the signed 32-bits of the result are written
to the destination.
There are four operands: VPMADCSWD dest, src1, src2, src3
dest = src1* src2 + src3
The destination is an XMM register specified by ModRM.reg. When the destination is written, bits
[255:128] of the corresponding YMM register are cleared.
The first source is an XMM register specified by XOP.vvvv, the second source is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field; and the third source is an
XMM register specified by bits [7:4] of an immediate byte operand.
When src3 designates the same XMM register as the dest register, the XMM register behaves as an
accumulator.
Instruction Support
Form

Subset

PMADCSWD

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
XOP RXB.map_select

PMADCSWD xmm1, xmm2, xmm3/mem128, xmm4

8F

RXB.08

W.vvvv.L.pp

Opcode

0.src1.0.00

B6 /r ib

Related Instructions
VPMACSSWW, VPMACSWW, VPMACSSWD, VPMACSWD, VPMACSSDD, VPMACSDD,
VPMACSSDQL, VPMACSSDQH, VPMACSDQL, VPMACSDQH, VPMADCSSWD
rFLAGS Affected
None
MXCSR Flags Affected
None

812

VPMADCSWD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

Instruction Reference

X
X
X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.W = 1.
XOP.L = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPMADCSWD

813

AMD64 Technology

26568—Rev. 3.22—May 2018

VPMASKMOVD

Masked Move
Packed Doubleword

Moves packed doublewords from a second source operand to a destination, as specified by mask bits
in a first source operand. There are load and store versions of the instruction.
The mask bits are the most-significant bit of each doubleword in the first source operand (mask).
• For loads, when a mask bit = 1, the corresponding doubleword is copied from the source to the
same element of the destination; when a mask bit = 0, the corresponding element of the destination
is cleared.
• For stores, when a mask bit = 1, the corresponding doubleword is copied from the source to the
same element of the destination; when a mask bit = 0, the corresponding element of the destination
is not affected.
Exception and trap behavior for elements not selected for loading or storing from/to memory is
implementation dependent. For instance, a given implementation may signal a data breakpoint or a
page fault for doublewords that are zero-masked and not actually written.
This instruction provides no non-temporal access hint.
This instruction has both 128-bit and 256-bit forms:
XMM Encoding

There are load and store encodings.
• For loads, the four doublewords that make up the source operand are located in a 128-bit memory
location, the mask operand is an XMM register, and the destination is an XMM register. Bits
[255:128] of the YMM register that corresponds to the destination are cleared.
• For stores, the four doublewords that make up the source operand are located in an XMM register,
the mask operand is an XMM register, and the destination is a 128-bit memory location.
YMM Encoding

There are load and store encodings.
• For loads, the eight doublewords that make up the source operand are located in a 256-bit memory
location, the mask operand is a YMM register, and the destination is a YMM register.
• For stores, the eight doublewords that make up the source operand are located in a YMM register,
the mask operand is a YMM register, and the destination is a 256-bit memory location.
Instruction Support
Form

Subset

VPMASKMOVD

AVX2

Feature Flag
Fn0000_00007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

814

VPMASKMOVD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPMASKMOVD xmm1, xmm2, mem128

C4

RXB.02

0.src1.0.01

8C /r

VPMASKMOVD ymm1, ymm2, mem256

C4

RXB.02

0.src1.1.01

8C /r

VPMASKMOVD mem128, xmm1, xmm2

C4

RXB.02

0.src1.0.01

8E /r

VPMASKMOVD mem256, ymm1, ymm2

C4

RXB.02

0.src1.1.01

8E /r

Loads:

Stores:

Related Instructions
VPMASKMOVQ
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

A
A

A
A
A
A
A
A
A
A
A

Alignment check, #AC

A

Page fault, #PF
A — AVX2 exception

A

Instruction Reference

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

VPMASKMOVD

815

AMD64 Technology

26568—Rev. 3.22—May 2018

VPMASKMOVQ

Masked Move
Packed Quadword

Moves packed quadwords from a second source operand to a destination, as specified by mask bits in
a first source operand. There are load and store versions of the instruction.
The mask bits are the most-significant bit of each quadword in the mask first source operand (mask).
• For loads, when a mask bit = 1, the corresponding quadword is copied from the source to the same
element of the destination; when a mask bit = 0, the corresponding element of the destination is
cleared.
• For stores, when a mask bit = 1, the corresponding quadword is copied from the source to the same
element of the destination; when a mask bit = 0, the corresponding element of the destination is not
affected.
Exception and trap behavior for elements not selected for loading or storing from/to memory is
implementation dependent. For instance, a given implementation may signal a data breakpoint or a
page fault for quadwords that are zero-masked and not actually written.
This instruction provides no non-temporal access hint.
This instruction has both 128-bit and 256-bit forms:
XMM Encoding

There are load and store encodings.
• For loads, the two quadwords that make up the source operand are located in a 128-bit memory
location, the mask operand is an XMM register, and the destination is an XMM register. Bits
[255:128] of the YMM register that corresponds to the destination are cleared.
• For stores, the two quadwords that make up the source operand are located in an XMM register, the
mask operand is an XMM register, and the destination is a 128-bit memory location.
YMM Encoding

There are load and store encodings.
• For loads, the four quadwords that make up the source operand are located in a 256-bit memory
location, the mask operand is a YMM register, and the destination is a YMM register.
• For stores, the four quadwords that make up the source operand are located in a YMM register, the
mask operand is a YMM register, and the destination is a 256-bit memory location.
Instruction Support
Form

Subset

VPMASKMOVQ

AVX2

Feature Flag
Fn0000_00007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.

816

VPMASKMOVQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Instruction Encoding
Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPMASKMOVQ xmm1, xmm2, mem128

C4

RXB.02

1.src1.0.01

8C /r

VPMASKMOVQ ymm1, ymm2, mem256

C4

RXB.02

1.src1.1.01

8C /r

VPMASKMOVQ mem128, xmm1, xmm2

C4

RXB.02

1.src1.0.01

8E /r

VPMASKMOVQ mem256, ymm1, ymm2

C4

RXB.02

1.src1.1.01

8E /r

Loads:

Stores:

Related Instructions
VPMASKMOVD
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

A
A

A
A
A
A
A
A
A
A
A

Alignment check, #AC

A

Page fault, #PF
A — AVX2 exception

A

Instruction Reference

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

VPMASKMOVQ

817

AMD64 Technology

26568—Rev. 3.22—May 2018

VPPERM

Packed Permute
Bytes

Selects 16 of 32 packed bytes from two concatenated sources, applies a logical transformation to each
selected byte, then writes the byte to a specified position in the destination.
There are four operands: VPPERM dest, src1, src2, src3
The second (src2) and first (src1) sources are concatenated to form the 32-byte source.
The src1 operand is an XMM register specified by XOP.vvvv.
The third source (src3) contains 16 control bytes. Each control byte specifies the source byte and the
logical operation to perform on that byte. The order of the bytes in the destination is the same as that
of the control bytes in the src3.
For each byte of the 16-byte result, the corresponding src3 byte is used as follows:
• Bits [7:5] select a logical operation to perform on the selected byte.
Bit Value

•

Selected Operation

000

Source byte (no logical operation)

001

Invert source byte

010

Bit reverse of source byte

011

Bit reverse of inverted source byte

100

00h (zero-fill)

101

FFh (ones-fill)

110

Most significant bit of source byte replicated in all bit positions.

111

Invert most significant bit of source byte and replicate in all bit positions.

Bits [4:0] select a source byte to move from src2:src1.
Bit
Value

Source
Byte

Bit
Value

Source
Byte

Bit
Value

Source
Byte

Bit
Value

Source
Byte

00000

src1[7:0]

01000

src1[71:64]

10000

src2[7:0]

11000

src2[71:64]

00001

src1[15:8]

01001

src1[79:72]

10001

src2[15:8]

11001

src2[79:72]

00010

src1[23:16]

01010

src1[87:80]

10010

src2[23:16]

11010

src2[87:80]

00011

src1[31:24]

01011

src1[95:88]

10011

src2[31:24]

11011

src2[95:88]

00100

src1[39:32]

01100

src1[103:96]

10100

src2[39:32]

11100

src2[103:96]

00101

src1[47:40]

01101

src1[111:104]

10101

src2[47:40]

11101

src2[111:104]

00110

src1[55:48]

01110

src1[119:112]

10110

src2[55:48]

11110

src2[119:112]

00111

src1[63:56]

01111

src1[127:120]

10111

src2[63:56]

11111

src2[127:120]

XOP.W and an immediate byte (imm8) determine register configuration.
• When XOP.W = 0, src2 is either an XMM register or a 128-bit memory location specified by
ModRM.r/m and src3 is an XMM register specified by imm8[7:4].

818

VPPERM

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

•

When XOP.W = 1, src2 is an XMM register specified by imm8[7:4] and src3 is either an XMM
register or a 128-bit memory location specified by ModRM.r/m.
The destination (dest) is an XMM register specified by ModRM.reg. When the result is written to the
dest XMM register, bits [255:128] of the corresponding YMM register are cleared.
Instruction Support
Form

Subset

VPPERM

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
XOP RXB.map_select

W.vvvv.L.pp

Opcode

VPPERM xmm1, xmm2, xmm3/mem128, xmm4

8F

RXB.08

0.src1.0.00

A3 /r ib

VPPERM xmm1, xmm2, xmm3, xmm4/mem128

8F

RXB.08

1.src1.0.00

A3 /r ib

Related Instructions
VPSHUFHW, VPSHUFD, VPSHUFLW, VPSHUFW, VPERMIL2PS, VPERMIL2PD
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

Instruction Reference

X
X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.L = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPPERM

819

AMD64 Technology

26568—Rev. 3.22—May 2018

VPROTB

Packed Rotate
Bytes

Rotates each byte of the source as specified by a count operand and writes the result to the corresponding byte of the destination.
There are two versions of the instruction, one for each source of the count byte:
• VPROTB dest, src, fixed-count
• VPROTB dest, src, variable-count
For both versions of the instruction, the destination (dest) operand is an XMM register specified by
ModRM.reg.
The fixed-count version of the instruction rotates each byte of the source (src) the number of bits specified by the immediate fixed-count byte. All bytes are rotated the same amount. The source XMM
register or memory location is selected by the ModRM.r/m field.
The variable-count version of the instruction rotates each byte of the source the amount specified in
the corresponding byte element of the variable-count. Both src and variable-count are configured by
XOP.W.
• When XOP.W = 0, variable-count is an XMM register specified by XOP.vvvv and src is either an
XMM register or a 128-bit memory location specified by ModRM.r/m.
• When XOP.W = 1, variable-count is either an XMM register or a 128-bit memory location
specified by ModRM.r/m and src is an XMM register specified by XOP.vvvv.
When the count value is positive, bits are rotated to the left (toward the more significant bit positions). The bits rotated out left of the most significant bit are rotated back in at the right end (least-significant bit) of the byte.
When the count value is negative, bits are rotated to the right (toward the least significant bit positions). The bits rotated to the right out of the least significant bit are rotated back in at the left end
(most-significant bit) of the byte.
Bits [255:128] of the YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

VPROTB

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
XOP

RXB.map_select

W.vvvv.L.pp

Opcode

VPROTB xmm1, xmm2/mem128, xmm3

8F

RXB.09

0.count.0.00

90 /r

VPROTB xmm1, xmm2, xmm3/mem128

8F

RXB.09

1.src.0.00

90 /r

VPROTB xmm1, xmm2/mem128, imm8

8F

RXB.08

0.1111.0.00

C0 /r ib

820

VPROTB

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Related Instructions
VPROTW, VPROTD, VPROTQ,VPSHLB, VPSHLW, VPSHLD, VPSHLQ, VPSHAB, VPSHAW,
VPSHAD, VPSHAQ
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

Instruction Reference

X
X
X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.vvvv ! = 1111b (for immediate operand variant only)
XOP.L field = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPROTB

821

AMD64 Technology

26568—Rev. 3.22—May 2018

VPROTD

Packed Rotate
Doublewords

Rotates each doubleword of the source as specified by a count operand and writes the result to the
corresponding doubleword of the destination.
There are two versions of the instruction, one for each source of the count byte:
• VPROTD dest, src, fixed-count
• VPROTD dest, src, variable-count
For both versions of the instruction, the dest operand is an XMM register specified by ModRM.reg.
The fixed count version of the instruction rotates each doubleword of the source operand the number
of bits specified by the immediate fixed-count byte operand. All doublewords are rotated the same
amount. The src XMM register or memory location is selected by the ModRM.r/m field.
The variable count version of the instruction rotates each doubleword of the source by the amount
specified in the low order byte of the corresponding doubleword of the variable-count operand vector.
Both src and variable-count are configured by XOP.W.
• When XOP.W = 0, src is either an XMM register or a128-bit memory location specified by the
ModRM.r/m field and variable-count is an XMM register specified by XOP.vvvv.
• When XOP.W = 1, src is an XMM register specified by XOP.vvvv and variable-count is either an
XMM register or a 128-bit memory location specified by the ModRM.r/m field.
When the count value is positive, bits are rotated to the left (toward the more significant bit positions). The bits rotated out to the left of the most significant bit of each source doubleword operand
are rotated back in at the right end (least-significant bit) of the doubleword.
When the count value is negative, bits are rotated to the right (toward the least significant bit positions). The bits rotated to the right out of the least significant bit of each source doubleword operand
are rotated back in at the left end (most-significant bit) of the doubleword.
Bits [255:128] of the YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

VPROTD

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
XOP

RXB.map_select

W.vvvv.L.pp

Opcode

VPROTD xmm1, xmm2/mem128, xmm3

8F

RXB.09

0.count.0.00

92 /r

VPROTD xmm1, xmm2, xmm3/mem128

8F

RXB.09

1.src.0.00

92 /r

VPROTD xmm1, xmm2/mem128, imm8

8F

RXB.08

0.1111.0.00

C2 /r ib

822

VPROTD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Related Instructions
VPROTB, VPROTW, VPROTQ, VPSHLB, VPSHLW, VPSHLD, VPSHLQ, VPSHAB, VPSHAW,
VPSHAD, VPSHAQ
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

Instruction Reference

X
X
X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.vvvv ! = 1111b (for immediate operand variant only)
XOP.L field = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPROTD

823

AMD64 Technology

26568—Rev. 3.22—May 2018

VPROTQ

Packed Rotate
Quadwords

Rotates each quadword of the source operand as specified by a count operand and writes the result to
the corresponding quadword of the destination.
There are two versions of the instruction, one for each source of the count byte:
• VPROTQ dest, src, fixed-count
• VPROTQ dest, src, variable-count
For both versions of the instruction, the dest operand is an XMM register specified by ModRM.reg.
The fixed count version of the instruction rotates each quadword in the source the number of bits
specified by the immediate fixed-count byte operand. All quadword elements of the source are rotated
the same amount. The src XMM register or memory location is selected by the ModRM.r/m field.
The variable count version of the instruction rotates each quadword of the source the amount specified ny the low order byte of the corresponding quadword of the variable-count operand.
Both src and variable-count are configured by XOP.W.
• When XOP.W = 0, src is either an XMM register or a 128-bit memory location specified by
ModRM.r/m and variable-count is an XMM register specified by XOP.vvvv.
• When XOP.W = 1, src is an XMM register specified by XOP.vvvv and variable-count is either an
XMM register or a128-bit memory location specified by ModRM.r/m.
When the count value is positive, bits are rotated to the left (toward the more significant bit positions)
of the operand element. The bits rotated out to the left of the most significant bit of the word element
are rotated back in at the right end (least-significant bit).
When the count value is negative, operand element bits are rotated to the right (toward the least significant bit positions). The bits rotated to the right out of the least significant bit are rotated back in at
the left end (most-significant bit) of the word element.
Bits [255:128] of the YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

VPROTQ

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
XOP

RXB.map_select

W.vvvv.L.pp

Opcode

VPROTQ xmm1, xmm2/mem128, xmm3

8F

RXB.09

0.count.0.00

93 /r

VPROTQ xmm1, xmm2, xmm3/mem128

8F

RXB.09

1.src.0.00

93 /r

VPROTQ xmm1, xmm2/mem128, imm8

8F

RXB.08

0.1111.0.00

C3 /r ib

824

VPROTQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Related Instructions
VPROTB, VPROTW, VPROTD, VPSHLB, VPSHLW, VPSHLD, VPSHLQ, VPSHAB, VPSHAW,
VPSHAD, VPSHAQ
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

Instruction Reference

X
X
X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.vvvv ! = 1111b (for immediate operand variant only)
XOP.L field = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPROTQ

825

AMD64 Technology

26568—Rev. 3.22—May 2018

VPROTW

Packed Rotate
Words

Rotates each word of the source as specified by a count operand and writes the result to the corresponding word of the destination.
There are two versions of the instruction, one for each source of the count byte:
• VPROTW dest, src, fixed-count
• VPROTW dest, src, variable-count
For both versions of the instruction, the dest operand is an XMM register specified by ModRM.reg.
The fixed count version of the instruction rotates each word of the source the number of bits specified
by the immediate fixed-count byte operand. All words of the source operand are rotated the same
amount. The src XMM register or memory location is selected by the ModRM.r/m field.
The variable count version of this instruction rotates each word of the source operand by the amount
specified in the low order byte of the corresponding word of the variable-count operand.
Both src and variable-count are configured by XOP.W.
• When XOP.W = 0, src is either an XMM register or a 128-bit memory location specified by
ModRM.r/m and variable-count is an XMM register specified by XOP.vvvv.
• When XOP.W = 1, src is an XMM register specified by XOP.vvvv and variable-count is either an
XMM register or a 128-bit memory location specified by ModRM.r/m.
When the count value is positive, bits are rotated to the left (toward the more significant bit positions). The bits rotated out to the left of the most significant bit of an element are rotated back in at the
right end (least-significant bit) of the word element.
When the count value is negative, bits are rotated to the right (toward the least significant bit positions) of the element. The bits rotated to the right out of the least significant bit of an element are
rotated back in at the left end (most-significant bit) of the word element.
Bits [255:128] of the YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

VPROTW

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
XOP

RXB.map_select

W.vvvv.L.pp

Opcode

VPROTW xmm1, xmm2/mem128, xmm3

8F

RXB.09

0.count.0.00

91 /r

VPROTW xmm1, xmm2, xmm3/mem128

8F

RXB.09

1.src.0.00

91 /r

VPROTW xmm1, xmm2/mem128, imm8

8F

RXB.08

0.1111.0.00

C1 /r ib

826

VPROTW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Related Instructions
VPROTB, VPROTD, VPROTQ, VPSHLB, VPSHLW, VPSHLD, VPSHLQ, VPSHAB, VPSHAW,
VPSHAD, VPSHAQ
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

Instruction Reference

X
X
X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.vvvv ! = 1111b (for immediate operand variant only)
XOP.L field = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPROTW

827

AMD64 Technology

26568—Rev. 3.22—May 2018

VPSHAB

Packed Shift Arithmetic
Bytes

Shifts each signed byte of the source as specified by a count byte and writes the result to the corresponding byte of the destination.
The count bytes are 8-bit signed two's-complement values in the corresponding bytes of the count
operand.
When the count value is positive, bits are shifted to the left (toward the more significant bit positions).
Zeros are shifted in at the right end (least-significant bit) of the byte.
When the count value is negative, bits are shifted to the right (toward the least significant bit positions). The most significant bit (sign bit) is replicated and shifted in at the left end (most-significant
bit) of the byte.
There are three operands: VPSHAB dest, src, count
The destination (dest) is an XMM register specified by ModRM.reg.
Both src and count are configured by XOP.W.
• When XOP.W = 0, count is an XMM register specified by XOP.vvvv and src is either an XMM
register or a128-bit memory location specified by ModRM.r/m.
• When XOP.W = 1, count is either an XMM register or a 128-bit memory location specified by
ModRM.r/m and src is an XMM register specified by XOP.vvvv.
Bits [255:128] of the YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

VPSHAB

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
XOP

RXB.map_select

W.vvvv.L.pp

Opcode

VPSHAB xmm1, xmm2/mem128, xmm3

8F

RXB.09

0.count.0.00

98 /r

VPSHAB xmm1, xmm2, xmm3/mem128

8F

RXB.09

1.src.0.00

98 /r

Related Instructions
VPROTB, VPROTW, VPROTD, VPROTQ, VPSHLB, VPSHLW, VPSHLD, VPSHLQ, VPSHAW,
VPSHAD, VPSHAQ
rFLAGS Affected
None
MXCSR Flags Affected
None
828

VPSHAB

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

Instruction Reference

X
X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.L = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPSHAB

829

AMD64 Technology

26568—Rev. 3.22—May 2018

VPSHAD

Packed Shift Arithmetic
Doublewords

Shifts each signed doubleword of the source operand as specified by a count byte and writes the result
to the corresponding doubleword of the destination.
The count bytes are 8-bit signed two's-complement values located in the low-order byte of the corresponding doubleword of the count operand.
When the count value is positive, bits are shifted to the left (toward the more significant bit positions).
Zeros are shifted in at the right end (least-significant bit) of the doubleword.
When the count value is negative, bits are shifted to the right (toward the least significant bit positions). The most significant bit (sign bit) is replicated and shifted in at the left end (most-significant
bit) of the doubleword.
There are three operands: VPSHAD dest, src, count
The destination (dest) is an XMM register specified by ModRM.reg.
Both src and count are configured by XOP.W.
• When XOP.W = 0, count is an XMM register specified by XOP.vvvv and src is either an XMM
register or a memory location specified by ModRM.r/m.
• When XOP.W = 1, count is either an XMM register or a memory location specified by
ModRM.r/m and src is an XMM register specified by XOP.vvvv.
Bits [255:128] of the YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

VPSHAD

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
XOP

RXB.map_select

W.vvvv.L.pp

Opcode

VPSHAD xmm1, xmm2/mem128, xmm3

8F

RXB.09

0.count.0.00

9A /r

VPSHAD xmm1, xmm2, xmm3/mem128

8F

RXB.09

1.src.0.00

9A /r

Related Instructions
VPROTB, VPROTW, VPROTD, VPROTQ, VPSHLB, VPSHLW, VPSHLD, VPSHLQ, VPSHAB,
VPSHAW, VPSHAQ
rFLAGS Affected
None
MXCSR Flags Affected
None
830

VPSHAD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

Instruction Reference

X
X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.L = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPSHAD

831

AMD64 Technology

26568—Rev. 3.22—May 2018

VPSHAQ

Packed Shift Arithmetic
Quadwords

Shifts each signed quadword of the source as specified by a count byte and writes the result to the corresponding quadword of the destination.
The count bytes are 8-bit signed two's-complement values located in the low-order byte of the corresponding quadword element of the count operand.
When the count value is positive, bits are shifted to the left (toward the more significant bit positions).
Zeros are shifted in at the right end (least-significant bit) of the quadword.
When the count value is negative, bits are shifted to the right (toward the least significant bit positions). The most significant bit is replicated and shifted in at the left end (most-significant bit) of the
quadword.
The shift amount is stored in two’s-complement form. The count is modulo 64.
There are three operands: VPSHAQ dest, src, count
The destination (dest) is an XMM register specified by ModRM.reg.
Both src and count are configured by XOP.W.
• When XOP.W = 0, count is an XMM register specified by XOP.vvvv and src is either an XMM
register or a memory location specified by ModRM.r/m.
• When XOP.W = 1, count is either an XMM register or a memory location specified by
ModRM.r/m and src is an XMM register specified by XOP.vvvv.
Bits [255:128] of the YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

VPSHAQ

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
XOP

RXB.map_select

W.vvvv.L.pp

Opcode

VPSHAQ xmm1, xmm2/mem128, xmm3

8F

RXB.09

0.count.0.00

9B /r

VPSHAQ xmm1, xmm2, xmm3/mem128

8F

RXB.09

1.src.0.00

9B /r

Related Instructions
VPROTB, VPROTW, VPROTD, VPROTQ, VPSHLB, VPSHLW, VPSHLD, VPSHLQ, VPSHAB,
VPSHAW, VPSHAD

832

VPSHAQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

Instruction Reference

X
X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.L = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPSHAQ

833

AMD64 Technology

26568—Rev. 3.22—May 2018

VPSHAW

Packed Shift Arithmetic
Words

Shifts each signed word of the source as specified by a count byte and writes the result to the corresponding word of the destination.
The count bytes are 8-bit signed two's-complement values located in the low-order byte of the corresponding word of the count operand.
When the count value is positive, bits are shifted to the left (toward the more significant bit positions).
Zeros are shifted in at the right end (least-significant bit) of the word.
When the count value is negative, bits are shifted to the right (toward the least significant bit positions). The most significant bit (signed bit) is replicated and shifted in at the left end (most-significant
bit) of the word.
The shift amount is stored in two’s-complement form. The count is modulo 16.
There are three operands: VPSHAW dest, src, count
The destination (dest) is an XMM register specified by ModRM.reg.
Both src and count are configured by XOP.W.
• When XOP.W = 0, count is an XMM register specified by XOP.vvvv and src is either an XMM
register or a memory location specified by ModRM.r/m.
• When XOP.W = 1, count is either an XMM register or a memory location specified by
ModRM.r/m and src is an XMM register specified by XOP.vvvv.
Bits [255:128] of the YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

VPSHAW

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
XOP

RXB.map_select

W.vvvv.L.pp

Opcode

VPSHAW xmm1, xmm2/mem128, xmm3

8F

RXB.09

0.count.0.00

99 /r

VPSHAW xmm1, xmm2, xmm3/mem128

8F

RXB.09

1.src.0.00

99 /r

Related Instructions
VPROTB, VPROTW, VPROTD, VPROTQ, VPSHLB, VPSHLW, VPSHLD, VPSHLQ, VPSHAB,
VPSHAD, VPSHAQ
rFLAGS Affected
None

834

VPSHAW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

Instruction Reference

X
X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.L = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPSHAW

835

AMD64 Technology

26568—Rev. 3.22—May 2018

VPSHLB

Packed Shift Logical
Bytes

Shifts each packed byte of the source as specified by a count byte and writes the result to the corresponding byte of the destination.
The count bytes are 8-bit signed two's-complement values located in the corresponding byte element
of the count operand.
When the count value is positive, bits are shifted to the left (toward the more significant bit positions).
Zeros are shifted in at the right end (least-significant bit) of the byte.
When the count value is negative, bits are shifted to the right (toward the least significant bit positions). Zeros are shifted in at the left end (most-significant bit) of the byte.
There are three operands: VPSHLB dest, src, count
The destination (dest) is an XMM register specified by ModRM.reg.
Both src and count are configured by XOP.W.
• When XOP.W = 0, count is an XMM register specified by XOP.vvvv and src is either an XMM
register or a memory location specified by ModRM.r/m.
• When XOP.W = 1, count is either an XMM register or a memory location specified by
ModRM.r/m and src is an XMM register specified by XOP.vvvv.
Bits [255:128] of the YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

VPSHLB

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
XOP

RXB.map_select

W.vvvv.L.pp

Opcode

VPSHLB xmm1, xmm2/mem128, xmm3

8F

RXB.09

0.count.0.00

94 /r

VPSHLB xmm1, xmm2, xmm3/mem128

8F

RXB.09

1.src.0.00

94 /r

Related Instructions
VPROTB, VPROTW, VPROTD, VPROTQ, VPSHLW, VPSHLD, VPSHLQ, VPSHAB, VPSHAW,
VPSHAD, VPSHAQ
rFLAGS Affected
None
MXCSR Flags Affected
None

836

VPSHLB

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

Instruction Reference

X
X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.L = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPSHLB

837

AMD64 Technology

26568—Rev. 3.22—May 2018

VPSHLD

Packed Shift Logical
Doublewords

Shifts each doubleword of the source operand as specified by a count byte and writes the result to the
corresponding doubleword of the destination.
The count bytes are 8-bit signed two's-complement values located in the low-order byte of the corresponding doubleword element of the count operand.
When the count value is positive, bits are shifted to the left (toward the more significant bit positions).
Zeros are shifted in at the right end (least-significant bit) of the doubleword.
When the count value is negative, bits are shifted to the right (toward the least significant bit positions). Zeros are shifted in at the left end (most-significant bit) of the doubleword.
The shift amount is stored in two’s-complement form. The count is modulo 32.
There are three operands: VPSHLD dest, src, count
The destination (dest) is an XMM register specified by ModRM.reg.
Both src and count are configured by XOP.W.
• When XOP.W = 0, count is an XMM register specified by XOP.vvvv and src is either an XMM
register or a memory location specified by ModRM.r/m.
• When XOP.W = 1, count is either an XMM register or a memory location specified by
ModRM.r/m and src is an XMM register specified by XOP.vvvv.
Bits [255:128] of the YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

VPSHLD

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
XOP

RXB.map_select

W.vvvv.L.pp

Opcode

VPSHLD xmm1, xmm3/mem128, xmm2

8F

RXB.09

0.count.0.00

96 /r

VPSHLD xmm1, xmm2, xmm3/mem128

8F

RXB.09

1.src.0.00

96 /r

Related Instructions
VPROTB, VPROTW, VPROTD, VPROTQ, VPSHLB, VPSHLW, VPSHLQ, VPSHAB, VPSHAW,
VPSHAD, VPSHAQ
rFLAGS Affected
None

838

VPSHLD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

Instruction Reference

X
X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.L = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPSHLD

839

AMD64 Technology

26568—Rev. 3.22—May 2018

VPSHLQ

Packed Shift Logical
Quadwords

Shifts each quadwords of the source by as specified by a count byte and writes the result in the corresponding quadword of the destination.
The count bytes are 8-bit signed two's-complement values located in the low-order byte of the corresponding quadword element of the count operand.
Bit 6 of the count byte is ignored.
When the count value is positive, bits are shifted to the left (toward the more significant bit positions).
Zeros are shifted in at the right end (least-significant bit) of the quadword.
When the count value is negative, bits are shifted to the right (toward the least significant bit positions). Zeros are shifted in at the left end (most-significant bit) of the quadword.
There are three operands: VPSHLQ dest, src, count
The destination (dest) is an XMM register specified by ModRM.reg.
Both src and count are configured by XOP.W.
• When XOP.W = 0, count is an XMM register specified by XOP.vvvv and src is either an XMM
register or a memory location specified by ModRM.r/m.
• When XOP.W = 1, count is either an XMM register or a memory location specified by
ModRM.r/m and src is an XMM register specified by XOP.vvvv.
Bits [255:128] of the YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

VPSHLQ

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
XOP

RXB.map_select

W.vvvv.L.pp

Opcode

VPSHLQ xmm1, xmm3/mem128, xmm2

8F

RXB.09

0.count.0.00

97 /r

VPSHLQ xmm1, xmm2, xmm3/mem128

8F

RXB.09

1.src.0.00

97 /r

Related Instructions
VPROTB, VPROTW, VPROTD, VPROTQ, VPSHLB, VPSHLW, VPSHLD, VPSHAB, VPSHAW,
VPSHAD, VPSHAQ
rFLAGS Affected
None

840

VPSHLQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

Instruction Reference

X
X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.L = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPSHLQ

841

AMD64 Technology

26568—Rev. 3.22—May 2018

VPSHLW

Packed Shift Logical
Words

Shifts each word of the source operand as specified by a count byte and writes the result to the corresponding word of the destination.
The count bytes are 8-bit signed two's-complement values located in the low-order byte of the corresponding word element of the count operand.
When the count value is positive, bits are shifted to the left (toward the more significant bit positions).
Zeros are shifted in at the right end (least-significant bit) of the word.
When the count value is negative, bits are shifted to the right (toward the least significant bit positions). Zeros are shifted in at the left end (most-significant bit) of the word.
There are three operands: VPSHLW dest, src, count
The destination (dest) is an XMM register specified by ModRM.reg.
Both src and count are configured by XOP.W.
• When XOP.W = 0, count is an XMM register specified by XOP.vvvv and src is either an XMM
register or a memory location specified by ModRM.r/m.
• When XOP.W = 1, count is either an XMM register or a memory location specified by
ModRM.r/m and src is an XMM register specified by XOP.vvvv.
Bits [255:128] of the YMM register that corresponds to the destination are cleared.
Instruction Support
Form

Subset

VPSHLW

XOP

Feature Flag
CPUID Fn8000_0001_ECX[XOP] (bit 11)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
XOP

RXB.map_select

W.vvvv.L.pp

Opcode

VPSHLW xmm1, xmm3/mem128, xmm2

8F

RXB.09

0.count.0.00

95 /r

VPSHLW xmm1, xmm2, xmm3/mem128

8F

RXB.09

1.src.0.00

95 /r

Related Instructions
VPROTB, VPROLW, VPROTD, VPROTQ, VPSHLB, VPSHLD, VPSHLQ, VPSHAB, VPSHAW,
VPSHAD, VPSHAQ
rFLAGS Affected
None
MXCSR Flags Affected
None

842

VPSHLW

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

Instruction Reference

X
X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.L = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

VPSHLW

843

AMD64 Technology

26568—Rev. 3.22—May 2018

VPSLLVD

Variable Shift Left Logical
Doublewords

Left-shifts the bits of each doubleword in the first source operand by a count specified in the corresponding doubleword of a second source operand and writes the shifted values to the destination.
The second source operand is treated as an array of unsigned 32-bit integers. Each integer specifies
the shift count of the corresponding doubleword of the first source operand. Each doubleword is
shifted independently.
Low-order bits emptied by shifting are cleared. High-order bits shifted out of each doubleword are
discarded. When the shift count for any doubleword is greater than 31, that doubleword is cleared in
the destination.
This instruction has 128-bit and 256-bit encodings:
XMM Encoding

The first source operand is an XMM register. The shift count array is specified by either a second
XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of
the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The shift count array is specified by either a second
YMM register or a 256-bit memory location. The destination is a YMM register.
Instruction Support
Form

Subset

VPSLLVD

AVX2

Feature Flag
CPUID Fn0000_00007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPSLLVD xmm1, xmm2, xmm3/mem128

C4

RXB.02

0.src1.0.01

47 /r

VPSLLVD ymm1, ymm2, ymm3/mem256

C4

RXB.02

0.src1.1.01

47 /r

Related Instructions
(V)PSLLD, (V)PSLLDQ, (V)PSLLQ, (V)PSLLW, (V)PSRAD, (V)PSRAW, (V)PSRLD,
(V)PSRLDQ, (V)PSRLQ, (V)PSRLW, VPSLLVQ, VPSRAVD, VPSRLVD, VPSRLVQ
rFLAGS Affected
None
MXCSR Flags Affected
None

844

VPSLLVD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

A
A

A
A
A
A
A
A
A
A
A

Alignment check, #AC

A

Page fault, #PF
A — AVX2 exception

A

Instruction Reference

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

VPSLLVD

845

AMD64 Technology

26568—Rev. 3.22—May 2018

VPSLLVQ

Variable Shift Left Logical
Quadwords

Left-shifts the bits of each quadword in the first source operand by a count specified in the corresponding quadword of a second source operand and writes the shifted values to the destination.
The second source operand is treated as an array of unsigned 64-bit integers. Each integer specifies
the shift count of the corresponding quadword of the first source operand. Each quadword is shifted
independently.
Low-order bits emptied by shifting are cleared. High-order bits shifted out of each quadword are discarded. When the shift count for any quadword is greater than 63, that quadword is cleared in the destination.
This instruction has 128-bit and 256-bit encodings:
XMM Encoding

The first source operand is an XMM register. The shift count array is specified by either a second
XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of
the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The shift count array is specified by either a second
YMM register or a 256-bit memory location. The destination is a YMM register.
Instruction Support
Form

Subset

VPSLLVQ

AVX2

Feature Flag
CPUID Fn0000_00007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPSLLVQ xmm1, xmm2, xmm3/mem128

C4

RXB.02

1.src1.0.01

47 /r

VPSLLVQ ymm1, ymm2, ymm3/mem256

C4

RXB.02

1.src1.1.01

47 /r

Related Instructions
(V)PSLLD, (V)PSLLDQ, (V)PSLLQ, (V)PSLLW, (V)PSRAD, (V)PSRAW, (V)PSRLD,
(V)PSRLDQ, (V)PSRLQ, (V)PSRLW, VPSLLVD, VPSRAVD, VPSRLVD, VPSRLVQ
rFLAGS Affected
None
MXCSR Flags Affected
None

846

VPSLLVQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

A
A

A
A
A
A
A
A
A
A
A

Alignment check, #AC

A

Page fault, #PF
A — AVX2 exception

A

Instruction Reference

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

VPSLLVQ

847

AMD64 Technology

26568—Rev. 3.22—May 2018

VPSRAVD

Variable Shift Right Arithmetic
Doublewords

Performs a right arithmetic shift of each signed 32-bit integer in the first source operand by a count
specified in the corresponding doubleword of a second source operand and writes the shifted values
to the destination.
The second source operand is treated as an array of unsigned 32-bit integers. Each integer specifies
the shift count of the corresponding doubleword of the first source operand. Each doubleword is
shifted independently.
A copy of the sign bit is shifted into the most-significant bit of the element on each right-shift. Loworder bits shifted out of each element are discarded. If a doubleword contains a positive integer and
the shift count is greater than 31, that doubleword is cleared in the destination. If a doubleword contains a negative integer and the shift count is greater than 31, that doubleword is set to -1 in the destination.
This instruction has 128-bit and 256-bit encodings:
XMM Encoding

The first source operand is an XMM register. The shift count array is specified by either a second
XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of
the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The shift count array is specified by either a second
YMM register or a 256-bit memory location. The destination is a YMM register.
Instruction Support
Form

Subset

VPSRAVD

AVX2

Feature Flag
CPUID Fn0000_00007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPSRAVD xmm1, xmm2, xmm3/mem128

C4

RXB.02

0.src1.0.01

46 /r

VPSRAVD ymm1, ymm2, ymm3/mem256

C4

RXB.02

0.src1.1.01

46 /r

Related Instructions
(V)PSLLD, (V)PSLLDQ, (V)PSLLQ, (V)PSLLW, (V)PSRAD, (V)PSRAW, (V)PSRLD,
(V)PSRLDQ, (V)PSRLQ, (V)PSRLW, VPSLLVD, VPSLLVQ, VPSRLVD, VPSRLVQ
rFLAGS Affected
None

848

VPSRAVD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

A
A

A
A
A
A
A
A
A
A
A
A

Alignment check, #AC

A

Page fault, #PF
A — AVX2 exception

A

Instruction Reference

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

VPSRAVD

849

AMD64 Technology

26568—Rev. 3.22—May 2018

VPSRLVD

Variable Shift Right Logical
Doublewords

Right-shifts each doubleword in the first source operand by a count specified in the corresponding
doubleword of a second source operand and writes the shifted values to the destination.
The second source operand is treated as an array of unsigned 32-bit integers. Each integer specifies
the shift count of the corresponding doubleword of the first source operand. Each doubleword is
shifted independently.
Zero is shifted into the most-significant bit of the element on each right-shift. Low-order bits shifted
out of each element are discarded. If the shift count for any doubleword is greater than 31, that doubleword is cleared in the destination.
This instruction has 128-bit and 256-bit encodings:
XMM Encoding

The first source operand is an XMM register. The shift count array is specified by either a second
XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of
the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The shift count array is specified by either a second
YMM register or a 256-bit memory location. The destination is a YMM register.
Instruction Support
Form

Subset

VPSRLVD

AVX2

Feature Flag
CPUID Fn0000_00007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPSRLVD xmm1, xmm2, xmm3/mem128

C4

RXB.02

0.src1.0.01

45 /r

VPSRLVD ymm1, ymm2, ymm3/mem256

C4

RXB.02

0.src1.1.01

45 /r

Related Instructions
(V)PSLLD, (V)PSLLDQ, (V)PSLLQ, (V)PSLLW, (V)PSRAD, (V)PSRAW, (V)PSRLD,
(V)PSRLDQ, (V)PSRLQ, (V)PSRLW, VPSLLVD, VPSLLVQ, VPSRAVD, VPSRLVQ
rFLAGS Affected
None
MXCSR Flags Affected
None

850

VPSRLVD

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

A
A

A
A
A
A
A
A
A
A
A

Alignment check, #AC

A

Page fault, #PF
A — AVX2 exception

A

Instruction Reference

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

VPSRLVD

851

AMD64 Technology

26568—Rev. 3.22—May 2018

VPSRLVQ

Variable Shift Right Logical
Quadwords

Right-shifts each quadword in the first source operand by a count specified in the corresponding
quadword of a second source operand and writes the shifted values to the destination.
The second source operand is treated as an array of unsigned 64-bit integers. Each integer specifies
the shift count of the corresponding quadword of the first source operand. Each quadword is shifted
independently.
Zero is shifted into the most-significant bit of the element on each right-shift. Low-order bits shifted
out of each element are discarded. If the shift count for any quadword is greater than 63, that quadword is cleared in the destination.
This instruction has 128-bit and 256-bit encodings:
XMM Encoding

The first source operand is an XMM register. The shift count array is specified by either a second
XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of
the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register. The shift count array is specified by either a second
YMM register or a 256-bit memory location. The destination is a YMM register.
Instruction Support
Form

Subset

VPSRLVQ

AVX2

Feature Flag
CPUID Fn0000_00007_EBX[AVX2]_x0 (bit 5)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VPSRLVQ xmm1, xmm2, xmm3/mem128

C4

RXB.02

1.src1.0.01

45 /r

VPSRLVQ ymm1, ymm2, ymm3/mem256

C4

RXB.02

1.src1.1.01

45 /r

Related Instructions
(V)PSLLD, (V)PSLLDQ, (V)PSLLQ, (V)PSLLW, (V)PSRAD, (V)PSRAW, (V)PSRLD,
(V)PSRLDQ, (V)PSRLQ, (V)PSRLW, VPSLLVD, VPSLLVQ, VPSRAVD, VPSRLVD
rFLAGS Affected
None
MXCSR Flags Affected
None

852

VPSRLVQ

Instruction Reference

26568—Rev. 3.22—May 2018

AMD64 Technology

Exceptions
Exception

Mode
Real Virt Prot
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

A
A

A
A
A
A
A
A
A
A
A

Alignment check, #AC

A

Page fault, #PF
A — AVX2 exception

A

Instruction Reference

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

VPSRLVQ

853

26568—Rev. 3.22—May 2018

AMD64 Technology

VTESTPD

Packed Bit Test

Performs two different logical operations on the sign bits of the first and second packed floating-point
operands and updates the ZF and CF flags based on the results.
First, performs a bitwise AND of the sign bits of each double-precision floating-point element of the
first source operand with the sign bits of the corresponding elements of the second source operand.
Sets rFLAGS.ZF when all bit operations = 0; else, clears ZF.
Second, performs a bitwise AND of the complements (NOT) of the sign bits of each double-precision
floating-point element of the first source with the sign bits of the corresponding elements of the second source operand. Sets rFLAGS.CF when all bit operations = 0; else, clears CF.
Neither source operand is modified.
This extended-form instruction has both 128-bit and 256-bit encoding.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location.
Instruction Support
Form

Subset

VTESTPD

AVX

Feature Flag
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
VEX RXB.map_select

W.vvvv.L.pp

Opcode

VTESTPD xmm1, xmm2/mem128

C4

RXB.02

0.1111.0.01

0F /r

VTESTPD ymm1, ymm2/mem256

C4

RXB.02

0.1111.1.01

0F /r

Related Instructions
PTEST, VTESTPS

Instruction Reference

VTESTPD

854

26568—Rev. 3.22—May 2018

AMD64 Technology

rFLAGS Affected
ID

VIP

VIF

AC

VM

RF

NT

IOPL

OF

DF

IF

TF

0
21
Note:

20

19

18

17

16

14

13:12

11

10

9

8

SF

ZF

AF

PF

CF

M

M

M

M

M

7

6

4

2

0

Bits 31:22, 15, 5, 3 and 1 are reserved. A flag set or cleared is M (modified). Unaffected flags are blank. Undefined
flags are U.

MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
X
X
X

X
X
X
X

X
X
X
X

X
X
X
X

S

S

S

X

A
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Alignment check, #AC
Page fault, #PF
X — AVX exception

Instruction Reference

X
X
X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

VTESTPD

855

26568—Rev. 3.22—May 2018

AMD64 Technology

VTESTPS

Packed Bit Test

Performs two different logical operations on the sign bits of the first and second packed floating-point
operands and updates the ZF and CF flags based on the results.
First, performs a bitwise AND of the sign bits of each single-precision floating-point element of the
first source operand with the sign bits of the corresponding elements of the second source operand.
Sets rFLAGS.ZF when all bit operations = 0; else, clears ZF.
Second, performs a bitwise AND of the complements (NOT) of the sign bits of each single-precision
floating-point element of the first source with the sign bits of the corresponding elements of the second source operand. Sets rFLAGS.CF when all bit operations = 0; else, clears CF.
Neither source operand is modified.
This extended-form instruction has both 128-bit and 256-bit encoding.
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location.
YMM Encoding

The first source operand is a YMM register. The second source operand is either a YMM register or a
256-bit memory location.
Instruction Support
Form

Subset

VTESTPS

AVX

Feature Flag
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VTESTPS xmm1, xmm2/mem128

C4

RXB.02

0.1111.0.01

0E /r

VTESTPS ymm1, ymm2/mem256

C4

RXB.02

0.1111.1.01

0E /r

Related Instructions
PTEST, VTESTPD

Instruction Reference

VTESTPS

856

26568—Rev. 3.22—May 2018

AMD64 Technology

rFLAGS Affected
ID

VIP

VIF

AC

VM

RF

NT

IOPL

OF

DF

IF

TF

0
21
Note:

20

19

18

17

16

14

13:12

11

10

9

8

SF

ZF

AF

PF

CF

M

M

M

M

M

7

6

4

2

0

Bits 31:22, 15, 5, 3 and 1 are reserved. A flag set or cleared is M (modified). Unaffected flags are blank. Undefined
flags are U.

MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
X
X
X
X

X
X
X
X

X
X
X
X

X
X
X
X

S

S

S

X

A
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Alignment check, #AC
Page fault, #PF
X — AVX exception

Instruction Reference

X
X
X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

VTESTPS

857

AMD64 Technology

26568—Rev. 3.22—May 2018

VZEROALL

Zero
All YMM Registers

Clears all YMM registers.
In 64-bit mode, YMM0–15 are all cleared (set to all zeros). In legacy and compatibility modes, only
YMM0–7 are cleared. The contents of the MXCSR is unaffected.
Instruction Support
Form

Subset

VZEROALL

AVX

Feature Flag
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding

VZEROALL

VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.01

X.1111.1.00

77

Related Instructions
VZEROUPPER
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
A
A

Invalid opcode, #UD

Device not available, #NM
A — AVX exception.

858

A
A
A
A
A
A
A
A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.

26568—Rev. 3.22—May 2018

AMD64 Technology

VZEROUPPER

Zero
All YMM Registers Upper

Clears the upper octword of all YMM registers. The corresponding XMM registers (lower octword of
each YMM register) are not affected.
In 64-bit mode, the instruction operates on registers YMM0–15. In legacy and compatibility mode,
the instruction operates on YMM0–7. The contents of the MXCSR is unaffected.
Instruction Support
Form

Subset

VZEROUPPER

AVX

Feature Flag
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Encoding

VZEROUPPER

VEX

RXB.map_select

W.vvvv.L.pp

Opcode

C4

RXB.01

X.1111.0.00

77

Related Instructions
VZEROUPPER
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot
A
A

Invalid opcode, #UD

Device not available, #NM
A — AVX exception.

A
A
A
A
A
A
A
A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.

859

AMD64 Technology

26568—Rev. 3.22—May 2018

XGETBV

Get Extended Control Register Value

Copies the content of the extended control register (XCR) specified by the ECX register into the
EDX:EAX register pair. The high-order 32 bits of the XCR are loaded into EDX and the low-order 32
bits are loaded into EAX. The corresponding high-order 32 bits of RAX and RDX are cleared.
This instruction and associated data structures extend the FXSAVE/FXRSTOR memory image used
to manage processor states and provide additional functionality. See the XSAVE instruction description for more information.
Values returned to EDX:EAX in unimplemented bit locations are undefined.
Specifying a reserved or unimplemented XCR in ECX causes a general protection exception.
Currently, only XCR0 (the XFEATURE_ENABLED_MASK register) is supported. If CPUID reports
support for ECX=1 (see table below), then the XGETBV instruction supports an ECX value of 1.
When ECX=1, XGETBV returns the logical and of XCR0 and the current value of the XINUSE statecomponent bitmap.
Instruction Support
Form
XGETBV
XGETBV

Subset

Feature Flag

XSAVE/XRSTOR CPUID Fn0000_0001_ECX[XSAVE] (bit 26)
ECX=1 support

CPUID Fn0000_000D_EAX_x1[2] = 1

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

XGETBV

0F 01 D0

Description
Copies content of the XCR specified by ECX into
EDX:EAX.

Related Instructions
RDMSR, XSETBV
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception
Invalid opcode, #UD
General protection, #GP
X — exception generated

860

Mode
Real Virt Prot
X
X
X
X

X
X
X
X

X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
Lock prefix (F0h) preceding opcode.
CR4.OSXSAVE = 0
ECX specifies a reserved or unimplemented XCR address.

26568—Rev. 3.22—May 2018

AMD64 Technology

XORPD
VXORPD

XOR
Packed Double-Precision Floating-Point

Performs bitwise XOR of two packed double-precision floating-point values in the first source operand with the corresponding values of the second source operand and writes the results into the corresponding elements of the destination.
There are legacy and extended forms of the instruction:
XORPD

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VXORPD

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register and the second source operand is either a YMM register
or a 256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

XORPD

SSE2

VXORPD

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
XORPD xmm1, xmm2/mem128

Opcode

Description

66 0F 57 /r

Performs bitwise XOR of two packed double-precision
floating-point values in xmm1 with corresponding values in
xmm2 or mem128. Writes the result to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VXORPD xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.01

57 /r

VXORPD ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.01

57 /r

Related Instructions
(V)ANDNPS, (V)ANDPD, (V)ANDPS, (V)ORPD, (V)ORPS, (V)XORPS

861

AMD64 Technology

26568—Rev. 3.22—May 2018

rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

862

X
A
S
S

X
A
S
S

X

S
S
S
S
S

S
S
S
S
S

S

S

S

S

A
X

S
S
A
A
A
X
X
X
X
S
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Memory operand not 16-byte aligned and MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

26568—Rev. 3.22—May 2018

AMD64 Technology

XORPS
VXORPS

XOR
Packed Single-Precision Floating-Point

Performs bitwise XOR of four packed single-precision floating-point values in the first source operand with the corresponding values of the second source operand and writes the results into the corresponding elements of the destination.
There are legacy and extended forms of the instruction:
XORPS

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the
YMM register that corresponds to the destination are not affected.
VXORPS

The extended form of the instruction has both 128-bit and 256-bit encodings:
XMM Encoding

The first source operand is an XMM register. The second source operand is either an XMM register or
a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.
YMM Encoding

The first source operand is a YMM register and the second source operand is either a YMM register
or a 256-bit memory location. The destination is a third YMM register.
Instruction Support
Form

Subset

XORPS

SSE2

VXORPS

AVX

Feature Flag
CPUID Fn0000_0001_EDX[SSE2] (bit 26)
CPUID Fn0000_0001_ECX[AVX] (bit 28)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
XORPS xmm1, xmm2/mem128

Opcode

Description

66 0F 57 /r

Performs bitwise XOR of four packed single-precision
floating-point values in xmm1 with corresponding values in
xmm2 or mem128. Writes the result to xmm1.

Mnemonic

Encoding
VEX

RXB.map_select

W.vvvv.L.pp

Opcode

VXORPS xmm1, xmm2, xmm3/mem128

C4

RXB.01

X.src1.0.00

57 /r

VXORPS ymm1, ymm2, ymm3/mem256

C4

RXB.01

X.src1.1.00

57 /r

Related Instructions
(V)ANDNPS, (V)ANDPD, (V)ANDPS, (V)ORPD, (V)ORPS, (V)XORPD

863

AMD64 Technology

26568—Rev. 3.22—May 2018

rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception

Mode
Real Virt Prot

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

864

X
A
S
S

X
A
S
S

X

S
S
S
S
S

S
S
S
S
S

S

S

S

S

A
X

S
S
A
A
A
X
X
X
X
S
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Memory operand not 16-byte aligned and MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

26568—Rev. 3.22—May 2018

AMD64 Technology

XRSTOR

Restore Extended States

Restores a partial or full processor state from memory.
This instruction and associated data structures extend the FXSAVE/FXRSTOR memory image used
to manage processor states and provide additional functionality. See the descriptions of XSAVE and
XRSTOR instructions for basic operational details.
The XRSTOR instruction may operate on the buffer in standard form or a compact form. The compact form is indicated in the memory buffer with XCOMP_BV[63]=1.
In either form, the instruction creates a Requested Feature Bit Map (RBFM) which is the logical AND
of EDX:EAX and XCR0. Then for each feature bit:
1. If RFBM = 0, XRSTOR does not update the component.
2. If RFBM = 1 but the corresponding XSTATE_BV bit is 0, the component is set to its reset state
without reading anything out of the buffer.
3. IF RFBM =1 and XSTATE_BV =1, the component state is read from the buffer.
4. XRSTOR loads an internal state value XRSTOR_INFO that can be used to further optimize a subsequent XSAVEOPT or XSAVES. This reflects the current privilege level and virtualization mode
as well as the save area's base address and XCOMP_BV field.
5. If RFBM=1, the corresponding XINUSE bit is set to the state of XSTATE_BV.
For standard mode, MXCSR is loaded if RFBM[1]=1 or RFBM[2]=1. It is never initialized.
For compact mode, MXCSR is associated with RFBM[1].
In some generations, the FP error pointers were only restored if there was a Floating point error
logged. In newer generations, the FP error pointers are always restored. This is indicated by CPUID
Fn8000_0008_EBX[2].
Instruction Support
Form

Subset

XRSTOR

XRSTOR

Feature Flag
CPUID Fn0000_00001_ECX[XSAVE] (bit 26)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

Description

XRSTOR mem

0F AE /5

Restores user-specified processor state from memory.

Related Instructions
XGETBV, XRSTORS, XSAVE, XSAVEC, XSAVES, XSETBV
rFLAGS Affected
None
MXCSR Flags Affected
None

865

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception
Invalid opcode, #UD
Device not available, #NM
Stack, #SS

General protection, #GP

Page fault, #PF
X — exception generated

866

Mode
Real Virt Prot
X
X
X
X
X
X
X
X
X
X
X
X
X
X

X
X
X
X
X
X
X
X
X
X
X
X
X
X

X
X
X
X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
CR4.OSXSAVE = 0.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not aligned on 64-byte boundary.
Any must be zero (MBZ) bits in the save area were set.
Attempt to set reserved bits in MXCSR.
XCOMP_BV[i] = 0 & XSTATE_BV[i] = 1
XCOMP_BV[I] = 1 & XCR0[i] = 0
Bytes 63:16 of header are non-zero
Instruction execution caused a page fault.

26568—Rev. 3.22—May 2018

AMD64 Technology

XRSTORS

Restore extended states supervisor

Restores processor state from memory.
XRSTORS is very similar to the XRSTOR instruction in compacted form with the following
differences:
1. XRSTORS must be executed at CPL=0
2. XRSTORS must read XCOMP_BV[63]=1, otherwise it will cause a #GP(0) exception
3. XRSTORS is able to restore state enabled from the IA32_XSS MSR.
All other behavior is the same as XRSTOR with the compact form.
Instruction Support
Form

Subset

XRSTOR

XRSTOR

Feature Flag
CPUID Fn0000_00001_ECX_X1[XSAVES] (bit 3)

For more on using the CPUID instruction to obtain processor feature support information, see
Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

XRSTOR mem

0F C7 /3

Description
Saves user-specified processor state to memory

Related Instructions
XGETBV, XRSTOR, XSAVE, XSAVEC, XSAVES, XSETBV
rFLAGS Affected
None
MXCSR Flags Affected
None

867

AMD64 Technology

Exception
Invalid opcode, #UD
Device not available, #NM
Stack, #SS

General protection, #GP

Page fault, #PF
X — exception generated

868

26568—Rev. 3.22—May 2018

Mode
Real Virt Prot
X
X
X
X
X
X
X
X
X
X
X
X
X

X
X
X
X
X
X
X
X
X
X
X
X
X

X
X
X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
CR4.OSXSAVE = 0.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not aligned on 64-byte boundary.
Any must be zero (MBZ) bits in the save area were set.
Attempt to set reserved bits in MXCSR.
CPL <> 0
(XSTATE_BV[i] & ~IA321_XSS[i]) = 1
Instruction execution caused a page fault.

26568—Rev. 3.22—May 2018

AMD64 Technology

XSAVE

Save Extended States

Saves a user-defined subset of enabled processor state data to a specified memory address.
This instruction and associated data structures extend the FXSAVE/FXRSTOR memory image used
to manage processor states and provide additional functionality.
The XSAVE/XRSTOR save area consists of a header section, and individual save areas for each processor state component. A component is saved when both the corresponding bits in the mask operand
(EDX:EAX) and the XFEATURE_ENABLED_MASK (XCR0) register are set. This bit-wise logical
AND of EDX:EAX and XCR0 is known as the Requested Feature Bit Map (RFBM). A component is
not saved when its corresponding RFBM bit is zero.
Software can set any bit in EDX:EAX, regardless of whether the bit position in XCR0 is valid for the
processor. When the mask operand contains all 1's, all processor state components enabled in XCR0
are saved.
For each component saved, XSAVE sets the corresponding bit in the XSTATE_BV field of the save
area header. XSAVE does not clear XSTATE_BV bits or modify individual save areas for components
that are not saved. If a saved component is in the hardware-specified initialized state, XSAVE may
clear the corresponding XSTATE_BV bit instead of setting it. This optimization is implementationdependent.
The MXCSR register is saved if either of RFBM bits 0 or 1 are set to 1. If there is no floating point
error present, some generations would not write out any of the FP error pointers. On newer generations, these fields are written to zeros. This is indicated by CPUID Fn8000_0008_EBX[2].
Instruction Support
Form
XSAVE

Subset

Feature Flag

XSAVE/XRSTOR CPUID Fn0000_0001_ECX[XSAVE] (bit 26)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

XSAVE mem

0F AE /4

Description
Saves user-specified processor state to memory.

Related Instructions
XGETBV, XRSTOR, XSAVEOPT, XSETBV
rFLAGS Affected
None
MXCSR Flags Affected
None

869

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception
Invalid opcode, #UD
Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
X — exception generated

870

Mode
Real Virt Prot
X
X
X
X
X
X
X
X
X
X

X
X
X
X
X
X
X
X
X
X

X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
CR4.OSXSAVE = 0.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not aligned on 64-byte boundary.
Attempt to write read-only memory.
Instruction execution caused a page fault.

26568—Rev. 3.22—May 2018

XSAVEC

AMD64 Technology

Save extended states in compacted form

Saves a user-defined subset of enabled processor state data to a specified memory address, possibly in
a compacted form.

This instruction and associated data structures extend the FXSAVE/FXRSTOR memory image used to
manage processor states and provides compaction functionality for more efficient context switching.
See the XSAVE and XRSTOR instruction descriptions for basic operational details..
XSAVEC is very similar to XSAVE but provides the following alternate functionality:
1. XSAVEC differs from XSAVE by using the init optimization and compaction.
2. XSAVEC differs by only saving a component if its RFBM=1 and its XINUSE=1. XINUSE is a
means by which the processor determines whether the feature is in its Initial state.
3. XSAVEC never writes bytes 511:464 of the legacy XSAVE data structure.
4. XSAVEC calculates XSTATE_BV by performing the logical AND of the RFBM and XINUSE
bitmaps and writes it to the XSAVE area.
5. XSAVEC calculates XCOMP_BV as [63]=1 and 62:0 = RFBM, and writes it to the XSAVE area.
6. XSAVEC does not modify any other parts of the header except as indicated in 4 and 5.
7. XSAVEC uses the compacted format of the XSAVE extended region while saving state.
Instruction Support
Form

Subset

XSAVE mem

XSAVEC

Feature Flag
CPUID Fn0000_0000D_EAX_x1[XSAVEC] (bit 1)

For more on using the CPUID instruction to obtain processor feature support information, see
Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

XSAVEOPT mem

0F C7 /4

Description
Saves user-specified processor state to memory.

Related Instructions
XGETBV, XRSTOR, XRSTORS, XSAVE, XSAVES, XSETBV
rFLAGS Affected
None
MXCSR Flags Affected
None

871

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception
Invalid opcode, #UD
Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
X — exception generated

872

Mode
Real Virt Prot
X
X
X
X
X
X
X
X
X
X

X
X
X
X
X
X
X
X
X
X

X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
CR4.OSXSAVE = 0.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not aligned on 64-byte boundary.
Attempt to write read-only memory.
Instruction execution caused a page fault.

26568—Rev. 3.22—May 2018

AMD64 Technology

XSAVEOPT

Save Extended States
Performance Optimized

Saves a user-defined subset of enabled processor state data to a specified memory address.
This instruction and associated data structures extend the FXSAVE/FXRSTOR memory image used
to manage processor states and provide additional functionality. See the XSAVE and XRSTOR
instruction descriptions for basic operational details.
The XSAVE/XRSTOR save area consists of a header section, and individual save areas for each processor state component. A component is saved when both the corresponding bits in the mask operand
(EDX:EAX) and the XFEATURE_ENABLED_MASK (XCR0) register are set. A component is not
saved when either of the corresponding bits in EDX:EAX or XCR0 is cleared.
Software can set any bit in EDX:EAX, regardless of whether the bit position in XCR0 is valid for the
processor. When the mask operand contains all 1's, all processor state components enabled in XCR0
are saved.
For each component saved, XSAVEOPT sets the corresponding bit in the XSTATE_BV field of the
save area header. XSAVEOPT does not clear XSTATE_BV bits or modify individual save areas for
components that are not saved. If a saved component is in the hardware-specified initialized state,
XSAVEOPT may clear the corresponding XSTATE_BV bit instead of setting it. This optimization is
implementation-dependent.
XSAVEOPT may provide other implementation-specific optimizations, such as the modified optimization described for XSAVES.
Instruction Support
Form

Subset

XSAVEOPT

XSAVEOPT

Feature Flag
CPUID Fn0000_0000D_EAX_x1[XSAVEOPT] (bit 0)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

XSAVEOPT mem

0F AE /6

Description
Saves user-specified processor state to memory.

Related Instructions
XGETBV, XRSTOR, XSAVE, XSETBV
rFLAGS Affected
None
MXCSR Flags Affected
None

873

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception
Invalid opcode, #UD
Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
X — exception generated

874

Mode
Real Virt Prot
X
X
X
X
X
X
X
X
X
X

X
X
X
X
X
X
X
X
X
X

X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
CR4.OSXSAVE = 0.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not aligned on 64-byte boundary.
Attempt to write read-only memory.
Instruction execution caused a page fault.

26568—Rev. 3.22—May 2018

AMD64 Technology

XSAVES

Save Extended States Supervisor

Saves a user-defined subset of enabled processor state data to a specified memory address, possibly in
a compacted form.
This instruction and associated data structures extend the XSAVE/XRSTOR memory image used to
manage processor states and provides compaction functionality. See the XSAVE and XRSTOR
instruction descriptions for basic operational details.
The XSAVES is very similar to XSAVEC but provides the following alternate functionality:
1. XSAVES must be executed at CPL=0
2. XSAVES can save state enabled in the IA32_XSS MSR. The specific state elements saved are
determined by the logical AND of EDX:EAX with the logical OR of XCR0 with the IA32_XSS
MSR.
3. XSAVES can use the modified optimization to not save components, even if RFBM=1 and
XINUSE=1 for the stated component. If the component state has not been modified internally
since the last execution of XRSTOR or XRSTORS and the XRSTOR_INFO state (an execution
environment signature created by the last XRSTOR) matches the current execution state of this
XSAVES, the state save can be skipped.
Instruction Support
Form

Subset

XSAVES

XSAVES

Feature Flag
CPUID Fn0000_0000D_EAX_x1[XSAVES] (bit 3)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic

Opcode

XSAVES mem

0F C7 /5

Description
Saves user-specified processor state to memory

Related Instructions
XGETBV, XRSTOR, XRSTORS, XSAVE, XSAVEC, XSETBV
rFLAGS Affected
None
MXCSR Flags Affected
None

875

AMD64 Technology

26568—Rev. 3.22—May 2018

Exceptions
Exception
Invalid opcode, #UD
Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
X — exception generated

876

Mode
Real Virt Prot
X
X
X
X
X
X
X
X
X
X

X
X
X
X
X
X
X
X
X
X

X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
CR4.OSXSAVE = 0.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not aligned on 64-byte boundary.
Attempt to write read-only memory.
Instruction execution caused a page fault.

26568—Rev. 3.22—May 2018

AMD64 Technology

XSETBV

Set Extended Control Register Value

Writes the content of the EDX:EAX register pair into the extended control register (XCR) specified
by the ECX register. The high-order 32 bits of the XCR are loaded from EDX and the low-order 32
bits are loaded from EAX. The corresponding high-order 32 bits of RAX and RDX are ignored.
This instruction and associated data structures extend the FXSAVE/FXRSTOR memory image used
to manage processor states and provide additional functionality. See the XSAVE instruction description for more information.
Currently, only the XFEATURE_ENABLED_MASK register (XCR0) is supported. Specifying a
reserved or unimplemented XCR in ECX causes a general protection exception (#GP).
Executing XSETBV at a privilege level other than 0 causes a general-protection exception. A general
protection exception also occurs when software attempts to write to reserved bits of an XCR.
Instruction Support
Form
XSETBV

Subset

Feature Flag

XSAVE/XRSTOR CPUID Fn0000_0001_ECX[XSAVE] (bit 26)

For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3.
Instruction Encoding
Mnemonic
XSETBV

Opcode

Description

0F 01 D1

Writes the content of the EDX:EAX register pair to
the XCR specified by the ECX register.

Related Instructions
XGETBV, XRSTOR, XSAVE, XSAVEOPT
rFLAGS Affected
None
MXCSR Flags Affected
None
Exceptions
Exception
Invalid opcode, #UD

Mode
Real Virt Prot
X
X
X
X

General protection, #GP

X
X
X
X

X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
CR4.OSXSAVE = 0.
Lock prefix (F0h) preceding opcode.
CPL != 0.
ECX specifies a reserved or unimplemented XCR address.
Any must be zero (MBZ) bits in the XCR were set.
Setting XCR0[2:1] to 10b.
Writing 0 to XCR[0].

X — exception generated

877

AMD64 Technology

878

26568—Rev. 3.22—May 2018

26568—Rev. 3.22—May 2018

3

AMD64 Technology

Exception Summary

This chapter provides a ready reference to instruction exceptions. Table 3-1 shows instructions
grouped by exception class, with the extended and legacy instruction type (if applicable).
Hyperlinks in the table point to the exception tables which follow.
Table 3-1. Instructions By Exception Class
Mnemonic
Extended Type Legacy Type
Class 1 — AVX / SSE Vector Aligned (VEX.vvvv != 1111)
AVX
SSE2
MOVAPD VMOVAPD
AVX
SSE
MOVAPS VMOVAPS
AVX
SSE2
MOVDQA VMOVDQA
AVX
SSE2
MOVNTDQ VMOVNTDQ
AVX
SSE2
MOVNTPD VMOVNTPD
AVX
SSE
MOVNTPS VMOVNTPS
Class 1X — SSE / AXV / AVX2 Vector (VEX.vvvv != 1111b or VEX.L=1 && !AVX2)
AVX, AVX2
SSE4.1
MOVNTDQA VMOVNTDQA
Class 2 — AVX / SSE Vector (SIMD 111111)
AVX
SSE2
DIVPD VDIVPD
AVX
SSE
DIVPS VDIVPS
Class 2-1 — AVX / SSE Vector (SIMD 111011)
AVX
SSE2
ADDPD VADDPD
AVX
SSE
ADDPS VADDPS
AVX
SSE2
ADDSUBPD VADDSUBPD
AVX
SSE
ADDSUBPS VADDSUBPS
AVX
SSE4.1
DPPS VDPPS
AVX
SSE3
HADDPD VHADDPD
AVX
SSE3
HADDPS VHADDPS
AVX
SSE3
HSUBPD VHSUBPD
AVX
SSE3
HSUBPS VHSUBPS
AVX
SSE2
SUBPD VSUBPD
AVX
SSE
SUBPS VSUBPS
Class 2-2 — AVX / SSE Vector (SIMD 000011)
AVX
SSE2
CMPPD VCMPPD
AVX
SSE
CMPPS VCMPPS
AVX
SSE2
MAXPD VMAXPD
AVX
SSE
MAXPS VMAXPS
AVX
SSE2
MINPD VMINPD
AVX
SSE
MINPS VMINPS
AVX
SSE2
MULPD VMULPD
AVX
SSE
MULPS VMULPS
Class 2-3 — AVX / SSE Vector (SIMD 100001)
—
—
(unused)

879

AMD64 Technology

26568—Rev. 3.22—May 2018

Table 3-1. Instructions By Exception Class (continued)
Mnemonic
Extended Type
Class 2A — AVX / SSE Vector (SIMD 111111, VEX.L = 1)
—
(unused)
Class 2A-1 — AVX / SSE Vector (SIMD 111011, VEX.L = 1)
AVX
DPPD VDPPD
Class 2B — AVX / SSE Vector (SIMD 111111, VEX.vvvv != 1111b)
—
(unused)
Class 2B-1 — AVX / SSE Vector (SIMD 100000, VEX.vvvv != 1111b)
AVX
CVTDQ2PS VCVTDQ2PS
Class 2B-2 — AVX / SSE Vector (SIMD 100001, VEX.vvvv != 1111b)
AVX
CVTPD2DQ VCVTPD2DQ
AVX
CVTPS2DQ VCVTPS2DQ
AVX
CVTTPS2DQ VCVTTPS2DQ
AVX
CVTTPD2DQ VCVTTPD2DQ
AVX
ROUNDPD, VROUNDPD
AVX
ROUNDPS, VROUNDPS
Class 2B-3 — AVX / SSE Vector (SIMD 111011, VEX.vvvv != 1111b)
AVX
CVTPD2PS VCVTPD2PS
Class 2B-4 — AVX / SSE Vector (SIMD 100011, VEX.vvvv != 1111b)
AVX
SQRTPD VSQRTPD
AVX
SQRTPS VSQRTPS
Class 3 — AVX / SSE Scalar (SIMD 111111)
AVX
DIVSD VDIVSD
AVX
DIVSS VDIVSS
Class 3-1 — AVX / SSE Scalar (SIMD 111011)
AVX
ADDSD VADDSD
AVX
ADDSS VADDSS
AVX
CVTSD2SS VCVTSD2SS
AVX
SUBSD VSUBSD
AVX
SUBSS VSUBSS
Class 3-2 — AVX / SSE Scalar (SIMD 000011)
AVX
CMPSD VCMPSD
AVX
CMPSS VCMPSS
AVX
CVTSS2SD VCVTSS2SD
AVX
MAXSD VMAXSD
AVX
MAXSS VMAXSS
AVX
MINSD VMINSD
AVX
MINSS VMINSS
AVX
MULSD VMULSD
AVX
MULSS VMULSS
AVX
UCOMISD VUCOMISD
AVX
UCOMISS VUCOMISS

880

Legacy Type
—
SSE4.1
—
SSE2
SSE2
SSE2
SSE2
SSE2
SSE4.1
SSE4.1
SSE2
SSE2
SSE
SSE2
SSE
SSE2
SSE
SSE2
SSE2
SSE
SSE2
SSE
SSE2
SSE2
SSE
SSE2
SSE
SSE2
SSE
SSE2
SSE

26568—Rev. 3.22—May 2018

AMD64 Technology

Table 3-1. Instructions By Exception Class (continued)
Mnemonic
Extended Type
Class 3-3 — AVX / SSE Scalar (SIMD 100000)
AVX
CVTSI2SD VCVTSI2SD
AVX
CVTSI2SS VCVTSI2SS
Class 3-4 — AVX / SSE Scalar (SIMD 100001)
AVX
ROUNDSD, VROUNDSD
AVX
ROUNDSS, VROUNDSS
Class 3-5 — AVX / SSE Scalar (SIMD 100011)
AVX
SQRTSD VSQRTSD
AVX
SQRTSS VSQRTSS
Class 3A — AVX / SSE Scalar (SIMD 111111, VEX.vvvv != 1111b)
—
(unused)
Class 3A-1 — AVX / SSE Scalar (SIMD 000011, VEX.vvvv != 1111b)
AVX
COMISD VCOMISD
AVX
COMISS VCOMISS
AVX
CVTPS2PD VCVTPS2PD
Class 3A-2 — AVX / SSE Scalar (SIMD 100001, VEX.vvvv != 1111b)
AVX
CVTSD2SI VCVTSD2SI
AVX
CVTSS2SI VCVTSS2SI
AVX
CVTTSD2SI VCVTTSD2SI
AVX
CVTTSS2SI VCVTTSS2SI
Class 4 — AVX / SSE Vector
AVX
AESDEC VAESDEC
AVX
AESDECLAST VAESDECLAST
AVX
AESENC VAESENC
AVX
AESENCLAST VAESENCLAST
AVX
AESIMC VAESIMC
AVX
AESKEYGENASSIST VAESKEYGENASSIST
AVX
ANDNPD VANDNPD
AVX
ANDNPS VANDNPS
AVX
ANDPD VANDPD
AVX
ANDPS VANDPS
AVX
BLENDPD VBLENDPD
AVX
BLENDPS VBLENDPS
AVX
ORPD VORPD
AVX
ORPS VORPS
AVX
PCLMULQDQ VPCLMULQDQ
AVX
SHUFPD VSHUFPD
AVX
SHUFPS VSHUFPS
AVX
UNPCKHPD VUNPCKHPD
AVX
UNPCKHPS VUNPCKHPS
AVX
UNPCKLPD VUNPCKLPD
AVX
UNPCKLPS VUNPCKLPS

Legacy Type
SSE2
SSE
SSE4.1
SSE4.1
SSE2
SSE
—
SSE2
SSE
SSE2
SSE2
SSE
SSE2
SSE
AES
AES
AES
AES
AES
AES
SSE2
SSE
SSE2
SSE
SSE4.1
SSE4.1
SSE2
SSE
CLMUL
SSE2
SSE2
SSE2
SSE
SSE2
SSE

881

AMD64 Technology

26568—Rev. 3.22—May 2018

Table 3-1. Instructions By Exception Class (continued)
Mnemonic
XORPD VXORPD
XORPS VXORPS
Class 4A — AVX / SSE Vector (VEX.W = 1)
BLENDVPD VBLENDVPD
BLENDVPS VBLENDVPS
Class 4B — AVX / SSE Vector (VEX.L = 1)
(unused)
Class 4B-X — SSE / AVX / AVX2 (VEX.L = 1 && !AVX2)
MPSADBW VMPSADBW
PACKSSDW VPACKSSDW
PACKSSWB VPACKSSWB
PACKUSDW VPACKUSDW
PACKUSWB VPACKUSWB
PADDB VPADDB
PADDD VPADDD
PADDQ VPADDQ
PADDSB VPADDSB
PADDSW VPADDSW
PADDUSB VPADDUSB
PADDUSW VPADDUSW
PADDW VPADDW
PALIGNR VPALIGNR
PAND VPAND
PANDN VPANDN
PAVGB VPAVGB
PAVGW VPAVGW
PBLENDW VPBLENDW
PCMPEQB VPCMPEQB
PCMPEQD VPCMPEQD
PCMPEQQ VPCMPEQQ
PCMPEQW VPCMPEQW
PCMPGTB VPCMPGTB
PCMPGTD VPCMPGTD
PCMPGTQ VPCMPGTQ
PCMPGTW VPCMPGTW
PHADDD VPHADDD
PHADDSW VPHADDSW
PHADDW VPHADDW
PHSUBD VPHSUBD
PHSUBW VPHSUBW
PHSUBSW VPHSUBSW
PMADDUBSW VPMADDUBSW

882

Extended Type

Legacy Type

AVX

SSE2

AVX

SSE

AVX

SSE4.1

AVX

SSE4.1

—

—

AVX, AVX2

SSE4.1

AVX, AVX2

SSE2

AVX, AVX2

SSE2

AVX, AVX2

SSE4.1

AVX, AVX2

SSE2

AVX, AVX2

SSE2

AVX, AVX2

SSE2

AVX, AVX2

SSE2

AVX, AVX2

SSE2

AVX, AVX2

SSE2

AVX, AVX2

SSE2

AVX, AVX2

SSE2

AVX, AVX2

SSE2

AVX, AVX2

SSSE3

AVX, AVX2

SSE2

AVX, AVX2

SSE2

AVX, AVX2

SSE

AVX, AVX2

SSE

AVX, AVX2

SSE4.1

AVX, AVX2

SSE2

AVX, AVX2

SSE2

AVX, AVX2

SSE4.1

AVX, AVX2

SSE2

AVX, AVX2

SSE2

AVX, AVX2

SSE2

AVX, AVX2

SSE4.2

AVX, AVX2

SSE2

AVX, AVX2

SSSE3

AVX, AVX2

SSSE3

AVX, AVX2

SSSE3

AVX, AVX2

SSSE3

AVX, AVX2

SSSE3

AVX, AVX2

SSSE3

AVX, AVX2

SSSE3

26568—Rev. 3.22—May 2018

AMD64 Technology

Table 3-1. Instructions By Exception Class (continued)
Mnemonic
PMADDWD VPMADDWD
PMAXSB VPMAXSB
PMAXSD VPMAXSD
PMAXSW VPMAXSW
PMAXUB VPMAXUB
PMAXUD VPMAXUD
PMAXUW VPMAXUW
PMINSB VPMINSB
PMINSD VPMINSD
PMINSW VPMINSW
PMINUB VPMINUB
PMINUD VPMINUD
PMINUW VPMINUW
PMULDQ VPMULDQ
PMULHRSW VPMULHRSW
PMULHUW VPMULHUW
PMULHW VPMULHW
PMULLD VPMULLD
PMULLW VPMULLW
PMULUDQ VPMULUDQ
POR VPOR
PSADBW VPSADBW
PSHUFB VPSHUFB
PSIGNB VPSIGNB
PSIGND VPSIGND
PSIGNW VPSIGNW
PSUBB VPSUBB
PSUBD VPSUBD
PSUBQ VPSUBQ
PSUBSB VPSUBSB
PSUBSW VPSUBSW
PSUBUSB VPSUBUSB
PSUBUSW VPSUBUSW
PSUBW VPSUBW
PUNPCKHBW VPUNPCKHBW
PUNPCKHDQ VPUNPCKHDQ
PUNPCKHQDQ VPUNPCKHQDQ
PUNPCKHWD VPUNPCKHWD
PUNPCKLBW VPUNPCKLBW
PUNPCKLDQ VPUNPCKLDQ
PUNPCKLQDQ VPUNPCKLQDQ
PUNPCKLWD VPUNPCKLWD

Extended Type

Legacy Type

AVX, AVX2

SSE2

AVX, AVX2

SSE4.1

AVX, AVX2

SSE4.1

AVX, AVX2

SSE

AVX, AVX2

SSE

AVX, AVX2

SSE4.1

AVX, AVX2

SSE4.1

AVX, AVX2

SSE4.1

AVX, AVX2

SSE4.1

AVX, AVX2

SSE

AVX, AVX2

SSE

AVX, AVX2

SSE4.1

AVX, AVX2

SSE4.1

AVX, AVX2

SSE4.1

AVX, AVX2

SSSE3

AVX, AVX2

SSE2

AVX, AVX2

SSE2

AVX, AVX2

SSE4.1

AVX, AVX2

SSE2

AVX, AVX2

SSE2

AVX, AVX2

SSE2

AVX, AVX2

SSE

AVX, AVX2

SSSE3

AVX, AVX2

SSSE3

AVX, AVX2

SSSE3

AVX, AVX2

SSSE3

AVX, AVX2

SSE2

AVX, AVX2

SSE2

AVX, AVX2

SSE2

AVX, AVX2

SSE2

AVX, AVX2

SSE2

AVX, AVX2

SSE2

AVX, AVX2

SSE2

AVX, AVX2

SSE2

AVX, AVX2

SSE2

AVX, AVX2

SSE2

AVX, AVX2

SSE2

AVX, AVX2

SSE2

AVX, AVX2

SSE2

AVX, AVX2

SSE2

AVX, AVX2

SSE2

AVX, AVX2

SSE2

883

AMD64 Technology

26568—Rev. 3.22—May 2018

Table 3-1. Instructions By Exception Class (continued)
Mnemonic

Extended Type

Legacy Type

AVX, AVX2
SSE2
PXOR VPXOR
Class 4C — AVX / SSE Vector (VEX.vvvv != 1111b)
AVX
SSE3
MOVSHDUP VMOVSHDUP
AVX
SSE3
MOVSLDUP VMOVSLDUP
AVX
SSE4.1
PTEST VPTEST
AVX
SSE
RCPPS VRCPPS
AVX
SSE
RSQRTPS VRSQRTPS
Class 4C-1 — AVX / SSE Vector (write to RO memory, VEX.vvvv != 1111b)
AVX
SSE3
LDDQU VLDDQU
AVX
SSE2
MOVDQU VMOVDQU
AVX
SSE2
MOVUPD VMOVUPD
AVX
SSE
MOVUPS VMOVUPS
Class 4D — AVX / SSE Vector (VEX.vvvv != 1111b, VEX.L = 1)
AVX
SSE2
MASKMOVDQU VMASKMOVDQU
AVX
SSE4.2
PCMPESTRI VPCMPESTRI
AVX
SSE4.2
PCMPESTRM VPCMPESTRM
AVX
SSE4.2
PCMPISTRI VPCMPISTRI
AVX
SSE4.2
PCMPISTRM VPCMPISTRM
AVX
SSE4.1
PHMINPOSUW VPHMINPOSUW
Class 4D-X — SSE / AVX / AVX2 Vector (VEX.vvvv != 1111b, (VEX.L = 1 && !AVX2))
AVX, AVX2
SSSE3
PABSB VPABSB
AVX, AVX2
SSSE3
PABSD VPABSD
AVX, AVX2
SSSE3
PABSW VPABSW
AVX, AVX2
SSE2
PSHUFD VPSHUFD
AVX, AVX2
SSE2
PSHUFHW VPSHUFHW
AVX, AVX2
SSE2
PSHUFLW VPSHUFLW
Class 4E — AVX / SSE Vector (VEX.W = 1, VEX.L = 1)
—
—
(unused)
Class 4E-X — SSE / AVX / AVX2 Vector (VEX.W = 1, (VEX.L = 1 && !AVX2))
AVX
SSE4.1
PBLENDVB VPBLENDVB
Class 4F — AVX / SSE (VEX.L = 1)
—
—
(unused)
Class 4F-X — SSE / AVX / AVX2 Vector (VEX.L = 1 && !AVX2)
AVX, AVX2
SSE2
PSLLD VPSLLD
AVX, AVX2
SSE2
PSLLQ VPSLLQ
AVX, AVX2
SSE2
PSLLW VPSLLW
AVX, AVX2
SSE2
PSRAD VPSRAD
AVX, AVX2
SSE2
PSRAW VPSRAW
AVX, AVX2
SSE2
PSRLD VPSRLD
AVX, AVX2
SSE2
PSRLQ VPSRLQ
AVX, AVX2
SSE2
PSRLW VPSRLW
Class 4G — AVX Vector (VEX.W = 1, VEX.vvvv != 1111b)

884

26568—Rev. 3.22—May 2018

AMD64 Technology

Table 3-1. Instructions By Exception Class (continued)
Mnemonic

Extended Type

VTESTPD
VTESTPS
Class 4H — AVX, 256-bit only (VEX.L = 0; No SIMD Exceptions)
VPERMD
VPERMPS
Class 4H-1 — AVX2, 256-bit only (VEX.L = 0, VEX.vvvv != 1111b)
VPERMPD
VPERMQ
Class 4J — AVX2 (VEX.W = 1)
VPBLENDD
VPSRAVD

Legacy Type

AVX

—

AVX

—

AVX2

—

AVX2

—

AVX2

—

AVX2

—

AVX2
AVX2

—
—

AVX2
AVX2
AVX2
AVX2
AVX2
AVX2

—
—
—
—
—
—

Class 4K — AVX2
VPMASKMOVD
VPMASKMOVQ
VPSLLVD
VPSLLVQ
VPSRLVD
VPSRLVQ

Class 5 — AVX / SSE Scalar
AVX
RCPSS VRCPSS
AVX
RSQRTSS VRSQRTSS
Class 5A — AVX / SSE Scalar (VEX.L = 1)
AVX
INSERTPS VINSERTPS
Class 5B — AVX / SSE Scalar (VEX.vvvv != 1111b)
AVX
CVTDQ2PD VCVTDQ2PD
AVX
MOVDDUP VMOVDDUP
Class 5C — AVX /SSE Scalar (VEX.vvvv != 1111b, VEX.L = 1)
AVX
PINSRB VPINSRB
AVX
PINSRD VPINSRD
AVX
PINSRQ VPINSRQ
AVX
PINSRW VPINSRW
Class 5C-X — SSE / AVX / AVX2 Scalar (VEX.vvvv != 1111b, (VEX.L = 1 && !AVX2))
AVX, AVX2
PMOVSXBD VPMOVSXBD
AVX, AVX2
PMOVSXBQ VPMOVSXBQ
AVX, AVX2
PMOVSXBW VPMOVSXBW
AVX, AVX2
PMOVSXDQ VPMOVSXDQ
AVX, AVX2
PMOVSXWD VPMOVSXWD
AVX, AVX2
PMOVSXWQ VPMOVSXWQ
AVX, AVX2
PMOVZXBD VPMOVZXBD
AVX, AVX2
PMOVZXBQ VPMOVZXBQ
AVX, AVX2
PMOVZXBW VPMOVZXBW
AVX, AVX2
PMOVZXDQ VPMOVZXDQ

SSE
SSE
SSE4.1
SSE2
SSE3
SSE4.1
SSE4.1
SSE4.1
SSE
SSE4.1
SSE4.1
SSE4.1
SSE4.1
SSE4.1
SSE4.1
SSE4.1
SSE4.1
SSE4.1
SSE4.1

885

AMD64 Technology

26568—Rev. 3.22—May 2018

Table 3-1. Instructions By Exception Class (continued)
Mnemonic
Extended Type Legacy Type
AVX, AVX2
SSE4.1
PMOVZXWD VPMOVZXWD
AVX, AVX2
SSE4.1
PMOVZXWQ VPMOVZXWQ
Class 5C-1 — AVX / SSE Scalar (write to RO memory, VEX.vvvv != 1111b, VEX.L = 1)
AVX
SSE4.1
EXTRACTPS VEXTRACTPS
AVX
SSE2
MOVD VMOVD
AVX
SSE2
MOVQ VMOVQ
AVX
SSE4.1
PEXTRB VPEXTRB
AVX
SSE4.1
PEXTRD VPEXTRD
AVX
SSE4.1
PEXTRQ VPEXTRQ
AVX
SSE4.1
PEXTRW VPEXTRW
Class 5D — AVX / SSE Scalar (write to RO memory, VEX.vvvv != 1111b (variant))
AVX
SSE2
MOVSD VMOVSD
AVX
SSE
MOVSS VMOVSS
Class 5E — AVX / SSE Scalar (write to RO, VEX.vvvv != 1111b (variant), VEX.L = 1)
AVX
SSE2
MOVHPD VMOVHPD
AVX
SSE
MOVHPS VMOVHPS
AVX
SSE2
MOVLPD VMOVLPD
AVX
SSE
MOVLPS VMOVLPS
Class 6 — AVX Mixed Memory Argument
—
—
(unused)
Class 6A — AVX Mixed Memory Argument (VEX.W = 1)
—
—
(unused)
Class 6A-1 — AVX Mixed Memory Argument (write to RO memory, VEX.W = 1)
AVX
—
VMASKMOVPD
AVX
—
VMASKMOVPS
Class 6B — AVX Mixed Memory Argument (VEX.W = 1, VEX.L = 0)
AVX
—
VINSERTF128
AVX2
—
VINSERTI128
AVX
—
VPERM2F128
AVX2
—
VPERM2I128
Class 6B-1 — AVX Mixed Memory Argument (write to RO, VEX.W = 1, VEX.L = 0)
AVX
—
VEXTRACTF128
Class 6C — AVX Mixed Memory Argument (VEX.W = 1, VEX.L = 0, VEX.vvvv != 1111b)
AVX
—
VBROADCASTF128
AVX2
—
VBROADCASTI128
AVX2
—
VEXTRACTI128
Class 6C-X — AVX / AVX2 (W=1, vvvv!=1111b, L=0, (reg src op specified && !AVX2))
AVX, AVX2
—
VBROADCASTSD
Class 6D — AVX Mixed Memory Argument (VEX.W = 1, VEX.vvvv != 1111b)
AVX2
—
VPBROADCASTB
AVX2
—
VPBROADCASTD
AVX2
—
VPBROADCASTQ

886

26568—Rev. 3.22—May 2018

AMD64 Technology

Table 3-1. Instructions By Exception Class (continued)
Mnemonic

Extended Type

Legacy Type

AVX2
—
VPBROADCASTW
Class 6D-X — AVX / AVX2 (W = 1, vvvv != 1111b, (ModRM.mod = 11b && !AVX2))
AVX, AVX2
—
VBROADCASTSS
Class 6E — AVX Mixed Memory Argument (VEX.W = 1, VEX.vvvv != 1111b (variant))
AVX
—
VPERMILPD
AVX
—
VPERMILPS
Class 6F — AVX2 (VEX.W = 1, VEX.vvvv != 1111b, VEX.L = 0, ModRM.mod = 11b)
AVX2
—
VBROADCASTI128
Class 7 — AVX / SSE No Memory Argument
—
—
(unused)
Class 7A — AVX /SSE No Memory Argument (VEX.L = 1)
AVX
SSE
MOVHLPS VMOVHLPS
AVX
SSE
MOVLHPS VMOVLHPS
Class 7A-X SSE / AVX / AVX2 Vector (VEX.L = 1 && !AVX2)
AVX, AVX2
SSE2
PSLLDQ VPSLLDQ
AVX, AVX2
SSE2
PSRLDQ VPSRLDQ
Class 7B — AVX /SSE No Memory Argument (VEX.vvvv != 1111b)
AVX
SSE2
MOVMSKPD VMOVMSKPD
AVX
SSE
MOVMSKPS VMOVMSKPS
Class 7C — AVX / SSE No Memory Argument (VEX.vvvv != 1111b, VEX.L = 1)
—
—
(not used)
Class 7C-X SSE / AVX / AVX2 Vector (VEX.vvvv != 1111b, (VEX.L = 1 && !AVX2))
AVX, AVX2
SSE2
PMOVMSKB VPMOVMSKB
Class 8 — AVX No Memory Argument (VEX.vvvv != 1111b, VEX.W = 1)
AVX
—
VZEROALL
AVX
—
VZEROUPPER
Class 9 — AVX 4-byte Argument (write to RO memory, VEX.vvvv != 1111b, VEX.L = 1)
AVX
SSE
STMXCSR VSTMXCSR
Class 9A — AVX 4-byte argument (reserved MBZ = 1, VEX.vvvv != 1111b, VEX.L = 1)
AVX
SSE
LDMXCSR VLDMXCSR

887

AMD64 Technology

26568—Rev. 3.22—May 2018

Table 3-1. Instructions By Exception Class (continued)
Mnemonic

Extended Type

Class 10 — XOP Base
XOP
VPCMOV
XOP
VPCOMB
XOP
VPCOMD
XOP
VPCOMQ
XOP
VPCOMUB
XOP
VPCOMUD
XOP
VPCOMUQ
XOP
VPCOMUW
XOP
VPCOMW
XOP
VPERMIL2PS
XOP
VPERMIL2PD
Class 10A — XOP Base (XOP.L = 1)
XOP
VPPERM
XOP
VPSHAB
XOP
VPSHAD
XOP
VPSHAQ
XOP
VPSHAW
XOP
VPSHLB
XOP
VPSHLD
XOP
VPSHLQ
XOP
VPSHLW
Class 10B — XOP Base (XOP.W = 1, XOP.L = 1)
XOP
VPMACSDD
XOP
VPMACSDQH
XOP
VPMACSDQL
XOP
VPMACSSDD
XOP
VPMACSSDQH
XOP
VPMACSSDQL
XOP
VPMACSSWD
XOP
VPMACSSWW
XOP
VPMACSWD
XOP
VPMACSWW
XOP
VPMADCSSWD
XOP
VPMADCSWD
Class 10C — XOP Base (XOP.W = 1, XOP.vvvv != 1111b, XOP.L = 1)
XOP
VPHADDBD
XOP
VPHADDBQ
XOP
VPHADDBW
XOP
VPHADDD
XOP
VPHADDDQ
XOP
VPHADDUBD

888

Legacy Type

—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—

26568—Rev. 3.22—May 2018

AMD64 Technology

Table 3-1. Instructions By Exception Class (continued)
Mnemonic

Extended Type

XOP
VPHADDUBQ
XOP
VPHADDUBW
XOP
VPHADDUDQ
XOP
VPHADDUWD
XOP
VPHADDUWQ
XOP
VPHADDWD
XOP
VPHADDWQ
XOP
VPHSUBBW
XOP
VPHSUBDQ
XOP
VPHSUBWD
Class 10D — XOP Base (SIMD 110011, XOP.vvvv != 1111b, XOP.W = 1)
XOP
VFRCZPD
XOP
VFRCZPS
XOP
VFRCZSD
XOP
VFRCZSS
Class 10E — XOP Base (XOP.vvvv != 1111b (variant), XOP.L = 1)
XOP
VPROTB
XOP
VPROTD
XOP
VPROTQ
XOP
VPROTW
Class 11 — F16C Instructions
F16C
VCVTPH2PS
F16C
VCVTPS2PH
Class 12 — AVX2 VSID (ModRM.mod = 11b, ModRM.rm != 100b)
VGATHERDPD
VGATHERDPS
VGATHERQPD
VGATHERQPS
VPGATHERDD
VPGATHERDQ
VPGATHERQD
VPGATHERQQ

AVX2
AVX2
AVX2
AVX2
AVX2
AVX2
AVX2
AVX2

Class FMA-2 — FMA / FMA4 Vector (SIMD Exceptions PE, UE, OE, DE, IE)
FMA4
VFMADDPD
FMA4
VFMADDPS
FMA4
VFMADDSUBPD
FMA4
VFMADDSUBPS
FMA4
VFMSUBADDPD
FMA4
VFMSUBADDPS
FMA4
VFMSUBPD
FMA4
VFMSUBPS
FMA4
VFNMADDPD

Legacy Type
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—

889

AMD64 Technology

26568—Rev. 3.22—May 2018

Table 3-1. Instructions By Exception Class (continued)
Mnemonic

Extended Type

FMA4
VFNMADDPS
FMA4
VFNMSUBPD
FMA4
VFNMSUBPS
Class FMA-3 — FMA / FMA4 Scalar (SIMD Exceptions PE, UE, OE, DE, IE)
FMA4
VFMADDSD
FMA4
VFMADDSS
FMA4
VFMSUBSD
FMA4
VFMSUBSS
FMA4
VFNMADDSD
FMA4
VFNMADDSS
FMA4
VFNMSUBSD
FMA4
VFNMSUBSS
Unique Cases
—
XGETBV
—
XRSTOR
—
XSAVE/XSAVEOPT
—
XSETBV

890

Legacy Type
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—

26568—Rev. 3.22—May 2018

AMD64 Technology

Class 1 — AVX / SSE Vector Aligned (VEX.vvvv != 1111)
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S
S
S

S
S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS

General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
S
X
A

Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

S

X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Memory operand not aligned on a 16-byte boundary.
Write to a read-only data segment.
VEX256: Memory operand not 32-byte aligned.
VEX128: Memory operand not 16-byte aligned.
Null data segment used to reference memory.
Instruction execution caused a page fault.

891

AMD64 Technology

26568—Rev. 3.22—May 2018

Class 1X — SSE / AXV / AVX2 Vector (VEX.vvvv != 1111b or VEX.L=1 && !AVX2)
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S
S
S

S
S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS

X
S
S
A
A
A
A
A
X
X
X
X
S
X

General protection, #GP
A

Page fault, #PF
S
X — AVX, AVX2, and SSE exception
A — AVX, AVX2 exception
S — SSE exception

892

X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Memory operand not aligned on a 16-byte boundary.
Write to a read-only data segment.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Null data segment used to reference memory.
Instruction execution caused a page fault.

26568—Rev. 3.22—May 2018

AMD64 Technology

Class 2 — AVX / SSE Vector (SIMD 111111)
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S

X
X
X
S
X

S

S

S

S

A
X

S

X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
SIMD floating-point, #XF

S

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Division by zero, ZE
Overflow, OE
Underflow, UE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

S
S
S
S
S
S
S

S
S
S
S
S
S
S

X
X
X
X
X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Division of finite dividend by zero-value divisor.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

893

AMD64 Technology

26568—Rev. 3.22—May 2018

Class 2-1 — AVX / SSE Vector (SIMD 111011)
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S

X
X
X
S
X

S

S

S

S

A
X

S

X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
SIMD floating-point, #XF

S

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

894

S
S
S
S
S
S

S
S
S
S
S
S

X
X
X
X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

26568—Rev. 3.22—May 2018

AMD64 Technology

Class 2-2 — AVX / SSE Vector (SIMD 000011)
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S

X
X
X
S
X

S

S

S

S

A
X

S

X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
SIMD floating-point, #XF

S

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

S
S
S

S
S
S

X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.

895

AMD64 Technology

26568—Rev. 3.22—May 2018

Class 2-3 — AVX / SSE Vector (SIMD 100001)
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S

X
X
X
S
X

S

S

S

S

A
X

S

X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
SIMD floating-point, #XF

S

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

896

S
S
S

S
S
S

X
X
X

A source operand was an SNaN value.
Undefined operation.
A result could not be represented exactly in the destination format.

26568—Rev. 3.22—May 2018

AMD64 Technology

Class 2A — AVX / SSE Vector (SIMD 111111, VEX.L = 1)
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S
S

X
X
X
S
X
X

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC

S

X

A
SIMD floating-point, #XF

S

S

S
S
S
S
S
S
S

S
S
S
S
S
S
S

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Division by zero, ZE
Overflow, OE
Underflow, UE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

X
X
X
X
X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Division of finite dividend by zero-value divisor.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

897

AMD64 Technology

26568—Rev. 3.22—May 2018

Class 2A-1 — AVX / SSE Vector (SIMD 111011, VEX.L = 1)
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S

X
X
X
S
X

S

S

S

S

A
X

S

S

X

S
S
S
S
S
S

S
S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

898

X
X
X
X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

26568—Rev. 3.22—May 2018

AMD64 Technology

Class 2B — AVX / SSE Vector (SIMD 111111, VEX.vvvv != 1111b)
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S

X
X
X
S
X

S

S

S

S

A
X

S

S

X

S
S
S
S
S
S
S

S
S
S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Division by zero, ZE
Overflow, OE
Underflow, UE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

X
X
X
X
X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Division of finite dividend by zero-value divisor.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

899

AMD64 Technology

26568—Rev. 3.22—May 2018

Class 2B-1 — AVX / SSE Vector (SIMD 100000, VEX.vvvv != 1111b)
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S

X
X
X
S
X

S

S

S

S

A
X

S

S

X

Precision, PE
S
X — AVX and SSE exception
A — AVX exception
S — SSE exception

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions

900

X

A result could not be represented exactly in the destination format.

26568—Rev. 3.22—May 2018

AMD64 Technology

Class 2B-2 — AVX / SSE Vector (SIMD 100001, VEX.vvvv != 1111b)
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S

X
X
X
S
X

S

S

S

S

A
X

S

S

X

S
S
S

S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

X
X
X

A source operand was an SNaN value.
Undefined operation.
A result could not be represented exactly in the destination format.

901

AMD64 Technology

26568—Rev. 3.22—May 2018

Class 2B-3 — AVX / SSE Vector (SIMD 111011, VEX.vvvv != 1111b)
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S

X
X
X
S
X

S

S

S

S

A
X

S

S

X

S
S
S
S
S
S

S
S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

902

X
X
X
X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

26568—Rev. 3.22—May 2018

AMD64 Technology

Class 2B-4 — AVX / SSE Vector (SIMD 100011, VEX.vvvv != 1111b)
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
A
X

S

S

X

S
S
S
S

S
S
S
S

X
X
X
S
X

S

S

S

S

A
X

S

S

X

S
S
S
S

S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Non-aligned memory operand while MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

X
X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
A result could not be represented exactly in the destination format.

903

AMD64 Technology

26568—Rev. 3.22—May 2018

Class 3 — AVX / SSE Scalar (SIMD 111111)
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S

S
S
S
S
S

X
X
X
X
X
X

S

S

X

S
S
S
S
S
S
S

S
S
S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Division by zero, ZE
Overflow, OE
Underflow, UE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

904

X
X
X
X
X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Division of finite dividend by zero-value divisor.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

26568—Rev. 3.22—May 2018

AMD64 Technology

Class 3-1 — AVX / SSE Scalar (SIMD 111011)
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S

S
S
S
S
S

X
X
X
X
X
X

S

S

X

S
S
S
S
S
S

S
S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

X
X
X
X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

905

AMD64 Technology

26568—Rev. 3.22—May 2018

Class 3-2 — AVX / SSE Scalar (SIMD 000011)
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S

S
S
S
S
S

X
X
X
X
X
X

S

S

X

S
S
S

S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

906

X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.

26568—Rev. 3.22—May 2018

AMD64 Technology

Class 3-3 — AVX / SSE Scalar (SIMD 100000)
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S

S
S
S
S
S

X
X
X
X
X
X

S

S

X

Precision, PE
S
X — AVX and SSE exception
A — AVX exception
S — SSE exception

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
X

A result could not be represented exactly in the destination format.

907

AMD64 Technology

26568—Rev. 3.22—May 2018

Class 3-4 — AVX / SSE Scalar (SIMD 100001)
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S

S
S
S
S
S

X
X
X
X
X
X

S

S

X

S
S
S

S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

908

X
X
X

A source operand was an SNaN value.
Undefined operation.
A result could not be represented exactly in the destination format.

26568—Rev. 3.22—May 2018

AMD64 Technology

Class 3-5 — AVX / SSE Scalar (SIMD 100011)
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
X

S

S

X

S
S
S

S
S
S
S
S

X
X
X
X
X
X

S

S

X

S
S
S
S

S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

X
X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
A result could not be represented exactly in the destination format.

909

AMD64 Technology

26568—Rev. 3.22—May 2018

Class 3A — AVX / SSE Scalar (SIMD 111111, VEX.vvvv != 1111b)
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
A
X

S

S

X

S
S
S

S
S
S
S
S

X
X
X
X
X
X

S

S

X

S
S
S
S
S
S
S

S
S
S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Division by zero, ZE
Overflow, OE
Underflow, UE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

910

X
X
X
X
X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Division of finite dividend by zero-value divisor.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

26568—Rev. 3.22—May 2018

AMD64 Technology

Class 3A-1 — AVX / SSE Scalar (SIMD 000011, VEX.vvvv != 1111b)
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
A
X

S

S

X

S
S
S

S
S
S
S
S

X
X
X
X
X
X

S

S

X

S
S
S

S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.

911

AMD64 Technology

26568—Rev. 3.22—May 2018

Class 3A-2 — AVX / SSE Scalar (SIMD 100001, VEX.vvvv != 1111b)
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S

S

S
S
A
A
A
A
X

S

S

X

S
S
S

S
S
S
S
S

X
X
X
X
X
X

S

S

X

S
S
S

S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Precision, PE
X — AVX and SSE exception
A — AVX exception
S — SSE exception

912

X
X
X

A source operand was an SNaN value.
Undefined operation.
A result could not be represented exactly in the destination format.

26568—Rev. 3.22—May 2018

AMD64 Technology

Class 4 — AVX / SSE Vector
Exceptions
Exception

Mode
Real Virt Prot

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

X
A
S
S

X
A
S
S

X

S
S
S
S
S

S
S
S
S
S

S

S

S

S

A
X

S
S
A
A
A
X
X
X
X
S
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Memory operand not 16-byte aligned and MXCSR.MM = 0.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

913

AMD64 Technology

26568—Rev. 3.22—May 2018

Class 4A — AVX / SSE Vector (VEX.W = 1)
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

S

S

A
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

914

X
S
S
A
A
A
A
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

26568—Rev. 3.22—May 2018

AMD64 Technology

Class 4B — AVX / SSE Vector (VEX.L = 1)
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

S

S

A
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

X
S
S
A
A
A
A
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

915

AMD64 Technology

26568—Rev. 3.22—May 2018

Class 4B-X — SSE / AVX / AVX2 (VEX.L = 1 && !AVX2)
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

916

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

26568—Rev. 3.22—May 2018

AMD64 Technology

Class 4C — AVX / SSE Vector (VEX.vvvv != 1111b)
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

S

S

A
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

X
S
S
A
A
A
A
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

917

AMD64 Technology

26568—Rev. 3.22—May 2018

Class 4C-1 — AVX / SSE Vector (write to RO memory, VEX.vvvv != 1111b)
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S
S

S
S
S
S
S

Alignment check, #AC
S
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

918

X
S
S
A
A
A
A
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Write to a read-only data segment.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

26568—Rev. 3.22—May 2018

AMD64 Technology

Class 4D — AVX / SSE Vector (VEX.vvvv != 1111b, VEX.L = 1)
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

S

S

A
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

X
S
S
A
A
A
A
A
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

919

AMD64 Technology

26568—Rev. 3.22—May 2018

Class 4D-X — SSE / AVX / AVX2 Vector (VEX.vvvv != 1111b, (VEX.L = 1 && !AVX2))
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

920

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

26568—Rev. 3.22—May 2018

AMD64 Technology

Class 4E — AVX / SSE Vector (VEX.W = 1, VEX.L = 1)
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

S

S

A
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

X
S
S
A
A
A
A
A
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
VEX.L = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

921

AMD64 Technology

26568—Rev. 3.22—May 2018

Class 4E-X — SSE / AVX / AVX2 Vector (VEX.W = 1, (VEX.L = 1 && !AVX2))
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
A
X
X
X
X
X
S

Alignment check, #AC
A
Page fault, #PF
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

922

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

26568—Rev. 3.22—May 2018

AMD64 Technology

Class 4F — AVX / SSE (VEX.L = 1)
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S

S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Alignment check, #AC
Page fault, #PF
X — AVX and SSE exception
A — AVX exception
S — SSE exception

X
S
S
A
A
A
A
X
X
A
A
A
S
A
A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

923

AMD64 Technology

26568—Rev. 3.22—May 2018

Class 4F-X — SSE / AVX / AVX2 Vector (VEX.L = 1 && !AVX2)
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S

S
S

S

S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

X
S
S
A
A
A
A
X
X
A
A
A
S

Alignment check, #AC
A
Page fault, #PF
X — AVX, AVX2, and SSE exception
A — AVX and AVX2 exception
S — SSE exception

924

A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
When alignment checking enabled:
• 128-bit memory operand not 16-byte aligned.
• 256-bit memory operand not 32-byte aligned.
Instruction execution caused a page fault.

26568—Rev. 3.22—May 2018

AMD64 Technology

Class 4G — AVX Vector (VEX.W = 1, VEX.vvvv != 1111b)
Exceptions
Exception

Mode
Real Virt Prot
X
X
X
X

X
X
X
X

X
X
X
X

X
X
X
X

S

S

S

X

A
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Alignment check, #AC
Page fault, #PF
X — AVX exception

X
X
X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled
and MXCSR.MM = 1.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

925

AMD64 Technology

26568—Rev. 3.22—May 2018

Class 4H — AVX, 256-bit only (VEX.L = 0; No SIMD Exceptions)
Exceptions
Exception
Invalid opcode, #UD

Mode
Real Virt Prot
A
A
A
A

A
A
A
A

A
A
A
A

Device not available, #NM
Stack, #SS
General protection, #GP
Alignment check, #AC
Page fault, #PF
A — AVX2 exception

926

A
A
A
A

A
A
A
A

A

A
A
A
A
A
A
A
A
A
A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID
Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L= 0.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

26568—Rev. 3.22—May 2018

AMD64 Technology

Class 4H-1 — AVX2, 256-bit only (VEX.L = 0, VEX.vvvv != 1111b)
Exceptions
Exception

Mode
Real Virt Prot
A
A
A
A

A
A
A
A

A
A
A
A

A
A
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Alignment check, #AC
Page fault, #PF
A — AVX2 exception

A

A
A
A
A
A
A
A
A
A
A
A
A
A
A
A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L= 0.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not 16-byte aligned when alignment checking enabled.
Instruction execution caused a page fault.

927

AMD64 Technology

26568—Rev. 3.22—May 2018

Class 4J — AVX2 (VEX.W = 1)
Exceptions
Exception

Mode
Real Virt Prot
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

A
A

A
A
A
A
A
A
A
A
A
A

Alignment check, #AC

A

Page fault, #PF
A — AVX2 exception

A

928

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

26568—Rev. 3.22—May 2018

AMD64 Technology

Class 4K — AVX2
Exceptions
Exception

Mode
Real Virt Prot
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

A
A

A
A
A
A
A
A
A
A
A

Alignment check, #AC

A

Page fault, #PF
A — AVX2 exception

A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

929

AMD64 Technology

26568—Rev. 3.22—May 2018

Class 5 — AVX / SSE Scalar
Exceptions
Exception

Mode
Real Virt Prot

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — AVX and SSE exception
A — AVX exception
S — SSE exception

930

X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S
S
S

X
S
S
A
A
A
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

26568—Rev. 3.22—May 2018

AMD64 Technology

Class 5A — AVX / SSE Scalar (VEX.L = 1)
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — AVX and SSE exception
A — AVX exception
S — SSE exception

S
S

X
S
S
A
A
A
A
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

931

AMD64 Technology

26568—Rev. 3.22—May 2018

Class 5B — AVX / SSE Scalar (VEX.vvvv != 1111b)
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — AVX and SSE exception
A — AVX exception
S — SSE exception

932

S
S

X
S
S
A
A
A
A
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference with alignment checking enabled.

26568—Rev. 3.22—May 2018

AMD64 Technology

Class 5C — AVX /SSE Scalar (VEX.vvvv != 1111b, VEX.L = 1)
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — AVX and SSE exception
A — AVX exception
S — SSE exception

S
S

X
S
S
A
A
A
A
A
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

933

AMD64 Technology

26568—Rev. 3.22—May 2018

Class 5C-X — SSE / AVX / AVX2 Scalar (VEX.vvvv != 1111b, (VEX.L = 1 && !AVX2))
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S

S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP

Page fault, #PF
S
Alignment check, #AC
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

934

X
S
S
A
A
A
A
A
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

26568—Rev. 3.22—May 2018

AMD64 Technology

Class 5C-1 — AVX / SSE Scalar (write to RO memory, VEX.vvvv != 1111b, VEX.L = 1)
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S
S

S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — AVX and SSE exception
A — AVX exception
S — SSE exception

S
S

X
S
S
A
A
A
A
A
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Write to a read-only data segment.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

935

AMD64 Technology

26568—Rev. 3.22—May 2018

Class 5D — AVX / SSE Scalar (write to RO memory, VEX.vvvv != 1111b (variant))
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S
S

S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — AVX and SSE exception
A — AVX exception
S — SSE exception

936

S
S

X
S
S
A
A
A
A
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b (for memory destination enoding only).
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Write to a read-only data segment.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

26568—Rev. 3.22—May 2018

AMD64 Technology

Class 5E — AVX / SSE Scalar (write to RO, VEX.vvvv != 1111b (variant), VEX.L = 1)
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

S
S
S
S
S

S
S
S
S
S

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — AVX and SSE exception
A — AVX exception
S — SSE exception

S
S

X
S
S
A
A
A
A
A
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b (for memory destination encoding only).
VEX.L = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Write to a read-only data segment.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

937

AMD64 Technology

26568—Rev. 3.22—May 2018

Class 6 — AVX Mixed Memory Argument
Exceptions
Exception

Mode
Real Virt Prot
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
A — AVX exception.

938

A
A
A
A
A
A
A
A
A
A
A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

26568—Rev. 3.22—May 2018

AMD64 Technology

Class 6A — AVX Mixed Memory Argument (VEX.W = 1)
Exceptions
Exception

Mode
Real Virt Prot
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
A — AVX exception.

A
A
A
A
A
A
A
A
A
A
A
A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

939

AMD64 Technology

26568—Rev. 3.22—May 2018

Class 6A-1 — AVX Mixed Memory Argument (write to RO memory, VEX.W = 1)
Exceptions
Exception

Mode
Real Virt Prot
A
A

A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
S
Page fault, #PF
A — AVX exception.

940

S

A
A
A
A
A
A
A
A
A
X
A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Write to a read-only data segment.
Instruction execution caused a page fault.

26568—Rev. 3.22—May 2018

AMD64 Technology

Class 6B — AVX Mixed Memory Argument (VEX.W = 1, VEX.L = 0)
Exceptions
Exception

Mode
Real Virt Prot
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
A — AVX exception.

A
A
A
A
A
A
A
A
A
A
A
A
A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
VEX.L = 0.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

941

AMD64 Technology

26568—Rev. 3.22—May 2018

Class 6B-1 — AVX Mixed Memory Argument (write to RO, VEX.W = 1, VEX.L = 0)
Exceptions
Exception

Mode
Real Virt Prot
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
A — AVX exception.

942

A
A
A
A
A
A
A
A
A
A
A
A
A
A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
VEX.L = 0.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Write to a read-only data segment.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

26568—Rev. 3.22—May 2018

AMD64 Technology

Class 6C — AVX Mixed Memory Argument (VEX.W = 1, VEX.L = 0, VEX.vvvv != 1111b)
Exceptions
Exception

Mode
Real Virt Prot
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
A — AVX exception.

A
A
A
A
A
A
A
A
A
A
A
A
A
A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
VEX.vvvv ! = 1111b.
VEX.L = 0.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

943

AMD64 Technology

26568—Rev. 3.22—May 2018

Class 6C-X — AVX / AVX2 (W=1, vvvv!=1111b, L=0, (reg src op specified && !AVX2))
Exceptions
Exception

Mode
Real Virt Prot
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
A — AVX, AVX2 exception.

944

A
A
A
A
A
A
A
A
A
A
A
A
A
A
A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
VEX.vvvv ! = 1111b.
VEX.L = 0.
Register-based source operand specified when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

26568—Rev. 3.22—May 2018

AMD64 Technology

Class 6D — AVX Mixed Memory Argument (VEX.W = 1, VEX.vvvv != 1111b)
Exceptions
Exception

Mode
Real Virt Prot
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
A — AVX exception.

A
A
A
A
A
A
A
A
A
A
A
A
A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

945

AMD64 Technology

26568—Rev. 3.22—May 2018

Class 6D-X — AVX / AVX2 (W = 1, vvvv != 1111b, (ModRM.mod = 11b && !AVX2))
Exceptions
Exception

Mode
Real Virt Prot
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
A — AVX, AVX2 exception.

946

A
A
A
A
A
A
A
A
A
A
A
A
A
A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
VEX.vvvv ! = 1111b.
MODRM.mod = 11b when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

26568—Rev. 3.22—May 2018

AMD64 Technology

Class 6E — AVX Mixed Memory Argument (VEX.W = 1, VEX.vvvv != 1111b (variant))
Exceptions
Exception

Mode
Real Virt Prot
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
A — AVX exception.

A
A
A
A
A
A
A
A
A
A
A
A
A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
VEX.vvvv ! = 1111b (for versions with immediate byte operand only).
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

947

AMD64 Technology

26568—Rev. 3.22—May 2018

Class 6F — AVX2 (VEX.W = 1, VEX.vvvv != 1111b, VEX.L = 0, ModRM.mod = 11b)
Exceptions
Exception

Mode
Real Virt Prot
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
A — AVX exception.

948

A
A
A
A
A
A
A
A
A
A
A
A
A
A
A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
VEX.vvvv ! = 1111b.
VEX.L = 0.
Register-based source operand specified (MODRM.mod = 11b)
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

26568—Rev. 3.22—May 2018

AMD64 Technology

Class 7 — AVX / SSE No Memory Argument
Exceptions
Exception

Mode
Real Virt Prot

Invalid opcode, #UD

Device not available, #NM
X — AVX and SSE exception
A — AVX exception
S — SSE exception

X
A
S
S

X
A
S
S

X
S

X
S

X
S
S
A
A
A
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.

949

AMD64 Technology

26568—Rev. 3.22—May 2018

Class 7A — AVX /SSE No Memory Argument (VEX.L = 1)
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

X
Device not available, #NM
S
X — AVX and SSE exception
A — AVX exception
S — SSE exception

X
S

Invalid opcode, #UD

950

X
S
S
A
A
A
A
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.

26568—Rev. 3.22—May 2018

AMD64 Technology

Class 7A-X SSE / AVX / AVX2 Vector (VEX.L = 1 && !AVX2)
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

Invalid opcode, #UD

X
X
Device not available, #NM
S
S
X — SSE, AVX, and AVX2 exception
A — AVX, AVX2 exception
S — SSE exception

X
S
S
A
A
A
A
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.L = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.

951

AMD64 Technology

26568—Rev. 3.22—May 2018

Class 7B — AVX /SSE No Memory Argument (VEX.vvvv != 1111b)
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

X
Device not available, #NM
S
X — AVX and SSE exception
A — AVX exception
S — SSE exception

X
S

Invalid opcode, #UD

952

X
S
S
A
A
A
A
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.

26568—Rev. 3.22—May 2018

AMD64 Technology

Class 7C — AVX / SSE No Memory Argument (VEX.vvvv != 1111b, VEX.L = 1)
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

X
Device not available, #NM
S
X — AVX and SSE exception
A — AVX exception
S — SSE exception

X
S

Invalid opcode, #UD

X
S
S
A
A
A
A
A
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv field ! = 1111b.
VEX.L field = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.

953

AMD64 Technology

26568—Rev. 3.22—May 2018

Class 7C-X SSE / AVX / AVX2 Vector (VEX.vvvv != 1111b, (VEX.L = 1 && !AVX2))
Exceptions
Exception

Mode
Real Virt Prot
X
A
S
S

X
A
S
S

Invalid opcode, #UD

X
X
Device not available, #NM
S
S
X — SSE, AVX and AVX2 exception
A — AVX, AVX2exception
S — SSE exception

954

X
S
S
A
A
A
A
A
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR0.EM = 1.
CR4.OSFXSR = 0.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv field ! = 1111b.
VEX.L field = 1 when AVX2 not supported.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.

26568—Rev. 3.22—May 2018

AMD64 Technology

Class 8 — AVX No Memory Argument (VEX.vvvv != 1111b, VEX.W = 1)
Exceptions
Exception

Mode
Real Virt Prot
A
A

Invalid opcode, #UD

Device not available, #NM
A — AVX exception.

A
A
A
A
A
A
A
A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W = 1.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.

955

AMD64 Technology

26568—Rev. 3.22—May 2018

Class 9 — AVX 4-byte Argument (write to RO memory, VEX.vvvv != 1111b, VEX.L = 1)
Exceptions
Exception

Mode
Real Virt Prot

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — AVX and SSE exception
A — AVX exception
S — SSE exception

956

X
A

X
A

S
S

S
S

X
S
S
S
S
S

X
S
S
S
S
S
S

X
A
S
S
A
A
A
A
X
X
X
X
X
S
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
CR0.EM = 1.
CR4.OSFXSR = 0.
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Write to a read-only data segment.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

26568—Rev. 3.22—May 2018

AMD64 Technology

Class 9A — AVX 4-byte argument (reserved MBZ = 1, VEX.vvvv != 1111b, VEX.L = 1)
Exceptions
Exception

Mode
Real Virt Prot

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — AVX and SSE exception
A — AVX exception
S — SSE exception

X
A

X
A

S
S

S
S

X
S
S
S
S
S

X
S
S
S
S
S
S

X
A
S
S
A
A
A
A
X
X
X
X
S
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
CR0.EM = 1.
CR4.OSFXSR = 0.
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.vvvv ! = 1111b.
VEX.L = 1.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Attempt to load non-zero values into reserved MXCSR bits
Instruction execution caused a page fault.
Unaligned memory reference when alignment checking enabled.

957

AMD64 Technology

26568—Rev. 3.22—May 2018

Class 10 — XOP Base
Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

958

X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

26568—Rev. 3.22—May 2018

AMD64 Technology

Class 10A — XOP Base (XOP.L = 1)
Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

X
X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.L = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

959

AMD64 Technology

26568—Rev. 3.22—May 2018

Class 10B — XOP Base (XOP.W = 1, XOP.L = 1)
Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

960

X
X
X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.W = 1.
XOP.L = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

26568—Rev. 3.22—May 2018

AMD64 Technology

Class 10C — XOP Base (XOP.W = 1, XOP.vvvv != 1111b, XOP.L = 1)
Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

X
X
X
X
A
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.W = 1.
XOP.vvvv ! = 1111b.
XOP.L = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

961

AMD64 Technology

26568—Rev. 3.22—May 2018

Class 10D — XOP Base (SIMD 110011, XOP.vvvv != 1111b, XOP.W = 1)
Exceptions
Exception

Mode
Real Virt Prot
X
X

X
X
X
X
X
X

Invalid opcode, #UD

X
Device not available, #NM
Stack, #SS

X
X
X
X
X
X

General protection, #GP
Page fault, #PF
Alignment check, #AC
SIMD floating-point, #XF

S

S

X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.W = 1.
XOP.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.
See SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Underflow, UE
Precision, PE
X — XOP exception

962

X
X
X
X
X

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

26568—Rev. 3.22—May 2018

AMD64 Technology

Class 10E — XOP Base (XOP.vvvv != 1111b (variant), XOP.L = 1)
Exceptions
Exception

Mode
Real Virt Prot
X
X

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
Alignment check, #AC
X — XOP exception

X
X
X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
XOP instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
XOP.vvvv ! = 1111b (for immediate operand variant only)
XOP.L field = 1.
REX, F2, F3, or 66 prefix preceding XOP prefix.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.

963

AMD64 Technology

26568—Rev. 3.22—May 2018

Class 11 — F16C Instructions
Exceptions
Exception

Mode
Real Virt Prot
F
F

F
F

Invalid opcode, #UD

F
F
A
F
F
F

Device not available, #NM
Stack, #SS
General protection, #GP
Alignment check, #AC
Page fault, #PF
SIMD Floating-Point
Exception, #XF

F
F
F
F
F
F
F

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID
Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
VEX.W field = 1.
VEX.vvvv ! = 1111b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Unaligned memory reference when alignment checking enabled.
Instruction execution caused a page fault.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid-operation exception
(IE)

F
F

A source operand was an SNaN value.
Undefined operation.

Denormalized-operand
exception (DE)
Overflow exception (OE)
Underflow exception (UE)
Precision exception (PE)
F — F16C exception.

F

A source operand was a denormal value.

F
F
F

Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

964

26568—Rev. 3.22—May 2018

AMD64 Technology

Class 12 — AVX2 VSID (ModRM.mod = 11b, ModRM.rm != 100b)
Exceptions
Exception

Mode
Real Virt Prot
A
A

A
A

A
A
A
A
A
A
A
A
A
A
A

Invalid opcode, #UD

Device not available, #NM
Stack, #SS
General protection, #GP
Alignment check, #AC
Page fault, #PF
A — AVX2 exception

A

A
A

A

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
AVX instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
MODRM.mod = 11b
MODRM.rm ! = 100b
YMM/XMM registers specified for destination, mask, and index not unique.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Alignment checking enabled and:
256-bit memory operand not 32-byte aligned or
128-bit memory operand not 16-byte aligned.
Instruction execution caused a page fault.

965

AMD64 Technology

26568—Rev. 3.22—May 2018

Class FMA-2 — FMA / FMA4 Vector (SIMD Exceptions PE, UE, OE, DE, IE)
Exceptions
Exception

Mode
Real Virt Prot
F
F

Invalid opcode, #UD

F
F
F
F
F
F

Device not available, #NM
Stack, #SS

Page fault, #PF
Alignment check, #AC

F
F
F
F
F
F

SIMD floating-point, #XF

F

General protection, #GP

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
FMA instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Memory operand not 16-byte aligned when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
F — FMA, FMA4 exception

966

F
F
F
F
F
F

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

26568—Rev. 3.22—May 2018

AMD64 Technology

Class FMA-3 — FMA / FMA4 Scalar (SIMD Exceptions PE, UE, OE, DE, IE)
Exceptions
Exception

Mode
Real Virt Prot
F
F

Invalid opcode, #UD

F
F
F
F
F
F

Device not available, #NM
Stack, #SS

Page fault, #PF
Alignment check, #AC

F
F
F
F
F
F

SIMD floating-point, #XF

F

General protection, #GP

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
FMA instructions are only recognized in protected mode.
CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE].
XFEATURE_ENABLED_MASK[2:1] ! = 11b.
REX, F2, F3, or 66 prefix preceding VEX prefix.
Lock prefix (F0h) preceding opcode.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0,
see SIMD Floating-Point Exceptions below for details.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Instruction execution caused a page fault.
Non-aligned memory reference when alignment checking enabled.
Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1,
see SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions
Invalid operation, IE
Denormalized operand, DE
Overflow, OE
Underflow, UE
Precision, PE
F — FMA, FMA4 exception

F
F
F
F
F
F

A source operand was an SNaN value.
Undefined operation.
A source operand was a denormal value.
Rounded result too large to fit into the format of the destination operand.
Rounded result too small to fit into the format of the destination operand.
A result could not be represented exactly in the destination format.

967

AMD64 Technology

26568—Rev. 3.22—May 2018

XGETBV
Exceptions
Exception
Invalid opcode, #UD
General protection, #GP
X — exception generated

968

Mode
Real Virt Prot
X
X
X

X
X
X

X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
Lock prefix (F0h) preceding opcode.
ECX specifies a reserved or unimplemented XCR address.

26568—Rev. 3.22—May 2018

AMD64 Technology

XRSTOR
Exceptions
Exception
Invalid opcode, #UD
Device not available, #NM
Stack, #SS

General protection, #GP

Page fault, #PF
X — exception generated

Mode
Real Virt Prot
X
X
X
X
X
X
X
X
X
X
X

X
X
X
X
X
X
X
X
X
X
X

X
X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
CR4.OSFXSR = 0.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not aligned on 64-byte boundary.
Any must be zero (MBZ) bits in the save area were set.
Attempt to set reserved bits in MXCSR.
Instruction execution caused a page fault.

969

AMD64 Technology

26568—Rev. 3.22—May 2018

XSAVE/XSAVEOPT
Exceptions
Exception
Invalid opcode, #UD
Device not available, #NM
Stack, #SS
General protection, #GP
Page fault, #PF
X — exception generated

970

Mode
Real Virt Prot
X
X
X
X
X
X
X
X
X
X

X
X
X
X
X
X
X
X
X
X

X
X
X
X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
CR4.OSFXSR = 0.
Lock prefix (F0h) preceding opcode.
CR0.TS = 1.
Memory address exceeding stack segment limit or non-canonical.
Memory address exceeding data segment limit or non-canonical.
Null data segment used to reference memory.
Memory operand not aligned on 64-byte boundary.
Attempt to write read-only memory.
Instruction execution caused a page fault.

26568—Rev. 3.22—May 2018

AMD64 Technology

XSETBV
Exceptions
Exception
Invalid opcode, #UD

General protection, #GP

Mode
Real Virt Prot
X
X
X
X
X
X
X

X
X
X
X
X
X
X

X
X
X
X
X
X
X

Cause of Exception
Instruction not supported, as indicated by CPUID feature identifier.
CR4.OSFXSR = 0.
Lock prefix (F0h) preceding opcode.
CPL != 0.
ECX specifies a reserved or unimplemented XCR address.
Any must be zero (MBZ) bits in the save area were set.
Writing 0 to XCR0.

X — exception generated
Note:
In virtual mode, only #UD for Instruction not supported and #GP for CPL != 0 are supported.

971

AMD64 Technology

972

26568—Rev. 3.22—May 2018

26568—Rev. 3.22—May 2018

AMD64 Technology

Appendix A AES Instructions
This appendix gives background information concerning the use of the AES instruction subset in the
implementation of encryption compliant to the Advanced Encryption Standard (AES).

A.1

AES Overview

This section provides an overview of AMD64 instructions that support AES software implementation.
The U.S. National Institute of Standards and Technology has adopted the Rijndael algorithm, a block
cipher that processes 16-byte data blocks using a shared key of variable length, as the Advanced
Encryption Standard (AES). The standard is defined in Federal Information Processing Standards
Publication 197 (FIPS 197), Specification for the Advanced Encryption Standard (AES). There are
three versions of the algorithm, based on key widths of 16 (AES-128), 24 (AES-192), and 32 (AES256) bytes.
The following AMD64 instructions support AES implementation:
•
•
•

AESDEC/VAESDEC and AESDECLAST/VAESDECLAST
Perform one round of AES decryption
AESENC/VAESENC and AESENCLAST/VAESENCLAST
Perform one round of AES encryption
AESIMC/VAESIMC
Perform the AES InvMixColumn transformation
- AESKEYGENASSIST/VAESKEYGENASSIST
Assist AES round key generation
- PCLMULQDQ, VPCLMULQDQ
Perform carry-less multiplication

See Chapter 2, “Instruction Reference” for detailed descriptions of the instructions.

A.2

Coding Conventions

This overview uses descriptive code that has the following basic characteristics.
•
•

Syntax and notation based on the C language
Four numerical data types:
- bool: The numbers 0 and 1, the values of the Boolean constants false and true
- nat: The infinite set of all natural numbers, including bool as a subtype
- int: The infinite set of all integers, including nat as a subtype
- rat: The infinite set of all rational numbers, including int as a subtype

973

AMD64 Technology

•
•
•
•
•
•

26568—Rev. 3.22—May 2018

Standard logical and arithmetic operators
Enumeration (enum) types, arrays, structures (struct), and union types
Global and local variable and constant declarations, initializations, and assignments
Standard control constructs (if, then, else, for, while, switch, break, and continue)
Function subroutines
Macro definitions (#define)

A.3

AES Data Structures

The AES instructions operate on 16-byte blocks of text called the state. Each block is represented as a
4 × 4 matrix of bytes which is assigned the Galois field matrix data type (GFMatrix). In the AMD64
implementation, the matrices are formatted as 16-byte vectors in XMM registers or 128-bit memory
locations. This overview represents each matrix as a sequence of 16 bytes in little-endian format (least
significant byte on the right and most significant byte on the left).
Figure A-1 shows a state block in 4 × 4 matrix representation.

GFMatrix =

X3,0 X2,0 X1,0 X0,0
X3,1 X2,1 X1,1 X0,1
X3,2 X2,2 X1,2 X0,2
X3,3 X2,3 X1,3 X0,3

Figure A-1. GFMatrix Representation of 16-byte Block
Figure A-2 shows the AMD64 AES format, with the corresponding mapping of FIPS 197 AES
“words” to operand bytes.
XMM Register or 128-bit Memory Operand
127 120119112111104103 96 95 88 87 80 79 72 71 64 63 56 55 48 47 40 39 32 31 24 23 16 15

87

0














X3,3 X2,3 X1,3 X0,3 X3,2 X2,2 X1,2 X0,2 X3,1 X2,1 X1,1 X0,1 X3,0 X2,0 X1,0 X0,0
AES Word 3

AES Word 2

AES Word 1

AES Word 0

Figure A-2. GFMatrix to Operand Byte Mappings

A.4

Algebraic Preliminaries

AES operations are based on the Galois field GF = GF(28), of order 256, constructed by adjoining a
root of the irreducible polynomial

974

26568—Rev. 3.22—May 2018

AMD64 Technology

p(X) = X8 + X4 + X3 + X + 1
to the field of two elements, 2. Equivalently, GF is the quotient field 2[X]/p(X) and thus may be
viewed as the set of all polynomials of degree less than 8 in 2[X] with the operations of addition and
multiplication modulo p(X). These operations may be implemented efficiently by exploiting the
mapping from 2[X] to the natural numbers given by
anXn + … + a1X+a0 → 2nan + … + 2a1 + a0 → an … a1a0b
For example:
1 → 01h
X → 02h
X2 → 04h
X4 + X3 + 1 → 19h
p(X) → 11Bh
Thus, each element of GF is identified with a unique byte. This overview uses the data type GF256 as
an alias of nat, to identify variables that are to be thought of as elements of GF.
The operations of addition and multiplication in GF are denoted by ⊕ and
of characteristic 2, addition is simply the “exclusive or” operation:

, respectively. Since

2 is

x ⊕ y = x^ y
In particular, every element of GF is its own additive inverse.
Multiplication in GF may be computed as a sequence of additions and multiplications by 2. Note that
this operation may be viewed as multiplication in 2[X] followed by a possible reduction modulo p(X).
Since 2 corresponds to the polynomial X and 11B corresponds to p(X), for any x ∈ GF,

2

 x << 1
x= 
 (x << 1) ⊕ 11Bh

if x < 80h
if x ≥ 80h

Now, if y = b7…b1b0b, then
x

y=2

(…(2

(2

(b7

x) ⊕ b6

x ) ⊕ b5

x) …b0.

This computation is performed by the GFMul( ) function.

A.4.1 Multiplication in the Field GF
The GFMul( ) function operates on GF256 elements in SRC1 and SRC2 and returns a GF256 matrix
in the destination.
GF256 GFMul(GF256 x, GF256 y) {
nat sum = 0;

975

AMD64 Technology

}

26568—Rev. 3.22—May 2018

for (int i=7; i>=0; i--) {
// Multiply sum by 2. This amounts to a shift followed
// by reduction mod 0x11B:
sum <<= 1;
if (sum > 0xFF) {sum = sum ^ 0x11B;}
// Add y[i]*x:
if (y[i]) {sum = sum ^ x;}
}
return sum;

Because the multiplicative group GF* is of order 255, the inverse of an element x of GF may be
computed by repeated multiplication as x--1 = x254. A more efficient computation, however, is
performed by the GFInv( ) function as an application of Euclid’s greatest common divisor algorithm.
See Section A.11, “Computation of GFInv with Euclidean Greatest Common Divisor” for an analysis
of this computation and the GFInv( ) function.
The AES algorithms operate on the vector space GF4, of dimension 4 over GF, which is represented by
the array type GFWord. FIPS 197 refers to an object of this type as a word. This overview uses the
term GF word in order to avoid confusion with the AMD64 notion of a 16-bit word.
A GFMatrix is an array of four GF words, which are viewed as the rows of a 4 × 4 matrix over GF.
The field operation symbols ⊕ and
are used to denote addition and multiplication of matrices over
GF as well. The GFMatrixMul( ) function computes the product A B of 4 × 4 matrices.

A.4.2 Multiplication of 4x4 Matrices Over GF

, GFMatrix GFMatrixMul(GFMatrix a, GFMatrix b) {
GFMatrix c;
for (nat i=0; i<4; i++) {
for (nat j=0; j<4; j++) {
c[i][j] = 0;
for (nat k=0; k<4; k++) {
c[i][j] = c[i][j] ^ GFMul(a[i][k], b[k][j]);
}
}
}
return c;
}

A.5

AES Operations

The AES encryption and decryption procedures may be specified as follows, in terms of a set of basic
operations that are defined later in this section. See the alphabetic instruction reference for detailed
descriptions of the instructions that are used to implement the procedures.
Call the Encrypt or Decrypt procedure, which pass the same expanded key to the functions
TextBlock Cipher(TextBlock in, ExpandedKey w, nat Nk)
and

976

26568—Rev. 3.22—May 2018

AMD64 Technology

TextBlock InvCipher(TextBlock in, ExpandedKey w, nat Nk)
In both cases, the input text is converted by
GFMatrix Text2Matrix(TextBlock A)
to a matrix, which becomes the initial state of the process. This state is transformed through the
sequence of Nr + 1 rounds and ultimately converted back to a linear array by
TextBlock Matrix2Text(GFMatrix M).
In each round i, the round key Ki is extracted from the expanded key w and added to the state by
GFMatrix AddRoundKey(GFMatrix state, ExpandedKey w, nat round).
Note that AddRoundKey does not explicitly construct Ki , but operates directly on the bytes of w.
The rounds of Cipher are numbered 0,…Nr . Let X be the initial state an an execution, i.e., the input in
matrix format, let Si be the state produced by round i, and let Y = SNr be the final state. Let Σ , R , and C
denote the operations performed by SubBytes, ShiftRows, MixColumns, respectively. Then
The initial round is a simple addition:

Each of the next Nr + 1 rounds is a composition of four operations:
for

The MixColumns transformation is omitted from the final round:

Composing these expressions yields

Note that the rounds of InvCipher are numbered in reverse order, Nr ,…,0. If Ʃ’ and Y’ are the initial
and final states and S’i is the state following round i , then

977

AMD64 Technology

26568—Rev. 3.22—May 2018

for

Composing these expressions yields

In order to show that InvCipher is the inverse of Cipher, it is only necessary to combine these
expanded expressions by replacing X’ with Y and cancel inverse operations to yield Y’ = X.

A.5.1 Sequence of Operations
•
•
•
1.
2.
3.
4.

Use predefined SBox and InvSBox matrices or initialize the matrices using the ComputeSBox
and ComputeInvSBox functions.
Call the Encrypt or Decrypt procedure.
For the Encrypt procedure:
Load the input TextBlock and CipherKey.
Expand the cipher key using the KeyExpansion function.
Call the Cipher function to perform the number of rounds determined by the cipher key length.
Perform round entry operations.
a. Convert input text block to state matrix using the Text2Matrix function.
b. Combine state and round key bytes by bitwise XOR using the AddRoundKey function.

5. Perform round iteration operations.
a. Replace each state byte with another by non-linear substitution using the SubBytes function.
b. Shift each row of the state cyclically using the ShiftRows function.
c. Combine the four bytes in each column of the state using the MixColumns function.
d. Perform AddRoundKey.

6. Perform round exit operations.
a. Perform SubBytes.
b. Perform ShiftRows.
c. Perform AddRoundKey.
d. Convert state matrix to output text block using the Matrix2Text function and return TextBlock.

• For the Decrypt procedure:
1. Load the input TextBlock and CipherKey.

978

26568—Rev. 3.22—May 2018

AMD64 Technology

2. Expand the cipher key using the KeyExpansion function.
3. Call the InvCipher function to perform the number of rounds determined by the cipher key
length.
4. Perform round entry operations.
a. Convert input text block to state matrix using the Text2Matrix function.
b. Combine state and round key bytes by bitwise XOR using the AddRoundKey function.

5. Perform round iteration operations.
a. Shift each row of the state cyclically using the InvShiftRows function.
b. Replace each state byte with another by non-linear substitution using the InvSubBytes function.
c. Perform AddRoundKey.
d. Combine the four bytes in each column of the state using the InvMixColumns function.

6. Perform round exit operations.
a. Perform InvShiftRows.
b. Perform InvSubBytes (InvSubWord).
c. Perform AddRoundKey.
d. Convert state matrix to output text block using the Matrix2Text function and return TextBlock.

A.6

Initializing the Sbox and InvSBox Matrices

The AES makes use of a bijective mapping σ : GF → GF, which is encoded, along with its inverse
mapping, in the 16 × 16 arrays SBox (for encryption) and InvSBox (for decryption), as follows:
for all x ∈ G,
σ(x) = SBox[x[7:4], x[3:0]]
and
σ−1(x) = InvSBox[x[7:4], x[3:0]]
While the FIPS 197 standard defines the contents of the SBox[ ] and InvSbox [ ] matrices, the
matrices may also be initialized algebraically (and algorithmically) by means of the ComputeSBox( )
and ComputeInvSBox( ) functions, discussed below.
The bijective mappings for encryption and decryption are computed by the SubByte( ) and
InvSubByte ( ) functions, respectively:
SubByte( ) computation:
GF256 SubByte(GF256 x) {
return SBox[x[7:4]][x[3:0]];
}

InvSubByte ( ) computation:
GF256 InvSubByte(GF256 x) {
return InvSBox[x[7:4]][x[3:0]];
}

979

AMD64 Technology

26568—Rev. 3.22—May 2018

A.6.1 Computation of SBox and InvSBox
Computation of SBox and InvSBox elements has a direct relationship to the cryptographic properties
of the AES, but not to the algorithms that use the tables. Readers who prefer to view σ as a primitive
operation may skip the remainder of this section.
The algorithmic definition of the bijective mapping σ is based on the consideration of GF as an
8-dimensional vector space over the subfield 2. Let ϕ be a linear operator on this vector space and let
M = [aij] be the matrix representation of ϕ with respect to the ordered basis {1, 2, 4, 10, 20, 40, 80}.
Then ϕ may be encoded concisely as an array of bytes A of dimension 8, each entry of which is the
concatenation of the corresponding row of M:
A[i] = ai8 ai7…ai0
This expression may be represented algorithmically by means of the ApplyLinearOp( ) function,
which applies a linear operator to an element of GF. The ApplyLinear Op( ) function is used in the
initialization of both the sBox[] and InvSBox[ ] matrices.
// The following function takes the array A representing a linear operator phi and
// an element x of G and returns phi(x):
GF256 ApplyLinearOp(GF256 A[8], GF256 x) {
GF256 result = 0;
for (nat i=0; i<8; i++) {
bool sum = 0;
for (nat j=0; j<8; j++) {
sum = sum ^ (A[i][j] & x[j]);
}
result[i] = sum;
}
return result;
}

The definition of σ involves the linear operator ϕ with matrix

In this case,
A = {F1, E3, C7, 8F, 1F, 3E, 7C, F8}.
Initialization of SBox[ ]
The mapping σ : G → G is defined by

980

26568—Rev. 3.22—May 2018

AMD64 Technology

σ(x) = ϕ (x–1) ⊕ 63
This computation is performed by ComputeSBox( ).
ComputeSBox( )
GF256[16][16] ComputeSBox() {
GF256 result[16][16];
GF256 A[8] = {0xF1, 0xE3, 0xC7, 0x8F, 0x1F, 0x3E, 0x7C, 0xF8};
for (nat i=0; i<16; i++) {
for (nat j=0; j<16; j++) {
GF256 x = (i << 4) | j;
result[i][j] = ApplyLinearOp(A, GFInv(x)) ^ 0x63;
}
}
return result;
}
const GF256 SBox[16][16] = ComputeSBox();

Table A-1 shows the resulting SBox[ ], as defined in FIPS 197.

981

AMD64 Technology

26568—Rev. 3.22—May 2018

Table A-1. SBox Definition
S[3:0]

S[7:4]

0

1

2

3

4

5

6

7

8

9

a

b

c

d

e

f

0

63

7c

77

7b

f2

6b

6f

c5

30

01

67

2b

fe

d7

ab

76

1

ca

82

c9

7d

fa

59

47

f0

ad

d4

a2

af

9c

a4

72

c0

2

b7

fd

93

26

36

3f

f7

cc

34

a5

e5

f1

71

d8

31

a5

3

04

c7

23

c3

18

96

05

9a

07

12

80

e2

eb

27

b2

75

4

09

83

2c

1a

1b

6e

5a

a0

52

3b

d6

b3

29

e3

2f

84

5

53

d1

00

ed

20

fc

b1

5b

6a

cb

be

39

4a

4c

58

cf

6

d0

ef

aa

fb

43

4d

33

85

45

f9

02

7f

50

3c

9f

a8

7

51

a3

40

8f

92

9d

38

f5

bc

b6

da

21

10

ff

f3

d2

8

cd

0c

13

ec

5f

97

44

17

c4

a7

7e

3d

64

5d

19

73

9

60

81

4f

dc

22

2a

90

88

46

ee

b8

14

de

5e

0b

db

a

e0

32

3a

0a

49

06

24

5c

c2

d3

ac

62

91

95

e4

79

b

e7

c8

37

6d

8d

d5

4e

a9

6c

56

f4

ea

65

7a

ae

08

c

ba

78

25

2e

1c

a6

b4

c6

e8

dd

74

1f

4b

bd

8b

8a

d

70

3e

b5

66

48

03

f6

0e

61

35

57

b9

86

c1

1d

9e

e

e1

f8

98

11

69

d9

8e

94

9b

1e

87

e9

ce

55

28

df

f

8c

a1

89

0d

bf

e6

42

68

41

99

2d

0f

b0

54

bb

16

A.6.2 Initialization of InvSBox[ ]
A straightforward calculation confirms that the matrix M is nonsingular with inverse.
Thus, ϕ is invertible and ϕ–1 is encoded as the array

M–1 =

0
1
0
1
0
0
1
0

0
0
1
0
1
0
0
1

1
0
0
1
0
1
0
0

0
1
0
0
1
0
1
0

0
0
1
0
0
1
0
1

1
0
0
1
0
0
1
0

0
1
0
0
1
0
0
1

1
0
1
0
0
1
0
0

B = {A4, 49, 92, 25, 4A, 94, 29, 52}.
If y = σ(x), then

982

26568—Rev. 3.22—May 2018

AMD64 Technology

(ϕ-1((y) ⊕ 5) –1= (ϕ-1(y ⊕ ϕ(5))–1
= (ϕ-1(y ⊕ 63))–1
= (ϕ-1(ϕ(x–1) ⊕ 63 ⊕ 63))–1
= (ϕ-1(ϕ(x–1)))–1
= x,
and σ is a permutation of GF with
σ-1(y) = (ϕ-1(y) ⊕ 5)–1
This computation is performed by ComputeInvSBox( ).
ComputeInvSBox( )
GF256[16][16] ComputeInvSBox() {
GF256 result[16][16];
GF256 B[8] = {0xA4, 0x49, 0x92, 0x25, 0x4A, 0x94, 0x29, 0x52};
for (nat i=0; i<16; i++) {
for (nat j=0; j<16; j++) {
GF256 y = (i << 4) | j;
result[i][j] = GFInv(ApplyLinearOp(B, y) ^ 0x5);
}
}
return result;
}
const GF256 InvSBox[16][16] = ComputeInvSBox();

Table A-2 shows the resulting InvSBox[ ], as defined in the FIPS 197.

983

AMD64 Technology

26568—Rev. 3.22—May 2018

Table A-2.

InvSBox Definition
S[3:0]

S[7:4]

A.7

0

1

2

3

4

5

6

7

8

9

a

b

c

d

e

f

0

52

09

6a

d5

30

36

a5

38

bf

40

a3

9e

81

f3

d7

fb

1

7c

e3

39

82

9b

2f

ff

87

34

8e

43

44

c4

de

e9

cb

2

54

7b

94

32

a6

c2

23

3d

ee

4c

95

0b

42

fa

c3

4e

3

08

2e

a1

66

28

d9

24

b2

76

5b

a2

49

6d

8b

d1

25

4

72

f8

f6

64

86

68

98

16

d4

a4

5c

cc

5d

65

b6

92

5

6c

70

48

50

fd

ed

b9

da

5e

15

46

57

a7

8d

9d

84

6

90

d8

ab

00

8c

bc

d3

0a

f7

e4

58

05

b8

b3

45

06

7

d0

2c

1e

8f

ca

3f

0f

02

c1

af

bd

03

01

13

8a

6b

8

3a

91

11

41

4f

67

dc

ea

97

f2

cf

ce

f0

b4

e6

73

9

96

ac

74

22

e7

ad

35

85

e2

f9

37

e8

1c

75

df

6e

a

47

f1

1a

71

1d

29

c5

89

6f

b7

62

0e

aa

18

be

1b

b

fc

56

3e

4b

c6

d2

79

20

9a

db

c0

fe

78

cd

5a

f4

c

1f

dd

a8

33

88

07

c7

31

b1

12

10

59

27

80

ec

5f

d

60

51

7f

a9

19

b5

4a

0d

2d

e5

7a

9f

93

c9

9c

ef

e

a0

e0

3b

4d

ae

2a

f5

b0

c8

eb

bb

3c

83

53

99

61

f

17

2b

04

7e

ba

77

d6

26

e1

69

14

63

55

21

0c

7d

Encryption and Decryption

The AMD64 architecture implements the AES algorithm by means of an iterative function called a
round for both encryption and the inverse operation, decryption.
The top-level encryption and decryption procedures Encrypt( ) and Decrypt( ) set up the rounds and
invoke the functions that perform them. Each of the procedures takes two 128-bit binary arguments:
•
•

input data — a 16-byte block of text stored in a source 128-bit XMM register
cipher key — a 16-, 24-, or 32-byte cipher key stored in either a second 128-bit XMM register or
128-bit memory location

A.7.1 The Encrypt( ) and Decrypt( ) Procedures
TextBlock Encrypt(TextBlock in, CipherKey key, nat Nk) {
return Cipher(in, ExpandKey(key, Nk), Nk);
}
TextBlock Decrypt(TextBlock in, CipherKey key, nat Nk) {
return InvCipher(in, ExpandKey(key, Nk), Nk);

984

26568—Rev. 3.22—May 2018

AMD64 Technology

}

The array types TextBlock and CipherKey are introduced to accommodate the text and key
parameters. The 16-, 24-, or 32-byte cipher keys correspond to AES-128, AES-192, or AES-256 key
sizes. The cipher key is logically partitioned into Nk = 4, 6, or 8 AES 32-bit words. Nk is passed as a
parameter to determine the AES version to be executed, and the number of rounds to be performed.
Both the Encrypt( ) and Decrypt( ) procedures invoke the ExpandKey( ) function to expand the
cipher key for use in round key generation. When key expansion is complete, either the Cipher( ) or
InvCipher( ) functions are invoked.
The Cipher( ) and InvCipher( ) functions are the key components of the encryption and decryption
process. See Section A.8, “The Cipher Function” and Section A.9, “The InvCipher Function” for
detailed information.

A.7.2 Round Sequences and Key Expansion
Encryption and decryption are performed in a sequence of rounds indexed by 0, …, Nr, where Nr is
determined by the number Nk of GF words in the cipher key. A key matrix called a round key is
generated for each round. The number of GF words required to form Nr + 1 round keys is equal to ,
4(Nr + 1). Table A-3 shows the relationship between cipher key length, round sequence length, and
round key length.
Table A-3. Cipher Key, Round Sequence, and Round Key Length
Nk

Nr

4(Nr + 1)

4

10

44

6

12

52

8

14

60

Expanded keys are generated from the cipher key by the ExpandKey( ) function, where the array type
ExpandedKey is defined to accommodate 60 words (the maximum required) corresponding to Nk = 8.
The ExpandKey( ) Function
ExpandedKey ExpandKey(CipherKey key, nat Nk) {
assert((Nk == 4) || (Nk == 6) || (Nk == 8));
nat Nr = Nk + 6;
ExpandedKey w;
// Copy key into first Nk rows of w:
for (nat i=0; i0; round--) {
state = InvShiftRows(state);
state = InvSubBytes(state);

989

AMD64 Technology

26568—Rev. 3.22—May 2018

state = AddRoundKey(state, w, round);
state = InvMixColumns(state);

}

}
state = InvShiftRows(state);
state = InvSubBytes(state);
state = AddRoundKey(state, w, 0);
return Matrix2Text(state);

A.9.1 Text to Matrix Conversion
Prior to processing, the input text block must be converted to matrix form. The Text2Matrix( )
function stores a TextBlock in a GFMatrix in column-major order as follows.
GFMatrix Text2Matrix(TextBlock A) {
GFMatrix result;
for (nat j=0; j<4; j++) {
for (nat i=0; i<4; i++) {
result[i][j] = A[4*j+i];
}
}
return result;
}

A.9.2 InvCypher Transformations
The following functions are used in decryption:
InvShiftRows( ) — The inverse of ShiftRows( ).
InvSubBytes( ) — The inverse of SubBytes( ).
InvSubWord( ) — The inverse of SubWord( ).
InvMixColumns( ) — The inverse of MixColumns( ).
AddRoundKey( ) — Is its own inverse.
Decryption is the inverse of encryption and is accomplished by means of the inverses of the,
SubBytes( ), SubWord( ), ShiftRows( ) and MixColumns( ) transformations used in encryption.
SubWord( ), SubBytes( ), and ShiftRows( ) are injective. This is also the case with MixColumns( ).
A simple computation shows that C is invertible with
E
9
C–1 = D
B
InvShiftRows( ) Function
The inverse of ShiftRows( ).
GFMatrix InvShiftRows(GFMatrix M) {
GFMatrix result;

990

B
E
9
D

D
B
E
9

9
D
B
E

26568—Rev. 3.22—May 2018

AMD64 Technology

for (nat i=0; i<4; i++) {
result[i] = RotateLeft(M[i], -i);
}
return result;

InvSubBytes( ) Function
The inverse of SubBytes( ).
GFMatrix InvSubBytes(GFMatrix M) {
GFMatrix result;
for (nat i=0; i<4; i++) {
result[i] = InvSubWord(M[i]);
}
return result;
}

InvSubWord( ) Function
The inverse of SubWord( ), InvSubBytes( ) applied to each element of a vector or a matrix.
GFWord InvSubWord(GFWord x) {
GFWord result;
for (nat i=0; i<4; i++) {
result[i] = InvSubByte(x[i]);
}
return result;
}

InvMixColumns( ) Function
The inverse of the MixColumns( ) function. Multiplies by the inverse of the predefined fixed matrix,
C, C–1, as discussed previously.
GFMatrix InvMixColumns(GFMatrix M) {
GFMatrix D = {
{0x0e,0x0b,0x0d,0x09},
{0x09,0x0e,0x0b,0x0d},
{0x0d,0x09,0x0e,0x0b},
{0x0b,0x0d,0x09,0x0e}
};
return GFMatrixMul(D, M);
}

AddRoundKey( ) Function
Extracts the round key from the expanded key and adds it to the state using a bitwise XOR operation.
GFMatrix AddRoundKey(GFMatrix state, ExpandedKey w, nat round) {
GFMatrix result = state;
for (nat i=0; i<4; i++) {
for (nat j=0; j<4; j++) {
result[i][j] = result[i][j] ^ w[4*round+j][i];
}
}
return result;

991

AMD64 Technology

26568—Rev. 3.22—May 2018

}

A.9.3 Matrix to Text Conversion
After processing, the output matrix must be converted to a text block. The Matrix2Text( ) function
converts a GFMatrix in column-major order to a TextBlock as follows.
TextBlock Matrix2Text(GFMatrix M) {
TextBlock result;
for (nat j=0; j<4; j++) {
for (nat i=0; i<4; i++) {
result[4*j+i] = M[i][j];
}
}
return result;
}

A.10

An Alternative Decryption Procedure

This section outlines an alternative decrypting procedure,
TextBlock EqDecrypt(TextBlock in, CipherKey key, nat Nk):
TextBlock EqDecrypt(TextBlock in, CipherKey key, nat Nk) {
return EqInvCipher(in, MixRoundKeys(ExpandKey(key, Nk), Nk), Nk);
}

The procedure is based on a variation of InvCipher,
TextBlock EqInvCipher(TextBlock in, ExpandedKey w, nat Nk):
TextBlock EqInvCipher(TextBlock in, ExpandedKey dw, nat Nk) {
assert((Nk == 4) || (Nk == 6) || (Nk == 8));
nat Nr = Nk + 6;
GFMatrix state = Text2Matrix(in);
state = AddRoundKey(state, dw, Nr);
for (nat round=Nr-1; round>0; round--) {
state = InvSubBytes(state);
state = InvShiftRows(state);
state = InvMixColumns(state);
state = AddRoundKey(state, dw, round);
}
state = InvSubBytes(state);
state = InvShiftRows(state);
state = AddRoundKey(state, dw, 0);
return Matrix2Text(state);
}

The variant structure more closely resembles that of Cipher. This requires a modification of the
expanded key generated by ExpandKey,
ExpandedKey MixRoundKeys(ExpandedKey w, nat Nk):

992

26568—Rev. 3.22—May 2018

AMD64 Technology

ExpandedKey MixRoundKeys(ExpandedKey w, nat Nk) {
assert((Nk == 4) || (Nk == 6) || (Nk == 8));
nat Nr = Nk + 6;
ExpandedKey result;
GFMatrix roundKey;
for (nat round=0; round 0) && (round < Nr)) {
roundKey = InvMixRows(roundKey);
}
for (nat i=0; i<4; i++) {
result[4*round+i] = roundKey[i];
}
}
return result;
}

The transformation MixRoundKeys leaves K0 and KNr unchanged, but for i = 1,…,Nr – 1, it replaces
Wi with the matrix product Wi Q, where

The effect of this is to replace Ki with

for i = 1,…,Nr – 1.
The equivalence of EqDecrypt and Decrypt follows from two properties of the basic operations:
C is a linear transformation and therefore, so is C–1;
Ʃ and R commute, and hence so do Ʃ–1 and R–1, for if

then

993

AMD64 Technology

26568—Rev. 3.22—May 2018

Now let X’’ and Y’’ be the initial and final states of an execution of EqDecrypt and let S’’i be the state
following round i . Suppose X’’ = X’. Appealing to the definitions of EqDecrypt and EqInvCipher,
we have

and for i = Nr – 1,…,1, by induction,
=
=
=
=
=

Finally,
=
=
=
=

A.11
Computation of GFInv with Euclidean Greatest
Common Divisor
Note that the operations performed by GFInv( ) are in the ring

2[X] rather than the quotient field GF.

The initial values of the variables x1 and x2 are the inputs x and 11b, the latter representing the
polynomial p(X). The variables a1 and a2 are initialized to 1 and 0.
994

26568—Rev. 3.22—May 2018

AMD64 Technology

On each iteration of the loop, a multiple of the lesser of x1 and x2 is added to the other. If x1 ≤ x2, then
the values of x2 and a2 are adjusted as follows:
x2 → x2 ⊕ 2s

x1

a2 → a2 ⊕ 2s

a1

where s is the difference in the exponents (i.e., degrees) of x1 and x2 . In the remaining case, x1 and a1
are similarly adjusted. This step is repeated until either x1 = 0 or x2 = 0.
We make the following observations:
•
•
•

On each iteration, the value added to xi has the same exponent as xi, and hence the sum has lesser
exponent. Therefore, termination is guaranteed.
Since p(X) is irreducible and x is of smaller degree than p(X), the initial values of x1 and x2 have no
non-trivial common factor. This property is clearly preserved by each step.
Initially,
x1 ⊕ a1

x=x⊕x=0

and
x2 ⊕ a2

x = 11b ⊕ 0 = 11b

are both divisible by 11b. This property is also invariant, since, for example, the above assignments
result in
x 2 ⊕ a2

x → (x2 ⊕ 2s

x1) ⊕ (a2 ⊕ 2s

a1)

x = (x2 ⊕ a2

x) ⊕ 2s

(x1 ⊕ a1

x).

Now suppose that the loop terminates with x2 = 0. Then x1 has no non-trivial factor and, hence, x1 = 1.
Thus, 1 ⊕ a1 x is divisible by 11b. Since the final result y is derived by reducing a1 modulo 11b, it
follows that 1 ⊕ y x is also divisible by 11b and, hence, in the quotient field GF, 1 + y x = 0,
which implies y x = 1.
The computation of the multiplicative inverse utilizing Euclid’s algorithm is as follows:

995

AMD64 Technology

26568—Rev. 3.22—May 2018

// Computation of multiplicative inverse based on Euclid's algorithm:
GF256 GFInv(GF256 x) {
if (x == 0) {
return 0;
}
// Initialization:
nat x1 = x;
nat x2 = 0x11B; // the irreducible polynomial p(X)
nat a1 = 1;
nat a2 = 0;
nat shift; // difference in exponents
while ((x1 != 0) && (x2!= 0)) {
//
//
//
//
//

Termination is guaranteed, since either x1 or x2 decreases on each iteration.
We have the following loop invariants, viewing natural numbers as elements of
the polynomial ring Z2[X]:
(1) x1 and x2 have no common divisor other than 1.
(2) x1 ^ GFMul(a1, x) and x2 ^ GFMul(a2, x) are both divisible by p(X).
if (x1 <=
shift =
x2 = x2
a2 = a2
}
else {
shift =
x1 = x1
a1 = a1
}

x2) {
expo(x2) - expo(x1);
^ (x1 << shift);
^ (a1 << shift);
expo(x1) - expo(x2);
^ (x2 << shift);
^ (a2 << shift);

}
nat y;

// Since either x1 or x2 is 0, it follows from (1) above that the other is 1.
if (x1 == 1) { // x2 == 0
y = a1;
}
else if (x2 == 1) { // x1 == 0
y = a2;
}
else {
assert(false);
}
// Now it follows from (2) that GFMul(y, x) ^ 1 is divisible by 0x11b.
// We need only reduce y modulo 0x11b:

}

nat e = expo(y);
while (e >= 8) {
y = y ^ (0x11B << (e - 8));
e = expo(y);
}
return y;

996

26568—Rev. 3.22—May 2018

AMD64 Technology

Index
Numeric
128-bit media instruction .......................................
16-bit mode ..........................................................
256-bit media instruction .......................................
32-bit mode ..........................................................
64-bit media instructions .......................................
64-bit mode ..........................................................

C
xxix
xxix
xxix
xxix
xxix
xxix

A
absolute displacement ............................................ xxx
ADDPD .................................................................. 23
ADDPS ................................................................... 25
Address space identifier ......................................... xxx
Address space identifier (ASID).............................. xxx
ADDSD .................................................................. 27
ADDSS ................................................................... 29
ADDSUBPD ........................................................... 31
ADDSUBPS............................................................ 33
Advanced Encryption Standard (AES) .............. xxx, 973
data structures .................................................... 974
decryption ........................................... 976, 984, 992
encryption ................................................... 976, 984
Euclidean common divisor .................................. 994
InvSbox ............................................................. 979
operations .......................................................... 978
Sbox .................................................................. 979
AESDEC ................................................................ 35
AESDECLAST ....................................................... 37
AESENC ................................................................ 39
AESENCLAST ....................................................... 41
AESIMC ................................................................. 43
AESKEYGENASSIST............................................. 45
ANDNPD ............................................................... 47
ANDNPS ................................................................ 49
ANDPD .................................................................. 51
ANDPS ................................................................... 53
ASID .................................................................... xxx
AVX ..................................................................... xxx

B
biased exponent ..................................................... xxx
BLENDPD .............................................................. 55
BLENDPS .............................................................. 57
BLENDVPD ........................................................... 59
BLENDVPS ............................................................ 61
byte ...................................................................... xxx

clear ...................................................................... xxx
cleared .................................................................. xxx
CMPPD .................................................................. 63
CMPPS ................................................................... 67
CMPSD .................................................................. 71
CMPSS ................................................................... 75
COMISD ................................................................. 79
COMISS ................................................................. 82
commit .................................................................. xxx
compatibility mode ................................................ xxx
Current privilege level (CPL) .................................. xxx
CVTDQ2PD ............................................................ 84
CVTDQ2PS ............................................................ 86
CVTPD2DQ ............................................................ 88
CVTPD2PS ............................................................. 90
CVTPS2DQ ............................................................ 92
CVTPS2PD ............................................................. 94
CVTSD2SI .............................................................. 96
CVTSD2SS ............................................................. 99
CVTSI2SD ............................................................ 101
CVTSI2SS ............................................................ 104
CVTSS2SD ........................................................... 107
CVTSS2SI ............................................................ 109
CVTTPD2DQ ........................................................ 112
CVTTPS2DQ ........................................................ 115
CVTTSD2SI.......................................................... 117
CVTTSS2SI .......................................................... 120

D
Definitions ...........................................................
direct referencing ...................................................
displacement..........................................................
DIVPD ..................................................................
DIVPS ..................................................................
DIVSD ..................................................................
DIVSS ..................................................................
double quadword ..................................................
doubleword ..........................................................
DPPD....................................................................
DPPS ....................................................................

xxix
xxx
xxx
123
125
127
129
xxxi
xxxi
131
134

E
effective address size ............................................. xxxi
effective operand size ............................................ xxxi
element ................................................................ xxxi
endian order........................................................ xxxix

997

AMD64 Technology

26568—Rev. 3.22—May 2018

exception ............................................................. xxxi
exponent ............................................................... xxx
extended SSE ....................................................... xxxi
extended-register prefix ....................................... xxxiv
EXTRQ ................................................................ 139

F
flush .................................................................... xxxi
FMA .................................................................... xxxi
FMA4 .................................................................. xxxi
four-operand instruction ............................................. 6

G
General notation ................................................. xxviii
Global descriptor table (GDT) ............................... xxxi
Global interrupt flag (GIF) ................................... xxxii

H
HADDPD .............................................................
HADDPS ..............................................................
HSUBPD ..............................................................
HSUBPS ...............................................................

141
143
146
149

I
IGN .................................................................... xxxii
immediate operands ................................................... 4
indirect ............................................................... xxxii
INSERTPS ............................................................ 152
INSERTQ ............................................................. 154
instructions
AES .................................................................. xxx
Interrupt descriptor table (IDT) ............................. xxxii
Interrupt redirection bitmap (IRB) ......................... xxxii
Interrupt stack table (IST) ..................................... xxxii
Interrupt vector table (IVT) .................................. xxxii

L
LDDQU ................................................................ 156
LDMXCSR ........................................................... 158
least significant byte ........................................... xxxiii
least-significant bit.............................................. xxxiii
legacy mode ........................................................ xxxii
legacy x86 ........................................................... xxxii
little endian ........................................................ xxxix
Local descriptor table (LDT) ................................ xxxii
long mode ........................................................... xxxii
LSB ................................................................... xxxiii
lsb ..................................................................... xxxiii

M
main memory ..................................................... xxxiii

998

mask .................................................................. xxxiii
MASKMOVDQU .................................................. 160
MAXPD ................................................................ 162
MAXPS ................................................................ 165
MAXSD ................................................................ 168
MAXSS ................................................................ 170
memory .............................................................. xxxiii
MINPD ................................................................. 172
MINPS .................................................................. 175
MINSD ................................................................. 178
MINSS .................................................................. 180
modes
32-bit ................................................................ xxix
64-bit ................................................................ xxix
compatibility ...................................................... xxx
legacy .............................................................. xxxii
long ................................................................. xxxii
protected ......................................................... xxxiv
real ................................................................. xxxiv
virtual-8086..................................................... xxxvi
most significant bit .............................................. xxxiii
most significant byte ........................................... xxxiii
MOVAPD.............................................................. 182
MOVAPS .............................................................. 184
MOVD .................................................................. 186
MOVDDUP .......................................................... 188
MOVDQA ............................................................ 190
MOVDQU ............................................................ 192
MOVHLPS ........................................................... 194
MOVHPD ............................................................. 196
MOVHPS .............................................................. 198
MOVLHPS ........................................................... 200
MOVLPD ............................................................. 202
MOVLPS .............................................................. 204
MOVMSKPD ........................................................ 206
MOVMSKPS ........................................................ 208
MOVNTDQ .......................................................... 210
MOVNTDQA ........................................................ 212
MOVNTPD ........................................................... 214
MOVNTPS ........................................................... 216
MOVNTSD ........................................................... 218
MOVNTSS ........................................................... 220
MOVQ .................................................................. 222
MOVSD ................................................................ 224
MOVSHDUP ........................................................ 226
MOVSLDUP ......................................................... 228
MOVSS ................................................................ 230
MOVUPD ............................................................. 232
MOVUPS .............................................................. 234
MPSADBW .......................................................... 236
MSB .................................................................. xxxiii
msb .................................................................... xxxiii

26568—Rev. 3.22—May 2018

MULPD ................................................................ 241
MULPS ................................................................ 243
MULSD ................................................................ 245
MULSS ................................................................ 247
Must be zero (MBZ) ........................................... xxxiii

N
Notation
conventions ..................................................... xxviii
register ........................................................... xxxvi

O
octword .............................................................. xxxiii
offset ................................................................. xxxiii
operands
immediate .............................................................. 4
ORPD ................................................................... 249
ORPS ................................................................... 251
overflow ............................................................ xxxiii

P
PABSB ................................................................. 253
PABSD ................................................................. 255
PABSW ................................................................ 257
packed ............................................................... xxxiii
PACKSSDW ......................................................... 259
PACKSSWB ......................................................... 261
PACKUSDW ........................................................ 263
PACKUSWB ......................................................... 265
PADDB................................................................. 267
PADDD ................................................................ 269
PADDQ ................................................................ 271
PADDSB............................................................... 273
PADDSW.............................................................. 275
PADDUSB ............................................................ 277
PADDUSW ........................................................... 279
PADDW................................................................ 281
PALIGNR ............................................................. 283
PAND ................................................................... 285
PANDN ................................................................ 287
PAVGB ................................................................. 289
PAVGW ................................................................ 291
PBLENDVB ......................................................... 293
PBLENDW ........................................................... 295
PCLMULQDQ ...................................................... 297
PCMPEQB............................................................ 299
PCMPEQD ........................................................... 301
PCMPEQQ ........................................................... 303
PCMPEQW........................................................... 305
PCMPESTRI ......................................................... 307
PCMPESTRM ....................................................... 310
PCMPGTB............................................................ 313

AMD64 Technology

PCMPGTD ............................................................ 315
PCMPGTQ ............................................................ 317
PCMPGTW ........................................................... 319
PCMPISTRI .......................................................... 321
PCMPISTRM ........................................................ 324
PEXTRB ............................................................... 327
PEXTRD ............................................................... 329
PEXTRQ ............................................................... 331
PEXTRW .............................................................. 333
PHADDD .............................................................. 335
PHADDSW ........................................................... 337
PHADDUBD ......................................................... 768
PHADDW ............................................................. 340
PHMINPOSUW .................................................... 343
PHSUBD .............................................................. 345
PHSUBSW ............................................................ 347
PHSUBW .............................................................. 350
Physical address extension (PAE) ......................... xxxiii
physical memory ................................................. xxxiv
PINSRB ................................................................ 353
PINSRD ................................................................ 356
PINSRQ ................................................................ 358
PINSRW ............................................................... 360
PMADDUBSW ..................................................... 362
PMADDWD .......................................................... 365
PMAXSB .............................................................. 367
PMAXSD .............................................................. 369
PMAXSW ............................................................. 371
PMAXUB ............................................................. 373
PMAXUD ............................................................. 375
PMAXUW ............................................................ 377
PMINSB ............................................................... 379
PMINSD ............................................................... 381
PMINSW .............................................................. 383
PMINUB ............................................................... 385
PMINUD .............................................................. 387
PMINUW .............................................................. 389
PMOVMSKB ........................................................ 391
PMOVSXBD ......................................................... 393
PMOVSXBQ ......................................................... 395
PMOVSXBW ........................................................ 397
PMOVSXDQ ........................................................ 399
PMOVSXWD ........................................................ 401
PMOVSXWQ ........................................................ 403
PMOVZXBD ........................................................ 405
PMOVZXBQ ........................................................ 407
PMOVZXBW ........................................................ 409
PMOVZXDQ ........................................................ 411
PMOVZXWD ....................................................... 413
PMOVZXWQ ....................................................... 415
PMULDQ ............................................................. 417

999

AMD64 Technology

26568—Rev. 3.22—May 2018

PMULHRSW ........................................................ 419
PMULHUW .......................................................... 421
PMULHW ............................................................ 423
PMULLD .............................................................. 425
PMULLW ............................................................. 427
PMULUDQ........................................................... 429
POR ..................................................................... 431
probe ................................................................. xxxiv
protected mode ................................................... xxxiv
PSADBW ............................................................. 433
PSHUFB ............................................................... 435
PSHUFD ............................................................... 437
PSHUFHW ........................................................... 440
PSHUFLW ............................................................ 443
PSIGNB ................................................................ 446
PSIGND ............................................................... 448
PSIGNW ............................................................... 450
PSLLD ................................................................. 452
PSLLDQ ............................................................... 455
PSLLQ ................................................................. 457
PSLLW ................................................................. 460
PSRAD ................................................................. 463
PSRAW ................................................................ 466
PSRLD ................................................................. 469
PSRLDQ ............................................................... 472
PSRLQ ................................................................. 474
PSRLW ................................................................. 477
PSUBB ................................................................. 480
PSUBD ................................................................. 482
PSUBQ ................................................................. 484
PSUBSB ............................................................... 486
PSUBSW .............................................................. 488
PSUBUSB ............................................................ 490
PSUBUSW ........................................................... 492
PSUBW ................................................................ 494
PTEST .................................................................. 496
PUNPCKHBW ...................................................... 498
PUNPCKHDQ ...................................................... 501
PUNPCKHQDQ .................................................... 504
PUNPCKHWD...................................................... 507
PUNPCKLBW ...................................................... 510
PUNPCKLDQ ....................................................... 513
PUNPCKLQDQ .................................................... 516
PUNPCKLWD ...................................................... 519
PXOR ................................................................... 522

Q
quadword ........................................................... xxxiv

R
RCPPS .................................................................. 524

1000

RCPSS .................................................................. 526
Read as zero (RAZ) ............................................. xxxiv
real address mode. See real mode
real mode ........................................................... xxxiv
Register extension prefix (REX) ........................... xxxiv
Register notation ................................................. xxxvi
relative ............................................................... xxxiv
Relative instruction pointer (RIP) ......................... xxxiv
reserved ............................................................. xxxiv
revision history ..................................................... xxiii
RIP-relative addressing........................................ xxxiv
Rip-relative addressing ........................................ xxxiv
ROUNDPD ........................................................... 528
ROUNDSD ........................................................... 534
ROUNDSS ............................................................ 537
ROUNDTPS.......................................................... 531
RSQRTPS ............................................................. 540
RSQRTSS ............................................................. 542

S
SBZ ................................................................... xxxiv
scalar .................................................................. xxxv
set ....................................................................... xxxv
SHUFPD ............................................................... 558
SHUFPS ............................................................... 561
Single instruction multiple data (SIMD)................. xxxv
SQRTPD ............................................................... 564
SQRTPS ................................................................ 566
SQRTSD ............................................................... 568
SQRTSS ................................................................ 570
SSE..................................................................... xxxv
SSE Instructions
legacy .............................................................. xxxii
SSE instructions
AVX .................................................................. xxx
SSE1 ................................................................... xxxv
SSE2 ................................................................... xxxv
SSE3 ................................................................... xxxv
SSE4.1 ................................................................ xxxv
SSE4.2 ................................................................ xxxv
SSE4A ................................................................ xxxv
SSSE3 ................................................................. xxxv
sticky bit ............................................................. xxxv
STMXCSR ............................................................ 572
Streaming SIMD Extensions ................................. xxxv
string compare instructions ....................................... 10
string comparison ..................................................... 10
SUBPD ................................................................. 574
SUBPS .................................................................. 576
SUBSD ................................................................. 578
SUBSS .................................................................. 580

26568—Rev. 3.22—May 2018

T
Task state segment (TSS)...................................... xxxv
Terminology ......................................................... xxix
three-operand instruction ............................................ 5
two-operand instruction .............................................. 4

U
UCOMISD ............................................................ 582
UCOMISS ............................................................ 584
underflow ........................................................... xxxvi
UNPCKHPD ......................................................... 586
UNPCKHPS.......................................................... 588
UNPCKLPD ......................................................... 590
UNPCKLPS .......................................................... 592

V
VADDPD ................................................................ 23
VADDPS ................................................................ 25
VADDSD ................................................................ 27
VADDSUBPD ......................................................... 31
VADDSUBPS ......................................................... 33
VADSS ................................................................... 29
VAESDEC .............................................................. 35
VAESDECLAST ..................................................... 37
VAESENC .............................................................. 39
VAESENCLAST ..................................................... 41
VAESIMC ............................................................... 43
VAESKEYGENASSIST .......................................... 45
VANDNPD ............................................................. 47
VANDNPS .............................................................. 49
VANDPD ................................................................ 51
VANDPS ................................................................ 53
VBLENDPD ........................................................... 55
VBLENDPS ............................................................ 57
VBLENDVPD......................................................... 59
VBLENDVPS ......................................................... 61
VBROADCASTF128 ............................................ 594
VBROADCASTI128 ............................................. 596
VBROADCASTSD ............................................... 598
VBROADCASTSS ................................................ 600
VCMPPD................................................................ 63
VCMPPS ................................................................ 67
VCMPSD................................................................ 71
VCMPSS ................................................................ 75
VCOMISD .............................................................. 79
VCOMISS .............................................................. 82
VCVTDQ2PD ......................................................... 84
VCVTDQ2PS.......................................................... 86
VCVTPD2DQ ......................................................... 88
VCVTPD2PS .......................................................... 90
VCVTPH2PS ........................................................ 602

AMD64 Technology

VCVTPS2DQ .......................................................... 92
VCVTPS2PD .......................................................... 94
VCVTPS2PH ........................................................ 605
VCVTSD2SI ........................................................... 96
VCVTSD2SS .......................................................... 99
VCVTSI2SD ......................................................... 101
VCVTSI2SS .......................................................... 104
VCVTSS2SD ........................................................ 107
VCVTSS2SI .......................................................... 109
VCVTTPD2DQ ..................................................... 112
VCVTTPS2DQ...................................................... 115
VCVTTSD2SI ....................................................... 117
VCVTTSS2SI ........................................................ 120
VDIVPD ............................................................... 123
VDIVPS ................................................................ 125
VDIVSD ............................................................... 127
VDIVSS ................................................................ 129
VDPPD ................................................................. 131
VDPPS ................................................................. 134
vector ................................................................. xxxvi
VEX prefix ......................................................... xxxvi
VEXTRACT128 .................................................... 609
VEXTRACTI128 ................................................... 611
VFMADD132PD ................................................... 613
VFMADD132PS.................................................... 616
VFMADD132SD ................................................... 619
VFMADD132SS.................................................... 622
VFMADD213PD ................................................... 613
VFMADD213PS.................................................... 616
VFMADD213SD ................................................... 619
VFMADD213SS.................................................... 622
VFMADD231PD ................................................... 613
VFMADD231PS.................................................... 616
VFMADD231SD ................................................... 619
VFMADD231SS.................................................... 622
VFMADDPD ........................................................ 613
VFMADDPS ......................................................... 616
VFMADDSD ........................................................ 619
VFMADDSS ......................................................... 622
VFMADDSUB132PD ............................................ 625
VFMADDSUB132PS ............................................ 628
VFMADDSUB213PD ............................................ 625
VFMADDSUB213PS ............................................ 628
VFMADDSUB231PD ............................................ 625
VFMADDSUB231PS ............................................ 628
VFMADDSUBPD ................................................. 625
VFMADDSUBPS .................................................. 628
VFMSUB132PD .................................................... 637
VFMSUB132PS .................................................... 640
VFMSUB132SD .................................................... 643
VFMSUB132SS .................................................... 646

1001

AMD64 Technology

VFMSUB213PD ...................................................
VFMSUB213PS ....................................................
VFMSUB213SD ...................................................
VFMSUB213SS ....................................................
VFMSUB231PD ...................................................
VFMSUB231PS ....................................................
VFMSUB231SD ...................................................
VFMSUB231SS ....................................................
VFMSUBADD132PD ............................................
VFMSUBADD132PS ............................................
VFMSUBADD213PD ............................................
VFMSUBADD213PS ............................................
VFMSUBADD231PD ............................................
VFMSUBADD231PS ............................................
VFMSUBADDPD .................................................
VFMSUBADDPS ..................................................
VFMSUBPD .........................................................
VFMSUBPS..........................................................
VFMSUBSD .........................................................
VFMSUBSS..........................................................
VFNMADD132PD ................................................
VFNMADD132PS .................................................
VFNMADD132SS .................................................
VFNMADD213PD ................................................
VFNMADD213PS .................................................
VFNMADD213SS .................................................
VFNMADD231PD ................................................
VFNMADD231PS .................................................
VFNMADD231SS .................................................
VFNMADDPD......................................................
VFNMADDPS ......................................................
VFNMADDSD......................................................
VFNMADDSS ......................................................
VFNMSUB132PD .................................................
VFNMSUB132PS .................................................
VFNMSUB132SD .................................................
VFNMSUB132SS .................................................
VFNMSUB213PD .................................................
VFNMSUB213PS .................................................
VFNMSUB213SD .................................................
VFNMSUB213SS .................................................
VFNMSUB231PD .................................................
VFNMSUB231PS .................................................
VFNMSUB231SD .................................................
VFNMSUB231SS .................................................
VFNMSUBPD ......................................................
VFNMSUBPS .......................................................
VFNMSUBSD ......................................................
VFNMSUBSS .......................................................
VFRCZPD ............................................................
VFRCZPS .............................................................

1002

26568—Rev. 3.22—May 2018

637
640
643
646
637
640
643
646
631
634
631
634
631
634
631
634
637
640
643
646
649
652
658
649
652
658
649
652
658
649
652
655
658
661
664
667
670
661
664
667
670
661
664
667
670
661
664
667
670
673
675

VFRCZSD ............................................................ 677
VFRCZSS ............................................................. 679
VGATHERDPD..................................................... 681
VGATHERDPS ..................................................... 683
VGATHERQPD..................................................... 685
VGATHERQPS ..................................................... 687
VHADDPD ........................................................... 141
VHADDPS ............................................................ 143
VHSUBPD ............................................................ 146
VHSUBPS ............................................................ 149
VINSERTF128 ...................................................... 689
VINSERTI128 ....................................................... 691
VINSERTPS .......................................................... 152
Virtual machine control block (VMCB) ................ xxxvi
Virtual machine monitor (VMM) .......................... xxxvi
virtual-8086 mode ............................................... xxxvi
VLDDQU ............................................................. 156
VLDMXCSR ......................................................... 158
VMASKMOVDQU ............................................... 160
VMASKMOVPD................................................... 693
VMASKMOVPS ................................................... 695
VMAXPD ............................................................. 162
VMAXPS .............................................................. 165
VMAXSD ............................................................. 168
VMAXSS .............................................................. 170
VMINPD .............................................................. 172
VMINPS ............................................................... 175
VMINSD .............................................................. 178
VMINSS ............................................................... 180
VMOVAPS ........................................................... 184
VMOVD ............................................................... 186
VMOVDDUP ........................................................ 188
VMOVDQA .......................................................... 190
VMOVDQU .......................................................... 192
VMOVHLPS ......................................................... 194
VMOVHPD .......................................................... 196
VMOVHPS ........................................................... 198
VMOVLHPS ......................................................... 200
VMOVLPD ........................................................... 202
VMOVLPS ........................................................... 204
VMOVMSKPD ..................................................... 206
VMOVMSKPS ...................................................... 208
VMOVNTDQ ........................................................ 210
VMOVNTDQA ..................................................... 212
VMOVNTPD ........................................................ 214
VMOVNTPS ......................................................... 216
VMOVQ ............................................................... 222
VMOVSD ............................................................. 224
VMOVSHDUP ...................................................... 226
VMOVSLDUP ...................................................... 228
VMOVSS .............................................................. 230

26568—Rev. 3.22—May 2018

VMOVUPD ..........................................................
VMOVUPS ...........................................................
VMPSADBW........................................................
VMULPD .............................................................
VMULPS ..............................................................
VMULSD .............................................................
VMULSS ..............................................................
VORPD ................................................................
VORPS .................................................................
VPABSB ...............................................................
VPABSD...............................................................
VPABSW ..............................................................
VPACKSSDW ......................................................
VPACKSSWB .......................................................
VPACKUSDW ......................................................
VPACKUSWB ......................................................
VPADDD ..............................................................
VPADDQ ..............................................................
VPADDSB ............................................................
VPADDSW ...........................................................
VPADDUSB .........................................................
VPADDUSW ........................................................
VPADDW .............................................................
VPALIGNR...........................................................
VPAND ................................................................
VPANDN ..............................................................
VPAVGB ..............................................................
VPAVGW .............................................................
VPBLENDD .........................................................
VPBLENDVB .......................................................
VPBLENDW ........................................................
VPBROADCASTB ...............................................
VPBROADCASTD ...............................................
VPBROADCASTQ ...............................................
VPBROADCASTW ..............................................
VPCLMULQDQ ...................................................
VPCMOV .............................................................
VPCMPEQB .........................................................
VPCMPEQD .........................................................
VPCMPEQQ .........................................................
VPCMPEQW ........................................................
VPCMPESTRI ......................................................
VPCMPESTRM ....................................................
VPCMPGTB .........................................................
VPCMPGTD .........................................................
VPCMPGTQ .........................................................
VPCMPGTW ........................................................
VPCMPISTRI .......................................................
VPCMPISTRM .....................................................
VPCOMB .............................................................
VPCOMD .............................................................

AMD64 Technology

232
234
236
241
243
245
247
249
251
253
255
257
259
261
263
265
269
271
273
275
277
279
281
283
285
287
289
291
697
293
295
699
701
703
705
297
707
299
301
303
305
307
310
313
315
317
319
321
324
709
711

VPCOMQ .............................................................
VPCOMUB ...........................................................
VPCOMUD ...........................................................
VPCOMUQ ...........................................................
VPCOMUW ..........................................................
VPCOMW ............................................................
VPERM2F128 .......................................................
VPERM2I128 ........................................................
VPERMD ..............................................................
VPERMIL2PD ......................................................
VPERMIL2PS .......................................................
VPERMILPD ........................................................
VPERMILPS .........................................................
VPERMPD ............................................................
VPERMPS ............................................................
VPERMQ ..............................................................
VPEXTRB ............................................................
VPEXTRD ............................................................
VPEXTRQ ............................................................
VPEXTRW ...........................................................
VPGATHERDD.....................................................
VPGATHERDQ.....................................................
VPGATHERQD.....................................................
VPGATHERQQ.....................................................
VPHADDBD .........................................................
VPHADDBQ .........................................................
VPHADDBW ........................................................
VPHADDD ...........................................................
VPHADDDQ ........................................................
VPHADDSW ........................................................
VPHADDUBQ ......................................................
VPHADDUBW .....................................................
VPHADDUDQ ......................................................
VPHADDUWD .....................................................
VPHADDUWQ .....................................................
VPHADDW ..........................................................
VPHADDWD ........................................................
VPHADDWQ ........................................................
VPHMINPOSUW ..................................................
VPHSUBBW .........................................................
VPHSUBD ............................................................
VPHSUBDQ .........................................................
VPHSUBSW .........................................................
VPHSUBW ...........................................................
VPHSUBWD ........................................................
VPINSRB .............................................................
VPINSRD .............................................................
VPINSRQ .............................................................
VPINSRW .............................................................
VPMACSDD .........................................................
VPMACSDQH ......................................................

713
715
717
719
721
723
725
727
729
731
735
739
742
746
748
750
327
329
331
333
752
754
756
758
760
762
764
335
766
337
770
772
774
776
778
340
780
782
343
784
345
786
347
350
788
353
356
358
360
790
792

1003

AMD64 Technology

VPMACSDQL ......................................................
VPMACSSDD ......................................................
VPMACSSDQL ....................................................
VPMACSSQH ......................................................
VPMACSSWD......................................................
VPMACSSWW .....................................................
VPMACSWD........................................................
VPMACSWW .......................................................
VPMADCSSWD ...................................................
VPMADCSWD .....................................................
VPMADDUBSW ..................................................
VPMADDWD .......................................................
VPMASKMOVD ..................................................
VPMASKMOVQ ..................................................
VPMAXSB ...........................................................
VPMAXSD ...........................................................
VPMAXSW ..........................................................
VPMAXUB ..........................................................
VPMAXUD ..........................................................
VPMAXUW .........................................................
VPMINSB ............................................................
VPMINSD ............................................................
VPMINSW ...........................................................
VPMINUB ............................................................
VPMINUD............................................................
VPMINUW ...........................................................
VPMOVMSKB .....................................................
VPMOVSXBD ......................................................
VPMOVSXBQ ......................................................
VPMOVSXBW .....................................................
VPMOVSXDQ......................................................
VPMOVSXWD .....................................................
VPMOVSXWQ .....................................................
VPMOVZXBD......................................................
VPMOVZXBQ......................................................
VPMOVZXBW .....................................................
VPMOVZXDQ .....................................................
VPMOVZXWD.....................................................
VPMOVZXWQ.....................................................
VPMULDQ...........................................................
VPMULHRSW .....................................................
VPMULHUW .......................................................
VPMULHW ..........................................................
VPMULLD ...........................................................
VPMULLW ..........................................................
VPMULUDQ ........................................................
VPOR ...................................................................
VPPERM ..............................................................
VPROTB ..............................................................
VPROTD ..............................................................
VPROTQ ..............................................................

1004

26568—Rev. 3.22—May 2018

794
796
800
798
802
804
806
808
810
812
362
365
814
816
367
369
371
373
375
377
379
381
383
385
387
389
391
393
395
397
399
401
403
405
407
409
411
413
415
417
419
421
423
425
427
429
431
818
820
822
824

VPROTW .............................................................
VPSADBW ...........................................................
VPSHAB ..............................................................
VPSHAD ..............................................................
VPSHAQ ..............................................................
VPSHAW ..............................................................
VPSHLB ...............................................................
VPSHLD ...............................................................
VPSHLQ ...............................................................
VPSHLW ..............................................................
VPSHUFB ............................................................
VPSHUFD ............................................................
VPSHUFHW .........................................................
VPSHUFLW .........................................................
VPSIGNB .............................................................
VPSIGND .............................................................
VPSIGNW ............................................................
VPSLLD ...............................................................
VPSLLDQ ............................................................
VPSLLQ ...............................................................
VPSLLVD .............................................................
VPSLLVQ .............................................................
VPSLLW...............................................................
VPSRAD ..............................................................
VPSRAVD ............................................................
VPSRAW ..............................................................
VPSRLD ...............................................................
VPSRLDQ ............................................................
VPSRLQ ...............................................................
VPSRLVD.............................................................
VPSRLVQ.............................................................
VPSRLW ..............................................................
VPSUBB ...............................................................
VPSUBD ..............................................................
VPSUBQ ..............................................................
VPSUBSB .............................................................
VPSUBSW ............................................................
VPSUBUSB ..........................................................
VPSUBUSW .........................................................
VPSUBW ..............................................................
VPTEST ...............................................................
VPUNPCKHBW ...................................................
VPUNPCKHDQ ....................................................
VPUNPCKHQDQ .................................................
VPUNPCKHWD ...................................................
VPUNPCKLBW ....................................................
VPUNPCKLDQ ....................................................
VPUNPCKLQDQ ..................................................
VPUNPCKLWD ....................................................
VPXOR ................................................................
VRCPPS ...............................................................

826
433
828
830
832
834
836
838
840
842
435
437
440
443
446
448
450
452
455
457
844
846
460
463
848
466
469
472
474
850
852
477
480
482
484
486
488
490
492
494
496
498
501
504
507
510
513
516
519
522
524

26568—Rev. 3.22—May 2018

VRCPSS ...............................................................
VROUNDPD ........................................................
VROUNDPS .........................................................
VROUNDSD ........................................................
VROUNDSS .........................................................
VRSQRTPS ..........................................................
VRSQRTSS ..........................................................
VSHUFPD ............................................................
VSHUFPS .............................................................
VSQRTPD ............................................................
VSQRTPS .............................................................
VSQRTSD ............................................................
VSQRTSS .............................................................
VSTMXCSR .........................................................
VSUBPD ..............................................................
VSUBPS ...............................................................
VSUBSD ..............................................................
VSUBSS ...............................................................
VTESTPD.............................................................
VTESTPS .............................................................
VUCOMISD .........................................................
VUCOMISS ..........................................................
VUNPCKHPD ......................................................
VUNPCKHPS .......................................................
VUNPCKLPD .......................................................
VUNPCKLPS .......................................................
VXORPD ..............................................................
VXORPS ..............................................................
VZEROALL .........................................................
VZEROUPPER .....................................................

AMD64 Technology

526
528
531
534
537
540
542
558
561
564
566
568
570
572
574
576
578
580
854
856
582
584
586
588
590
592
861
863
858
859

W
word .................................................................. xxxvi

X
x86 .................................................................... xxxvi
XGETBV .............................................................. 860
XOP instructions ................................................ xxxvi
XOP prefix ......................................................... xxxvi
XORPD ................................................................ 861
XORPS ................................................................. 863
XRSTOR .............................................................. 865
XSAVE ................................................................. 869
XSAVEOPT .......................................................... 873
XSETBV .............................................................. 877

1005



Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.7
Linearized                      : No
Author                          : AMD
Copyright                       : © 2002 – 2010 Advanced Micro Devices, Inc. All rights reserved.
Create Date                     : 2018:05:24 13:48:03Z
Keywords                        : AMD64, SIMD, extended media instructions, legacy media instructions
Modify Date                     : 2018:06:13 14:17:39+08:00
Subject                         : AMD64 128-Bit and 256-Bit Media Instructions
Page Mode                       : UseOutlines
Page Count                      : 1047
Has XFA                         : No
XMP Toolkit                     : Adobe XMP Core 5.2-c001 63.139439, 2010/09/27-13:37:26
Format                          : application/pdf
Description                     : AMD64 128-Bit and 256-Bit Media Instructions
Title                           : AMD64 Architecture Programmer’s Manual, Volume 4: 128-Bit and 256-Bit Media Instructions
Creator                         : AMD
Producer                        : Acrobat Distiller 10.1.16 (Windows)
Creator Tool                    : FrameMaker 9.0
Metadata Date                   : 2018:05:24 16:43:59-05:00
Document ID                     : uuid:5643f610-5f2d-4060-8d1d-3e7be7312e8c
Instance ID                     : uuid:59decf66-93dc-4a16-a0ca-491f3ec34679
EXIF Metadata provided by EXIF.tools

Navigation menu