AMD64 Architecture Programmer’s Manual, Volume 4: 128 Bit And 256 Media Instructions Apm4
User Manual:
Open the PDF directly: View PDF .
Page Count: 1047
Download | |
Open PDF In Browser | View PDF |
AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit and 256-Bit Media Instructions Publication No. Revision Date 26568 3.22 May 2018 Advanced Micro Devices © 2013 – 2018 Advanced Micro Devices Inc. All rights reserved. The information contained herein is for informational purposes only, and is subject to change without notice. While every precaution has been taken in the preparation of this document, it may contain technical inaccuracies, omissions and typographical errors, and AMD is under no obligation to update or otherwise correct this information. Advanced Micro Devices, Inc. makes no representations or warranties with respect to the accuracy or completeness of the contents of this document, and assumes no liability of any kind, including the implied warranties of noninfringement, merchantability or fitness for particular purposes, with respect to the operation or use of AMD hardware, software or other products described herein. No license, including implied or arising by estoppel, to any intellectual property rights is granted by this document. Terms and limitations applicable to the purchase or use of AMD’s products are as set forth in a signed agreement between the parties or in AMD's Standard Terms and Conditions of Sale. Trademarks AMD, the AMD Arrow logo, and combinations thereof, and 3DNow! are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies. MMX is a trademark and Pentium is a registered trademark of Intel Corporation. 26568—Rev. 3.22—May 2018 AMD64 Technology Contents Revision History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxvii About This Book. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxvii Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxvii Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxvii Conventions and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxviii Related Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xl 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 1.1 1.2 1.3 1.4 1.5 2 Syntax and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Extended Instruction Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.1 Immediate Byte Usage Unique to the SSE instructions . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.2 Instruction Format Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 VSIB Addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3.1 Effective Address Array Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3.2 Notational Conventions Related to VSIB Addressing Mode . . . . . . . . . . . . . . . . . . . . . . 8 1.3.3 Memory Ordering and Exception Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Enabling SSE Instruction Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 String Compare Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.5.1 Source Data Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.5.2 Comparison Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.5.3 Comparison Summary Bit Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.5.4 Intermediate Result Post-processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.5.5 Output Option Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.5.6 Effect on Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Instruction Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21 ADDPD VADDPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 ADDPS VADDPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 ADDSD VADDSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 ADDSS VADDSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 ADDSUBPD VADDSUBPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 ADDSUBPS VADDSUBPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 AESDEC VAESDEC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 AESDECLAST VAESDECLAST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 AESENC iii AMD64 Technology 26568—Rev. 3.22—May 2018 VAESENC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 AESENCLAST VAESENCLAST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 AESIMC VAESIMC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 AESKEYGENASSIST VAESKEYGENASSIST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 ANDNPD VANDNPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 ANDNPS VANDNPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 ANDPD VANDPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 ANDPS VANDPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 BLENDPD VBLENDPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 BLENDPS VBLENDPS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 BLENDVPD VBLENDVPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 BLENDVPS VBLENDVPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 CMPPD VCMPPD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 CMPPS VCMPPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 CMPSD VCMPSD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 CMPSS VCMPSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 COMISD VCOMISD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 COMISS VCOMISS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 CVTDQ2PD VCVTDQ2PD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 CVTDQ2PS VCVTDQ2PS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 CVTPD2DQ VCVTPD2DQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 CVTPD2PS VCVTPD2PS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 CVTPS2DQ VCVTPS2DQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 CVTPS2PD VCVTPS2PD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 iv 26568—Rev. 3.22—May 2018 AMD64 Technology CVTSD2SI VCVTSD2SI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 CVTSD2SS VCVTSD2SS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 CVTSI2SD VCVTSI2SD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 CVTSI2SS VCVTSI2SS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 CVTSS2SD VCVTSS2SD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 CVTSS2SI VCVTSS2SI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 CVTTPD2DQ VCVTTPD2DQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 CVTTPS2DQ VCVTTPS2DQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 CVTTSD2SI VCVTTSD2SI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 CVTTSS2SI VCVTTSS2SI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 DIVPD VDIVPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 DIVPS VDIVPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 DIVSD VDIVSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 DIVSS VDIVSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 DPPD VDPPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 DPPS VDPPS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 EXTRACTPS VEXTRACTPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 EXTRQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 HADDPD VHADDPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 HADDPS VHADDPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 HSUBPD VHSUBPD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 HSUBPS VHSUBPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 INSERTPS VINSERTPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 INSERTQ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 LDDQU v AMD64 Technology 26568—Rev. 3.22—May 2018 VLDDQU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 LDMXCSR VLDMXCSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 MASKMOVDQU VMASKMOVDQU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 MAXPD VMAXPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 MAXPS VMAXPS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 MAXSD VMAXSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 MAXSS VMAXSS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 MINPD VMINPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 MINPS VMINPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 MINSD VMINSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 MINSS VMINSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 MOVAPD VMOVAPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 MOVAPS VMOVAPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 MOVD VMOVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 MOVDDUP VMOVDDUP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 MOVDQA VMOVDQA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 MOVDQU VMOVDQU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 MOVHLPS VMOVHLPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 MOVHPD VMOVHPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 MOVHPS VMOVHPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 MOVLHPS VMOVLHPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 MOVLPD VMOVLPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 MOVLPS VMOVLPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 MOVMSKPD VMOVMSKPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 vi 26568—Rev. 3.22—May 2018 AMD64 Technology MOVMSKPS VMOVMSKPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 MOVNTDQ VMOVNTDQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 MOVNTDQA VMOVNTDQA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 MOVNTPD VMOVNTPD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 MOVNTPS VMOVNTPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 MOVNTSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 MOVNTSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 MOVQ VMOVQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 MOVSD VMOVSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 MOVSHDUP VMOVSHDUP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 MOVSLDUP VMOVSLDUP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 MOVSS VMOVSS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 MOVUPD VMOVUPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 MOVUPS VMOVUPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 MPSADBW VMPSADBW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 MULPD VMULPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 MULPS VMULPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 MULSD VMULSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 MULSS VMULSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 ORPD VORPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 ORPS VORPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 PABSB VPABSB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 PABSD VPABSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 PABSW VPABSW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 PACKSSDW vii AMD64 Technology 26568—Rev. 3.22—May 2018 VPACKSSDW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 PACKSSWB VPACKSSWB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 PACKUSDW VPACKUSDW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 PACKUSWB VPACKUSWB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 PADDB VPADDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 PADDD VPADDD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 PADDQ VPADDQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 PADDSB VPADDSB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 PADDSW VPADDSW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 PADDUSB VPADDUSB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 PADDUSW VPADDUSW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 PADDW VPADDW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 PALIGNR VPALIGNR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 PAND VPAND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 PANDN VPANDN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 PAVGB VPAVGB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 PAVGW VPAVGW. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 PBLENDVB VPBLENDVB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 PBLENDW VPBLENDW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 PCLMULQDQ VPCLMULQDQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 PCMPEQB VPCMPEQB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 PCMPEQD VPCMPEQD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301 PCMPEQQ VPCMPEQQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 PCMPEQW VPCMPEQW. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 viii 26568—Rev. 3.22—May 2018 AMD64 Technology PCMPESTRI VPCMPESTRI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 PCMPESTRM VPCMPESTRM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310 PCMPGTB VPCMPGTB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 PCMPGTD VPCMPGTD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 PCMPGTQ VPCMPGTQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 PCMPGTW VPCMPGTW. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 PCMPISTRI VPCMPISTRI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 PCMPISTRM VPCMPISTRM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324 PEXTRB VPEXTRB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 PEXTRD VPEXTRD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 PEXTRQ VPEXTRQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 PEXTRW VPEXTRW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 PHADDD VPHADDD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335 PHADDSW VPHADDSW. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 PHADDW VPHADDW. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340 PHMINPOSUW VPHMINPOSUW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 PHSUBD VPHSUBD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345 PHSUBSW VPHSUBSW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 PHSUBW VPHSUBW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350 PINSRB VPINSRB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353 PINSRD VPINSRD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356 PINSRQ VPINSRQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358 PINSRW VPINSRW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360 PMADDUBSW ix AMD64 Technology 26568—Rev. 3.22—May 2018 VPMADDUBSW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362 PMADDWD VPMADDWD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365 PMAXSB VPMAXSB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367 PMAXSD VPMAXSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369 PMAXSW VPMAXSW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371 PMAXUB VPMAXUB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373 PMAXUD VPMAXUD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375 PMAXUW VPMAXUW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377 PMINSB VPMINSB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379 PMINSD VPMINSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381 PMINSW VPMINSW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383 PMINUB VPMINUB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385 PMINUD VPMINUD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387 PMINUW VPMINUW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389 PMOVMSKB VPMOVMSKB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391 PMOVSXBD VPMOVSXBD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393 PMOVSXBQ VPMOVSXBQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395 PMOVSXBW VPMOVSXBW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397 PMOVSXDQ VPMOVSXDQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399 PMOVSXWD VPMOVSXWD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401 PMOVSXWQ VPMOVSXWQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403 PMOVZXBD VPMOVZXBD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405 PMOVZXBQ VPMOVZXBQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 PMOVZXBW VPMOVZXBW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409 x 26568—Rev. 3.22—May 2018 AMD64 Technology PMOVZXDQ VPMOVZXDQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411 PMOVZXWD VPMOVZXWD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413 PMOVZXWQ VPMOVZXWQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415 PMULDQ VPMULDQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417 PMULHRSW VPMULHRSW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419 PMULHUW VPMULHUW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421 PMULHW VPMULHW. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423 PMULLD VPMULLD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425 PMULLW VPMULLW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427 PMULUDQ VPMULUDQ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429 POR VPOR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431 PSADBW VPSADBW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433 PSHUFB VPSHUFB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435 PSHUFD VPSHUFD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437 PSHUFHW VPSHUFHW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440 PSHUFLW VPSHUFLW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443 PSIGNB VPSIGNB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446 PSIGND VPSIGND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448 PSIGNW VPSIGNW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450 PSLLD VPSLLD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452 PSLLDQ VPSLLDQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455 PSLLQ VPSLLQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457 PSLLW VPSLLW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460 PSRAD xi AMD64 Technology 26568—Rev. 3.22—May 2018 VPSRAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463 PSRAW VPSRAW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466 PSRLD VPSRLD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469 PSRLDQ VPSRLDQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472 PSRLQ VPSRLQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474 PSRLW VPSRLW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477 PSUBB VPSUBB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480 PSUBD VPSUBD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482 PSUBQ VPSUBQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484 PSUBSB VPSUBSB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486 PSUBSW VPSUBSW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 488 PSUBUSB VPSUBUSB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490 PSUBUSW VPSUBUSW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492 PSUBW VPSUBW. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494 PTEST VPTEST. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496 PUNPCKHBW VPUNPCKHBW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498 PUNPCKHDQ VPUNPCKHDQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501 PUNPCKHQDQ VPUNPCKHQDQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504 PUNPCKHWD VPUNPCKHWD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507 PUNPCKLBW VPUNPCKLBW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 510 PUNPCKLDQ VPUNPCKLDQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513 PUNPCKLQDQ VPUNPCKLQDQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516 PUNPCKLWD VPUNPCKLWD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519 PXOR VPXOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522 xii 26568—Rev. 3.22—May 2018 AMD64 Technology RCPPS VRCPPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524 RCPSS VRCPSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526 ROUNDPD VROUNDPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528 ROUNDPS VROUNDPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531 ROUNDSD VROUNDSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534 ROUNDSS VROUNDSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537 RSQRTPS VRSQRTPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 540 RSQRTSS VRSQRTSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542 SHA1RNDS4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544 SHA1NEXTE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546 SHA1MSG1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 548 SHA1MSG2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 550 SHA256RNDS2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 552 SHA256MSG1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554 SHA256MSG2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556 SHUFPD VSHUFPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 558 SHUFPS VSHUFPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561 SQRTPD VSQRTPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564 SQRTPS VSQRTPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566 SQRTSD VSQRTSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568 SQRTSS VSQRTSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570 STMXCSR VSTMXCSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572 SUBPD VSUBPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574 SUBPS VSUBPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576 SUBSD VSUBSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 578 SUBSS VSUBSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 580 UCOMISD VUCOMISD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582 xiii AMD64 Technology 26568—Rev. 3.22—May 2018 UCOMISS VUCOMISS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584 UNPCKHPD VUNPCKHPD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586 UNPCKHPS VUNPCKHPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 588 UNPCKLPD VUNPCKLPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 590 UNPCKLPS VUNPCKLPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592 VBROADCASTF128 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594 VBROADCASTI128 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596 VBROADCASTSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598 VBROADCASTSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 600 VCVTPH2PS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602 VCVTPS2PH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605 VEXTRACTF128 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609 VEXTRACTI128. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611 VFMADDPD VFMADD132PD VFMADD213PD VFMADD231PD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613 VFMADDPS VFMADD132PS VFMADD213PS VFMADD231PS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 616 VFMADDSD VFMADD132SD VFMADD213SD VFMADD231SD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 619 VFMADDSS VFMADD132SS VFMADD213SS VFMADD231SS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 622 VFMADDSUBPD VFMADDSUB132PD VFMADDSUB213PD VFMADDSUB231PD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625 VFMADDSUBPS VFMADDSUB132PS VFMADDSUB213PS VFMADDSUB231PS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 628 VFMSUBADDPD VFMSUBADD132PD VFMSUBADD213PD VFMSUBADD231PD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631 VFMSUBADDPS xiv 26568—Rev. 3.22—May 2018 AMD64 Technology VFMSUBADD132PS VFMSUBADD213PS VFMSUBADD231PS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634 VFMSUBPD VFMSUB132PD VFMSUB213PD VFMSUB231PD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637 VFMSUBPS VFMSUB132PS VFMSUB213PS VFMSUB231PS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 640 VFMSUBSD VFMSUB132SD VFMSUB213SD VFMSUB231SD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643 VFMSUBSS VFMSUB132SS VFMSUB213SS VFMSUB231SS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 646 VFNMADDPD VFNMADD132PD VFNMADD213PD VFNMADD231PD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 649 VFNMADDPS VFNMADD132PS VFNMADD213PS VFNMADD231PS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 652 VFNMADDSD VFNMADD132SD VFNMADD213SD VFNMADD231SD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655 VFNMADDSS VFNMADD132SS VFNMADD213SS VFNMADD231SS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 658 VFNMSUBPD VFNMSUB132PD VFNMSUB213PD VFNMSUB231PD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 661 VFNMSUBPS VFNMSUB132PS VFNMSUB213PS VFNMSUB231PS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664 VFNMSUBSD VFNMSUB132SD VFNMSUB213SD VFNMSUB231SD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 667 xv AMD64 Technology 26568—Rev. 3.22—May 2018 VFNMSUBSS VFNMSUB132SS VFNMSUB213SS VFNMSUB231SS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 670 VFRCZPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673 VFRCZPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675 VFRCZSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 677 VFRCZSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 679 VGATHERDPD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 681 VGATHERDPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 683 VGATHERQPD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685 VGATHERQPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 687 VINSERTF128 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 689 VINSERTI128 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 691 VMASKMOVPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693 VMASKMOVPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695 VPBLENDD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697 VPBROADCASTB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 699 VPBROADCASTD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 701 VPBROADCASTQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703 VPBROADCASTW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705 VPCMOV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707 VPCOMB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 709 VPCOMD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 711 VPCOMQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713 VPCOMUB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715 VPCOMUD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 717 VPCOMUQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 719 VPCOMUW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 721 VPCOMW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723 VPERM2F128 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725 VPERM2I128 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 727 VPERMD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 729 VPERMIL2PD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 731 VPERMIL2PS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735 VPERMILPD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 739 VPERMILPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 742 VPERMPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 746 VPERMPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 748 VPERMQ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 750 VPGATHERDD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 752 VPGATHERDQ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 754 VPGATHERQD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 756 VPGATHERQQ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 758 VPHADDBD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 760 VPHADDBQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 762 VPHADDBW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764 xvi 26568—Rev. 3.22—May 2018 AMD64 Technology VPHADDDQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 766 VPHADDUBD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 768 VPHADDUBQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 770 VPHADDUBW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 772 VPHADDUDQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774 VPHADDUWD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 776 VPHADDUWQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 778 VPHADDWD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 780 VPHADDWQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 782 VPHSUBBW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 784 VPHSUBDQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 786 VPHSUBWD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 788 VPMACSDD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 790 VPMACSDQH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 792 VPMACSDQL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 794 VPMACSSDD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 796 VPMACSSDQH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 798 VPMACSSDQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 800 VPMACSSWD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 802 VPMACSSWW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 804 VPMACSWD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 806 VPMACSWW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 808 VPMADCSSWD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 810 VPMADCSWD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 812 VPMASKMOVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 814 VPMASKMOVQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 816 VPPERM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 818 VPROTB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 820 VPROTD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 822 VPROTQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 824 VPROTW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 826 VPSHAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 828 VPSHAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 830 VPSHAQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 832 VPSHAW. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 834 VPSHLB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 836 VPSHLD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 838 VPSHLQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 840 VPSHLW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 842 VPSLLVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 844 VPSLLVQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 846 VPSRAVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 848 VPSRLVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 850 VPSRLVQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 852 VTESTPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 854 VTESTPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 856 VZEROALL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 858 xvii AMD64 Technology 26568—Rev. 3.22—May 2018 VZEROUPPER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 859 XGETBV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 860 XORPD VXORPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 861 XORPS VXORPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 863 XRSTOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 865 XRSTORS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 867 XSAVE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 869 XSAVEC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 871 XSAVEOPT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 873 XSAVES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 875 XSETBV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 877 3 Exception Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .879 Appendix A A.1 A.2 A.3 A.4 A.5 A.6 A.7 A.8 A.9 A.10 A.11 AES Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .973 AES Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 973 Coding Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 973 AES Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 974 Algebraic Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 974 A.4.1 Multiplication in the Field GF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 975 A.4.2 Multiplication of 4x4 Matrices Over GF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 976 AES Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 976 A.5.1 Sequence of Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 978 Initializing the Sbox and InvSBox Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 979 A.6.1 Computation of SBox and InvSBox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 980 A.6.2 Initialization of InvSBox[ ] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 982 Encryption and Decryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 984 A.7.1 The Encrypt( ) and Decrypt( ) Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 984 A.7.2 Round Sequences and Key Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 985 The Cipher Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 986 A.8.1 Text to Matrix Conversion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 987 A.8.2 Cipher Transformations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 987 A.8.3 Matrix to Text Conversion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 989 The InvCipher Function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 989 A.9.1 Text to Matrix Conversion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 990 A.9.2 InvCypher Transformations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 990 A.9.3 Matrix to Text Conversion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 992 An Alternative Decryption Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 992 Computation of GFInv with Euclidean Greatest Common Divisor . . . . . . . . . . . . . . . . . . . 994 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 997 xviii 26568—Rev. 3.22—May 2018 AMD64 Technology Figures Figure 1-1. Typical Descriptive Synopsis - Extended SSE Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Figure 1-2. VSIB Byte Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Figure 1-3. Byte-wide Character String – Memory and Register Image. . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Figure 2-1. Typical Instruction Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Figure 2-2. (V)MPSADBW Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 Figure A-1. GFMatrix Representation of 16-byte Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 974 Figure A-2. GFMatrix to Operand Byte Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 974 xix AMD64 Technology xx 26568—Rev. 3.22—May 2018 26568—Rev. 3.22—May 2018 AMD64 Technology Tables Table 1-1. Three-Operand Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Table 1-2. Four-Operand Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Table 1-3. Source Data Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Table 1-4. Comparison Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Table 1-5. Post-processing Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Table 1-6. Indexed Output Option Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Table 1-7. Masked Output Option Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Table 1-8. State of Affected Flags After Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Table 3-1. Instructions By Exception Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 879 Table A-1. SBox Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 982 Table A-2. InvSBox Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 984 Table A-3. Cipher Key, Round Sequence, and Round Key Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 985 xxi AMD64 Technology xxii 26568—Rev. 3.22—May 2018 26568—Rev. 3.22—May 2018 AMD64 Technology Revision History Date Revision Description May 2018 3.22 Update Packed String Compare Algorithm Fixed a number of erroneous references to double precision that should be single precision Separate out MOVQ from MOVD December 2017 3.21 Clarifications to XGETBV, XRSTOR, XRSTORS, XSAVE, XSAVEC, XSAVEOPT, XSAVES, and XSETBV instructions. March 2017 3.20 Corrections to ROUNDPD, VROUNDPD, ROUNDPS, VROUNDPS, ROUNDSD, VROUNDSD, ROUNDSS, VROUNDSS, VPERMD, VPERMPD, VPERMPS, VPERMQ, VTESTPD, VTESTPS, XGETBV, XSETBV, XSAVE, and AVX instruction descriptions. Added SHA1RNDS4, SHA1NEXTE, SHA1MSG1, SHA1MSG2, SHA256RNDS2, SHA256MSG1, SHA256MSG2, XRSTOR, XRSTORS and XSAVEC instructions. June 2015 3.19 Corrections to the MOVLPD, PHSUBW, PHSUBSW instruction descriptions. October 2013 3.18 Added AVX2 Instructions. Added “Instruction Support” subsection to each instruction reference page that lists CPUID feature bit information in a table. 3.17 Removed all references to the CPUID specification which has been superseded by Volume 3, Appendix E, "Obtaining Processor Information Via the CPUID Instruction." Corrected exceptions table for the explicitly-aligned load/store instructions. General protection exception does not depend on state of MXCSR.MM bit. September 2012 3.16 Corrected REX.W bit encoding for the MOVD instruction. (See page 186.) Corrected L bit encoding for the VMOVQ (D6h opcode) instruction. (See page 222.) Corrected statement about zero extension for third encoding (11h opcode) of MOVSS instruction. (See page 230.) March 2012 3.15 Corrected instruction encoding for VPCOMUB, VPCOMUD, VPCOMUQ, VPCOMUW, and VPHSUBDQ instructions. Other minor corrections. May 2013 xxiii AMD64 Technology Date 26568—Rev. 3.22—May 2018 Revision Description 3.14 Reworked Section 1.5, "String Compare Instructions" on page 10. Revised descriptions of the string compare instructions in instruction reference. Moved AES overview to Appendix A. Clarified trap and exception behavior for elements not selected for writing. See MASKMOVDQU VMASKMOVDQU on page 160. Additional minor corrections and clarifications. September 2011 3.13 Moved discussion of extended instruction encoding; VEX and XOP prefixes to Volume 3. Added FMA instructions. Described on the corresponding FMA4 reference page. Moved BMI and TBM instructions to Volume 3. Added XSAVEOPT instruction. Corrected descriptions of VSQRTSD and VSQRTSS. May 2011 3.12 Added F16C, BMI, and TBM instructions. December 2010 3.11 Complete revision and reformat accommodating 128-bit and 256-bit media instructions. Includes revised definitions of legacy SSE, SSE2, SSE3, SSE4.1, SSE4.2, and SSSE3 instructions, as well as new definitions of extended AES, AVX, CLMUL, FMA4, and XOP instructions. Introduction includes supplemental information concerning encoding of extended instructions, enhanced processor state management provided by the XSAVE/XRSTOR instructions, cryptographic capabilities of the AES instructions, and functionality of extended string comparison instructions. September 2007 3.10 Added minor clarifications and corrected typographical and formatting errors. July 2007 3.09 Added the following instructions: EXTRQ, INSERTQ, MOVNTSD, and MOVNTSS. Added misaligned exception mask (MXCSR.MM) information. Added imm8 values with corresponding mnemonics to (V)CMPPD, (V)CMPPS, (V)CMPSD, and (V)CMPSS. Reworded CPUID information in condition tables. Added minor clarifications and corrected typographical and formatting errors. September 2006 3.08 Made minor corrections. December 2005 3.07 Made minor editorial and formatting changes. December 2011 xxiv 26568—Rev. 3.22—May 2018 AMD64 Technology Date Revision Description January 2005 3.06 Added documentation on SSE3 instructions. Corrected numerous minor factual errors and typos. September 2003 3.05 Made numerous small factual corrections. April 2003 3.04 Made minor corrections. xxv AMD64 Technology xxvi 26568—Rev. 3.22—May 2018 26568—Rev. 3.22—May 2018 AMD64 Technology Preface About This Book This book is part of a multivolume work entitled the AMD64 Architecture Programmer’s Manual. The complete set includes the following volumes. Title Order No. Volume 1: Application Programming 24592 Volume 2: System Programming 24593 Volume 3: General-Purpose and System Instructions 24594 Volume 4: 128-Bit and 256-Bit Media Instructions 26568 Volume 5: 64-Bit Media and x87 Floating-Point Instructions 26569 Audience This volume is intended for programmers who develop application or system software. Organization Volumes 3, 4, and 5 describe the AMD64 instruction set in detail, providing mnemonic syntax, instruction encoding, functions, affected flags, and possible exceptions. The AMD64 instruction set is divided into five subsets: • • • • • General-purpose instructions System instructions Streaming SIMD Extensions (includes 128-bit and 256-bit media instructions) 64-bit media instructions (MMX™) x87 floating-point instructions Several instructions belong to, and are described identically in, multiple instruction subsets. This volume describes the Streaming SIMD Extensions (SSE) instruction set which includes 128-bit and 256-bit media instructions. SSE includes both legacy and extended forms. The index at the end cross-references topics within this volume. For other topics relating to the AMD64 architecture, and for information on instructions in other subsets, see the tables of contents and indexes of the other volumes. xxvii AMD64 Technology 26568—Rev. 3.22—May 2018 Conventions and Definitions The section which follows, Notational Conventions, describes notational conventions used in this volume. The next section, Definitions, lists a number of terms used in this volume along with their technical definitions. Some of these definitions assume knowledge of the legacy x86 architecture. See “Related Documents” on page xl for further information about the legacy x86 architecture. Finally, the Registers section lists the registers which are a part of the system programming model. Notational Conventions Section 1.1, “Syntax and Notation” on page 2 describes notation relating specifically to instruction encoding. #GP(0) An instruction exception—in this example, a general-protection exception with error code of 0. 1011b A binary value, in this example, a 4-bit value. F0EA_0B40h A hexadecimal value, in this example a 32-bit value. Underscore characters may be used to improve readability. 128 Numbers without an alpha suffix are decimal unless the context indicates otherwise. 7:4 A bit range, from bit 7 to 4, inclusive. The high-order bit is shown first. Commas may be inserted to indicate gaps. #GP(0) A general-protection exception (#GP) with error code of 0. CPUID FnXXXX_XXXX_RRR[FieldName] Support for optional features or the value of an implementation-specific parameter of a processor can be discovered by executing the CPUID instruction on that processor. To obtain this value, software must execute the CPUID instruction with the function code XXXX_XXXXh in EAX and then examine the field FieldName returned in register RRR. If the “_RRR” notation is followed by “_xYYY”, register ECX must be set to the value YYYh before executing CPUID. When FieldName is not given, the entire contents of register RRR contains the desired value. When determining optional feature support, if the bit identified by FieldName is set to a one, the feature is supported on that processor. CR0–CR4 A register range, from register CR0 through CR4, inclusive, with the low-order register first. xxviii 26568—Rev. 3.22—May 2018 AMD64 Technology CR4[OSXSAVE], CR4.OSXSAVE The OSXSAVE bit of the CR4 register. CR0[PE] = 1, CR0.PE = 1 The PE bit of the CR0 register has a value of 1. EFER[LME] = 0, EFER.LME = 0 The LME field of the EFER register is cleared (contains a value of 0). DS:rSI The content of a memory location whose segment address is in the DS register and whose offset relative to that segment is in the rSI register. RFLAGS[13:12] A field within a register identified by its bit range. In this example, corresponding to the IOPL field. Definitions 128-bit media instruction Instructions that operate on the various 128-bit vector data types. Supported within both the legacy SSE and extended SSE instruction sets. 256-bit media instruction Instructions that operate on the various 256-bit vector data types. Supported within the extended SSE instruction set. 64-bit media instructions Instructions that operate on the 64-bit vector data types. These are primarily a combination of MMX and 3DNow!™ instruction sets and their extensions, with some additional instructions from the SSE1 and SSE2 instruction sets. 16-bit mode Legacy mode or compatibility mode in which a 16-bit address size is active. See legacy mode and compatibility mode. 32-bit mode Legacy mode or compatibility mode in which a 32-bit address size is active. See legacy mode and compatibility mode. 64-bit mode A submode of long mode. In 64-bit mode, the default address size is 64 bits and new features, such as register extensions, are supported for system and application software. xxix AMD64 Technology 26568—Rev. 3.22—May 2018 absolute A displacement that references the base of a code segment rather than an instruction pointer. See relative. AES Advance Encryption Standard (AES) algorithm acceleration instructions; part of Streaming SIMD Extensions (SSE). ASID Address space identifier. AVX Extension of the SSE instruction set supporting 256-bit vector (packed) operands. See Streaming SIMD Extensions. biased exponent The sum of a floating-point value’s exponent and a constant bias for a particular floating-point data type. The bias makes the range of the biased exponent always positive, which allows reciprocation without overflow. byte Eight bits. clear, cleared To write the value 0 to a bit or a range of bits. See set. compatibility mode A submode of long mode. In compatibility mode, the default address size is 32 bits, and legacy 16bit and 32-bit applications run without modification. commit To irreversibly write, in program order, an instruction’s result to software-visible storage, such as a register (including flags), the data cache, an internal write buffer, or memory. CPL Current privilege level. direct Referencing a memory address included in the instruction syntax as an immediate operand. The address may be an absolute or relative address. See indirect. displacement A signed value that is added to the base of a segment (absolute addressing) or an instruction pointer (relative addressing). Same as offset. xxx 26568—Rev. 3.22—May 2018 AMD64 Technology doubleword Two words, or four bytes, or 32 bits. double quadword Eight words, or 16 bytes, or 128 bits. Also called octword. effective address size The address size for the current instruction after accounting for the default address size and any address-size override prefix. effective operand size The operand size for the current instruction after accounting for the default operand size and any operand-size override prefix. element See vector. exception An abnormal condition that occurs as the result of instruction execution. Processor response to an exception depends on the type of exception. For all exceptions except SSE floating-point exceptions and x87 floating-point exceptions, control is transferred to a handler (or service routine) for that exception as defined by the exception’s vector. For floating-point exceptions defined by the IEEE 754 standard, there are both masked and unmasked responses. When unmasked, the exception handler is called, and when masked, a default response is provided instead of calling the handler. extended SSE instructions Enhanced set of SIMD instructions supporting 256-bit vector data types and allowing the specification of up to four operands. A subset of the Streaming SIMD Extensions (SSE). Includes the AVX, FMA, FMA4, and XOP instructions. Compare legacy SSE. flush An often ambiguous term meaning (1) writeback, if modified, and invalidate, as in “flush the cache line,” or (2) invalidate, as in “flush the pipeline,” or (3) change a value, as in “flush to zero.” FMA4 Fused Multiply Add, four operand. Part of the extended SSE instruction set. FMA Fused Multiply Add. Part of the extended SSE instruction set. GDT Global descriptor table. xxxi AMD64 Technology 26568—Rev. 3.22—May 2018 GIF Global interrupt flag. IDT Interrupt descriptor table. IGN Ignored. Value written is ignored by hardware. Value returned on a read is indeterminate. See reserved. indirect Referencing a memory location whose address is in a register or other memory location. The address may be an absolute or relative address. See direct. IRB The virtual-8086 mode interrupt-redirection bitmap. IST The long-mode interrupt-stack table. IVT The real-address mode interrupt-vector table. LDT Local descriptor table. legacy x86 The legacy x86 architecture. legacy mode An operating mode of the AMD64 architecture in which existing 16-bit and 32-bit applications and operating systems run without modification. A processor implementation of the AMD64 architecture can run in either long mode or legacy mode. Legacy mode has three submodes, real mode, protected mode, and virtual-8086 mode. legacy SSE instructions All Streaming SIMD Extensions instructions prior to AVX, XOP, and FMA4. Legacy SSE instructions primarily utilize operands held in XMM registers. The legacy SSE instructions include the original Streaming SIMD Extensions (SSE1) and the subsequent extensions SSE2, SSE3, SSSE3, SSE4, SSE4A, SSE4.1, and SSE4.2. See Streaming SIMD instructions. long mode An operating mode unique to the AMD64 architecture. A processor implementation of the AMD64 architecture can run in either long mode or legacy mode. Long mode has two submodes, 64-bit mode and compatibility mode. xxxii 26568—Rev. 3.22—May 2018 AMD64 Technology lsb Least-significant bit. LSB Least-significant byte. main memory Physical memory, such as RAM and ROM (but not cache memory) that is installed in a particular computer system. mask (1) A control bit that prevents the occurrence of a floating-point exception from invoking an exception-handling routine. (2) A field of bits used for a control purpose. MBZ Must be zero. If software attempts to set an MBZ bit to 1, a general-protection exception (#GP) occurs. See reserved. memory Unless otherwise specified, main memory. moffset A 16, 32, or 64-bit offset that specifies a memory operand directly, without using a ModRM or SIB byte. msb Most-significant bit. MSB Most-significant byte. octword Same as double quadword. offset Same as displacement. overflow The condition in which a floating-point number is larger in magnitude than the largest, finite, positive or negative number that can be represented in the data-type format being used. packed See vector. PAE Physical-address extensions. xxxiii AMD64 Technology 26568—Rev. 3.22—May 2018 physical memory Actual memory, consisting of main memory and cache. probe A check for an address in processor caches or internal buffers. External probes originate outside the processor, and internal probes originate within the processor. protected mode A submode of legacy mode. quadword Four words, eight bytes, or 64 bits. RAZ Read as zero. Value returned on a read is always zero (0) regardless of what was previously written. See reserved. real-address mode, real mode A short name for real-address mode, a submode of legacy mode. relative Referencing with a displacement (offset) from an instruction pointer rather than the base of a code segment. See absolute. reserved Fields marked as reserved may be used at some future time. To preserve compatibility with future processors, reserved fields require special handling when read or written by software. Software must not depend on the state of a reserved field (unless qualified as RAZ), nor upon the ability of such fields to return a previously written state. If a field is marked reserved without qualification, software must not change the state of that field; it must reload that field with the same value returned from a prior read. Reserved fields may be qualified as IGN, MBZ, RAZ, or SBZ (see definitions). REX A legacy instruction modifier prefix that specifies 64-bit operand size and provides access to additional registers. RIP-relative addressing Addressing relative to the 64-bit relative instruction pointer. SBZ Should be zero. An attempt by software to set an SBZ bit to 1 results in undefined behavior. See reserved. xxxiv 26568—Rev. 3.22—May 2018 AMD64 Technology scalar An atomic value existing independently of any specification of location, direction, etc., as opposed to vectors. set To write the value 1 to a bit or a range of bits. See clear. SIMD Single instruction, multiple data. See vector. Streaming SIMD Extensions (SSE) Instructions that operate on scalar or vector (packed) integer and floating point numbers. The SSE instruction set comprises the legacy SSE and extended SSE instruction sets. SSE1 Original SSE instruction set. Includes instructions that operate on vector operands in both the MMX and the XMM registers. SSE2 Extensions to the SSE instruction set. SSE3 Further extensions to the SSE instruction set. SSSE3 Further extensions to the SSE instruction set. SSE4.1 Further extensions to the SSE instruction set. SSE4.2 Further extensions to the SSE instruction set. SSE4A A minor extension to the SSE instruction set adding the instructions EXTRQ, INSERTQ, MOVNTSS, and MOVNTSD. sticky bit A bit that is set or cleared by hardware and that remains in that state until explicitly changed by software. TSS Task-state segment. xxxv AMD64 Technology 26568—Rev. 3.22—May 2018 underflow The condition in which a floating-point number is smaller in magnitude than the smallest nonzero, positive or negative number that can be represented in the data-type format being used. vector (1) A set of integer or floating-point values, called elements, that are packed into a single operand. Most media instructions use vectors as operands. Also called packed or SIMD operands. (2) An interrupt descriptor table index, used to access exception handlers. See exception. VEX prefix Extended instruction encoding escape prefix. Introduces a two- or three-byte encoding escape sequence used in the encoding of AVX instructions. Opens a new extended instruction encoding space. Fields select the opcode map and allow the specification of operand vector length and an additional operand register. See XOP prefix. virtual-8086 mode A submode of legacy mode. VMCB Virtual machine control block. VMM Virtual machine monitor. word Two bytes, or 16 bits. x86 See legacy x86. XOP instructions Part of the extended SSE instruction set using the XOP prefix. See Streaming SIMD Extensions. XOP prefix Extended instruction encoding escape prefix. Introduces a three-byte escape sequence used in the encoding of XOP instructions. Opens a new extended instruction encoding space distinct from the VEX opcode space. Fields select the opcode map and allow the specification of operand vector length and an additional operand register. See VEX prefix. Registers In the following list of registers, mnemonics refer either to the register itself or to the register content: AH–DH The high 8-bit AH, BH, CH, and DH registers. See [AL–DL]. xxxvi 26568—Rev. 3.22—May 2018 AMD64 Technology AL–DL The low 8-bit AL, BL, CL, and DL registers. See [AH–DH]. AL–r15B The low 8-bit AL, BL, CL, DL, SIL, DIL, BPL, SPL, and [r8B–r15B] registers, available in 64-bit mode. BP Base pointer register. CRn Control register number n. CS Code segment register. eAX–eSP The 16-bit AX, BX, CX, DX, DI, SI, BP, and SP registers or the 32-bit EAX, EBX, ECX, EDX, EDI, ESI, EBP, and ESP registers. See [rAX–rSP]. EFER Extended features enable register. eFLAGS 16-bit or 32-bit flags register. See rFLAGS. EFLAGS 32-bit (extended) flags register. eIP 16-bit or 32-bit instruction-pointer register. See rIP. EIP 32-bit (extended) instruction-pointer register. FLAGS 16-bit flags register. GDTR Global descriptor table register. GPRs General-purpose registers. For the 16-bit data size, these are AX, BX, CX, DX, DI, SI, BP, and SP. For the 32-bit data size, these are EAX, EBX, ECX, EDX, EDI, ESI, EBP, and ESP. For the 64-bit data size, these include RAX, RBX, RCX, RDX, RDI, RSI, RBP, RSP, and R8–R15. xxxvii AMD64 Technology 26568—Rev. 3.22—May 2018 IDTR Interrupt descriptor table register. IP 16-bit instruction-pointer register. LDTR Local descriptor table register. MSR Model-specific register. r8–r15 The 8-bit R8B–R15B registers, or the 16-bit R8W–R15W registers, or the 32-bit R8D–R15D registers, or the 64-bit R8–R15 registers. rAX–rSP The 16-bit AX, BX, CX, DX, DI, SI, BP, and SP registers, or the 32-bit EAX, EBX, ECX, EDX, EDI, ESI, EBP, and ESP registers, or the 64-bit RAX, RBX, RCX, RDX, RDI, RSI, RBP, and RSP registers. Replace the placeholder r with nothing for 16-bit size, “E” for 32-bit size, or “R” for 64bit size. RAX 64-bit version of the EAX register. RBP 64-bit version of the EBP register. RBX 64-bit version of the EBX register. RCX 64-bit version of the ECX register. RDI 64-bit version of the EDI register. RDX 64-bit version of the EDX register. rFLAGS 16-bit, 32-bit, or 64-bit flags register. See RFLAGS. RFLAGS 64-bit flags register. See rFLAGS. xxxviii 26568—Rev. 3.22—May 2018 AMD64 Technology rIP 16-bit, 32-bit, or 64-bit instruction-pointer register. See RIP. RIP 64-bit instruction-pointer register. RSI 64-bit version of the ESI register. RSP 64-bit version of the ESP register. SP Stack pointer register. SS Stack segment register. TPR Task priority register (CR8). TR Task register. YMM/XMM Set of sixteen (eight accessible in legacy and compatibility modes) 256-bit wide registers that hold scalar and vector operands used by the SSE instructions. Endian Order The x86 and AMD64 architectures address memory using little-endian byte-ordering. Multibyte values are stored with the least-significant byte at the lowest byte address, and illustrated with their least significant byte at the right side. Strings are illustrated in reverse order, because the addresses of string bytes increase from right to left. xxxix AMD64 Technology 26568—Rev. 3.22—May 2018 Related Documents • • • • • • • • • • • • • • • • • • • • • • • xl Peter Abel, IBM PC Assembly Language and Programming, Prentice-Hall, Englewood Cliffs, NJ, 1995. Rakesh Agarwal, 80x86 Architecture & Programming: Volume II, Prentice-Hall, Englewood Cliffs, NJ, 1991. AMD, AMD-K6™ MMX™ Enhanced Processor Multimedia Technology, Sunnyvale, CA, 2000. AMD, 3DNow!™ Technology Manual, Sunnyvale, CA, 2000. AMD, AMD Extensions to the 3DNow!™ and MMX™ Instruction Sets, Sunnyvale, CA, 2000. Don Anderson and Tom Shanley, Pentium Processor System Architecture, Addison-Wesley, New York, 1995. Nabajyoti Barkakati and Randall Hyde, Microsoft Macro Assembler Bible, Sams, Carmel, Indiana, 1992. Barry B. Brey, 8086/8088, 80286, 80386, and 80486 Assembly Language Programming, Macmillan Publishing Co., New York, 1994. Barry B. Brey, Programming the 80286, 80386, 80486, and Pentium Based Personal Computer, Prentice-Hall, Englewood Cliffs, NJ, 1995. Ralf Brown and Jim Kyle, PC Interrupts, Addison-Wesley, New York, 1994. Penn Brumm and Don Brumm, 80386/80486 Assembly Language Programming, Windcrest McGraw-Hill, 1993. Geoff Chappell, DOS Internals, Addison-Wesley, New York, 1994. Chips and Technologies, Inc. Super386 DX Programmer’s Reference Manual, Chips and Technologies, Inc., San Jose, 1992. John Crawford and Patrick Gelsinger, Programming the 80386, Sybex, San Francisco, 1987. Cyrix Corporation, 5x86 Processor BIOS Writer's Guide, Cyrix Corporation, Richardson, TX, 1995. Cyrix Corporation, M1 Processor Data Book, Cyrix Corporation, Richardson, TX, 1996. Cyrix Corporation, MX Processor MMX Extension Opcode Table, Cyrix Corporation, Richardson, TX, 1996. Cyrix Corporation, MX Processor Data Book, Cyrix Corporation, Richardson, TX, 1997. Ray Duncan, Extending DOS: A Programmer's Guide to Protected-Mode DOS, Addison Wesley, NY, 1991. William B. Giles, Assembly Language Programming for the Intel 80xxx Family, Macmillan, New York, 1991. Frank van Gilluwe, The Undocumented PC, Addison-Wesley, New York, 1994. John L. Hennessy and David A. Patterson, Computer Architecture, Morgan Kaufmann Publishers, San Mateo, CA, 1996. Thom Hogan, The Programmer’s PC Sourcebook, Microsoft Press, Redmond, WA, 1991. 26568—Rev. 3.22—May 2018 • • • • • • • • • • • • • • • • • • • • • • AMD64 Technology Hal Katircioglu, Inside the 486, Pentium, and Pentium Pro, Peer-to-Peer Communications, Menlo Park, CA, 1997. IBM Corporation, 486SLC Microprocessor Data Sheet, IBM Corporation, Essex Junction, VT, 1993. IBM Corporation, 486SLC2 Microprocessor Data Sheet, IBM Corporation, Essex Junction, VT, 1993. IBM Corporation, 80486DX2 Processor Floating Point Instructions, IBM Corporation, Essex Junction, VT, 1995. IBM Corporation, 80486DX2 Processor BIOS Writer's Guide, IBM Corporation, Essex Junction, VT, 1995. IBM Corporation, Blue Lightning 486DX2 Data Book, IBM Corporation, Essex Junction, VT, 1994. Institute of Electrical and Electronics Engineers, IEEE Standard for Binary Floating-Point Arithmetic, ANSI/IEEE Std 754-1985. Institute of Electrical and Electronics Engineers, IEEE Standard for Radix-Independent FloatingPoint Arithmetic, ANSI/IEEE Std 854-1987. Muhammad Ali Mazidi and Janice Gillispie Mazidi, 80X86 IBM PC and Compatible Computers, Prentice-Hall, Englewood Cliffs, NJ, 1997. Hans-Peter Messmer, The Indispensable Pentium Book, Addison-Wesley, New York, 1995. Karen Miller, An Assembly Language Introduction to Computer Architecture: Using the Intel Pentium, Oxford University Press, New York, 1999. Stephen Morse, Eric Isaacson, and Douglas Albert, The 80386/387 Architecture, John Wiley & Sons, New York, 1987. NexGen Inc., Nx586 Processor Data Book, NexGen Inc., Milpitas, CA, 1993. NexGen Inc., Nx686 Processor Data Book, NexGen Inc., Milpitas, CA, 1994. Bipin Patwardhan, Introduction to the Streaming SIMD Extensions in the Pentium III, www.x86.org/articles/sse_pt1/ simd1.htm, June, 2000. Peter Norton, Peter Aitken, and Richard Wilton, PC Programmer’s Bible, Microsoft Press, Redmond, WA, 1993. PharLap 386|ASM Reference Manual, Pharlap, Cambridge MA, 1993. PharLap TNT DOS-Extender Reference Manual, Pharlap, Cambridge MA, 1995. Sen-Cuo Ro and Sheau-Chuen Her, i386/i486 Advanced Programming, Van Nostrand Reinhold, New York, 1993. Jeffrey P. Royer, Introduction to Protected Mode Programming, course materials for an onsite class, 1992. Tom Shanley, Protected Mode System Architecture, Addison Wesley, NY, 1996. SGS-Thomson Corporation, 80486DX Processor SMM Programming Manual, SGS-Thomson Corporation, 1995. xli AMD64 Technology • • • xlii 26568—Rev. 3.22—May 2018 Walter A. Triebel, The 80386DX Microprocessor, Prentice-Hall, Englewood Cliffs, NJ, 1992. John Wharton, The Complete x86, MicroDesign Resources, Sebastopol, California, 1994. Web sites and newsgroups: - www.amd.com - news.comp.arch - news.comp.lang.asm.x86 - news.intel.microprocessors - news.microsoft 26568—Rev. 3.22—May 2018 AMD64 Technology 1 Introduction Processors capable of performing the same mathematical operation simultaneously on multiple data streams are classified as single-instruction, multiple-data (SIMD). Instructions that utilize this hardware capability are called SIMD instructions. Software can utilize SIMD instructions to drastically increase the performance of media applications which typically employ algorithms that perform the same mathematical operation on a set of values in parallel. The original SIMD instruction set was called MMX and operated on 64-bit wide vectors of integer and floating-point elements. Subsequently a new SIMD instruction set called the Streaming SIMD Extensions (SSE) was added to the architecture. The SSE instruction set defines a new programming model with its own array of vector data registers (YMM/XMM registers) and a control and status register (MXCSR). Most SSE instructions pull their operands from one or more YMM/XMM registers and store results in a YMM/XMM register, although some instructions use a GPR as either a source or destination. Most instructions allow one operand to be loaded from memory. The set includes instructions to load a YMM/XMM register from memory (aligned or unaligned) and store the contents of a YMM/XMM register. An overview of the SSE instruction set is provided in Volume 1, Chapter 4. This volume provides detailed descriptions of each instruction within the SSE instruction set. The SSE instruction set comprises the legacy SSE instructions and the extended SSE instructions. Legacy SSE instructions comprise the following subsets: • • • • • • • • The original Streaming SIMD Extensions (herein referred to as SSE1) SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 SSE4A Advanced Encryption Standard (AES) Extended SSE instructions comprise the following subsets: • • • • • AVX AVX2 FMA FMA4 XOP 1 AMD64 Technology 26568—Rev. 3.22—May 2018 Legacy SSE architecture supports operations involving 128-bit vectors and defines the base programming model including the SSE registers, the Media eXtension Control and Status Register (MXCSR), and the instruction exception behavior. The Streaming SIMD Extensions (SSE) instruction set is extended to include the AVX, FMA, FMA4, and XOP instruction sets. The AVX instruction set provides an extended form for most legacy SSE instructions and several new instructions. Extensions include providing for the specification of a unique destination register for operations with two or more source operands and support for 256-bit wide vectors. Some AVX instructions also provide enhanced functionality compared to their legacy counterparts. A significant feature of the extended SSE instruction set architecture is the doubling of the width of the XMM registers. These registers are referred to as the YMM registers. The XMM registers overlay the lower octword (128 bits) of the YMM registers. Registers YMM/XMM0–7 are accessible in legacy and compatibility mode. Registers YMM/XMM8–15 are available in 64-bit mode (a subset of long mode). VEX/XOP instruction prefixes allow instruction encodings to address the additional registers. The SSE instructions can be used in processor legacy mode or long (64-bit) mode. CPUID Fn8000_0001_EDX[LM] indicates the availability of long mode. Compilation for execution in 64-bit mode offers the following advantages: • • • Access to an additional eight YMM/XMM registers for a total of 16 Access to an additional eight 64-bit general-purpose registers for a total of 16 Access to the 64-bit virtual address space and the RIP-relative addressing mode Hardware support for each of the subsets of SSE instructions listed above is indicated by CPUID feature flags. Refer to Volume 3, Appendix D, “Instruction Subsets and CPUID Feature Flags,” for a complete list of instruction-related feature flags. The CPUID feature flags that pertain to each instruction are also given in the instruction descriptions below. For information on using the CPUID instruction, see the instruction description in Volume 3. Chapter 2, “Instruction Reference” contains detailed descriptions of each instruction, organized in alphabetic order by mnemonic. For those legacy SSE instructions that have an AVX form, the extended form of the instruction is described together with the legacy instruction in one entry. For these instructions, the instruction reference page is located based on the instruction mnemonic of the legacy SSE and not the extended (AVX) form. Those AVX instructions without a legacy form are listed in order by their AVX mnemonic. The mnemonic for all extended SSE instructions including the FMA and XOP instructions begin with the letter V. 1.1 Syntax and Notation The descriptive synopsis of opcode syntax for legacy SSE instructions follows the conventions described in Volume 3: General Purpose and System Instructions. See Chapter 2 and the section entitled “Notation.” 2 26568—Rev. 3.22—May 2018 AMD64 Technology For general information on the programming model and overview descriptions of the SSE instruction set, see: • • • “Streaming SIMD Extensions Media and Scientific Programming” in Volume 1. “Instruction Encoding” in Volume 3 “Summary of Registers and Data Types” in Volume 3. The syntax of the extended instruction sets requires an expanded synopsis. The expanded synopsis includes a mnemonic summary and a summary of prefix sequence fields. Figure 1-1 shows the descriptive synopsis of a typical XOP instruction. The synopsis of VEX-encoded instructions have the same format, differing only in regard to the instruction encoding escape prefix, that is, VEX instead of XOP. Mnemonic Encoding XOP RXB.map_select W.vvvv.L.pp VPCMOV ymm1, ymm2, ymm3/mem256, ymm4 8F assembly language representation encoding escape prefix 3-bit field representing R, X, B bit values 5-bit map_select field RXB.08 0.src.1.00 Opcode A2 /r ib W bit vvvv field L bit pp field opcode register/memory type specifier immediate operand Figure 1-1. Typical Descriptive Synopsis - Extended SSE Instructions 1.2 Extended Instruction Encoding The legacy SSE instructions are encoded using the legacy encoding syntax and the extended instructions are encoded using an enhanced encoding syntax which is compatible with the legacy syntax. Both are described in detail in Chapter 1 of Volume 3. As described in Volume 3, the extended instruction encoding syntax utilizes multi-byte escape sequences to both select alternate opcode maps as well as augment the encoding of the instruction. Multi-byte escape sequences are introduced by one of the two VEX prefixes or the XOP prefix. The AVX and AVX2 instructions utilize either the two-byte (introduced by the VEX C5h prefix) or the three-byte (introduced by the VEX C4h prefix) encoding escape sequence. XOP instructions are encoded using a three-byte encoding escape sequence introduced by the XOP prefix (except for the XOP instructions VPERMIL2PD and VPERMIL2PS which are encoded using the VEX prefix). The XOP prefix is 8Fh. The three-byte encoding escape sequences utilize the map_select field of the second byte to select the opcode map used to interpret the opcode byte. 3 AMD64 Technology 26568—Rev. 3.22—May 2018 The two-byte VEX prefix sequence implicitly selects the secondary (“two-byte”) opcode map. 1.2.1 Immediate Byte Usage Unique to the SSE instructions An immediate is a value, typically an operand, explicitly provided within the instruction encoding. Depending on the opcode and the operating mode, the size of an immediate operand can be 1, 2, 4, or 8 bytes. Legacy and extended media instructions typically use an immediate byte operand (imm8). A one-byte immediate is generally shown in the instruction synopsis as “ib” suffix. For extended SSE instructions with four source operands, the suffix “is4” is used to indicate the presence of the immediate byte used to select the fourth source operand. The VPERMIL2PD and VPERMIL2PS instructions utilize a fifth 2-bit operand which is encoded along with the fourth register select index in an immediate byte. For this special case the immediate byte will be shown in the instruction synopsis as “is5”. 1.2.2 Instruction Format Examples The following sections provide examples of two-, three-, and four-operand extended instructions. These instructions generally perform nondestructive-source operations, meaning that the result of the operation is written to a separately specified destination register rather than overwriting one of the source operands. This preserves the contents of the source registers. Most legacy SSE instructions perform destructive-source operations, in which a single register is both source and destination, so source content is lost. 1.2.2.1 XMM Register Destinations The following general properties apply to YMM/XMM register destination operands. • • For legacy instructions that use XMM registers as a destination: When a result is written to a destination XMM register, bits [255:128] of the corresponding YMM register are not affected. For extended instructions that use XMM registers as a destination: When a result is written to a destination XMM register, bits [255:128] of the corresponding YMM register are cleared. 1.2.2.2 Two Operand Instructions Two-operand instructions use ModRM-based operand assignment. For most instructions, the first operand is the destination, selected by the ModRM.reg field, and the second operand is either a register or a memory source, selected by the ModRM.r/m field. VCVTDQ2PD is an example of a two-operand AVX instruction. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VCVTDQ2PD xmm1, xmm2/mem64 C4 RXB.01 0.1111.0.10 E6 /r VCVTDQ2PD ymm1, xmm2/mem128 C4 RXB.01 0.1111.1.10 E6 /r 4 26568—Rev. 3.22—May 2018 AMD64 Technology The destination register is selected by ModRM.reg. The size of the destination register is determined by VEX.L. The source is either a YMM/XMM register or a memory location specified by ModRM.r/m Because this instruction converts packed doubleword integers to double-precision floating-point values, the source data size is smaller than the destination data size. VEX.vvvv is not used and must be set to 1111b. 1.2.2.3 Three-Operand Instructions These extended instructions have two source operands and a destination operand. VPROTB is an example of a three-operand XOP instruction. There are versions of the instruction for variable-count rotation and for fixed-count rotation. VPROTB dest, src, variable-count VPROTB dest, src, fixed-count Mnemonic Encoding XOP RXB.map_select W.vvvv.L.pp Opcode VPROTB xmm1, xmm2/mem128, xmm3 8F RXB.09 0.src.0.00 90 /r VPROTB xmm1, xmm2, xmm3/mem128 8F RXB.09 1.src.0.00 90 /r VPROTB xmm1, xmm2/mem128, imm8 8F RXB.08 0.1111.0.00 90 /r ib For both versions of the instruction, the destination (dest) operand is an XMM register specified by ModRM.reg. The variable-count version of the instruction rotates each byte of the source as specified by the corresponding byte element variable-count. Selection of src and variable-count is controlled by XOP.W. • • When XOP.W = 0, src is either an XMM register or a 128-bit memory location specified by ModRM.r/m, and variable-count is an XMM register specified by XOP.vvvv. When XOP.W = 1, src is an XMM register specified by XOP.vvvv and variable-count is either an XMM register or a 128-bit memory location specified by ModRM.r/m. Table 1-1 summarizes the effect of the XOP.W bit on operand selection. Table 1-1. Three-Operand Selection XOP.W dest src variable-count 0 ModRM.reg ModRM.r/m XOP.vvvv 1 ModRM.reg XOP.vvvv ModRM.r/m The fixed-count version of the instruction rotates each byte of src as specified by the immediate byte operand fixed-count. For this version, src is either an XMM register or a 128-bit memory location 5 AMD64 Technology 26568—Rev. 3.22—May 2018 specified by ModRM.r/m. Because XOP.vvvv is not used to specify the source register, it must be set to 1111b or execution of the instruction will cause an Invalid Opcode (#UD) exception. 1.2.2.4 Four-Operand Instructions Some extended instructions have three source operands and a destination operand. This is accomplished by using the VEX/XOP.vvvv field, the ModRM.reg and ModRM.r/m fields, and bits [7:4] of an immediate byte to select the operands. The opcode suffix “is4” is used to identify the immediate byte, and the selected operands are shown in the synopsis. VFMSUBPD is an example of an four-operand FMA4 instruction. VFMSUBPD dest, src1, src2, src3 dest = src1* src2 - src3 Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VFMSUBPD xmm1, xmm2, xmm3/mem128, xmm4 C4 RXB.03 0.src.0.01 6D /r is4 VFMSUBPD ymm1, ymm2, ymm3/mem256, ymm4 C4 RXB.03 0.src.1.01 6D /r is4 VFMSUBPD xmm1, xmm2, xmm3, xmm4/mem128 C4 RXB.03 1.src.0.01 6D /r is4 VFMSUBPD ymm1, ymm2, ymm3, ymm4/mem256 C4 RXB.03 1.src.1.01 6D /r is4 The first operand, the destination (dest), is an XMM register or a YMM register (as determined by VEX.L) selected by ModRM.reg. The following three operands (src1, src2, src3) are sources. The src1 operand is an XMM or YMM register specified by VEX.vvvv. VEX.W determines the configuration of the src2 and src3 operands. • • When VEX.W = 0, src2 is either a register or a memory location specified by ModRM.r/m, and src3 is a register specified by bits [7:4] of the immediate byte. When VEX.W = 1, src2 is a register specified by bits [7:4] of the immediate byte and src3 is either a register or a memory location specified by ModRM.r/m. Table 1-1 summarizes the effect of the VEX.W bit on operand selection. Table 1-2. Four-Operand Selection VEX.W dest src1 src2 src3 0 ModRM.reg VEX.vvvv ModRM.r/m is4[7:4] 1 ModRM.reg VEX.vvvv is4[7:4] ModRM.r/m 1.3 VSIB Addressing Specific AVX2 instructions utilize a vectorized form of indexed register-indirect addressing called vector SIB (VSIB) addressing. In contrast to the standard indexed register-indirect address mode, which generates a single effective address to access a single memory operand, VSIB addressing generates an array of effective addresses which is used to access data from multiple memory locations in a single operation. 6 26568—Rev. 3.22—May 2018 AMD64 Technology VSIB addressing is encoded using three or six bytes following the opcode byte, augmented by the X and B bits from the VEX prefix. The first byte is the ModRM byte with the standard mod, reg, and r/m fields (although allowed values for the mod and r/m fields are restricted). The second is the VSIB byte which replaces the SIB byte in the encoding. The VSIB byte specifies a GPR which serves as a base address register and an XMM/YMM register that contains a packed array of index values. The two-bit scale field specifies a common scaling factor to be applied to all of the index values. A constant displacement value is encoded in the one or four bytes that follow the VSIB byte. Figure 1-2 shows the format of the VSIB byte. 7 6 SS 5 4 3 index 2 1 0 VSIB base VEX.X extends this field to 4 bits VEX.B extends this field to 4 bits v4_VSIB_format.eps Figure 1-2. VSIB Byte Format VSIB.SS (Bits [7:6]). The SS field is used to specify the scale factor to be used in the computation of each of the effective addresses. The scale factor scale is equal to 2SS (two raised to power of the value of the SS field). Therefore, if SS = 00b, scale = 1; if SS = 01b, scale = 2; if SS = 10b, scale = 4; and if SS = 11b, scale = 8. VSIB.index (Bits [5:3]). This field is concatenated with the complement of the VEX.X bit ({X, index}) to specify the YMM/XMM register that contains the packed array of index values index[i] to be used in the computation of the array of effective addresses effective address[i]. VSIB.base (Bits [5:3]). This field is concatenated with the complement of the VEX.B bit ({B, base}) to specify the general-purpose register (base GPR) that contains the base address base to be used in the computation of each of the effective addresses. 1.3.1 Effective Address Array Computation Each element i of the effective address array is computed using the formula: effective address[i] = scale * index[i] + base + displacement. where index[i] is the ith element of the XMM/YMM register specified by {X,VSIB.index}. An index element is either 32 or 64 bits wide and is treated as a signed integer. Variants of this mode use either an eight-bit or a 32-bit displacement value. One variant sets the base to zero. The value of the ModRM.mod field specifies the specific variant of VSIB addressing mode, as shown in Table 1. In the table, the notation [XMMn/YMMn] indicates the XMM/YMM register that contains the packed index array and [base GPR] means the contents of the base GPR selected by {B, base}. 7 AMD64 Technology 26568—Rev. 3.22—May 2018 Table 1: Vectorized Addressing Modes Index1 ModRM.mod 00 01 10 0000 scale * [XMM0/YMM0] + Disp32 scale * [XMM0/YMM0] + Disp8 + [base GPR] scale * [XMM0/YMM0] + Disp32 + [base GPR] 0001 scale * [XMM1/YMM1] + Disp32 scale * [XMM1/YMM1] + Disp8 + [base GPR] scale * [XMM1/YMM1] + Disp32 + [base GPR] 0010 scale * [XMM2/YMM2] + Disp32 scale * [XMM2/YMM2] + Disp8 + [base GPR] scale * [XMM2/YMM2] + Disp32 + [base GPR] 0011 scale * [XMM3/YMM3] + Disp32 scale * [XMM3/YMM3] + Disp8 + [base GPR] scale * [XMM3/YMM3] + Disp32 + [base GPR] 0100 scale * [XMM4/YMM4] + Disp32 scale * [XMM4/YMM4] + Disp8 + [base GPR] scale * [XMM4/YMM4] + Disp32 + [base GPR] 0101 scale * [XMM5/YMM5] + Disp32 scale * [XMM5/YMM5] + Disp8 + [base GPR] scale * [XMM5/YMM5] + Disp32 + [base GPR] 0110 scale * [XMM6/YMM6] + Disp32 scale * [XMM6/YMM6] + Disp8 + [base GPR] scale * [XMM6/YMM6] + Disp32 + [base GPR] 0111 scale * [XMM7/YMM7] + Disp32 scale * [XMM7/YMM7] + Disp8 + [base GPR] scale * [XMM7/YMM7] + Disp32 + [base GPR] 1000 scale * [XMM8/YMM8] + Disp32 scale * [XMM8/YMM8] + Disp8 + [base GPR] scale * [XMM8/YMM8] + Disp32 + [base GPR] 1001 scale * [XMM9/YMM9] + Disp32 scale * [XMM9/YMM9] + Disp8 + [base GPR] scale * [XMM9/YMM9] + Disp32 + [base GPR] 1010 scale * [XMM10/YMM10] + Disp32 scale * [XMM10/YMM10] + Disp8 + scale * [XMM10/YMM10] + Disp32 + [base GPR] [base GPR] 1011 scale * [XMM11/YMM11] + Disp32 scale * [XMM11/YMM11] + Disp8 + scale * [XMM11/YMM11] + Disp32 + [base GPR] [base GPR] 1100 scale * [XMM12/YMM12] + Disp32 scale * [XMM12/YMM12] + Disp8 + scale * [XMM12/YMM12] + Disp32 + [base GPR] [base GPR] 1101 scale * [XMM13/YMM13] + Disp32 scale * [XMM13/YMM13] + Disp8 + scale * [XMM13/YMM13] + Disp32 + [base GPR] [base GPR] 1110 scale * [XMM14/YMM14] + Disp32 scale * [XMM14/YMM14] + Disp8 + scale * [XMM14/YMM14] + Disp32 + [base GPR] [base GPR] 1111 scale * [XMM15/YMM15] + Disp32 scale * [XMM15/YMM15] + Disp8 + scale * [XMM15/YMM15] + Disp32 + [base GPR] [base GPR] Note 1. Index = {VEX.X,VSIB.index}. In 32-bit mode, VEX.X = 1. 1.3.2 Notational Conventions Related to VSIB Addressing Mode In the instruction descriptions that follow, the notation vm32x indicates a packed array of four 32-bit index values contained in the specified XMM index register and vm32y indicates a packed array of eight 32-bit index values contained in the specified YMM index register. Depending on the instruction, these indices can be used to compute the effective address of up to four (vm32x) or eight (vm32y) memory-based operands. The notation vm64x indicates a packed array of two 64-bit index values contained in the specified XMM index register and vm64y indicates a packed array of four 64-bit index values contained in the specified YMM index register. Depending on the instruction, these indices can be used to compute the effective address of up to two (vm64x) or four (vm64y) memory-based operands. 8 26568—Rev. 3.22—May 2018 AMD64 Technology In body of the description of the instructions, the notation mem32[vm32x] is used to represent a sparse array of 32-bit memory operands where the packed array of four 32-bit indices used to calculate the effective addresses of the operands is held in an XMM register. The notation mem32[vm32y] refers to a similar array of 32-bit memory operands where the packed array of eight 32-bit indices is held in a YMM register. The notation mem32[vm64x] means a sparse array of 32-bit memory operands where the packed array of two 64-bit indices is held in an XMM register and mem32[vm64y] means a sparse array of 32-bit memory operands where the packed array of four 64-bit indices is held in a YMM register. The notation mem64[index_array], where index_array is either vm32x, vm64x, or vm64y, specifies a sparse array of 64-bit memory operands addressed via a packed array of 32-bit or 64-bit indices held in an XMM/YMM register. If an instruction uses either an XMM or a YMM register, depending on operand size, to hold the index array, the notation vm32x/y or vm64x/y is used to represent the array. In summary, given a maximum operand size of 256-bits, a sparse array of 32-bit memory-based operands can be addressed using a vm32x, vm32y, vm64x, or vm64y index array. A sparse array of 64bit memory-based operands can be addressed using a vm32x, vm64x, or vm64y index array. Specific instructions may use fewer than the maximum number of memory operands that can be addressed using the specified index array. VSIB addressing is only valid in 32-bit or 64-bit effective addressing mode and is only supported for instruction encodings using the VEX prefix. The ModRM.mod value of 11b is not valid in VSIB addressing mode and ModRM.r/m must be set to 100b. 1.3.3 Memory Ordering and Exception Behavior VSIB addressing has some special considerations relative to memory ordering and the signaling of exceptions. VSIB addressing specifies an array of addresses that allows an instruction to access multiple memory locations. The order in which data is read from or written to memory is not specified. Memory ordering with respect to other instructions follows the memory-ordering model described in Volume 2. Data may be accessed by the instruction in any order, but access-triggered exceptions are delivered in right-to-left order. That is, if a exception is triggered by the load or store of an element of an XMM/YMM register and delivered, all elements to the right of that element (all the lower indexed elements) have been or will be completed without causing an exception. Elements to the left of the element causing the exception may or may not be completed. If the load or store of a given element triggers multiple exceptions, they are delivered in the conventional order. Because data can be accessed in any order, elements to the left of the one that triggered the exception may be read or written before the exception is delivered. Although the ordering of accesses is not specified, it is repeatable in a specific processor implementation. Given the same input values and initial architectural state, the same set of elements to the left of the faulting one will be accessed. VSIB addressing should not be used to access memory mapped I/O as the ordering of the individual loads is implementation-specific and some implementations may access data larger than the data element size or access elements more than once. 9 AMD64 Technology 26568—Rev. 3.22—May 2018 1.4 Enabling SSE Instruction Execution Application software that utilizes the SSE instructions requires support from operating system software. To enable and support SSE instruction execution, operating system software must: • • • enable hardware for supported SSE subsets manage the SSE hardware architectural state, saving and restoring it as required during and after task switches provide exception handlers for all unmasked SSE exceptions. See Volume 2, Chapter 11, for details on enabling SSE execution and managing its execution state. 1.5 String Compare Instructions The legacy SSE instructions PCMPESTRI, PCMPESTRM, PCMPISTRI, and PCMPISTRM and the extended SSE instructions VPCMPESTRI, VPCMPESTRM, VPCMPISTRI, and VPCMPISTRM provide a versatile means of classifying characters of a string by performing one of several different types of comparison operations using a second string as a prototype. This section describes the operation of the legacy string compare instructions. This discussion applies equally to the extended versions of the instructions. Any difference between the legacy and the extended version of a given instruction is described in the instruction reference entry for the instruction in the following chapter. A character string is a vector of data elements that is normally used to represent an ordered arrangement of graphemes which may be stored, processed, displayed, or printed. Ordered strings of graphemes are most often used to convey information in a human-readable manner. The string compare instructions, however, do not restrict the use or interpretation of their operands. The first source operand provides the prototype string and the second operand is the string to be scanned and characterized (referred to herein as the string under test, or SUT). Four string formats and four types of comparisons are supported. The intermediate result of this processing is a bit vector that summarizes the characterization of each character in the SUT. This bit vector is then post-processed based on options specified in the instruction encoding. Instruction variants determine the final result— either an index or a mask. Instruction execution affects the arithmetic status flags (ZF, CF, SF, OF, AF, PF), but the significance of many of the flags is redefined to provide information tailored to the result of the comparison performed. See Section 1.5.6, “Affect on Flags” on page 19. The instructions have a defined base function and additional functionality controlled by bit fields in an immediate byte operand (imm8). The base function determines whether the source strings have implicitly (PCMPISTRI and PCMPISTRM) or explicitly (PCMPESTRI and PCMPESTRM) defined lengths, and whether the result is an index (PCMPISTRI and PCMPESTRI) or a mask (PCMPISTRM and PCMPESTRM). 10 26568—Rev. 3.22—May 2018 AMD64 Technology PCMPISTRI and PCMPESTRI return their final result (an integer value) via the ECX register, while PCMPISTRM and PCMPESTRM write a bit or character mask, depending on the option selected, to the XMM0 register. There are a number of different schemes for encoding a set of graphemes, but the most common ones use either an 8-bit code (ASCII) or a 16-bit code (unicode). The string compare instructions support both character sizes. 11 AMD64 Technology 26568—Rev. 3.22—May 2018 Bit fields of the immediate operand control the following functions: • • Source data format — character size (byte or word), signed or unsigned values Comparison type • • Intermediate result postprocessing Output option selection This overview description covers functions common to all of the string compare instructions and describes some of the differentiated features of specific instructions. Information on instruction encoding and exception behavior are covered in the individual instruction reference pages in the following chapter. 12 26568—Rev. 3.22—May 2018 1.5.1 AMD64 Technology Source Data Format The character strings that constitute the source operands for the string compare instructions are formatted as either 8-bit or 16-bit integer values packed into a 128-bit data type. The figure below illustrates how a string of byte-wide characters is laid out in memory and how these characters are arranged when loaded into an XMM register. [null] (00) 112h . (2Eh) 111h g (67h) 110h n (6Eh) 10Fh i (69h) 10Eh r (72h) 10Dh t (74h) 10Ch s (73h) 10Bh [blank] (20h) 10Ah t (74h) 109h r (72h) 108h o (6Fh) 107h h (68h) 106h s (73h) 105h [blank] (20h) 104h A (41h) 103h Memory Image 128-bit String of Byte-wide Characters in Memory (ASCII Encoded) Highest address Lowest address Defines address of string XMM Register Image 7 6 5 4 3 2 1 0 [blank] (20h) t (74h) r (72h) o (6Fh) h (68h) s (73h) [blank] (20h) A (41h) 15 14 13 12 11 10 9 8 [null] (00) . (2Eh) g (67h) n (6Eh) i (69h) r (72h) t (74h) s (73h) 63 127 0 64 v4_String_layout.eps Figure 1-3. Byte-wide Character String – Memory and Register Image Note from the figure that the longest string that can be packed in a 128-bit data object is either sixteen 8-bit characters (as illustrated) or eight 16-bit characters. When loaded from memory, the character read from the lowest address in memory is placed in the least-significant position of the register and the character read from the highest address is placed in the most-significant position. In other words, for character i of width w, bits [w−1:0] of the character are placed in bits [iw + (w−1):iw] of the register. 13 AMD64 Technology 26568—Rev. 3.22—May 2018 Bits [1:0] of the immediate byte operand specify the source string data format, as shown in Table 1-3. Table 1-3. Source Data Format Imm8[1:0] Character Format Maximum String Length 00b unsigned bytes 16 01b unsigned words 8 10b signed bytes 16 11b signed words 8 The string compare instructions are defined with the capability of operating on strings of lengths from 0 to the maximum that can be packed into the 128-bit data type as shown in the table above. Because strings being processed may be shorter than the maximum string length, a means is provided to designate the length of each string. As mentioned above, one pair of string compare instructions relies on an explicit method while the other utilizes an implicit method. For the explicit method, the length of the first operand (the prototype string) is specified by the absolute value of the signed integer contained in rAX and the length of the second operand (the SUT) is specified by the absolute value of the signed integer contained in rDX. If a specified length is greater than the maximum allowed, the maximum value is used. Using the explicit method of length specification, null characters (characters whose numerical value is 0) can be included within a string. Using the implicit method, a string shorter than the maximum length is terminated by a null character. If no null character is found in the string, its length is implied to be the maximum. For the example illustrated in Figure 1-3 above, the implicit length of the string is 15 because the final character is null. However, using the explicit method, a specified length of 16 would include the null character in the string. In the following discussion, l1 is the length of the first operand string (the prototype string), l2 is the length of the second operand string (the SUT) and m is the maximum string length based on the selected character size. 1.5.2 Comparison Type Although the string compare instructions can be implemented in many different ways, the instructions are most easily understood as the sequential processing of the SUT using the characters of the prototype string as a template. The template is applied at each character index of SUT, processing the string from the first character (index 0) to the last character (index l2−1). The result of each comparison is recorded in successive positions of a summary bit vector CmprSumm. When the sequence of comparisons is complete, this bit vector summarizes the results of comparison operations that were performed. The length of the CmprSumm bit vector is equal to the maximum input operand string length (m). The rules for the setting of CmprSumm bits beyond the end of the SUT (CmprSumm[m−1:l2]) are dependent on the comparison type (see Table 1-4 below.) Bits [3:2] of the immediate byte operand determine the comparison type, as shown in Table 1-4. 14 26568—Rev. 3.22—May 2018 AMD64 Technology Table 1-4. Comparison Type Imm8[3:2] Comparison Type 00b Subset Tests each character of the SUT to determine if it is within the subset of characters specified by the prototype string. Each set bit of CmprSumm indicates that the corresponding character of the SUT is within the subset specified by the prototype. Bits [m−1:l2] are cleared. 01b Ranges Tests each character of the SUT to determine if it lies within one or more ranges specified by pairs of values within the prototype string. The ranges are inclusive. Each set bit in CmprSumm indicates that the corresponding character of the SUT is within one or more of the inclusive ranges specified. Bits [m−1:l2] are cleared. If the length of the prototype is odd, the last value in the prototype is effectively ignored. 10b Match Performs a character-by-character comparison between the SUT and the prototype string. Each set bit of CmprSumm indicates that the corresponding characters in the two strings match. If not, the bit is cleared. Bits [m−1:max(l1, l2)] of CmprSumm are set. 11b Sub-string Searches for an exact match between the prototype string and an ordered sequence of characters (a sub-string) in the SUT beginning at the current index i. Bit i of CmprSumm is set for each value of i where the sub-string match is made, otherwise the bit is cleared. See discussion below. Description In the Sub-string comparison type, any matching sub-string of the SUT must match the prototype string one-for-one, in order, and without gaps. Null characters in the SUT do not match non-null characters in the prototype. If the prototype and the SUT are equal in length and less than the max length, the two strings must be identical for the comparison to be TRUE. In this case, bit 0 of CmprSumm is set to one and the remainder are all 0s. If the length of the SUT is less than the prototype string, no match is possible and CmprSumm is all 0s. If the prototype string is shorter than the SUT (l1 < l2), a sequential search of the SUT is performed. For each i from 0 to l2−l1, the prototype is compared to characters [i + l1−1:i] of the SUT. If the prototype and the sub-string SUT[i + l1−1:i] match exactly, then CmprSumm[i] is set, otherwise the bit is cleared. When the comparison at i = l2−l1 is complete, no further testing is required because there are not enough characters remaining in the SUT for a match to be possible. The remaining bits l2−l1+1 through m-1 are all set to 0. For the Match comparison type, the character-by-character comparison is performed on all m characters in the 128-bit operand data, which may extend beyond the end of one or both strings. A null character at index i within one string is not considered a match when compared with a character beyond the end of the other string. In this case, CmprSumm[i] is cleared. For index positions beyond the end of both strings, CmprSumm[i] is set. The following section provides more detail on the generation of the comparison summary bit vector based on the specified comparison type. 15 AMD64 Technology 1.5.3 26568—Rev. 3.22—May 2018 Comparison Summary Bit Vector The following pseudo code provides more detail on the generation of the comparison summary bit vector CmprSumm. The function CompareStrgs defined below returns a bit vector of length m, the maximum length of the operand data strings. bit vector CompareStrgs(ProtoType, length1, SUT, length2, CmpType, signed, m) doubleword vector StrUndTst // temp vector; holds string under test doubleword vector StrProto // temp vector; holds prototype string bit vector[m] Result // length of vector is m StrProto = m{0} StrUndTst = m{0} Result = m{0} //initialize m elements of StrProto to 0 //initialize m elements of StrUndTst to 0 //initialize result bit vector FOR i = 0 to length1 StrProto[i] = signed ? SignExtend(ProtoType[i]) : ZeroExtend(ProtoType[i]) FOR i = 0 to length2 StrUndTst[i] = signed ? SignExtend(SUT[i]) : ZeroExtend(SUT[i]) IF CmpType == Subset FOR j = 0 to length2 - 1 // j indexes SUT FOR i = 0 to length1 - 1 // i indexes prototype Result[j] |= (StrProto[i] == StrUndTst[j]) IF CmpType == Ranges FOR j = 0 to length2 - 1 // j indexes SUT FOR i = 0 to length1 - 2, BY 2 // i indexes prototype Result[j] |= (StrProto[i] <= StrUndTst[j]) && (StrProto[i+1] >= StrUndTst[j]) IF CmpType == Match FOR i = 0 to (min(length1, length2)-1) Result[i] = (StrProto[i] == StrUndTst[i]) FOR i = min(length1, length2) to (max(length1, length2)-1) Result[i] = 0 FOR i = max(length1, length2) to (m-1) Result[i] = 1 IF CmpType == Sub-string IF (length2==16)&& (length1==16) maxlength=15 else maxlength = length2-length1 IF length2 >= lenght1 FOR j = 0 to maxlength // j indexes result bit vector Result[j] = 1 k = j // k scans the SUT FOR i = 0 to length1 - 1 // i scans the Prototype Result[j] &= (StrProto[i] == StrUndTst[k])// Result[j] is cleared if any of the comparisons do not match k++ Return Result 16 26568—Rev. 3.22—May 2018 AMD64 Technology Given the above definition of CompareStrgs(), the following pseudo code computes the value of CmprSumm: ProtoType = contents of first source operand (xmm1) SUT = contents of xmm2 or 128-bit value read from the specified memory location length1 = length of first operand string //specified implicitly or explicitly length2 = length of second operand string //specified implicitly or explicitly m = Maximum String Length from Table 1-3 above CmpType = Comparison Type from Table 1-4 above signed = (imm8[1] == 1) ? TRUE : FALSE bit vector [m] CmprSumm // CmprSumm is m bits long CmprSumm = CompareStrgs(ProtoType, length1, SUT, length2, CmpType, signed, m) The following examples demonstrate the comparison summary bit vector CmprSumm for each comparison type. For the sake of illustration, the operand strings are represented as ASCII-encoded strings. Each character value is represented by its ASCII grapheme. Strings are displayed with the lowest indexed character on the left as they would appear when printed or displayed. CmprSumm is shown in reverse order with the least significant bit on the left to agree with the string presentation. Comparison Type = Subset Prototype: ZCx SUT: aCx%xbZreCx CmprSumm: 0110101001100000 Comparison Type = Ranges Prototype: ACax SUT: aCx%xbZreCx CmprSumm: 1110110111100000 Comparison Type = Match Prototype: ZCx SUT: aCx%xbZreCx CmprSumm: 0110000000011111 Comparison Type = Sub-string Prototype: ZCx SUT: aZCx%xCZreZCxCZ CmprSumm: 0100000000100000 17 AMD64 Technology 1.5.4 26568—Rev. 3.22—May 2018 Intermediate Result Post-processing Post-processing of the CmprSumm bit vector is controlled by imm8[5:4]. The result of this step is designated pCmprSumm. Bit [4] of the immediate operand determines whether a ones’ complement (bit-wise inversion) is performed on CmprSumm; bit [5] of the immediate operand determines whether the inversion applies to the entire comparison summary bit vector (CmprSumm) or just to those bits that correspond to characters within the SUT. See Table 1-5 below for the encoding of the imm8[5:4] field. Table 1-5. Post-processing Options Imm8[5:4] 1.5.5 Post-processing Applied x0b pCmprSumm = CmprSumm 01b pCmprSumm = NOT CmprSumm 11b pCmprSumm[i] = !CmprSumm[i] for i < l2, pCmprSumm[i] = CmprSumm[i], for l2 ≤ i < m Output Option Selection For PCMPESTRI and PCMPISTRI, imm8[6] determines whether the index of the lowest set bit or the highest set bit of pCmprSumm is written to ECX, as shown in Table 1-6. Table 1-6. Imm8[6] Indexed Output Option Selection Description 0b Return the index of the least significant set bit in pCmprSumm. 1b Return the index of the most significant set bit in pCmprSumm. For PCMPESTRM and PCMPISTRM, imm8[6] specifies whether the output from the instruction is a bit mask or an expanded mask. The bit mask is a copy of pCmprSumm zero-extended to 128 bits. The expanded mask is a packed vector of byte or word elements, as determined by the string operand format (as indicated by imm8[0]). The expanded mask is generated by copying each bit of pCmprSumm to all bits of the element of the same index. Table 1-7 below shows the encoding of imm8[6]. Table 1-7. Imm8[6] Masked Output Option Selection Description 0b Return pCmprSumm as the output with zero extension to 128 bits. 1b Return expanded pCmprSumm byte or word mask. The PCMPESTRM and PCMPISTRM instructions return their output in register XMM0. For the extended forms of the instructions, bits [127:64] of YMM0 are cleared. 18 26568—Rev. 3.22—May 2018 1.5.6 AMD64 Technology Effect on Flags The execution of a string compare instruction updates the state of the CF, PF, AF, ZF, SF, and OF flags within the rFLAGs register. All other flags are unaffected. The PF and AF flags are always cleared. The ZF and SF flags are set or cleared based on attributes of the source strings and the CF and OF flags are set or cleared based on attributes of the summary bit vector after post processing. The CF flag is cleared if the summary bit vector, after post processing, is zero; the flag is set if one or more of the bits in the post-processed bit vector are 1. The OF flag is updated to match the value of the least significant bit of the post-processed summary bit vector. The ZF flag is set if the length of the second string operand (SUT) is shorter than m, the maximum number of 8-bit or 16-bit characters that can be packed into 128 bits. Similarly, the SF flag is set if the length of the first string operand (prototype) is shorter than m. This information is summarized in Table 1-8 below. Table 1-8. Unconditional State of Affected Flags After Execution Source String Length PF AF SF ZF 0 0 (l1 < m) (l2 < m) Post-processed Bit Vector CF OF pCmprSumm ≠ 0 pCmprSumm [0] 19 AMD64 Technology 20 26568—Rev. 3.22—May 2018 26568—Rev. 3.22—May 2018 2 AMD64 Technology Instruction Reference Instructions are listed by mnemonic, in alphabetic order. Each entry describes instruction function, syntax, opcodes, affected flags and exceptions related to the instruction. Figure 2-1 shows the conventions used in the descriptions. Items that do not pertain to a particular instruction, such as a synopsis of the 256-bit form, may be omitted. INST VINST Instruction Mnemonic Expansion Brief functional description INST Description of legacy version of instruction. VINST Description of extended version of instruction. XMM Encoding Description of 128-bit extended instruction. YMM Encoding Description of 256-bit extended instruction. Information about CPUID functions related to the instruction set. Synopsis diagrams for legacy and extended versions of the instruction. Mnemonic INST xmm1, xmm2/mem128 Opcode FF FF /r Mnemonic VINST xmm1, xmm2/mem128, xmm3 V,167 ymm1, ymm2/mem256, ymm3 Description Brief summary of legacy operation. Encoding VEX RXB.mmmmm W.vvvv.L.pp RXB.11 0.src.0.00 C4 C4 RXB.11 0.src.0.00 Opcode FF /r FF /r Related Instructions Instructions that perform similar or related functions. rFLAGS Affected Rflags diagram. MXCSR Flags Affected MXCSR diagram. Exceptions Exception summary table. Figure 2-1. Typical Instruction Description Instruction Reference 21 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Exceptions Under various conditions instructions described below can cause exceptions. The conditions that cause these exceptions can differ based on processor mode and instruction subset. This information is summarized at the end of each instruction reference page in an Exception Table. Rows list the applicable exceptions and the different conditions that trigger each exception for the instruction. For each processor mode (real, virtual, and protected) a symbol in the table indicates whether this exception condition applies. Each AVX instruction has a legacy form that comes from one of the legacy (SSE1, SSE2, ...) subsets. An “X” at the intersection of a processor mode column and an exception cause row indicates that the causing condition and potential exception applies to both the AVX instruction and the legacy SSE instruction. “A” indicates that the causing condition applies only to the AVX instruction and “S” indicates that the condition applies to the SSE legacy instruction. Note that XOP and FMA4 instructions do not have corresponding instructions from the SSE legacy subsets. In the exception tables for these instructions, “X” represents the XOP instruction and “F” represents the FMA4 instruction. 22 Instruction Reference 26568—Rev. 3.22—May 2018 ADDPD VADDPD AMD64 Technology Add Packed Double-Precision Floating-Point Adds each packed double-precision floating-point value of the first source operand to the corresponding value of the second source operand and writes the result of each addition into the corresponding quadword of the destination. There are legacy and extended forms of the instruction: ADDPD Adds two pairs of values. The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VADDPD The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Adds two pairs of values. The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Adds four pairs of values. The first source operand is a YMM register and the second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset ADDPD SSE2 VADDPD AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic ADDPD xmm1, xmm2/mem128 Opcode 66 0F 58 /r Description Adds two packed double-precision floating-point values in xmm1 to corresponding values in xmm2 or mem128. Writes results to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VADDPD xmm1, xmm2, xmm3/mem128 C4 RXB.00001 X.src.0.01 58 /r VADDPD ymm1, ymm2, ymm3/mem256 C4 RXB.00001 X.src.1.01 58 /r Instruction Reference ADDPD, VADDPD 23 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)ADDPS, (V)ADDSD, (V)ADDSS rFLAGS Affected None MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM OM 11 10 ZM 9 DM 8 IM 7 DAZ 6 PE UE OE M M M 5 4 3 ZE 2 DE IE M M 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X S X S S S S A X S X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD floating-point, #XF S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception 24 S S S S S S S S S S S S X X X X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. ADDPD, VADDPD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology ADDPS VADDPS Add Packed Single-Precision Floating-Point Adds each packed single-precision floating-point value of the first source operand to the corresponding value of the second source operand and writes the result of each addition into the corresponding elements of the destination. There are legacy and extended forms of the instruction: ADDPS Adds four pairs of values. The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VADDPS The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Adds four pairs of values. The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Adds eight pairs of values. The first source operand is a YMM register and the second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset ADDPS SSE2 VADDPS AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode Description ADDPS xmm1, xmm2/mem128 0F 58 /r Adds four packed single-precision floating-point values in xmm1 to corresponding values in xmm2 or mem128. Writes results to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VADDPS xmm1, xmm2, xmm3/mem128 C4 RXB.00001 X.src.0.00 58 /r VADDPS ymm1, ymm2, ymm3/mem256 C4 RXB.00001 X.src.1.00 58 /r Instruction Reference ADDPS, VADDPS 25 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)ADDPD, (V)ADDSD, (V)ADDSS rFLAGS Affected None MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM OM 11 10 ZM 9 DM 8 IM 7 DAZ 6 PE UE OE M M M 5 4 3 ZE 2 DE IE M M 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X S X S S S S A X S X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD floating-point, #XF S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception 26 S S S S S S S S S S S S X X X X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. ADDPS, VADDPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology ADDSD VADDSD Add Scalar Double-Precision Floating-Point Adds the double-precision floating-point value in the low-order quadword of the first source operand to the corresponding value in the low-order quadword of the second source operand and writes the result into the low-order quadword of the destination. There are legacy and extended forms of the instruction: ADDSD The first source operand is an XMM register and the second source operand is either an XMM register or a 64-bit memory location. The first source register is also the destination register. Bits [127:64] of the destination and bits [255:128] of the corresponding YMM register are not affected. VADDSD The extended form of the instruction has a 128-bit encoding only. The first source operand is an XMM register and the second source operand is either an XMM register or a 64-bit memory location. The destination is a third XMM register. Bits [127:64] of the first source operand are copied to bits [127:64] of the destination. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset ADDSD SSE2 VADDSD AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic ADDSD xmm1, xmm2/mem64 Opcode F2 0F 58 /r Description Adds low-order double-precision floating-point values in xmm1 to corresponding values in xmm2 or mem64. Writes results to xmm1. Mnemonic VADDSD xmm1, xmm2, xmm3/mem64 Encoding VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.00001 X.src.X.11 58 /r Related Instructions (V)ADDPD, (V)ADDPS, (V)ADDSS rFLAGS Affected None Instruction Reference ADDSD, VADDSD 27 AMD64 Technology 26568—Rev. 3.22—May 2018 MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM OM 11 10 ZM 9 DM 8 IM 7 DAZ 6 PE UE OE M M M 5 4 3 ZE 2 DE IE M M 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X X X X S S X S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception 28 X X X X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. ADDSD, VADDSD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology ADDSS VADDSS Add Scalar Single-Precision Floating-Point Adds the single-precision floating-point value in the low-order doubleword of the first source operand to the corresponding value in the low-order doubleword of the second source operand and writes the result into the low-order doubleword of the destination. There are legacy and extended forms of the instruction: ADDSS The first source operand is an XMM register and the second source operand is either an XMM register or a 32-bit memory location. The first source register is also the destination. Bits [127:32] of the destination register and bits [255:128] of the corresponding YMM register are not affected. VADDSS The extended form of the instruction has a 128-bit encoding only. The first source operand is an XMM register and the second source operand is either an XMM register or a 32-bit memory location. The destination is a third XMM register. Bits [127:32] of the first source register are copied to bits [127:32] of the of the destination. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset Feature Flag ADDSS SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) VADDSS AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic ADDSS xmm1, xmm2/mem32 Opcode Description F3 0F 58 /r Adds a single-precision floating-point value in the low-order doubleword of xmm1 to a corresponding value in xmm2 or mem32. Writes results to xmm1. Mnemonic VADDSS xmm1, xmm2, xmm3/mem32 Encoding VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.00001 X.src.X.10 58 /r Related Instructions (V)ADDPD, (V)ADDPS, (V)ADDSD rFLAGS Affected None Instruction Reference ADDSS, VADDSS 29 AMD64 Technology 26568—Rev. 3.22—May 2018 MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM OM 11 10 ZM 9 DM 8 IM 7 DAZ 6 PE UE OE M M M 5 4 3 ZE 2 DE IE M M 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X X X X S S X S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception 30 X X X X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. ADDSS, VADDSS Instruction Reference 26568—Rev. 3.22—May 2018 ADDSUBPD VADDSUBPD AMD64 Technology Alternating Addition and Subtraction Packed Double-Precision Floating-Point Adds the odd-numbered packed double-precision floating-point values of the first source operand to the corresponding values of the second source operand and writes the sum to the corresponding oddnumbered element of the destination; subtracts the even-numbered packed double-precision floatingpoint values of the second source operand from the corresponding values of the first source operand and writes the differences to the corresponding even-numbered element of the destination. There are legacy and extended forms of the instruction: ADDSUBPD The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VADDSUBPD The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register and the second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset ADDSUBPD SSE2 VADDSUBPD AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic ADDSUBPD xmm1, xmm2/mem128 Opcode Description 66 0F D0 /r Adds a value in the upper 64 bits of xmm1 to the corresponding value in xmm2 and writes the result to the upper 64 bits of xmm1; subtracts the value in the lower 64 bits of xmm1 from the corresponding value in xmm2 and writes the result to the lower 64 bits of xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VADDSUBPD xmm1, xmm2, xmm3/mem128 C4 RXB.00001 X.src.0.01 D0 /r VADDSUBPD ymm1, ymm2, ymm3/mem256 C4 RXB.00001 X.src.1.01 D0 /r Instruction Reference ADDSUBPD, VADDSUBPD 31 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)ADDSUBPS rFLAGS Affected None MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM OM 11 10 ZM 9 DM 8 IM 7 DAZ 6 PE UE OE M M M 5 4 3 ZE 2 DE IE M M 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X S X S S S S A X S X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD floating-point, #XF S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception 32 S S S S S S S S S S S S X X X X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. ADDSUBPD, VADDSUBPD Instruction Reference 26568—Rev. 3.22—May 2018 ADDSUBPS VADDSUBPS AMD64 Technology Alternating Addition and Subtraction Packed Single-Precision Floating Point Adds the second and fourth single-precision floating-point values of the first source operand to the corresponding values of the second source operand and writes the sums to the second and fourth elements of the destination. Subtracts the first and third single-precision floating-point values of the second source operand from the corresponding values of the first source operand and writes the differences to the first and third elements of the destination. There are legacy and extended forms of the instruction: ADDSUBPS The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VADDSUBPS The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register and the second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag ADDSUBPS SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) VADDSUBPS AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic ADDSUBPS xmm1, xmm2/mem128 Opcode Description F2 0F D0 /r Adds the second and fourth packed single-precision values in xmm2 or mem128 to the corresponding values in xmm1 and writes results to the corresponding positions of xmm1. Subtracts the first and third packed single-precision values in xmm2 or mem128 from the corresponding values in xmm1 and writes results to the corresponding positions of xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VADDSUBPS xmm1, xmm2, xmm3/mem128 C4 RXB.00001 X.src.0.11 D0 /r VADDSUBPS ymm1, ymm2, ymm3/mem256 C4 RXB.00001 X.src.1.11 D0 /r Instruction Reference ADDSUBPS, VADDSUBPS 33 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)ADDSUBPD rFLAGS Affected None MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM OM 11 10 ZM 9 DM 8 IM 7 DAZ 6 PE UE OE M M M 5 4 3 ZE 2 DE IE M M 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X S X S S S S A X S X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD floating-point, #XF S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception 34 S S S S S S S S S S S S X X X X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. ADDSUBPS, VADDSUBPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology AESDEC VAESDEC AES Decryption Round Performs a single round of AES decryption. Transforms a state value specified by the first source operand using a round key value specified by the second source operand, and writes the result to the destination. See Appendix A on page 973 for more information about the operation of the AES instructions. Decryption consists of 1, …, Nr – 1 iterations of sequences of operations called rounds, terminated by a unique final round, Nr. The AESDEC and VAESDEC instructions perform all the rounds except the last; the AESDECLAST and VAESDECLAST instructions perform the final round. The 128-bit state and round key vectors are interpreted as 16-byte column-major entries in a 4-by-4 matrix of bytes.The transformed state is written to the destination in column-major order. For both instructions, the destination register is the same as the first source register. There are legacy and extended forms of the instruction: AESDEC The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VAESDEC The extended form of the instruction has a 128-bit encoding only. The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset Feature Flag AESDEC AES CPUID Fn0000_0001_ECX[AES] (bit 25) VAESDEC AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic AESDEC xmm1, xmm2/mem128 Opcode Description 66 0F 38 DE /r Performs one decryption round on a state value in xmm1 using the key value in xmm2 or mem128. Writes results to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VAESDEC xmm1, xmm2, xmm3/mem128 C4 RXB.00010 X.src.0.01 DE /r Related Instructions (V)AESENC, (V)AESENCLAST, (V)AESIMC, (V)AESKEYGENASSIST Instruction Reference AESDEC, VAESDEC 35 AMD64 Technology 26568—Rev. 3.22—May 2018 rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception 36 X A S S X A S S X S S S S S S S S S S S S S S A X S S A A A X X X X S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Memory operand not 16-byte aligned and MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. AESDEC, VAESDEC Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology AESDECLAST VAESDECLAST AES Last Decryption Round Performs the final round of AES decryption. Completes transformation of a state value specified by the first source operand using a round key value specified by the second source operand, and writes the result to the destination. See Appendix A on page 973 for more information about the operation of the AES instructions. Decryption consists of 1, …, Nr – 1 iterations of sequences of operations called rounds, terminated by a unique final round, Nr.The AESDEC and VAESDEC instructions perform all the rounds before the final round; the AESDECLAST and VAESDECLAST instructions perform the final round. The 128-bit state and round key vectors are interpreted as 16-byte column-major entries in a 4-by-4 matrix of bytes.The transformed state is written to the destination in column-major order. For both instructions, the destination register is the same as the first source register. There are legacy and extended forms of the instruction: AESDECLAST The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VAESDECLAST The extended form of the instruction has a 128-bit encoding only. The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset Feature Flag AESDECLAST AES CPUID Fn0000_0001_ECX[AES] (bit 25) VAESDECLAST AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode AESDECLAST xmm1, xmm2/mem128 66 0F 38 DF/r Description Performs the last decryption round on a state value in xmm1 using the key value in xmm2 or mem128. Writes results to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VAESDECLAST xmm1, xmm2, xmm3/mem128 C4 RXB.00010 X.src.0.01 DF /r Related Instructions (V)AESENC, (V)AESENCLAST, (V)AESIMC, (V)AESKEYGENASSIST Instruction Reference AESDECLAST, VAESDECLAST 37 AMD64 Technology 26568—Rev. 3.22—May 2018 rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception 38 X A S S X A S S X S S S S S S S S S S S S S S A X S S A A A X X X X S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Memory operand not 16-byte aligned and MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. AESDECLAST, VAESDECLAST Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology AESENC VAESENC AES Encryption Round Performs a single round of AES encryption. Transforms a state value specified by the first source operand using a round key value specified by the second source operand, and writes the result to the destination. See Appendix A on page 973 for more information about the operation of the AES instructions. Encryption consists of 1, …, Nr – 1 iterations of sequences of operations called rounds, terminated by a unique final round, Nr. The AESENC and VAESENC instructions perform all the rounds before the final round; the AESENCLAST and VAESENCLAST instructions perform the final round. The 128-bit state and round key vectors are interpreted as 16-byte column-major entries in a 4-by-4 matrix of bytes.The transformed state is written to the destination in column-major order. For both instructions, the destination register is the same as the first source register There are legacy and extended forms of the instruction: AESENC The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VAESENC The extended form of the instruction has a 128-bit encoding only. The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset Feature Flag AESENC AES CPUID Fn0000_0001_ECX[AES] (bit 25) VAESENC AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic AESENC xmm1, xmm2/mem128 Opcode Description 66 0F 38 DC /r Performs one encryption round on a state value in xmm1 using the key value in xmm2 or mem128. Writes results to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VAESENC xmm1, xmm2, xmm3/mem128 C4 RXB.00010 X.src.0.01 DC /r Related Instructions (V)AESDEC, (V)AESDECLAST, (V)AESIMC, (V)AESKEYGENASSIST Instruction Reference AESENC, VAESENC 39 AMD64 Technology 26568—Rev. 3.22—May 2018 rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception 40 X A S S X A S S X S S S S S S S S S S S S S S A X S S A A A X X X X S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Memory operand not 16-byte aligned and MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. AESENC, VAESENC Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology AESENCLAST VAESENCLAST AES Last Encryption Round Performs the final round of AES encryption. Completes transformation of a state value specified by the first source operand using a round key value specified by the second source operand, and writes the result to the destination. See Appendix A on page 973 for more information about the operation of the AES instructions. Encryption consists of 1, …, Nr – 1 iterations of sequences of operations called rounds, terminated by a unique final round, Nr. The AESENC and VAESENC instructions perform all the rounds before the final round; the AESENCLAST and VAESENCLAST instructions perform the final round. The 128-bit state and round key vectors are interpreted as 16-byte column-major entries in a 4-by-4 matrix of bytes.The transformed state is written to the destination in column-major order. For both instructions, the destination register is the same as the first source register. There are legacy and extended forms of the instruction: AESENCLAST The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VAESENCLAST The extended form of the instruction has a 128-bit encoding only. The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset Feature Flag AESENCLAST AES CPUID Fn0000_0001_ECX[AES] (bit 25) VAESENCLAST AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode AESENCLAST xmm1, xmm2/mem128 Description 66 0F 38 DD /r Performs the last encryption round on a state value in xmm1 using the key value in xmm2 or mem128. Writes results to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VAESENCLAST xmm1, xmm2, xmm3/mem128 C4 RXB.00010 X.src.0.01 DD /r Related Instructions (V)AESDEC, (V)AESDECLAST, (V)AESIMC, (V)AESKEYGENASSIST Instruction Reference AESENCLAST, VAESENCLAST 41 AMD64 Technology 26568—Rev. 3.22—May 2018 rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception 42 X A S S X A S S X S S S S S S S S S S S S S S A X S S A A A X X X X S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Memory operand not 16-byte aligned and MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. AESENCLAST, VAESENCLAST Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology AESIMC VAESIMC AES InvMixColumn Transformation Applies the AES InvMixColumns( ) transformation to expanded round keys in preparation for decryption. Transforms an expanded key specified by the second source operand and writes the result to a destination register. See Appendix A on page 973 for more information about the operation of the AES instructions. The 128-bit round key vector is interpreted as 16-byte column-major entries in a 4-by-4 matrix of bytes.The transformed result is written to the destination in column-major order. AESIMC and VAESIMC are not used to transform the first and last round key in a decryption sequence. There are legacy and extended forms of the instruction: AESIMC The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VAESIMC The extended form of the instruction has a 128-bit encoding only. The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset Feature Flag AESIMC AES CPUID Fn0000_0001_ECX[AES] (bit 25) VAESIMC AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic AESIMC xmm1, xmm2/mem128 Opcode Description 66 0F 38 DB /r Performs AES InvMixColumn transformation on a round key in the xmm2 or mem128 and stores the result in xmm1. Mnemonic Encoding VEX RXB.map_select VAESIMC xmm1, xmm2/mem128 C4 RXB.00010 W.vvvv.L.pp Opcode X.src.0.01 DB /r Related Instructions (V)AESDEC, (V)AESDECLAST, (V)AESENC, (V)AESENCLAST, (V)AESKEYGENASSIST rFLAGS Affected None Instruction Reference AESIMC, VAESIMC 43 AMD64 Technology 26568—Rev. 3.22—May 2018 MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception 44 X A S S X A S S X S S S S S S S S S S S S S S A X S S A A A X X X X S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Memory operand not 16-byte aligned and MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. AESIMC, VAESIMC Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology AESKEYGENASSIST VAESKEYGENASSIST AES Assist Round Key Generation Expands a round key for encryption. Transforms a 128-bit round key operand using an 8-bit round constant and writes the result to a destination register. See Appendix A on page 973 for more information about the operation of the AES instructions. The round key is provided by the second source operand and the round constant is specified by an immediate operand. The 128-bit round key vector is interpreted as 16-byte column-major entries in a 4-by-4 matrix of bytes. The transformed result is written to the destination in column-major order. There are legacy and extended forms of the instruction: AESKEYGENASSIST The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VAESKEYGENASSIST The extended form of the instruction has a 128-bit encoding only. The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset Feature Flag AESKEYGENASSIST AES CPUID Fn0000_0001_ECX[AES] (bit 25) VAESKEYGENASSIST AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode AESKEYGENASSIST xmm1, xmm2/mem128, imm8 Description 66 0F 3A DF /r ib Expands a round key in xmm2 or mem128 using an immediate round constant. Writes the result to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode AESKEYGENASSIST xmm1, xmm2 /mem128, imm8 C4 RXB.00011 X.src.0.01 DF /r ib Related Instructions (V)AESDEC, (V)AESDECLAST, (V)AESENC, (V)AESENCLAST,(V)AESIMC rFLAGS Affected None Instruction Reference AESKEYGENASSIST, VAESKEYGENASSIST 45 AMD64 Technology 26568—Rev. 3.22—May 2018 MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception 46 X A S S X A S S X S S S S S S S S S S S S S S A X S S A A A X X X X S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Memory operand not 16-byte aligned and MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. AESKEYGENASSIST, VAESKEYGENASSIST Instruction Reference 26568—Rev. 3.22—May 2018 ANDNPD VANDNPD AMD64 Technology AND NOT Packed Double-Precision Floating-Point Performs a bitwise AND of two packed double-precision floating-point values in the second source operand with the ones’-complement of the two corresponding packed double-precision floating-point values in the first source operand and writes the result into the destination. There are legacy and extended forms of the instruction: ANDNPD The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VANDNPD The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register and the second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset ANDNPD SSE2 VANDNPD AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic ANDNPD xmm1, xmm2/mem128 Opcode Description 66 0F 55 /r Performs bitwise AND of two packed double-precision floating-point values in xmm2 or mem128 with the ones’complement of two packed double-precision floatingpoint values in xmm1. Writes the result to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VANDNPD xmm1, xmm2, xmm3/mem128 C4 RXB.00001 X.src.0.01 55 /r VANDNPD ymm1, ymm2, ymm3/mem256 C4 RXB.00001 X.src.1.01 55 /r Related Instructions (V)ANDNPS, (V)ANDPD, (V)ANDPS, (V)ORPD, (V)ORPS, (V)XORPD, (V)XORPS Instruction Reference ANDNPD, VANDNPD 47 AMD64 Technology 26568—Rev. 3.22—May 2018 rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception 48 X A S S X A S S X S S S S S S S S S S S S S S A X S S A A A X X X X S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Memory operand not 16-byte aligned and MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. ANDNPD, VANDNPD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology ANDNPS VANDNPS AND NOT Packed Single-Precision Floating-Point Performs a bitwise AND of four packed single-precision floating-point values in the second source operand with the ones’-complement of the four corresponding packed single-precision floating-point values in the first source operand, and writes the result in the destination. There are legacy and extended forms of the instruction: ANDNPS The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VANDNPS The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register and the second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag ANDNPS SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) VANDNPS AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode ANDNPS xmm1, xmm2/mem128 0F 55 /r Description Performs bitwise AND of four packed single-precision floating-point values in xmm2 or mem128 with the ones’complement of four packed single-precision floating-point values in xmm1. Writes the result to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VANDNPS xmm1, xmm2, xmm3/mem128 C4 RXB.00001 X.src.0.00 55 /r VANDNPS ymm1, ymm2, ymm3/mem256 C4 RXB.00001 X.src.1.00 55 /r Related Instructions (V)ANDNPD, (V)ANDPD, (V)ANDPS, (V)ORPD, (V)ORPS, (V)XORPD, (V)XORPS Instruction Reference ANDNPS, VANDNPS 49 AMD64 Technology 26568—Rev. 3.22—May 2018 rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception 50 X A S S X A S S X S S S S S S S S S S S S S S A X S S A A A X X X X S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Memory operand not 16-byte aligned and MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. ANDNPS, VANDNPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology ANDPD VANDPD AND Packed Double-Precision Floating-Point Performs bitwise AND of two packed double-precision floating-point values in the first source operand with the corresponding two packed double-precision floating-point values in the second source operand and writes the results into the corresponding elements of the destination. There are legacy and extended forms of the instruction: ANDPD The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VANDPD The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register and the second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset ANDPD SSE2 VANDPD AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic ANDPD xmm1, xmm2/mem128 Opcode Description 66 0F 54 /r Performs bitwise AND of two packed double-precision floating-point values in xmm1 with corresponding values in xmm2 or mem128. Writes the result to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VANDPD xmm1, xmm2, xmm3/mem128 C4 RXB.00001 X.src.0.01 54 /r VANDPD ymm1, ymm2, ymm3/mem256 C4 RXB.00001 X.src.1.01 54 /r Related Instructions (V)ANDNPD, (V)ANDNPS, (V)ANDPS, (V)ORPD, (V)ORPS, (V)XORPD, (V)XORPS Instruction Reference ANDPD, VANDPD 51 AMD64 Technology 26568—Rev. 3.22—May 2018 rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception 52 X A S S X A S S X S S S S S S S S S S S S S S A X S S A A A X X X X S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Memory operand not 16-byte aligned and MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. ANDPD, VANDPD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology ANDPS VANDPS AND Packed Single-Precision Floating-Point Performs bitwise AND of the four packed single-precision floating-point values in the first source operand with the corresponding four packed single-precision floating-point values in the second source operand, and writes the result into the corresponding elements of the destination. There are legacy and extended forms of the instruction: ANDPS The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VANDPS The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register and the second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag ANDPS SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) VANDPS AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode Description ANDPS xmm1, xmm2/mem128 0F 54 /r Performs bitwise AND of four packed single-precision floatingpoint values in xmm1 with corresponding values in xmm2 or mem128. Writes the result to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VANDPS xmm1, xmm2, xmm3/mem128 C4 RXB.00001 X.src.0.00 54 /r VANDPS ymm1, ymm2, ymm3/mem256 C4 RXB.00001 X.src.1.00 54 /r Related Instructions (V)ANDNPD, (V)ANDNPS, (V)ANDPD, (V)ORPD, (V)ORPS, (V)XORPD, (V)XORPS Instruction Reference ANDPS, VANDPS 53 AMD64 Technology 26568—Rev. 3.22—May 2018 rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception 54 X A S S X A S S X S S S S S S S S S S S S S S A X S S A A A X X X X S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Memory operand not 16-byte aligned and MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. ANDPS, VANDPS Instruction Reference 26568—Rev. 3.22—May 2018 BLENDPD VBLENDPD AMD64 Technology Blend Packed Double-Precision Floating-Point Copies packed double-precision floating-point values from either of two sources to a destination, as specified by an 8-bit mask operand. Each mask bit specifies a 64-bit element in a source location and a corresponding 64-bit element in the destination register. When a mask bit = 0, the specified element of the first source is copied to the corresponding position in the destination register. When a mask bit = 1, the specified element of the second source is copied to the corresponding position in the destination register. There are legacy and extended forms of the instruction: BLENDPD The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. Only mask bits [1:0] are used. VBLENDPD The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Only mask bits [1:0] are used. YMM Encoding The first source operand is a YMM register and the second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Only mask bits [3:0] are used. Instruction Support Form Subset BLENDPD SSE4.1 VBLENDPD AVX Feature Flag CPUID Fn0000_0001_ECX[SSE41] (bit 19) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode BLENDPD xmm1, xmm2/mem128, imm8 Description 66 0F 3A 0D /r ib Mnemonic Copies values from xmm1 or xmm2/mem128 to xmm1, as specified by imm8. Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VBLENDPD xmm1, xmm2, xmm3/mem128, imm8 C4 RXB.00011 X.src.0.01 0D /r ib VBLENDPD ymm1, ymm2, ymm3/mem256, imm8 C4 RXB.00011 X.src.1.01 0D /r ib Instruction Reference BLENDPD, VBLENDPD 55 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)BLENDPS, (B)BLENDVPD, (V)BLENDVPS rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception 56 X A S S X A S S X S S S S S S S S S S S S S S A X S S A A A X X X X S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Memory operand not 16-byte aligned and MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. BLENDPD, VBLENDPD Instruction Reference 26568—Rev. 3.22—May 2018 BLENDPS VBLENDPS AMD64 Technology Blend Packed Single-Precision Floating-Point Copies packed single-precision floating-point values from either of two sources to a destination, as specified by an 8-bit mask operand. Each mask bit specifies a 32-bit element in a source location and a corresponding 32-bit element in the destination register. When a mask bit = 0, the specified element of the first source is copied to the corresponding position in the destination register. When a mask bit = 1, the specified element of the second source is copied to the corresponding position in the destination register. There are legacy and extended forms of the instruction: BLENDPS The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. Only mask bits [3:0] are used. VBLENDPS The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.Only mask bits [3:0] are used. YMM Encoding The first operand is a YMM register and the second operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. All 8 bits of the mask are used. Instruction Support Form Subset BLENDPS SSE4.1 VBLENDPS AVX Feature Flag CPUID Fn0000_0001_ECX[SSE41] (bit 19) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode BLENDPS xmm1, xmm2/mem128, imm8 Description 66 0F 3A 0C /r ib Mnemonic Copies values from xmm1 or xmm2/mem128 to xmm1, as specified by imm8. Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VBLENDPS xmm1, xmm2, xmm3/mem128, imm8 C4 RXB.00011 X.src.0.01 0C /r ib VBLENDPS ymm1, ymm2, ymm3/mem256, imm8 C4 RXB.00011 X.src.1.01 0C /r ib Instruction Reference BLENDPS, VBLENDPS 57 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)BLENDPD, (V)BLENDVPD, (V)BLENDVPS rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception 58 X A S S X A S S X S S S S S S S S S S S S S S A X S S A A A X X X X S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Memory operand not 16-byte aligned and MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. BLENDPS, VBLENDPS Instruction Reference 26568—Rev. 3.22—May 2018 BLENDVPD VBLENDVPD AMD64 Technology Variable Blend Packed Double-Precision Floating-Point Copies packed double-precision floating-point values from either of two sources to a destination, as specified by a mask operand. Each mask bit specifies a 64-bit element of a source location and a corresponding 64-bit element of the destination. The position of a mask bit corresponds to the position of the most significant bit of a copied value. When a mask bit = 0, the specified element of the first source is copied to the corresponding position in the destination. When a mask bit = 1, the specified element of the second source is copied to the corresponding position in the destination. There are legacy and extended forms of the instruction: BLENDVPD The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. The mask is defined by bits 127 and 63 of the implicit register XMM0. VBLENDVPD The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. The mask is defined by bits 127 and 63 of a fourth XMM register. YMM Encoding The first operand is a YMM register and the second operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. The mask is defined by bits 255, 191, 127, and 63 of a fourth YMM register. Instruction Support Form Subset BLENDVPD SSE4.1 VBLENDVPD AVX Feature Flag CPUID Fn0000_0001_ECX[SSE41] (bit 19) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Reference BLENDVPD, VBLENDVPD 59 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Encoding Mnemonic Opcode BLENDVPD xmm1, xmm2/mem128 Description 66 0F 38 15 /r Copies values from xmm1 or xmm2/mem128 to xmm1, as specified by the MSB of corresponding elements of xmm0. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VBLENDVPD xmm1, xmm2, xmm3/mem128, xmm4 C4 RXB.00011 X.src.0.01 4B /r VBLENDVPD ymm1, ymm2, ymm3/mem256, ymm4 C4 RXB.00011 X.src.1.01 4B /r Related Instructions (V)BLENDPD, (V)BLENDPS, (V)BLENDVPS rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S S S A X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception 60 X S S A A A A X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. BLENDVPD, VBLENDVPD Instruction Reference 26568—Rev. 3.22—May 2018 BLENDVPS VBLENDVPS AMD64 Technology Variable Blend Packed Single-Precision Floating-Point Copies packed single-precision floating-point values from either of two sources to a destination, as specified by a mask operand. Each mask bit specifies a 32-bit element of a source location and a corresponding 32-bit element of the destination register. The position of a mask bits corresponds to the position of the most significant bit of a copied value. When a mask bit = 0, the specified element of the first source is copied to the corresponding position in the destination. When a mask bit = 1, the specified element of the second source is copied to the corresponding position in the destination. There are legacy and extended forms of the instruction: BLENDVPS The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. The mask is defined by bits 127, 95, 63, and 31 of the implicit register XMM0. VBLENDVPS The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. The mask is defined by bits 127, 95, 63, and 31 of a fourth XMM register. YMM Encoding The first operand is a YMM register and the second operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. The mask is defined by bits 255, 223, 191, 159, 127, 95, 63, and 31 of a fourth YMM register. Instruction Support Form Subset BLENDVPS SSE4.1 VBLENDVPS AVX Feature Flag CPUID Fn0000_0001_ECX[SSE41] (bit 19) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Reference BLENDVPS, VBLENDVPS 61 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Encoding Mnemonic BLENDVPS xmm1, xmm2/mem128 Opcode Description 66 0F 38 14 /r Copies packed single-precision floating-point values from xmm1 or xmm2/mem128 to xmm1, as specified by bits in xmm0. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VBLENDVPS xmm1, xmm2, xmm3/mem128, xmm4 C4 RXB.00011 X.src.0.01 4A /r VBLENDVPS ymm1, ymm2, ymm3/mem256, ymm4 C4 RXB.00011 X.src.1.01 4A /r Related Instructions (V)BLENDPD, (V)BLENDPS, (V)BLENDVPD rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S S S A X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception 62 X S S A A A A X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. BLENDVPS, VBLENDVPS Instruction Reference 26568—Rev. 3.22—May 2018 CMPPD VCMPPD AMD64 Technology Compare Packed Double-Precision Floating-Point Compares each of the two packed double-precision floating-point values of the first source operand to the corresponding values of the second source operand and writes the result of each comparison to the corresponding 64-bit element of the destination. When a comparison is TRUE, all 64 bits of the destination element are set; when a comparison is FALSE, all 64 bits of the destination element are cleared. The type of comparison is specified by an immediate byte operand. Signed comparisons return TRUE only when both operands are valid numbers and the numbers have the relation specified by the type of comparison operation. Ordered comparison returns TRUE when both operands are valid numbers, or FALSE when either operand is a NaN. Unordered comparison returns TRUE only when one or both operands are NaN and FALSE otherwise. QNaN operands generate an Invalid Operation Exception (IE) only if the comparison type isn't Equal, Unequal, Ordered, or Unordered. SNaN operands always generate an IE. There are legacy and extended forms of the instruction: CMPPD The first source operand is an XMM register and the second source operand is either an XMM register or a128-bit memory location.The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. Comparison type is specified by bits [2:0] of an immediate byte operand. VCMPPD The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Comparison type is specified by bits [4:0] of an immediate byte operand. YMM Encoding The first source operand is a YMM register and the second source operand is either a YMM register or a 256-bit memory location. The destination operand is a YMM register. Comparison type is specified by bits [4:0] of an immediate byte operand. Immediate Operand Encoding CMPPD uses bits [2:0] of the 8-bit immediate operand and VCMPPD uses bits [4:0] of the 8-bit immediate operand. Although VCMPPD supports 20h encoding values, the comparison types echo those of CMPPD on 4-bit boundaries. The following table shows the immediate operand value for CMPPD and each of the VCMPPD echoes. Some comparison operations that are not directly supported by immediate-byte encodings can be implemented by swapping the contents of the source and destination operands and executing the appropriate comparison of the swapped values. These additional comparison operations are shown with the directly supported comparison operations. Instruction Reference CMPPD, VCMPPD 63 AMD64 Technology 26568—Rev. 3.22—May 2018 Immediate Operand Value Compare Operation Result If NaN Operand QNaN Operand Causes Invalid Operation Exception 00h, 08h, 10h, 18h Equal FALSE No 01h, 09h, 11h, 19h Less than FALSE Yes Greater than (swapped operands) FALSE Yes Less than or equal FALSE Yes Greater than or equal (swapped operands) FALSE Yes 03h, 0Bh, 13h, 1Bh Unordered TRUE No 04h, 0Ch, 14h, 1Ch Not equal TRUE No 05h, 0Dh, 15h, 1Dh Not less than TRUE Yes Not greater than (swapped operands) TRUE Yes Not less than or equal TRUE Yes Not greater than or equal (swapped operands) TRUE Yes Ordered FALSE No 02h, 0Ah, 12h, 1Ah 06h, 0Eh, 16h, 1Eh 07h, 0Fh, 17h, 1Fh The following alias mnemonics for (V)CMPPD with appropriate value of imm8 are supported. Mnemonic Implied Value of imm8 (V)CMPEQPD 00h, 08h, 10h, 18h (V)CMPLTPD 01h, 09h, 11h, 19h (V)CMPLEPD 02h, 0Ah, 12h, 1Ah (V)CMPUNORDPD 03h, 0Bh, 13h, 1Bh (V)CMPNEQPD 04h, 0Ch, 14h, 1Ch (V)CMPNLTPD 05h, 0Dh, 15h, 1Dh (V)CMPNLEPD 06h, 0Eh, 16h, 1Eh (V)CMPORDPD 07h, 0Fh, 17h, 1Fh Instruction Support Form Subset CMPPD SSE2 VCMPPD AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 64 CMPPD, VCMPPD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic Opcode CMPPD xmm1, xmm2/mem128, imm8 Description 66 0F C2 /r ib Compares two pairs of values in xmm1 to corresponding values in xmm2 or mem128. Comparison type is determined by imm8. Writes comparison results to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VCMPPD xmm1, xmm2, xmm3/mem128, imm8 C4 RXB.00001 X.src.0.01 C2 /r ib VCMPPD ymm1, ymm2, ymm3/mem256, imm8 C4 RXB.00001 X.src.1.01 C2 /r ib Related Instructions (V)CMPPS, (V)CMPSD, (V)CMPSS, (V)COMISD, (V)COMISS, (V)UCOMISD, (V)UCOMISS rFLAGS Affected None MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM 11 OM 10 ZM 9 DM 8 IM 7 DAZ 6 PE 5 UE 4 OE 3 ZE 2 DE IE M M 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Instruction Reference CMPPD, VCMPPD 65 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X S X S S S S A X S X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD floating-point, #XF S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE X — AVX and SSE exception A — AVX exception S — SSE exception 66 S S S S S S X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. CMPPD, VCMPPD Instruction Reference 26568—Rev. 3.22—May 2018 CMPPS VCMPPS AMD64 Technology Compare Packed Single-Precision Floating-Point Compares each of the four packed single-precision floating-point values of the first source operand to the corresponding values of the second source operand and writes the result of each comparison to the corresponding 32-bit element of the destination. When a comparison is TRUE, all 32 bits of the destination element are set; when a comparison is FALSE, all 32 bits of the destination element are cleared. The type of comparison is specified by an immediate byte operand. Signed comparisons return TRUE only when both operands are valid numbers and the numbers have the relation specified by the type of comparison operation. Ordered comparison returns TRUE when both operands are valid numbers, or FALSE when either operand is a NaN. Unordered comparison returns TRUE only when one or both operands are NaN and FALSE otherwise. QNaN operands generate an Invalid Operation Exception (IE) only if the comparison type isn't Equal, Unequal, Ordered, or Unordered. SNaN operands always generate an IE. There are legacy and extended forms of the instruction: CMPPS The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. Comparison type is specified by bits [2:0] of an immediate byte operand. VCMPPS The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Comparison type is specified by bits [4:0] of an immediate byte operand. YMM Encoding The first source operand is a YMM register and the second source operand is either a YMM register or a 256-bit memory location. The destination operand is a YMM register. Comparison type is specified by bits [4:0] of an immediate byte operand. Immediate Operand Encoding CMPPS uses bits [2:0] of the 8-bit immediate operand and VCMPPS uses bits [4:0] of the 8-bit immediate operand. Although VCMPPS supports 20h encoding values, the comparison types echo those of CMPPS on 4-bit boundaries. The following table shows the immediate operand value for CMPPS and each of the VCMPPDS echoes. Some comparison operations that are not directly supported by immediate-byte encodings can be implemented by swapping the contents of the source and destination operands and executing the appropriate comparison of the swapped values. These additional comparison operations are shown in with the directly supported comparison operations. Instruction Reference CMPPS, VCMPPS 67 AMD64 Technology 26568—Rev. 3.22—May 2018 Immediate Operand Value Compare Operation Result If NaN Operand QNaN Operand Causes Invalid Operation Exception 00h, 08h, 10h, 18h Equal FALSE No 01h, 09h, 11h, 19h Less than FALSE Yes Greater than (swapped operands) FALSE Yes Less than or equal FALSE Yes Greater than or equal (swapped operands) FALSE Yes 03h, 0Bh, 13h, 1Bh Unordered TRUE No 04h, 0Ch, 14h, 1Ch Not equal TRUE No 05h, 0Dh, 15h, 1Dh Not less than TRUE Yes Not greater than (swapped operands) TRUE Yes Not less than or equal TRUE Yes Not greater than or equal (swapped operands) TRUE Yes Ordered FALSE No 02h, 0Ah, 12h, 1Ah 06h, 0Eh, 16h, 1Eh 07h, 0Fh, 17h, 1Fh The following alias mnemonics for (V)CMPPS with appropriate value of imm8 are supported. Mnemonic Implied Value of imm8 (V)CMPEQPS 00h, 08h, 10h, 18h (V)CMPLTPS 01h, 09h, 11h, 19h (V)CMPLEPS 02h, 0Ah, 12h, 1Ah (V)CMPUNORDPS 03h, 0Bh, 13h, 1Bh (V)CMPNEQPS 04h, 0Ch, 14h, 1Ch (V)CMPNLTPS 05h, 0Dh, 15h, 1Dh (V)CMPNLEPS 06h, 0Eh, 16h, 1Eh (V)CMPORDPS 07h, 0Fh, 17h, 1Fh Instruction Support Form Subset Feature Flag CMPPS SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) VCMPPS AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 68 CMPPS, VCMPPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic Opcode CMPPS xmm1, xmm2/mem128, imm8 0F C2 /r ib Description Compares four pairs of values in xmm1 to corresponding values in xmm2 or mem128. Comparison type is determined by imm8. Writes comparison results to xmm1. Mnemonic Encoding VCMPPS xmm1, xmm2, xmm3/mem128, imm8 VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.00001 X.src.0.00 C2 /r ib Related Instructions (V)CMPPD, (V)CMPSD, (V)CMPSS, (V)COMISD, (V)COMISS, (V)UCOMISD, (V)UCOMISS rFLAGS Affected None MXCSR Flags Affected MM FZ 17 15 Note: RC 14 13 PM UM OM ZM DM IM DAZ PE UE OE ZE 12 11 10 9 8 7 6 5 4 3 2 DE IE M M 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Instruction Reference CMPPS, VCMPPS 69 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X S X S S S S A X S X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD floating-point, #XF S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE X — AVX and SSE exception A — AVX exception S — SSE exception 70 S S S S S S X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. CMPPS, VCMPPS Instruction Reference 26568—Rev. 3.22—May 2018 CMPSD VCMPSD AMD64 Technology Compare Scalar Double-Precision Floating-Point Compares a double-precision floating-point value in the low-order 64 bits of the first source operand with a double-precision floating-point value in the low-order 64 bits of the second source operand and writes the result to the low-order 64 bits of the destination. When a comparison is TRUE, all 64 bits of the destination element are set; when a comparison is FALSE, all 64 bits of the destination element are cleared. Comparison type is specified by an immediate byte operand. Signed comparisons return TRUE only when both operands are valid numbers and the numbers have the relation specified by the type of comparison operation. Ordered comparison returns TRUE when both operands are valid numbers, or FALSE when either operand is a NaN. Unordered comparison returns TRUE only when one or both operands are NaN and FALSE otherwise. QNaN operands generate an Invalid Operation Exception (IE) only when the comparison type is not Equal, Unequal, Ordered, or Unordered. SNaN operands always generate an IE. There are legacy and extended forms of the instruction: CMPSD The first source operand is an XMM register. The second source operand is either an XMM register or a 64-bit memory location. The first source register is also the destination. Bits [127:64] of the destination are not affected. Bits [255:128] of the YMM register that corresponds to the destination are not affected. Comparison type is specified by bits [2:0] of an immediate byte operand. This CMPSD instruction must not be confused with the same-mnemonic CMPSD (compare strings by doubleword) instruction in the general-purpose instruction set. Assemblers can distinguish the instructions by the number and type of operands. VCMPSD The extended form of the instruction has a 128-bit encoding only. The first source operand is an XMM register. The second source operand is either an XMM register or a 64-bit memory location. The destination is a third XMM register. Bits [127:64] of the destination are copied from bits [127:64] of the first source. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Comparison type is specified by bits [4:0] of an immediate byte operand. Immediate Operand Encoding CMPSD uses bits [2:0] of the 8-bit immediate operand and VCMPSD uses bits [4:0] of the 8-bit immediate operand. Although VCMPSD supports 20h encoding values, the comparison types echo those of CMPSD on 4-bit boundaries. The following table shows the immediate operand value for CMPSD and each of the VCMPSD echoes. Some comparison operations that are not directly supported by immediate-byte encodings can be implemented by swapping the contents of the source and destination operands and executing the appropriate comparison of the swapped values. These additional comparison operations are shown with the directly supported comparison operations. When operands are swapped, the first source XMM register is overwritten by the result. Instruction Reference CMPSD, VCMPSD 71 AMD64 Technology 26568—Rev. 3.22—May 2018 Immediate Operand Value Compare Operation Result If NaN Operand QNaN Operand Causes Invalid Operation Exception 00h, 08h, 10h, 18h Equal FALSE No 01h, 09h, 11h, 19h Less than FALSE Yes Greater than (swapped operands) FALSE Yes Less than or equal FALSE Yes Greater than or equal (swapped operands) FALSE Yes 03h, 0Bh, 13h, 1Bh Unordered TRUE No 04h, 0Ch, 14h, 1Ch Not equal TRUE No 05h, 0Dh, 15h, 1Dh Not less than TRUE Yes Not greater than (swapped operands) TRUE Yes Not less than or equal TRUE Yes Not greater than or equal (swapped operands) TRUE Yes Ordered FALSE No 02h, 0Ah, 12h, 1Ah 06h, 0Eh, 16h, 1Eh 07h, 0Fh, 17h, 1Fh The following alias mnemonics for (V)CMPSD with appropriate value of imm8 are supported. Mnemonic Implied Value of imm8 (V)CMPEQSD 00h, 08h, 10h, 18h (V)CMPLTSD 01h, 09h, 11h, 19h (V)CMPLESD 02h, 0Ah, 12h, 1Ah (V)CMPUNORDSD 03h, 0Bh, 13h, 1Bh (V)CMPNEQSD 04h, 0Ch, 14h, 1Ch (V)CMPNLTSD 05h, 0Dh, 15h, 1Dh (V)CMPNLESD 06h, 0Eh, 16h, 1Eh (V)CMPORDSD 07h, 0Fh, 17h, 1Fh Instruction Support Form Subset CMPSD SSE2 VCMPSD AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 72 CMPSD, VCMPSD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic Opcode CMPSD xmm1, xmm2/mem64, imm8 Description F2 0F C2 /r ib Compares double-precision floating-point values in the low-order 64 bits of xmm1 with corresponding values in xmm2 or mem64. Comparison type is determined by imm8. Writes comparison results to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.00001 X.src.X.11 C2 /r ib VCMPSD xmm1, xmm2, xmm3/mem64, imm8 Related Instructions (V)CMPPD, (V)CMPPS, (V)CMPSS, (V)COMISD, (V)COMISS, (V)UCOMISD, (V)UCOMISS rFLAGS Affected None MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM 11 OM 10 ZM 9 DM 8 IM 7 DAZ 6 PE 5 UE 4 OE 3 ZE 2 DE IE M M 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Instruction Reference CMPSD, VCMPSD 73 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X X X X S X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE X — AVX and SSE exception A — AVX exception S — SSE exception 74 S S S S S S X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. CMPSD, VCMPSD Instruction Reference 26568—Rev. 3.22—May 2018 CMPSS VCMPSS AMD64 Technology Compare Scalar Single-Precision Floating-Point Compares a single-precision floating-point value in the low-order 32 bits of the first source operand with a single-precision floating-point value in the low-order 32 bits of the second source operand and writes the result to the low-order 32 bits of the destination. When a comparison is TRUE, all 32 bits of the destination element are set; when a comparison is FALSE, all 32 bits of the destination element are cleared. Comparison type is specified by an immediate byte operand. Signed comparisons return TRUE only when both operands are valid numbers and the numbers have the relation specified by the type of comparison operation. Ordered comparison returns TRUE when both operands are valid numbers, or FALSE when either operand is a NaN. Unordered comparison returns TRUE only when one or both operands are NaN and FALSE otherwise. QNaN operands generate an Invalid Operation Exception (IE) only if the comparison type isn't Equal, Unequal, Ordered, or Unordered. SNaN operands always generate an IE. There are legacy and extended forms of the instruction: CMPSS The first source operand is an XMM register. The second source operand is either an XMM register or a 32-bit memory location. The first source register is also the destination. Bits [127:32] of the destination are not affected. Bits [255:128] of the YMM register that corresponds to the destination are not affected. Comparison type is specified by bits [2:0] of an immediate byte operand. VCMPSS The extended form of the instruction has a 128-bit encoding only. The first source operand is an XMM register. The second source operand is either an XMM register or a 32-bit memory location. The destination is a third XMM register. Bits [127:32] of the destination are copied from bits [127L32] of the first source. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Comparison type is specified by bits [4:0] of an immediate byte operand. Immediate Operand Encoding CMPSS uses bits [2:0] of the 8-bit immediate operand and VCMPSS uses bits [4:0] of the 8-bit immediate operand. Although VCMPSS supports 20h encoding values, the comparison types echo those of CMPSS on 4-bit boundaries. The following table shows the immediate operand value for CMPSS and each of the VCMPSS echoes. Some comparison operations that are not directly supported by immediate-byte encodings can be implemented by swapping the contents of the source and destination operands and executing the appropriate comparison of the swapped values. These additional comparison operations are shown below with the directly supported comparison operations. When operands are swapped, the first source XMM register is overwritten by the result. Instruction Reference CMPSS, VCMPSS 75 AMD64 Technology 26568—Rev. 3.22—May 2018 Immediate Operand Value Compare Operation Result If NaN Operand QNaN Operand Causes Invalid Operation Exception 00h, 08h, 10h, 18h Equal FALSE No 01h, 09h, 11h, 19h Less than FALSE Yes Greater than (swapped operands) FALSE Yes Less than or equal FALSE Yes Greater than or equal (swapped operands) FALSE Yes 03h, 0Bh, 13h, 1Bh Unordered TRUE No 04h, 0Ch, 14h, 1Ch Not equal TRUE No 05h, 0Dh, 15h, 1Dh Not less than TRUE Yes Not greater than (swapped operands) TRUE Yes Not less than or equal TRUE Yes Not greater than or equal (swapped operands) TRUE Yes Ordered FALSE No 02h, 0Ah, 12h, 1Ah 06h, 0Eh, 16h, 1Eh 07h, 0Fh, 17h, 1Fh The following alias mnemonics for (V)CMPSS with appropriate value of imm8 are supported. Mnemonic Implied Value of imm8 (V)CMPEQSS 00h, 08h, 10h, 18h (V)CMPLTSS 01h, 09h, 11h, 19h (V)CMPLESS 02h, 0Ah, 12h, 1Ah (V)CMPUNORDSS 03h, 0Bh, 13h, 1Bh (V)CMPNEQSS 04h, 0Ch, 14h, 1Ch (V)CMPNLTSS 05h, 0Dh, 15h, 1Dh (V)CMPNLESS 06h, 0Eh, 16h, 1Eh (V)CMPORDSS 07h, 0Fh, 17h, 1Fh Instruction Support Form Subset CMPSS SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) Feature Flag VCMPSS AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 76 CMPSS, VCMPSS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic Opcode CMPSS xmm1, xmm2/mem32, imm8 Description F3 0F C2 /r ib Compares single-precision floating-point values in the low-order 32 bits of xmm1 with corresponding values in xmm2 or mem32. Comparison type is determined by imm8. Writes comparison results to xmm1. Mnemonic Encoding VEX RXB.map_select VCMPSS xmm1, xmm2, xmm3/mem32, imm8 C4 RXB.00001 W.vvvv.L.pp Opcode X.src.X.10 C2 /r ib Related Instructions (V)CMPPD, (V)CMPPS, (V)CMPSD, (V)COMISD, (V)COMISS, (V)UCOMISD, (V)UCOMISS rFLAGS Affected None MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM 11 OM 10 ZM 9 DM 8 IM 7 DAZ 6 PE 5 UE 4 OE 3 ZE 2 DE IE M M 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Instruction Reference CMPSS, VCMPSS 77 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X X X X S X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE X — AVX and SSE exception A — AVX exception S — SSE exception 78 S S S S S S X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. CMPSS, VCMPSS Instruction Reference 26568—Rev. 3.22—May 2018 COMISD VCOMISD AMD64 Technology Compare Ordered Scalar Double-Precision Floating-Point Compares a double-precision floating-point value in the low-order 64 bits of the first operand with a double-precision floating-point value in the low-order 64 bits of the second operand and sets rFLAGS.ZF, PF, and CF to show the result of the comparison: Comparison ZF PF CF NaN input 1 1 1 operand 1 > operand 2 0 0 0 operand 1 < operand 2 0 0 1 operand 1 == operand 2 1 0 0 The result is unordered if one or both of the operand values is a NaN. The rFLAGS.OF, AF, and SF bits are cleared. If an #XF SIMD floating-point exception occurs the rFLAGS bits are not updated. There are legacy and extended forms of the instruction: COMISD The first source operand is an XMM register and the second source operand is an XMM register or a 64-bit memory location. VCOMISD The extended form of the instruction has a 128-bit encoding only. The first source operand is an XMM register and the second source operand is either an XMM register or a 64-bit memory location. Instruction Support Form Subset COMISD SSE2 VCOMISD AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic COMISD xmm1, xmm2/mem64 Opcode Description 66 0F 2F /r Compares double-precision floating-point values in xmm1 with corresponding values in xmm2 or mem64 and sets rFLAGS. Mnemonic VCOMISD xmm1, xmm2 /mem64 Encoding VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.00001 X.src.X.01 2F /r Related Instructions (V)CMPPD, (V)CMPPS, (V)CMPSD, (V)CMPSS, (V)COMISS, (V)UCOMISD, (V)UCOMISS Instruction Reference COMISD, VCOMISD 79 AMD64 Technology 26568—Rev. 3.22—May 2018 rFLAGS Affected ID VIP VIF AC VM RF NT IOPL OF DF IF TF SF ZF AF PF CF 0 M 0 M M 7 6 4 2 0 DE IE M M 1 0 0 21 Note: 20 19 18 17 16 14 13 12 11 10 9 8 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Bits 31:22, 15, 5, 3, and 1 are reserved. For #XF, rFLAGS bits are not updated. MXCSR Flags Affected MM 17 Note: 80 FZ 15 RC 14 PM 13 12 UM 11 OM 10 ZM 9 DM 8 IM DAZ 7 6 PE 5 UE 4 OE 3 ZE 2 M indicates a flag that may be modified (set or cleared). Unaffected flags are blank. COMISD, VCOMISD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A A X S S X S S S S S S S S X X X X X X S S X S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. COMISD, VCOMISD 81 AMD64 Technology COMISS VCOMISS 26568—Rev. 3.22—May 2018 Compare Ordered Scalar Single-Precision Floating-Point Compares a double-precision floating-point value in the low-order 32 bits of the first operand with a single-precision floating-point value in the low-order 32 bits of the second operand and sets rFLAGS.ZF, PF, and CF to show the result of the comparison: Comparison ZF PF CF NaN input 1 1 1 operand 1 > operand 2 0 0 0 operand 1 < operand 2 0 0 1 operand 1 == operand 2 1 0 0 The result is unordered if one or both of the operand values is a NaN. The rFLAGS.OF, AF, and SF bits are cleared. If an #XF SIMD floating-point exception occurs the rFLAGS bits are not updated. There are legacy and extended forms of the instruction: COMISS The first source operand is an XMM register and the second source operand is an XMM register or a 32-bit memory location. VCOMISS The extended form of the instruction has a 128-bit encoding only. The first source operand is an XMM register and the second source operand is either an XMM register or a 32-bit memory location. Instruction Support Form Subset COMISS SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) Feature Flag VCOMISS AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode COMISS xmm1, xmm2/mem32 0F 2F /r Description Compares single-precision floating-point values in xmm1 with corresponding values in xmm2 or mem32 and sets rFLAGS. Mnemonic VCOMISS xmm1, xmm2 /mem32 Encoding VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.00001 X.src.X.00 2F /r Related Instructions (V)CMPPD, (V)CMPPS, (V)CMPSD, (V)CMPSS, (V)COMISD, (V)UCOMISD, (V)UCOMISS 82 COMISS, VCOMISS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology rFLAGS Affected ID VIP VIF AC VM RF NT IOPL OF DF IF TF SF ZF AF PF CF 0 M 0 M M 7 6 4 2 0 DE IE M M 1 0 0 21 Note: 20 19 18 17 16 14 13 12 11 10 9 8 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Bits 31:22, 15, 5, 3, and 1 are reserved. For #XF, rFLAGS bits are not updated. MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM OM 11 10 ZM 9 DM 8 IM DAZ 7 6 PE 5 UE 4 OE 3 ZE 2 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A A X S S X S S S S S S S S X X X X X X S X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference S S S S S S X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. COMISS, VCOMISS 83 AMD64 Technology 26568—Rev. 3.22—May 2018 CVTDQ2PD VCVTDQ2PD Convert Packed Doubleword Integers to Packed Double-Precision Floating-Point Converts packed 32-bit signed integer values to packed double-precision floating-point values and writes the converted values to the destination. There are legacy and extended forms of the instruction: CVTDQ2PD Converts two packed 32-bit signed integer values in the low-order 64 bits of an XMM register or in a 64-bit memory location to two packed double-precision floating-point values and writes the converted values to an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VCVTDQ2PD The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Converts two packed 32-bit signed integer values in the low-order 64 bits of an XMM register or in a 64-bit memory location to two packed double-precision floating-point values and writes the converted values to an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Converts four packed 32-bit signed integer values in the low-order 128 bits of a YMM register or a 256-bit memory location to four packed double-precision floating-point values and writes the converted values to a YMM register. Instruction Support Form Subset CVTDQ2PD SSE2 VCVTDQ2PD AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic CVTDQ2PD xmm1, xmm2/mem64 Opcode Description F3 0F E6 /r Converts packed doubleword signed integers in xmm2 or mem64 to double-precision floating-point values in xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VCVTDQ2PD xmm1, xmm2/mem64 C4 RXB.00001 X.1111.0.10 E6 /r VCVTDQ2PD ymm1, ymm2/mem256 C4 RXB.00001 X.1111.1.10 E6 /r 84 CVTDQ2PD, VCVTDQ2PD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Related Instructions (V)CVTPD2DQ, (V)CVTPI2PD, (V)CVTSD2SI, (V)CVTSI2SD, (V)CVTTPD2DQ, (V)CVTTSD2SI rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference S S X S S A A A A X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference with alignment checking enabled. CVTDQ2PD, VCVTDQ2PD 85 AMD64 Technology 26568—Rev. 3.22—May 2018 CVTDQ2PS VCVTDQ2PS Convert Packed Doubleword Integers to Packed Single-Precision Floating-Point Converts packed 32-bit signed integer values to packed single-precision floating-point values and writes the converted values to the destination. When the result is an inexact value, it is rounded as specified by MXCSR.RC. There are legacy and extended forms of the instruction: CVTDQ2PS Converts four packed 32-bit signed integer values in an XMM register or a 128-bit memory location to four packed single-precision floating-point values and writes the converted values to an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VCVTDQ2PS The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Converts four packed 32-bit signed integer values in an XMM register or a 128-bit memory location to four packed single-precision floating-point values and writes the converted values to an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Converts eight packed 32-bit signed integer values in a YMM register or a 256-bit memory location to eight packed single-precision floating-point values and writes the converted values to a YMM register. Instruction Support Form Subset CVTDQ2PS SSE2 VCVTDQ2PS AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode Description CVTDQ2PS xmm1, xmm2/mem128 0F 5B /r Converts packed doubleword integer values in xmm2 or mem128 to packed single-precision floating-point values in xmm2. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VCVTDQ2PS xmm1, xmm2/mem128 C4 RXB.00001 X.1111.0.00 5B /r VCVTDQ2PS ymm1, ymm2/mem256 C4 RXB.00001 X.1111.1.00 5B /r Related Instructions (V)CVTPS2DQ, (V)CVTSI2SS, (V)CVTSS2SI, (V)CVTTPS2DQ, (V)CVTTSS2SI 86 CVTDQ2PS, VCVTDQ2PS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology rFLAGS Affected None MXCSR Flags Affected MM FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE 4 3 2 1 0 M 17 Note: 15 14 13 12 11 10 9 8 7 6 5 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A A X S S X S S S S S S S S X X X S X S S S S A X S X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD floating-point, #XF S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Precision, PE S X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference S X A result could not be represented exactly in the destination format. CVTDQ2PS, VCVTDQ2PS 87 AMD64 Technology CVTPD2DQ VCVTPD2DQ 26568—Rev. 3.22—May 2018 Convert Packed Double-Precision Floating-Point to Packed Doubleword Integer Converts packed double-precision floating-point values to packed signed doubleword integers and writes the converted values to the destination. When the result is an inexact value, it is rounded as specified by MXCSR.RC. When the floatingpoint value is a NaN, infinity, or the result of the conversion is larger than the maximum signed doubleword (–231 to +231 – 1), the instruction returns the 32-bit indefinite integer value (8000_0000h) when the invalid-operation exception (IE) is masked. There are legacy and extended forms of the instruction: CVTPD2DQ Converts two packed double-precision floating-point values in an XMM register or a 128-bit memory location to two packed signed doubleword integers and writes the converted values to the two loworder doublewords of the destination XMM register. Bits [127:64] of the destination are cleared. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VCVTPD2DQ The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Converts two packed double-precision floating-point values in an XMM register or a 128-bit memory location to two signed doubleword values and writes the converted values to the lower two doubleword elements of the destination XMM register. Bits [127:64] of the destination are cleared. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Converts four packed double-precision floating-point values in a YMM register or a 256-bit memory location to four signed doubleword values and writes the converted values to an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset CVTPD2DQ SSE2 VCVTPD2DQ AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic CVTPD2DQ xmm1, xmm2/mem128 Opcode F2 0F E6 /r Description Converts two packed double-precision floating-point values in xmm2 or mem128 to packed doubleword integers in xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VCVTPD2DQ xmm1, xmm2/mem128 C4 RXB.00001 X.1111.0.11 E6 /r VCVTPD2DQ xmm1, ymm2/mem256 C4 RXB.00001 X.1111.1.11 E6 /r 88 CVTPD2DQ, VCVTPD2DQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Related Instructions (V)CVTDQ2PD, (V)CVTPI2PD, (V)CVTSD2SI, (V)CVTSI2SD, (V)CVTTPD2DQ, (V)CVTTSD2SI rFLAGS Affected None MXCSR Flags Affected MM FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE M 17 Note: 15 14 13 12 11 10 9 8 7 6 5 IE M 4 3 2 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A A X S S X S S S S S S S S X X X S X S S S S A X S X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD floating-point, #XF S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference S S S S S S X X X A source operand was an SNaN value. Undefined operation. A result could not be represented exactly in the destination format. CVTPD2DQ, VCVTPD2DQ 89 AMD64 Technology CVTPD2PS VCVTPD2PS 26568—Rev. 3.22—May 2018 Convert Packed Double-Precision Floating-Point to Packed Single-Precision Floating-Point Converts packed double-precision floating-point values to packed single-precision floating-point values and writes the converted values to the low-order doubleword elements of the destination. When the result is an inexact value, it is rounded as specified by MXCSR.RC. There are legacy and extended forms of the instruction: CVTPD2PS Converts two packed double-precision floating-point values in an XMM register or a 128-bit memory location to two packed single-precision floating-point values and writes the converted values to an XMM register. Bits [127:64] of the destination are cleared. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VCVTPD2PS The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Converts two packed double-precision floating-point values in an XMM register or a 128-bit memory location to two packed single-precision floating-point values and writes the converted values to an XMM register. Bits [127:64] of the destination are cleared. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Converts four packed double-precision floating-point values in a YMM register or a 256-bit memory location to four packed single-precision floating-point values and writes the converted values to a YMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset CVTPD2PS SSE2 VCVTPD2PS AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic CVTPD2PS xmm1, xmm2/mem128 Opcode 66 0F 5A /r Description Converts packed double-precision floating-point values in xmm2 or mem128 to packed singleprecision floating-point values in xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VCVTPD2PS xmm1, xmm2/mem128 C4 RXB.00001 X.1111.0.01 5A /r VCVTPD2PS xmm1, ymm2/mem256 C4 RXB.00001 X.1111.1.01 5A /r 90 CVTPD2PS, VCVTPD2PS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Related Instructions (V)CVTPS2PD, (V)CVTSD2SS, (V)CVTSS2SD rFLAGS Affected None MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM OM 11 10 ZM 9 DM 8 IM 7 DAZ 6 PE UE OE M M M 5 4 3 ZE 2 DE IE M M 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A A X S S X S S S S S S S S X X X S X S S S S A X S S X S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference X X X X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. CVTPD2PS, VCVTPD2PS 91 AMD64 Technology CVTPS2DQ VCVTPS2DQ 26568—Rev. 3.22—May 2018 Convert Packed Single-Precision Floating-Point to Packed Doubleword Integers Converts packed single-precision floating-point values to packed signed doubleword integer values and writes the converted values to the destination. When the result is an inexact value, it is rounded as specified by MXCSR.RC. When the floatingpoint value is a NaN, infinity, or the result of the conversion is larger than the maximum signed doubleword (–231 to +231 – 1), the instruction returns the 32-bit indefinite integer value (8000_0000h) when the invalid-operation exception (IE) is masked. There are legacy and extended forms of the instruction: CVTPS2DQ Converts four packed single-precision floating-point values in an XMM register or a 128-bit memory location to four packed signed doubleword integer values and writes the converted values to an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VCVTPS2DQ The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Converts four packed single-precision floating-point values in an XMM register or a 128-bit memory location to four packed signed doubleword integer values and writes the converted values to an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Converts eight packed single-precision floating-point values in a YMM register or a 256-bit memory location to eight packed signed doubleword integer values and writes the converted values to a YMM register. Instruction Support Form Subset CVTPS2DQ SSE2 VCVTPS2DQ AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic CVTPS2DQ xmm1, xmm2/mem128 Opcode 66 0F 5B /r Description Converts four packed single-precision floating-point values in xmm2 or mem128 to four packed doubleword integers in xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VCVTPS2DQ xmm1, xmm2/mem128 C4 RXB.00001 X.1111.0.01 5B /r VCVTPS2DQ ymm1, ymm2/mem256 C4 RXB.00001 X.1111.1.01 5B /r 92 CVTPS2DQ, VCVTPS2DQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Related Instructions (V)CVTDQ2PS, (V)CVTSI2SS, (V)CVTSS2SI, (V)CVTTPS2DQ, (V)CVTTSS2SI rFLAGS Affected None MXCSR Flags Affected MM FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE M 17 Note: 15 14 13 12 11 10 9 8 7 6 5 IE M 4 3 2 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A A X S S X S S S S S S S S X X X S X S S S S A X S S X S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference X X X A source operand was an SNaN value. Undefined operation. A result could not be represented exactly in the destination format. CVTPS2DQ, VCVTPS2DQ 93 AMD64 Technology CVTPS2PD VCVTPS2PD 26568—Rev. 3.22—May 2018 Convert Packed Single-Precision Floating-Point to Packed Double-Precision Floating-Point Converts packed single-precision floating-point values to packed double-precision floating-point values and writes the converted values to the destination. There are legacy and extended forms of the instruction: CVTPS2PD Converts two packed single-precision floating-point values in the two low order doubleword elements of an XMM register or a 64-bit memory location to two double-precision floating-point values and writes the converted values to an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VCVTPS2PD The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Converts two packed single-precision floating-point values in the two low order doubleword elements of an XMM register or a 64-bit memory location to two double-precision floating-point values and writes the converted values to an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Converts four packed single-precision floating-point values in a YMM register or a 128-bit memory location to four double-precision floating-point values and writes the converted values to a YMM register. Instruction Support Form Subset CVTPS2PD SSE2 VCVTPS2PD AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode Description CVTPS2PD xmm1, xmm2/mem64 0F 5A /r Converts packed single-precision floating-point values in xmm2 or mem64 to packed double-precision floatingpoint values in xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VCVTPS2PD xmm1, xmm2/mem64 C4 RXB.00001 X.1111.0.00 5A /r VCVTPS2PD ymm1, ymm2/mem128 C4 RXB.00001 X.1111.1.00 5A /r Related Instructions (V)CVTPD2PS, (V)CVTSD2SS, (V)CVTSS2SD 94 CVTPS2PD, VCVTPS2PD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology rFLAGS Affected None MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM OM 11 10 ZM 9 DM 8 IM 7 DAZ 6 PE 5 UE 4 OE 3 ZE 2 DE IE M M 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A A X S S X S S S S S S S S X X X X X X S X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference S S S S S S X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. CVTPS2PD, VCVTPS2PD 95 AMD64 Technology CVTSD2SI VCVTSD2SI 26568—Rev. 3.22—May 2018 Convert Scalar Double-Precision Floating-Point to Signed Doubleword or Quadword Integer Converts a scalar double-precision floating-point value to a 32-bit or 64-bit signed integer value and writes the converted value to a general-purpose register. When the result is an inexact value, it is rounded as specified by MXCSR.RC. When the floatingpoint value is a NaN, infinity, or the result of the conversion is larger than the maximum signed doubleword (–231 to +231 – 1) or quadword value (–263 to +263 – 1), the instruction returns the indefinite integer value (8000_0000h for 32-bit integers, 8000_0000_0000_0000h for 64-bit integers) when the invalid-operation exception (IE) is masked. There are legacy and extended forms of the instruction: CVTSD2SI The legacy form has two encodings: • When REX.W = 0, converts a scalar double-precision floating-point value in the low-order 64 bits of an XMM register or a 64-bit memory location to a 32-bit signed integer and writes the converted value to a 32-bit general purpose register. • When REX.W = 1, converts a scalar double-precision floating-point value in the low-order 64 bits of an XMM register or a 64-bit memory location to a 64-bit sign-extended integer and writes the converted value to a 64-bit general purpose register. VCVTSD2SI The extended form of the instruction has two 128-bit encodings: • When VEX.W = 0, converts a scalar double-precision floating-point value in the low-order 64 bits of an XMM register or a 64-bit memory location to a 32-bit signed integer and writes the converted value to a 32-bit general purpose register. • When VEX.W = 1, converts a scalar double-precision floating-point value in the low-order 64 bits of an XMM register or a 64-bit memory location to a 64-bit sign-extended integer and writes the converted value to a 64-bit general purpose register. Instruction Support Form Subset CVTSD2SI SSE2 VCVTSD2SI AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 96 CVTSD2SI, VCVTSD2SI Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic Opcode Description CVTSD2SI reg32, xmm1/mem64 F2 (W0) 0F 2D /r Converts a packed double-precision floating-point value in xmm1 or mem64 to a doubleword integer in reg32. CVTSD2SI reg64, xmm1/mem64 F2 (W1) 0F 2D /r Converts a packed double-precision floating-point value in xmm1 or mem64 to a quadword integer in reg64. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VCVTSD2SI reg32, xmm2/mem64 C4 RXB.00001 0.1111.X.11 2D /r VCVTSD2SI reg64, xmm2/mem64 C4 RXB.00001 1.1111.X.11 2D /r Related Instructions (V)CVTDQ2PD, (V)CVTPD2DQ, (V)CVTPI2PD, (V)CVTSI2SD, (V)CVTTPD2DQ, (V)CVTTSD2SI rFLAGS Affected None MXCSR Flags Affected MM FZ 17 15 RC PM UM OM ZM DM IM DAZ 12 11 10 9 8 7 6 PE UE OE ZE DE 4 3 2 1 M Note: 14 13 5 IE M 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Instruction Reference CVTSD2SI, VCVTSD2SI 97 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A A X S S X S S S S S S S S X X X X X X S S X S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception 98 X X X A source operand was an SNaN value. Undefined operation. A result could not be represented exactly in the destination format. CVTSD2SI, VCVTSD2SI Instruction Reference 26568—Rev. 3.22—May 2018 CVTSD2SS VCVTSD2SS AMD64 Technology Convert Scalar Double-Precision Floating-Point to Scalar Single-Precision Floating-Point Converts a scalar double-precision floating-point value to a scalar single-precision floating-point value and writes the converted value to the low-order 32 bits of the destination. When the result is an inexact value, it is rounded as specified by MXCSR.RC. There are legacy and extended forms of the instruction: CVTSD2SS Converts a scalar double-precision floating-point value in the low-order 64 bits of the second source XMM register or a 64-bit memory location to a scalar single-precision floating-point value and writes the converted value to the low-order 32 bits of a destination XMM register. Bits [127:32] of the destination are not affected. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VCVTSD2SS The extended form of the instruction has a 128-bit encoding only. Converts a scalar double-precision floating-point value in the low-order 64 bits of a source XMM register or a 64-bit memory location to a scalar single-precision floating-point value and writes the converted value to the low-order 32 bits of the destination XMM register. Bits [127:32] of the destination are copied from the first source XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset CVTSD2SS SSE2 VCVTSD2SS AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic CVTSD2SS xmm1, xmm2/mem64 Opcode Description F2 0F 5A /r Converts a scalar double-precision floating-point value in xmm2 or mem64 to a scalar single-precision floating-point value in xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp VCVTSD2SS xmm1, xmm2, xmm3/mem64 C4 RXB.00001 X.src.X.11 Opcode 5A /r Related Instructions (V)CVTPD2PS, (V)CVTPS2PD, (V)CVTSS2SD rFLAGS Affected None Instruction Reference CVTSD2SS, VCVTSD2SS 99 AMD64 Technology 26568—Rev. 3.22—May 2018 MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM OM 11 10 ZM 9 DM 8 IM 7 DAZ 6 PE UE OE M M M 5 4 3 ZE 2 DE IE M M 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X X X X S X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception 100 S S S S S S S S S S S S X X X X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. CVTSD2SS, VCVTSD2SS Instruction Reference 26568—Rev. 3.22—May 2018 CVTSI2SD VCVTSI2SD AMD64 Technology Convert Signed Doubleword or Quadword Integer to Scalar Double-Precision Floating-Point Converts a signed integer value to a double-precision floating-point value and writes the converted value to a destination register. When the result of the conversion is an inexact value, the value is rounded as specified by MXCSR.RC. There are legacy and extended forms of the instruction: CVTSI2SD The legacy form as two encodings: • When REX.W = 0, converts a signed doubleword integer value from a 32-bit source generalpurpose register or a 32-bit memory location to a double-precision floating-point value and writes the converted value to the low-order 64 bits of an XMM register. Bits [127:64] of the destination XMM register and bits [255:128] of the corresponding YMM register are not affected. • When REX.W = 1, converts a a signed quadword integer value from a 64-bit source generalpurpose register or a 64-bit memory location to a 64-bit double-precision floating-point value and writes the converted value to the low-order 64 bits of an XMM register. Bits [127:64] of the destination XMM register and bits [255:128] of the corresponding YMM register are not affected. VCVTSI2SD The extended form of the instruction has two 128-bit encodings: • When VEX.W = 0, converts a signed doubleword integer value from a 32-bit source generalpurpose register or a 32-bit memory location to a double-precision floating-point value and writes the converted value to the low-order 64 bits of the destination XMM register. Bits [127:64] of the first source XMM register are copied to the destination XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. • When VEX.W = 1, converts a signed quadword integer value from a 64-bit source general-purpose register or a 64-bit memory location to a double-precision floating-point value and writes the converted value to the low-order 64 bits of the destination XMM register. Bits [127:64] of the first source XMM register are copied to the destination XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset CVTSI2SD SSE2 VCVTSI2SD AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Reference CVTSI2SD, VCVTSI2SD 101 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Encoding Mnemonic Opcode Description CVTSI2SD xmm1, reg32/mem32 F2 (W0) 0F 2A /r Converts a doubleword integer in reg32 or mem32 to a double-precision floating-point value in xmm1. CVTSI2SD xmm1, reg64/mem64 F2 (W1) 0F 2A /r Converts a quadword integer in reg64 or mem64 to a double-precision floating-point value in xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VCVTSI2SD xmm1, xmm2, reg32/mem32 C4 RXB.00001 0.src.X.11 2A /r VCVTSI2SD xmm1, xmm2, reg64/mem64 C4 RXB.00001 1.src.X.11 2A /r Related Instructions (V)CVTDQ2PD, (V)CVTPD2DQ, (V)CVTPI2PD, (V)CVTSD2SI, (V)CVTTPD2DQ, (V)CVTTSD2SI rFLAGS Affected None MXCSR Flags Affected MM FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE 4 3 2 1 0 M 17 Note: 102 15 14 13 12 11 10 9 8 7 6 5 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. CVTSI2SD, VCVTSI2SD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X X X X S X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Precision, PE S X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference S X A result could not be represented exactly in the destination format. CVTSI2SD, VCVTSI2SD 103 AMD64 Technology CVTSI2SS VCVTSI2SS 26568—Rev. 3.22—May 2018 Convert Signed Doubleword or Quadword Integer to Scalar Single-Precision Floating-Point Converts a signed integer value to a single-precision floating-point value and writes the converted value to an XMM register. When the result of the conversion is an inexact value, the value is rounded as specified by MXCSR.RC. There are legacy and extended forms of the instruction: CVTSI2SS The legacy form has two encodings: • When REX.W = 0, converts a signed doubleword integer value from a 32-bit source generalpurpose register or a 32-bit memory location to a single-precision floating-point value and writes the converted value to the low-order 32 bits of an XMM register. Bits [127:32] of the destination XMM register and bits [255:128] of the corresponding YMM register are not affected. • When REX.W = 1, converts a a signed quadword integer value from a 64-bit source generalpurpose register or a 64-bit memory location to a single-precision floating-point value and writes the converted value to the low-order 32 bits of an XMM register. Bits [127:32] of the destination XMM register and bits [255:128] of the corresponding YMM register are not affected. VCVTSI2SS The extended form of the instruction has two 128-bit encodings: • When VEX.W = 0, converts a signed doubleword integer value from a 32-bit source generalpurpose register or a 32-bit memory location to a single-precision floating-point value and writes the converted value to the low-order 32 bits of the destination XMM register. Bits [127:32] of the first source XMM register are copied to the destination XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. • When VEX.W = 1, converts a signed quadword integer value from a 64-bit source general-purpose register or a 64-bit memory location to a single-precision floating-point value and writes the converted value to the low-order 32 bits of the destination XMM register. Bits [127:32] of the first source XMM register are copied to the destination XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset Feature Flag CVTSI2SS SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) VCVTSI2SS AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 104 CVTSI2SS, VCVTSI2SS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic Opcode Description CVTSI2SS xmm1, reg32/mem32 F3 (W0) 0F 2A /r Converts a doubleword integer in reg32 or mem32 to a single-precision floating-point value in xmm1. CVTSI2SS xmm1, reg64/mem64 F3 (W1) 0F 2A /r Converts a quadword integer in reg64 or mem64 to a single-precision floating-point value in xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VCVTSI2SS xmm1, xmm2, reg32/mem32 C4 RXB.00001 0.src.X.10 2A /r VCVTSI2SS xmm1, xmm2, reg64/mem64 C4 RXB.00001 1.src.X.10 2A /r Related Instructions (V)CVTDQ2PS, (V)CVTPS2DQ, (V)CVTSS2SI, (V)CVTTPS2DQ, (V)CVTTSS2SI rFLAGS Affected None MXCSR Flags Affected MM FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE 4 3 2 1 0 M 17 Note: 15 14 13 12 11 10 9 8 7 6 5 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Instruction Reference CVTSI2SS, VCVTSI2SS 105 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X X X X S X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Precision, PE S X — AVX and SSE exception A — AVX exception S — SSE exception 106 S X A result could not be represented exactly in the destination format. CVTSI2SS, VCVTSI2SS Instruction Reference 26568—Rev. 3.22—May 2018 CVTSS2SD VCVTSS2SD AMD64 Technology Convert Scalar Single-Precision Floating-Point to Scalar Double-Precision Floating-Point Converts a scalar single-precision floating-point value to a scalar double-precision floating-point value and writes the converted value to the low-order 64 bits of the destination. There are legacy and extended forms of the instruction: CVTSS2SD Converts a scalar single-precision floating-point value in the low-order 32 bits of a source XMM register or a 32-bit memory location to a scalar double-precision floating-point value and writes the converted value to the low-order 64 bits of a destination XMM register. Bits [127:64] of the destination and bits [255:128] of the corresponding YMM register are not affected. VCVTSS2SD The extended form of the instruction has a 128-bit encoding only. Converts a scalar single-precision floating-point value in the low-order 32 bits of the second source XMM register or 32-bit memory location to a scalar double-precision floating-point value and writes the converted value to the low-order 64 bits of the destination XMM register. Bits [127:64] of the destination are copied from the first source XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset CVTSS2SD SSE2 VCVTSS2SD AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic CVTSS2SD xmm1, xmm2/mem32 Opcode Description F3 0F 5A /r Converts a scalar single-precision floating-point value in xmm2 or mem32 to a scalar double-precision floating-point value in xmm1. Mnemonic Encoding VCVTSS2SD xmm1, xmm2, xmm3/mem64 VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.00001 X.src.X.10 5A /r Related Instructions (V)CVTPD2PS, (V)CVTPS2PD, (V)CVTSD2SS Instruction Reference CVTSS2SD, VCVTSS2SD 107 AMD64 Technology 26568—Rev. 3.22—May 2018 MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM OM 11 10 ZM 9 DM 8 IM 7 DAZ 6 PE 5 UE 4 OE 3 ZE 2 DE IE M M 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X X X X S S X S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE X — AVX and SSE exception A — AVX exception S — SSE exception 108 X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. CVTSS2SD, VCVTSS2SD Instruction Reference 26568—Rev. 3.22—May 2018 CVTSS2SI VCVTSS2SI AMD64 Technology Convert Scalar Single-Precision Floating-Point to Signed Doubleword or Quadword Integer Converts a single-precision floating-point value to a signed integer value and writes the converted value to a general-purpose register. When the result of the conversion is an inexact value, the value is rounded as specified by MXCSR.RC. When the floating-point value is a NaN, infinity, or the result of the conversion is larger than the maximum signed doubleword (–231 to +231 – 1) or quadword value (–263 to +263 – 1), the indefinite integer value (8000_0000h for 32-bit integers, 8000_0000_0000_0000h for 64-bit integers) is returned when the invalid-operation exception (IE) is masked. There are legacy and extended forms of the instruction: CVTSS2SI The legacy form has two encodings: • When REX.W = 0, converts a single-precision floating-point value in the low-order 32 bits of an XMM register or a 32-bit memory location to a 32-bit signed integer value and writes the converted value to a 32-bit general-purpose register. • When REX.W = 1, converts a single-precision floating-point value in the low-order 32 bits of an XMM register or a 32-bit memory location to a 64-bit signed integer value and writes the converted value to a 64-bit general-purpose register. VCVTSS2SI The extended form of the instruction has two 128-bit encodings: • When VEX.W = 0, converts a single-precision floating-point value in the low-order 32 bits of an XMM register or a 32-bit memory location to a 32-bit signed integer value and writes the converted value to a 32-bit general-purpose register. • When VEX.W = 1, converts a single-precision floating-point value in the low-order 32 bits of an XMM register or a 32-bit memory location to a 64-bit signed integer value and writes the converted value to a 64-bit general-purpose register. Instruction Support Form Subset Feature Flag CVTSS2SI SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) VCVTSS2SI AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Reference CVTSS2SI, VCVTSS2SI 109 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Encoding Mnemonic Opcode Description CVTSS2SI reg32, xmm1/mem32 F3 (W0) 0F 2D /r Converts a single-precision floating-point value in xmm1 or mem32 to a 32-bit integer value in reg32 CVTSS2SI reg64, xmm1//mem64 F3 (W1) 0F 2D /r Converts a single-precision floating-point value in xmm1 or mem64 to a 64-bit integer value in reg64 Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VCVTSS2SI reg32, xmm1/mem32 C4 RXB.00001 0.1111.X.10 2D /r VCVTSS2SI reg64, xmm1/mem64 C4 RXB.00001 1.1111.X.10 2D /r Related Instructions (V)CVTDQ2PS, (V)CVTPS2DQ, (V)CVTSI2SS, (V)CVTTPS2DQ, (V)CVTTSS2SI MXCSR Flags Affected MM FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE M 17 Note: 110 15 14 13 12 11 10 9 8 7 6 5 IE M 4 3 2 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. CVTSS2SI, VCVTSS2SI Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A A X S S X S S S S S S S S X X X X X X S S X S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference X X X A source operand was an SNaN value. Undefined operation. A result could not be represented exactly in the destination format. CVTSS2SI, VCVTSS2SI 111 AMD64 Technology 26568—Rev. 3.22—May 2018 CVTTPD2DQ Convert Packed Double-Precision Floating-Point VCVTTPD2DQ to Packed Doubleword Integer, Truncated Converts packed double-precision floating-point values to packed signed doubleword integer values and writes the converted values to the destination. When the result is an inexact value, it is truncated (rounded toward zero). When the floating-point value is a NaN, infinity, or the result of the conversion is larger than the maximum signed doubleword (–231 to +231 – 1), the instruction returns the 32-bit indefinite integer value (8000_0000h) when the invalid-operation exception (IE) is masked. There are legacy and extended forms of the instruction: CVTTPD2DQ Converts two packed double-precision floating-point values in an XMM register or a 128-bit memory location to two packed signed doubleword integers and writes the converted values to the two loworder doublewords of the destination XMM register. Bits [127:64] of the destination are cleared. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VCVTTPD2DQ The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Converts two packed double-precision floating-point values in an XMM register or a 128-bit memory location to two signed doubleword values and writes the converted values to the lower two doubleword elements of the destination XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Converts four packed double-precision floating-point values in a YMM register or a 256-bit memory location to four signed doubleword integer values and writes the converted values to an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset CVTTPD2DQ SSE2 VCVTTPD2DQ AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 112 CVTTPD2DQ, VCVTTPD2DQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic Opcode CVTTPD2DQ xmm1, xmm2/mem128 Description 66 0F E6 /r Converts two packed double-precision floating-point values in xmm2 or mem128 to packed doubleword integers in xmm1. Truncates inexact result. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VCVTTPD2DQ xmm1, xmm2/mem128 C4 RXB.00001 X.1111.0.01 E6 /r VCVTTPD2DQ xmm1, ymm2/mem256 C4 RXB.00001 X.1111.1.01 E6 /r Related Instructions (V)CVTDQ2PD, (V)CVTPD2DQ, (V)CVTPI2PD, (V)CVTSD2SI, (V)CVTSI2SD, (V)CVTTSD2SI MXCSR Flags Affected MM FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE M 17 Note: 15 14 13 12 11 10 9 8 7 6 5 IE M 4 3 2 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Instruction Reference CVTTPD2DQ, VCVTTPD2DQ 113 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A A X S S X S S S S S S S S X X X S X S S S S A X S X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD floating-point, #XF S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception 114 S S S S S S X X X A source operand was an SNaN value. Undefined operation. A result could not be represented exactly in the destination format. CVTTPD2DQ, VCVTTPD2DQ Instruction Reference 26568—Rev. 3.22—May 2018 CVTTPS2DQ VCVTTPS2DQ AMD64 Technology Convert Packed Single-Precision Floating-Point to Packed Doubleword Integers, Truncated Converts packed single-precision floating-point values to packed signed doubleword integer values and writes the converted values to the destination. When the result of the conversion is an inexact value, the value is truncated (rounded toward zero). When the floating-point value is a NaN, infinity, or the result of the conversion is larger than the maximum signed doubleword (–231 to +231 – 1), the instruction returns the 32-bit indefinite integer value (8000_0000h) when the invalid-operation exception (IE) is masked. There are legacy and extended forms of the instruction: CVTTPS2DQ Converts four packed single-precision floating-point values in an XMM register or a 128-bit memory location to four packed signed doubleword integer values and writes the converted values to an XMM register. The high-order 128-bits of the corresponding YMM register are not affected. VCVTTPS2DQ The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Converts four packed single-precision floating-point values in an XMM register or a 128-bit memory location to four packed signed doubleword integer values and writes the converted values to an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Converts eight packed single-precision floating-point values in a YMM register or a 256-bit memory location to eight packed signed doubleword integer values and writes the converted values to a YMM register. Instruction Support Form Subset CVTTPS2DQ SSE2 VCVTTPS2DQ AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode CVTTPS2DQ xmm1, xmm2/mem128 F3 0F 5B /r Description Converts four packed single-precision floating-point values in xmm2 or mem128 to four packed doubleword integers in xmm1. Truncates inexact result. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VCVTTPS2DQ xmm1, xmm2/mem128 C4 RXB.00001 X.1111.0.10 5B /r VCVTTPS2DQ ymm1, ymm2/mem256 C4 RXB.00001 X.1111.1.10 5B /r Instruction Reference CVTTPS2DQ, VCVTTPS2DQ 115 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)CVTDQ2PS, (V)CVTPS2DQ, (V)CVTSI2SS, (V)CVTSS2SI, (V)CVTTSS2SI MXCSR Flags Affected MM FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE M 17 Note: 15 14 13 12 11 10 9 8 7 6 5 IE M 4 3 2 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A A X S S X S S S S S S S S X X X S X S S S S A X S X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD floating-point, #XF S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception 116 S S S S S S X X X A source operand was an SNaN value. Undefined operation. A result could not be represented exactly in the destination format. CVTTPS2DQ, VCVTTPS2DQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology CVTTSD2SI Convert Scalar Double-Precision Floating-Point VCVTTSD2SI to Signed Double- or Quadword Integer, Truncated Converts a scalar double-precision floating-point value to a signed integer value and writes the converted value to a general-purpose register. When the result of the conversion is an inexact value, the value is truncated (rounded toward zero). When the floating-point value is a NaN, infinity, or the result of the conversion is larger than the maximum signed doubleword (–231 to +231 – 1) or quadword value (–263 to +263 – 1), the instruction returns the indefinite integer value (8000_0000h for 32-bit integers, 8000_0000_0000_0000h for 64bit integers) when the invalid-operation exception (IE) is masked. There are legacy and extended forms of the instruction: CVTTSD2SI The legacy form of the instruction has two encodings: • When REX.W = 0, converts a scalar double-precision floating-point value in the low-order 64 bits of an XMM register or a 64-bit memory location to a 32-bit signed integer and writes the converted value to a 32-bit general purpose register. • When REX.W = 1, converts a scalar double-precision floating-point value in the low-order 64 bits of an XMM register or a 64-bit memory location to a 64-bit sign-extended integer and writes the converted value to a 64-bit general purpose register. VCVTTSD2SI The extended form of the instruction has two 128-bit encodings. • When VEX.W = 0, converts a scalar double-precision floating-point value in the low-order 64 bits of an XMM register or a 64-bit memory location to a 32-bit signed integer and writes the converted value to a 32-bit general purpose register. • When VEX.W = 1, converts a scalar double-precision floating-point value in the low-order 64 bits of an XMM register or a 64-bit memory location to a 64-bit sign-extended integer and writes the converted value to a 64-bit general purpose register. Instruction Support Form Subset CVTTSD2SI SSE2 VCVTTSD2SI AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Reference CVTTSD2SI, VCVTTSD2SI 117 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Encoding Mnemonic CVTTSD2SI reg32, xmm1/mem64 CVTTSD2SI reg64, xmm1/mem64 Opcode Description F2 (W0) 0F 2C /r Converts a packed double-precision floating-point value in xmm1 or mem64 to a doubleword integer in reg32. Truncates inexact result. F2 (W1) 0F 2C /r Converts a packed double-precision floating-point value in xmm1 or mem64 to a quadword integer in reg64.Truncates inexact result. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp C4 RXB.00001 0.1111.X.11 C4 RXB.00001 1.1111.X.11 VCVTTSD2SI reg32, xmm2/mem64 VCVTTSD2SI reg64, xmm2/mem64 Opcode 2C /r 2C /r Related Instructions (V)CVTDQ2PD, (V)CVTPD2DQ, (V)CVTPI2PD, (V)CVTSD2SI, (V)CVTSI2SD, (V)CVTTPD2DQ MXCSR Flags Affected MM FZ 17 15 RC PM UM OM ZM DM IM DAZ 12 11 10 9 8 7 6 PE UE OE ZE DE 4 3 2 1 M Note: 118 14 13 5 IE M 0 A flag that may be set or cleared is M (modified). Unaffected flags are blank. CVTTSD2SI, VCVTTSD2SI Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A A X S S X S S S S S S S S X X X X X X S S X S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference X X X A source operand was an SNaN value. Undefined operation. A result could not be represented exactly in the destination format. CVTTSD2SI, VCVTTSD2SI 119 AMD64 Technology 26568—Rev. 3.22—May 2018 CVTTSS2SI Convert Scalar Single-Precision Floating-Point VCVTTSS2SI to Signed Double or Quadword Integer, Truncated Converts a single-precision floating-point value to a signed integer value and writes the converted value to a general-purpose register. When the result of the conversion is an inexact value, the value is truncated (rounded toward zero). When the floating-point value is a NaN, infinity, or the result of the conversion is larger than the maximum signed doubleword (–231 to +231 – 1) or quadword value (–263 to +263 – 1), the indefinite integer value (8000_0000h for 32-bit integers, 8000_0000_0000_0000h for 64-bit integers) is returned when the invalid-operation exception (IE) is masked. There are legacy and extended forms of the instruction: CVTTSS2SI The legacy form of the instruction has two encodings: • When REX.W = 0, converts a single-precision floating-point value in the low-order 32 bits of an XMM register or a 32-bit memory location to a 32-bit signed integer value and writes the converted value to a 32-bit general-purpose register. Bits [255:128] of the YMM register that corresponds to the source are not affected. • When REX.W = 1, converts a single-precision floating-point value in the low-order 32 bits of an XMM register or a 32-bit memory location to a 64-bit signed integer value and writes the converted value to a 64-bit general-purpose register. Bits [255:128] of the YMM register that corresponds to the source are not affected. VCVTTSS2SI The extended form of the instruction has two 128-bit encodings: • When VEX.W = 0, converts a single-precision floating-point value in the low-order 32 bits of an XMM register or a 32-bit memory location to a 32-bit signed integer value and writes the converted value to a 32-bit general-purpose register. Bits [255:128] of the YMM register that corresponds to the source are cleared. • When VEX.W = 1, converts a single-precision floating-point value in the low-order 32 bits of an XMM register or a 32-bit memory location to a 64-bit signed integer value and writes the converted value to a 64-bit general-purpose register. Bits [255:128] of the YMM register that corresponds to the source are cleared. Instruction Support Form Subset Feature Flag CVTTSS2SI SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) VCVTTSS2SI AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 120 CVTTSS2SI, VCVTTSS2SI Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic Opcode Description CVTTSS2SI reg32, xmm1/mem32 F3 (W0) 0F 2C /r Converts a single-precision floating-point value in xmm1 or mem32 to a 32-bit integer value in reg32. Truncates inexact result. CVTTSS2SI reg64, xmm1/mem64 F3 (W1) 0F 2C /r Converts a single-precision floating-point value in xmm1 or mem64 to a 64-bit integer value in reg64. Truncates inexact result. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VCVTTSS2SI reg32, xmm1/mem32 C4 RXB.00001 0.1111.X.10 2C /r VCVTTSS2SI reg64, xmm1/mem64 C4 RXB.00001 1.1111.X.10 2C /r Related Instructions (V)CVTDQ2PS, (V)CVTPS2DQ, (V)CVTSI2SS, (V)CVTSS2SI, (V)CVTTPS2DQ MXCSR Flags Affected MM FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE M 17 Note: 15 14 13 12 11 10 9 8 7 6 5 IE M 4 3 2 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Instruction Reference CVTTSS2SI, VCVTTSS2SI 121 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A A X S S X S S S S S S S S X X X X X X S S X S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception 122 X X X A source operand was an SNaN value. Undefined operation. A result could not be represented exactly in the destination format. CVTTSS2SI, VCVTTSS2SI Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology DIVPD VDIVPD Divide Packed Double-Precision Floating-Point Divides each of the packed double-precision floating-point values of the first source operand by the corresponding packed double-precision floating-point values of the second source operand and writes the quotients to the destination. There are legacy and extended forms of the instruction: DIVPD Divides two packed double-precision floating-point values in the first source XMM register by the corresponding packed double-precision floating-point values in either a second source XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VDIVPD The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Divides two packed double-precision floating-point values in the first source XMM register by the corresponding packed double-precision floating-point values in either a second source XMM register or a 128-bit memory location and writes the two results a destination XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Divides four packed double-precision floating-point values in the first source YMM register by the corresponding packed double-precision floating-point values in either a second source YMM register or a 256-bit memory location and writes the two results a destination YMM register. Instruction Support Form Subset DIVPD SSE2 VDIVPD AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic DIVPD xmm1, xmm2/mem128 Opcode 66 0F 5E /r Description Divides packed double-precision floating-point values in xmm1 by the packed double-precision floating-point values in xmm2 or mem128. Writes quotients to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VDIVPD xmm1, xmm2, xmm3/mem128 C4 RXB.00001 X.src.0.01 5E /r VDIVPD ymm1, ymm2, ymm3/mem256 C4 RXB.00001 X.src.1.01 5E /r Instruction Reference DIVPD, VDIVPD 123 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)DIVPS, (V)DIVSD, (V)DIVSS MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM OM 11 10 ZM 9 DM 8 IM 7 DAZ 6 PE UE OE ZE DE IE M M M M M M 5 4 3 2 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X S X S S S S A X S S X S S S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Division by zero, ZE Overflow, OE Underflow, UE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception 124 X X X X X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Division of finite dividend by zero-value divisor. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. DIVPD, VDIVPD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology DIVPS VDIVPS Divide Packed Single-Precision Floating-Point Divides each of the packed single-precision floating-point values of the first source operand by the corresponding packed single-precision floating-point values of the second source operand and writes the quotients to the destination. There are legacy and extended forms of the instruction: DIVPS Divides four packed single-precision floating-point values in the first source XMM register by the corresponding packed single-precision floating-point values in either a second source XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VDIVPS The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Divides four packed single-precision floating-point values in the first source XMM register by the corresponding packed single-precision floating-point values in either a second source XMM register or a 128-bit memory location and writes two results to a third destination XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Divides eight packed single-precision floating-point values in the first source YMM register by the corresponding packed single-precision floating-point values in either a second source YMM register or a 256-bit memory location and writes the two results a destination YMM register. Instruction Support Form Subset Feature Flag DIVPS SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) VDIVPS AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode DIVPS xmm1, xmm2/mem128 0F 5E /r Description Divides packed single-precision floating-point values in xmm1 by the corresponding values in xmm2 or mem128. Writes quotients to xmm1 Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VDIVPS xmm1, xmm2, xmm3/mem128 C4 RXB.00001 X.src.0.00 5E /r VDIVPS ymm1, ymm2, ymm3/mem256 C4 RXB.00001 X.src.1.00 5E /r Instruction Reference DIVPS, VDIVPS 125 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)DIVPD, (V)DIVSD, (V)DIVSS MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM OM 11 10 ZM 9 DM 8 IM 7 DAZ 6 PE UE OE ZE DE IE M M M M M M 5 4 3 2 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X S X S S S S A X S S X S S S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Division by zero, ZE Overflow, OE Underflow, UE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception 126 X X X X X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Division of finite dividend by zero-value divisor. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. DIVPS, VDIVPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology DIVSD VDIVSD Divide Scalar Double-Precision Floating-Point Divides the double-precision floating-point value in the low-order quadword of the first source operand by the double-precision floating-point value in the low-order quadword of the second source operand and writes the quotient to the low-order quadword of the destination. There are legacy and extended forms of the instruction: DIVSD The first source operand is an XMM register and the second source operand is either an XMM register or a 64-bit memory location. The first source register is also the destination register. Bits [127:64] of the destination are not affected. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VDIVSD The extended form of the instruction has a 128-bit encoding only. The first source operand is an XMM register and the second source operand is either an XMM register or a 64-bit memory location. Bits [127:64] of the first source operand are copied to bits [127:64] of the destination. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset DIVSD SSE2 VDIVSD AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic DIVSD xmm1, xmm2/mem64 Opcode Description F2 0F 5E /r Divides the double-precision floating-point value in the loworder 64 bits of xmm1by the corresponding value in xmm2 or mem64. Writes quotient to xmm1. Mnemonic VDIVSD xmm1, xmm2, xmm3/mem64 Encoding VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.00001 X.src.X.11 5E /r Related Instructions (V)DIVPD, (V)DIVPS, (V)DIVSS Instruction Reference DIVSD, VDIVSD 127 AMD64 Technology 26568—Rev. 3.22—May 2018 MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM OM 11 10 ZM 9 DM 8 IM 7 DAZ 6 PE UE OE ZE DE IE M M M M M M 5 4 3 2 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X X X X S S X S S S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Division by zero, ZE Overflow, OE Underflow, UE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception 128 X X X X X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Division of finite dividend by zero-value divisor. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. DIVSD, VDIVSD Instruction Reference 26568—Rev. 3.22—May 2018 DIVSS VDIVSS AMD64 Technology Divide Scalar Single-Precision Floating-Point Divides the single-precision floating-point value in the low-order doubleword of the first source operand by the single-precision floating-point value in the low-order doubleword of the second source operand and writes the quotient to the low-order doubleword of the destination. There are legacy and extended forms of the instruction: DIVSS The first source operand is an XMM register and the second source operand is either an XMM register or a 32-bit memory location. The first source register is also the destination register. Bits [127:32] of the destination are not affected. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VDIVSS The extended form of the instruction has a 128-bit encoding only. The first source operand is an XMM register and the second source operand is either an XMM register or a 64-bit memory location. The destination is a third XMM register. Bits [127:32] of the first source operand are copied to bits [127:32] of the destination. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset Feature Flag DIVSS SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) VDIVSS AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic DIVSS xmm1, xmm2/mem32 Opcode F3 0F 5E /r Description Divides a single-precision floating-point value in the loworder doubleword of xmm1 by a corresponding value in xmm2 or mem32. Writes the quotient to xmm1. Mnemonic VDIVSS xmm1, xmm2, xmm3/mem32 Encoding VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.00001 X.src.X.10 5E /r Related Instructions (V)DIVPD, (V)DIVPS, (V)DIVSD Instruction Reference DIVSS, VDIVSS 129 AMD64 Technology 26568—Rev. 3.22—May 2018 MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM OM 11 10 ZM 9 DM 8 IM 7 DAZ 6 PE UE OE ZE DE IE M M M M M M 5 4 3 2 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X X X X S S X S S S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Division by zero, ZE Overflow, OE Underflow, UE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception 130 X X X X X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Division of finite dividend by zero-value divisor. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. DIVSS, VDIVSS Instruction Reference 26568—Rev. 3.22—May 2018 DPPD VDPPD AMD64 Technology Dot Product Packed Double-Precision Floating-Point Computes the dot-product of the input operands. An immediate operand specifies both the input values and the destination locations to which the products are written. Selectively multiplies packed double-precision values in a source operand by the corresponding values in a second source operand, writes the results to a temporary location, adds the results, writes the sum to a second temporary location and selectively writes the sum to a destination. Mask bits [5:4] of an 8-bit immediate operand perform multiplicative selection. Bit 5 selects bits [127:64] of the source operands; bit 4 selects bits [63:0] of the source operands. When a mask bit = 1, the corresponding packed double-precision floating point values are multiplied and the product is written to the corresponding position of a 128-bit temporary location. When a mask bit = 0, the corresponding position of the temporary location is cleared. After the two 64-bit values in the first temporary location are added and written to the 64-bit second temporary location, mask bits [1:0] of the same 8-bit immediate operand perform write selection. Bit 1 selects bits [127:64] of the destination; bit 0 selects bits [63:0] of the destination. When a mask bit = 1, the 64-bit value of the second temporary location is written to the corresponding position of the destination. When a mask bit = 0, the corresponding position of the destination is cleared. When the operation produces a NaN, its value is determined as follows. Source Operands (in either order) Note: NaN Result1 QNaN Any non-NaN floating-point value (or single-operand instruction) Value of QNaN SNaN Any non-NaN floating-point value (or single-operand instruction) Value of SNaN, converted to a QNaN2 QNaN QNaN First operand QNaN SNaN First operand (converted to QNaN if SNaN SNaN SNaN First operand converted to a QNaN2 1. A NaN result produced when the floating-point invalid-operation exception is masked. 2. The conversion is done by changing the most-significant fraction bit to 1. For each addition occurring in either the second or third step, for the purpose of NaN propagation, the addend of lower bit index is considered to be the first of the two operands. For example, when both multiplications produce NaNs, the one that corresponds to bits [64:0] is written to all indicated fields of the destination, regardless of how those NaNs were generated from the sources. When the highorder multiplication produces NaNs and the low-order multiplication produces infinities of opposite signs, the real indefinite QNaN (produced as the sum of the infinities) is written to the destination. NaNs in source operands or in computational results result in at least one NaN in the destination. For the 256-bit version, NaNs are propagated within the two independent dot product operations only to their respective 128-bit results. Instruction Reference DPPD, VDPPD 131 AMD64 Technology 26568—Rev. 3.22—May 2018 There are legacy and extended forms of the instruction: DPPD The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VDPPD The extended form of the instruction has a single 128-bit encoding. The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset DPPD SSE4.1 VDPPD AVX Feature Flag CPUID Fn0000_0001_ECX[SSE41] (bit 19) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic DPPD xmm1, xmm2/mem128, imm8 Opcode Description 66 0F 3A 41 /r ib Selectively multiplies packed double-precision floating-point values in xmm2 or mem128 by corresponding values in xmm1, adds interim products, selectively writes results to xmm1. Mnemonic Encoding VDPPD xmm1, xmm2, xmm3/mem128, imm8 VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.00011 X.src.0.01 41 /r ib Related Instructions (V)DPPS MXCSR Flags Affected MM 17 Note: 132 FZ 15 RC 14 PM 13 12 UM 11 OM 10 ZM 9 DM 8 IM 7 DAZ 6 PE UE OE M M M 5 4 3 ZE 2 DE IE M M 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Exceptions are determined separately for each add-multiply operation. Unmasked exceptions do not affect the destination DPPD, VDPPD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A A X S S X S S S S S S S S X X X S X S S S S A X S S X S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference X X X X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. DPPD, VDPPD 133 AMD64 Technology DPPS VDPPS 26568—Rev. 3.22—May 2018 Dot Product Packed Single-Precision Floating-Point Computes the dot-product of the input operands. An immediate operand specifies both the input values and the destination locations to which the products are written. Selectively multiplies packed single-precision values in a source operand by corresponding values in a second source operand, writes results to a temporary location, adds pairs of results, writes the sums to additional temporary locations, and selectively writes a cumulative sum to a destination. Mask bits [7:4] of an 8-bit immediate operand perform multiplicative selection. Each bit selects a 32bit segment of the source operands; bit 7 selects bits [127:96], bit 6 selects bits [95:64], bit 5 selects bits [63:32], and bit 4 selects bits [31:0]. When a mask bit = 1, the corresponding packed single-precision floating point values are multiplied and the product is written to the corresponding position of a 128-bit temporary location. When a mask bit = 0, the corresponding position of the temporary location is cleared. After multiplication, three pairs of 32-bit values are added and written to temporary locations. Bits [63:32] and [31:0] of temporary location 1 are added and written to 32-bit temporary location 2; bits [127:96] and [95:64] of temporary location 1 are added and written to 32-bit temporary location 3; then the contents of temporary locations 2 and 3 are added and written to 32-bit temporary location 4. After addition, mask bits [3:0] of the same 8-bit immediate operand perform write selection. Each bit selects a 32-bit segment of the source operands; bit 3 selects bits [127:96], bit 2 selects bits [95:64], bit 1 selects bits [63:32], and bit 0 selects bits [31:0] of the destination. When a mask bit = 1, the 64bit value of the fourth temporary location is written to the corresponding position of the destination. When a mask bit = 0, the corresponding position of the destination is cleared. For the 256-bit extended encoding, this process is performed on the upper and lower 128 bits of the affected YMM registers. When the operation produces a NaN, its value is determined as follows. Source Operands (in either order) Note: NaN Result1 QNaN Any non-NaN floating-point value (or single-operand instruction) Value of QNaN SNaN Any non-NaN floating-point value (or single-operand instruction) Value of SNaN, converted to a QNaN2 QNaN QNaN First operand QNaN SNaN First operand (converted to QNaN if SNaN SNaN SNaN First operand converted to a QNaN2 1. A NaN result produced when the floating-point invalid-operation exception is masked. 2. The conversion is done by changing the most-significant fraction bit to 1. For each addition occurring in either the second or third step, for the purpose of NaN propagation, the addend of lower bit index is considered to be the first of the two operands. For example, when all four multiplications produce NaNs, the one that corresponds to bits [31:0] is written to all indicated fields 134 DPPS, VDPPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology of the destination, regardless of how those NaNs were generated from the sources. When the two highest-order multiplication produce NaNs and the two lowest-low-order multiplications produce infinities of opposite signs, the real indefinite QNaN (produced as the sum of the infinities) is written to the destination. NaNs in source operands or in computational results result in at least one NaN in the destination. For the 256-bit version, NaNs are propagated within the two independent dot product operations only to their respective 128-bit results. There are legacy and extended forms of the instruction: DPPS The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VDPPS The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register and the second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset DPPS SSE4.1 VDPPS AVX Feature Flag CPUID Fn0000_0001_ECX[SSE41] (bit 19) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic DPPS xmm1, xmm2/mem128, imm8 Opcode Description 66 0F 3A 40 /r ib Selectively multiplies packed single-precision floating-point values in xmm2 or mem128 by corresponding values in xmm1, adds interim products, selectively writes results to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VDPPS xmm1, xmm2, xmm3/mem128, imm8 C4 RXB.00011 X.src.0.01 40 /r ib VDPPS ymm1, ymm2, ymm3/mem256, imm8 C4 RXB.00011 X.src.1.01 40 /r ib Related Instructions (V)DPPD Instruction Reference DPPS, VDPPS 135 AMD64 Technology 26568—Rev. 3.22—May 2018 MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM OM 11 10 ZM 9 DM 8 IM 7 DAZ 6 PE UE OE M M M 5 4 3 ZE 2 DE IE M M 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Exceptions are determined separately for each add-multiply operation. Unmasked exceptions do not affect the destination Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X S X S S S S A X S S X S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception 136 X X X X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. DPPS, VDPPS Instruction Reference 26568—Rev. 3.22—May 2018 EXTRACTPS VEXTRACTPS AMD64 Technology Extract Packed Single-Precision Floating-Point Copies one of four packed single-precision floating-point values from a source XMM register to a general purpose register or a 32-bit memory location. Bits [1:0] of an immediate byte operand specify the location of the 32-bit value that is copied. 00b corresponds to the low word of the source register and 11b corresponds to the high word of the source register. Bits [7:2] of the immediate operand are ignored. There are legacy and extended forms of the instruction: EXTRACTPS The source operand is an XMM register. The destination can be a general purpose register or a 32-bit memory location. A 32-bit single-precision value extracted to a general purpose register is zeroextended to 64-bits. VEXTRACTPS The extended form of the instruction has a single 128-bit encoding. The source operand is an XMM register. The destination can be a general purpose register or a 32-bit memory location. Instruction Support Form Subset EXTRACTPS SSE4.1 VEXTRACTPS AVX Feature Flag CPUID Fn0000_0001_ECX[SSE41] (bit 19) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode EXTRACTPS reg32/mem32, xmm1 imm8 66 0F 3A 17 /r ib Description Extract the single-precision floating-point element of xmm1 specified by imm8 to reg32/mem32. Mnemonic Encoding VEXTRACTPS reg32/mem32, xmm1, imm8 VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.00011 X.1111.0.01 17 /r ib Related Instructions (V)INSERTPS Instruction Reference EXTRACTPS, VEXTRACTPS 137 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — AVX and SSE exception A — AVX exception S — SSE exception 138 S S X S S A A A A A X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Write to a read-only data segment. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. EXTRACTPS, VEXTRACTPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology EXTRQ Extract Field From Register Extracts specified bits from the lower 64 bits of the first operand (the destination XMM register). The extracted bits are saved in the least-significant bit positions of the lower quadword of the destination; the remaining bits in the lower quadword of the destination register are cleared to 0. The upper quadword of the destination register is undefined. The portion of the source data being extracted is defined by the bit index and the field length. The bit index defines the least-significant bit of the source operand being extracted. Bits [bit index + length field – 1]:[bit index] are extracted. If the sum of the bit index + length field is greater than 64, the results are undefined. For example, if the bit index is 32 (20h) and the field length is 16 (10h), then the result in the destination register will be source [47:32] in bits 15:0, with zeros in bits 63:16. A value of zero in the field length is defined as a length of 64. If the length field is 0 and the bit index is 0, bits 63:0 of the source are extracted. For any other value of the bit index, the results are undefined. The bit index and field length can be specified as immediate values (second and first immediate operands, respectively, in the case of the three argument version of the instruction), or they can both be specified by fields in an XMM source operand. In the latter case, bits [5:0] of the XMM register specify the number of bits to extract (the field length) and bits [13:8] of the XMM register specify the index of the first bit in the field to extract. The bit index and field length are each six bits in length; other bits of the field are ignored. The diagram below illustrates the operation of this instruction. XMM1 127 second imm8 64 63 7 5 0 0 first imm8 7 5 0 shift right mask to field length XMM1 127 XMM2 64 63 0 127 13 8 5 0 shift right mask to field length Instruction Reference EXTRQ 139 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Support Form Subset EXTRQ SSE4A Feature Flag CPUID Fn8000_0001_ECX[SSE4A] (bit 6) Software must check the CPUID bit once per program or library initialization before using the instruction, or inconsistent behavior may result. For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode EXTRQ xmm1, imm8, imm8 EXTRQ xmm1, xmm2 Description 66 0F 78 /0 ib ib Extract field from xmm1, with the least significant bit of the extracted data starting at the bit index specified by [5:0] of the second immediate byte, with the length specified by [5:0] of the first immediate byte. 66 0F 79 /r Extract field from xmm1, with the least significant bit of the extracted data starting at the bit index specified by xmm2[13:8], with the length specified by xmm2[5:0]. Related Instructions INSERTQ, PINSRW, PEXTRW rFLAGS Affected None Exceptions Exception Invalid opcode, #UD Device not available, #NM 140 Real Virtual 8086 Protected Cause of Exception X X X SSE4A instructions are not supported, as indicated by CPUID Fn8000_0001_ECX[SSE4A] = 0. X X X The emulate bit (EM) of CR0 was set to 1. X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0. X X X The task-switch bit (TS) of CR0 was set to 1. EXTRQ Instruction Reference 26568—Rev. 3.22—May 2018 HADDPD VHADDPD AMD64 Technology Horizontal Add Packed Double-Precision Floating-Point Adds adjacent pairs of double-precision floating-point values in two source operands and writes the sums to a destination. There are legacy and extended forms of the instruction: HADDPD Adds the packed double-precision values in bits [127:64] and bits [63:0] of the first source XMM register and writes the sum to bits [63:0] of the destination; adds the corresponding doublewords of the second source XMM register or a 128-bit memory location and writes the sum to bits [127:64] of the destination. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VHADDPD The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Adds the packed double-precision values in bits [127:64] and bits [63:0] of the first source XMM register and writes the sum to bits [63:0] of the destination XMM register; adds the corresponding doublewords of the second source XMM register or a 128-bit memory location and writes the sum to bits [127:64] of the destination. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Adds the packed double-precision values in bits [127:64] and bits [63:0] of the of the first source YMM register and writes the sum to bits [63:0] of the destination YMM register; adds the corresponding doublewords of the second source YMM register or a 256-bit memory location and writes the sum to bits [127:64] of the destination. Performs the same process for the upper 128 bits of the sources and destination. Instruction Support Form Subset Feature Flag HADDPD SSE3 CPUID Fn0000_0001_ECX[SSE3] (bit 0) VHADDPD AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic HADDPD xmm1, xmm2/mem128 Opcode Description 66 0F 7C /r Adds adjacent pairs of double-precision values in xmm1 and xmm2 or mem128. Writes the sums to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VHADDPD xmm1, xmm2, xmm3/mem128 C4 RXB.00001 X.src.0.01 7C /r VHADDPD ymm1, ymm2, ymm3/mem256 C4 RXB.00001 X.src.1.01 7C /r Instruction Reference HADDPD, VHADDPD 141 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)HADDPS, (V)HSUBPD, (V)HSUBPS MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM OM 11 10 ZM 9 DM 8 IM DAZ 7 6 PE UE OE M M M 5 4 3 ZE 2 DE IE M M 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X S X S S S S A X S S X S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception 142 X X X X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. HADDPD, VHADDPD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology HADDPS VHADDPS Horizontal Add Packed Single-Precision Adds adjacent pairs of single-precision floating-point values in two source operands and writes the sums to a destination. There are legacy and extended forms of the instruction: HADDPS Adds the packed single-precision values in bits [63:32] and bits [31:0] of the first source XMM register and writes the sum to bits [31:0] of the destination; adds the packed single-precision values in bits [127:96] and bits [95:64] of the first source register and writes the sum to bits [63:32] of the destination. Adds the corresponding values in the second source XMM register or a 128-bit memory location and writes the sum to bits [95:64] and [127:96] of the destination. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VHADDPS The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Adds the packed single-precision values in bits [63:32] and bits [31:0] of the first source XMM register and writes the sum to bits [31:0] of the destination XMM register; adds the packed single-precision values in bits [127:96] and bits [95:64] of the first source register and writes the sum to bits [63:32] of the destination. Adds the corresponding values in the second source XMM register or a 128-bit memory location and writes the sum to bits [95:64] and [127:96] of the destination. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Adds the packed single-precision values in bits [63:32] and bits [31:0] of the first source YMM register and writes the sum to bits [31:0] of the destination YMM register; adds the packed single-precision values in bits [127:96] and bits [95:64] of the first source register and writes the sum to bits [63:32] of the destination. Adds the corresponding values in the second source YMM register or a 256-bit memory location and writes the sums to bits [95:64] and [127:96] of the destination. Performs the same process for the upper 128 bits of the sources and destination. Instruction Support Form Subset Feature Flag HADDPS SSE3 CPUID Fn0000_0001_ECX[SSE3] (bit 0) VHADDPS AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Reference HADDPS, VHADDPS 143 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Encoding Mnemonic HADDPS xmm1, xmm2/mem128 Opcode F2 0F 7C /r Mnemonic VHADDPS xmm1, xmm2, xmm3/mem128 VHADDPS ymm1, ymm2, ymm3/mem256 Description Adds adjacent pairs of single-precision values in xmm1 and xmm2 or mem128. Writes the sums to xmm1. Encoding VEX RXB.map_select W.vvvv.L.pp Opcode X.src.0.11 7C /r C4 RXB.00001 X.src.1.11 7C /r C4 RXB.00001 Related Instructions (V)HADDPD, (V)HSUBPD, (V)HSUBPS MXCSR Flags Affected MM FZ 17 15 Note: 144 RC 14 13 PM UM OM ZM DM IM DAZ 12 11 10 9 8 7 6 PE UE OE M M M 5 4 3 ZE 2 DE IE M M 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. HADDPS, VHADDPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X S X S S S S A X S X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD floating-point, #XF S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference S S S S S S S S S S S S X X X X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. HADDPS, VHADDPS 145 AMD64 Technology HSUBPD VHSUBPD 26568—Rev. 3.22—May 2018 Horizontal Subtract Packed Double-Precision Subtracts adjacent pairs of double-precision floating-point values in two source operands and writes the sums to a destination. There are legacy and extended forms of the instruction: HSUBPD The first source register is also the destination. Subtracts the packed double-precision value in bits [127:64] from the value in bits [63:0] of the first source XMM register and writes the difference to bits [63:0] of the destination; subtracts the corresponding values of the second source XMM register or a 128-bit memory location and writes the difference to bits [127:64] of the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VHSUBPD The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Subtracts the packed double-precision values in bits [127:64] from the value in bits [63:0] of the first source XMM register and writes the difference to bits [63:0] of the destination XMM register; subtracts the corresponding values of the second source XMM register or a 128-bit memory location and writes the difference to bits [127:64] of the destination. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Subtracts the packed double-precision values in bits [127:64] from the value in bits [63:0] of the of the first source YMM register and writes the difference to bits [63:0] of the destination YMM register; subtracts the corresponding values of the second source YMM register or a 256-bit memory location and writes the difference to bits [127:64] of the destination. Performs the same process for the upper 128 bits of the sources and destination. Instruction Support Form Subset Feature Flag HSUBPD SSE3 CPUID Fn0000_0001_ECX[SSE3] (bit 0) VHSUBPD AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 146 HSUBPD, VHSUBPD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic HSUBPD xmm1, xmm2/mem128 Opcode Description 66 0F 7D /r Subtracts adjacent pairs of double-precision floatingpoint values in xmm1 and xmm2 or mem128. Writes the differences to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VHSUBPD xmm1, xmm2, xmm3/mem128 C4 RXB.00001 X.src.0.01 7D /r VHSUBPD ymm1, ymm2, ymm3/mem256 C4 RXB.00001 X.src.1.01 7D /r Related Instructions (V)HSUBPS, (V)HADDPD, (V)HADDPS MXCSR Flags Affected MM FZ 17 15 Note: RC 14 13 PM UM OM ZM DM IM DAZ 12 11 10 9 8 7 6 PE UE OE M M M 5 4 3 ZE 2 DE IE M M 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Instruction Reference HSUBPD, VHSUBPD 147 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X S X S S S S A X S X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD floating-point, #XF S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception 148 S S S S S S S S S S S S X X X X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. HSUBPD, VHSUBPD Instruction Reference 26568—Rev. 3.22—May 2018 HSUBPS VHSUBPS AMD64 Technology Horizontal Subtract Packed Single Subtracts adjacent pairs of single-precision floating-point values in two source operands and writes the differences to a destination. There are legacy and extended forms of the instruction: HSUBPS Subtracts the packed single-precision values in bits [63:32] from the values in bits [31:0] of the first source XMM register and writes the difference to bits [31:0] of the destination; subtracts the packed single-precision values in bits [127:96] from the value in bits [95:64] of the first source register and writes the difference to bits [63:32] of the destination. Subtracts the corresponding values of the second source XMM register or a 128-bit memory location and writes the differences to bits [95:64] and [127:96] of the destination. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VHSUBPS The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Subtracts the packed single-precision values in bits [63:32] from the value in bits [31:0] of the first source XMM register and writes the difference to bits [31:0] of the destination XMM register; subtracts the packed single-precision values in bits [127:96] from the value bits [95:64] of the first source register and writes the sum to bits [63:32] of the destination. Subtracts the corresponding values of the second source XMM register or a 128-bit memory location and writes the differences to bits [95:64] and [127:96] of the destination. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Subtracts the packed single-precision values in bits [63:32] from the value in bits [31:0] of the first source YMM register and writes the difference to bits [31:0] of the destination YMM register; subtracts the packed single-precision values in bits [127:96] from the value in bits [95:64] of the first source register and writes the difference to bits [63:32] of the destination. Subtracts the corresponding values of the second source YMM register or a 256-bit memory location and writes the differences to bits [95:64] and [127:96] of the destination. Performs the same process for the upper 128 bits of the sources and destination. Instruction Support Form Subset Feature Flag HSUBPS SSE3 CPUID Fn0000_0001_ECX[SSE3] (bit 0) VHSUBPS AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Reference HSUBPS, VHSUBPS 149 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Encoding Mnemonic HSUBPS xmm1, xmm2/mem128 Opcode F2 0F 7D /r Mnemonic VHSUBPS xmm1, xmm2, xmm3/mem128 VHSUBPS ymm1, ymm2, ymm3/mem256 Description Subtracts adjacent pairs of values in xmm1 and xmm2 or mem128. Writes differences to xmm1. Encoding VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.00001 X.src.0.11 7D /r C4 RXB.00001 X.src.1.11 7D /r Related Instructions (V)HSUBPD, (V)HADDPD, (V)HADDPS MXCSR Flags Affected MM 17 Note: 150 FZ 15 RC 14 PM 13 12 UM 11 OM 10 ZM 9 DM 8 IM DAZ 7 6 PE UE OE M M M 5 4 3 ZE 2 DE IE M M 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. HSUBPS, VHSUBPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X S X S S S S A X S X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD floating-point, #XF S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference S S S S S S S S S S S S X X X X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. HSUBPS, VHSUBPS 151 AMD64 Technology 26568—Rev. 3.22—May 2018 INSERTPS VINSERTPS Insert Packed Single-Precision Floating-Point Copies a selected single-precision floating-point value from a source operand to a selected location in a destination register and optionally clears selected elements of the destination. The legacy and extended forms of the instruction treat the remaining elements of the destination in different ways. Selections are specified by three fields of an immediate 8-bit operand: 7 6 COUNT_S 5 4 COUNT_D 3 2 1 0 ZMASK COUNT_S — The binary value of the field specifies a 32-bit element of a source register, counting upward from the low-order doubleword. COUNT_S is used only for register source; when the source is a memory operand, COUNT_S = 0. COUNT_D — The binary value of the field specifies a 32-bit destination element, counting upward from the low-order doubleword. ZMASK — Set a bit to clear a 32-bit element of the destination. There are legacy and extended forms of the instruction: INSERTPS The source operand is either an XMM register or a 32-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. When the source operand is a register, the instruction copies the 32-bit element of the source specified by Count_S to the location in the destination specified by Count_D, and clears destination elements as specified by ZMask. Elements of the destination that are not cleared are not affected. When the source operand is a memory location, the instruction copies a 32-bit value from memory, to the location in the destination specified by Count_D, and clears destination elements as specified by ZMask. Elements of the destination that are not cleared are not affected. VINSERTPS The extended form of the instruction has a 128-bit encoding only. The first source operand is an XMM register and the second source operand is either an XMM register or a 32-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. When the second source operand is a register, the instruction copies the 32-bit element of the source specified by Count_S to the location in the destination specified by Count_D. The other elements of the destination are either copied from the first source operand or cleared as specified by ZMask. When the second source operand is a memory location, the instruction copies a 32-bit value from the source to the location in the destination specified by Count_D. The other elements of the destination are either copied from the first source operand or cleared as specified by ZMask. Instruction Support Form Subset INSERTPS SSE4.1 VINSERTPS AVX 152 Feature Flag CPUID Fn0000_0001_ECX[SSE41] (bit 19) CPUID Fn0000_0001_ECX[AVX] (bit 28) INSERTPS, VINSERTPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic INSERTPS xmm1, xmm2/mem32, imm8 Opcode Description 66 0F 3A 21 /r ib Insert a selected single-precision floatingpoint value from xmm2 or from mem32 at a selected location in xmm1 and clear selected elements of xmm1. Selections specified by imm8. Mnemonic Encoding VINSERTPS xmm1, xmm2, xmm3/mem128, imm8 VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.00011 X.src.0.01 21 /r ib Related Instructions (V)EXTRACTPS Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference S S X S S A A A A X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. INSERTPS, VINSERTPS 153 AMD64 Technology 26568—Rev. 3.22—May 2018 INSERTQ Insert Field Inserts bits from the lower 64 bits of the source operand into the lower 64 bits of the destination operand. No other bits in the lower 64 bits of the destination are modified. The upper 64 bits of the destination are undefined. The least-significant l bits of the source operand are inserted into the destination, with the least-significant bit of the source operand inserted at bit position n, where l and n are defined as the field length and bit index, respectively. Bits (field length – 1):0 of the source operand are inserted into bits (bit index + field length – 1):(bit index) of the destination. If the sum of the bit index + length field is greater than 64, the results are undefined. For example, if the bit index is 32 (20h) and the field length is 16 (10h), then the result in the destination register will be source operand[15:0] in bits 47:32. Bits 63:48 and bits 31:0 are not modified. A value of zero in the field length is defined as a length of 64. If the length field is 0 and the bit index is 0, bits 63:0 of the source operand are inserted. For any other value of the bit index, the results are undefined. The bits to insert are located in the XMM2 source operand. The bit index and field length can be specified as immediate values or can be specified in the XMM source operand. In the immediate form, the bit index and the field length are specified by the fourth (second immediate byte) and third operands (first immediate byte), respectively. In the register form, the bit index and field length are specified in bits [77:72] and bits [69:64] of the source XMM register, respectively. The bit index and field length are each six bits in length; other bits in the field are ignored. The diagram below illustrates the operation of this instruction. second first imm8 imm8 0 0 7 5 0 7 5 XMM2 XMM1 127 64 63 127 64 63 select number of bits to insert 0 select bit position for insert XMM1 127 64 63 XMM2 0 127 77 72 69 64 63 0 select number of bits to insert select bit position for insert 154 INSERTQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Support Form Subset INSERTQ SSE4A Feature Flag CPUID Fn8000_0001_ECX[SSE4A] (bit 6) Software must check the CPUID bit once per program or library initialization before using the instruction, or inconsistent behavior may result. For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode Description INSERTQ xmm1, xmm2, imm8, imm8 F2 0F 78 /r ib ib Insert field starting at bit 0 of xmm2 with the length specified by [5:0] of the first immediate byte. This field is inserted into xmm1 starting at the bit position specified by [5:0] of the second immediate byte. INSERTQ xmm1, xmm2 F2 0F 79 /r Insert field starting at bit 0 of xmm2 with the length specified by xmm2[69:64]. This field is inserted into xmm1 starting at the bit position specified by xmm2[77:72]. Related Instructions EXTRQ, PINSRW, PEXTRW rFLAGS Affected None Exceptions Exception Invalid opcode, #UD Device not available, #NM Instruction Reference Real Virtual 8086 Protected Cause of Exception X X X SSE4A instructions are not supported, as indicated by CPUID Fn8000_0001_ECX[SSE4A] = 0. X X X The emulate bit (EM) of CR0 was set to 1. X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0. X X X The task-switch bit (TS) of CR0 was set to 1. INSERTQ 155 AMD64 Technology 26568—Rev. 3.22—May 2018 LDDQU VLDDQU Load Unaligned Double Quadword Loads unaligned double quadwords from a memory location to a destination register. Like the (V)MOVUPD instructions, (V)LDDQU loads a 128-bit or 256-bit operand from an unaligned memory location. However, to improve performance when the memory operand is actually misaligned, (V)LDDQU may read an aligned 16 or 32 bytes to get the first part of the operand, and an aligned 16 or 32 bytes to get the second part of the operand. This behavior is implementation-specific, and (V)LDDQU may only read the exact 16 or 32 bytes needed for the memory operand. If the memory operand is in a memory range where reading extra bytes can cause performance or functional issues, use (V)MOVUPD instead of (V)LDDQU. Memory operands that are not aligned on 16-byte or 32-byte boundaries do not cause general-protection exceptions. There are legacy and extended forms of the instruction: LDDQU The source operand is an unaligned 128-bit memory location. The destination operand is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination register are not affected. VLDDQU The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding The source operand is an unaligned 128-bit memory location. The destination operand is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination register are cleared. YMM Encoding The source operand is an unaligned 256-bit memory location. The destination operand is a YMM register. Instruction Support Form Subset Feature Flag LDDQU SSE3 CPUID Fn0000_0001_ECX[SSE3] (bit 0) VLDDQU AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic LDDQU xmm1, mem128 Opcode Description F2 0F F0 /r Loads a 128-bit value from an unaligned mem128 to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VLDDQU xmm1, mem128 C4 RXB.00001 X.1111.0.11 F0 /r VLDDQU ymm1, mem256 C4 RXB.00001 X.1111.1.11 F0 /r 156 LDDQU, VLDDQU Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Related Instructions (V)MOVDQU Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Alignment check, #AC S Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Instruction Reference X S S A A A A X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Write to a read-only data segment. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. LDDQU, VLDDQU 157 AMD64 Technology 26568—Rev. 3.22—May 2018 LDMXCSR VLDMXCSR Load MXCSR Control/Status Register Loads the MXCSR register with a 32-bit value from memory. For both legacy LDMXCSR and extended VLDMXCSR forms of the instruction, the source operand is a 32-bit memory location and the destination operand is the MXCSR. If an MXCSR load clears a SIMD floating-point exception mask bit and sets the corresponding exception flag bit, a SIMD floating-point exception is not generated immediately. An exception is generated only when the next instruction that operates on an XMM or YMM register operand and causes that particular SIMD floating-point exception to be reported executes. A general protection exception occurs if the instruction attempts to load non-zero values into reserved MXCSR bits. Software can use MXCSR_MASK to determine which bits are reserved. For details, see “128-Bit, 64-Bit, and x87 Programming” in Volume 2. The MXCSR register is described in “Registers” in Volume 1. Instruction Support Form Subset Feature Flag LDMXCSR SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) VLDMXCSR AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode LDMXCSR mem32 0F AE /2 Description Loads MXCSR register with 32-bit value from memory. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.00001 X.1111.0.00 AE /2 VLDMXCSR mem32 Related Instructions (V)STMXCSR MXCSR Flags Affected MM FZ M M M 17 15 14 Note: 158 RC M 13 PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE M M M M M M M M M M M M M 12 11 10 9 8 7 6 5 4 3 2 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. LDMXCSR, VLDMXCSR Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference X A X A S S S S X S S S S S X S S S S S S X A S S A A A A X X X X S X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. CR0.EM = 1. CR4.OSFXSR = 0. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Attempt to load non-zero values into reserved MXCSR bits Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. LDMXCSR, VLDMXCSR 159 AMD64 Technology 26568—Rev. 3.22—May 2018 MASKMOVDQU VMASKMOVDQU Masked Move Double Quadword Unaligned Moves bytes from the first source operand to a memory location specified by the DS:rDI register. Bytes are selected by mask bits in the second source operand. The memory location may be unaligned. The mask consists of the most significant bit of each byte of the second source register. When a mask bit = 1, the corresponding byte of the first source register is written to the destination; when a mask bit = 0, the corresponding byte is not written. Exception and trap behavior for elements not selected for storage to memory is implementation dependent. For instance, a given implementation may signal a data breakpoint or a page fault for bytes that are zero-masked and not actually written. The instruction implicitly uses weakly-ordered, write-combining buffering for the data, as described in “Buffering and Combining Memory Writes” in Volume 2. For data that is shared by multiple processors, this instruction should be used together with a fence instruction in order to ensure data coherency (see “Cache and TLB Management” in Volume 2). There are legacy and extended forms of the instruction: MASKMOVDQU The first source operand is an XMM register and the second source operand is an XMM register. The destination is a 128-bit memory location. VMASKMOVDQU The extended form of the instruction has a 128-bit encoding only. The first source operand is an XMM register and the second source operand is an XMM register. The destination is a 128-bit memory location. Instruction Support Form Subset MASKMOVDQU SSE2 VMASKMOVDQU AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic MASKMOVDQU xmm1, xmm2 Opcode 66 0F F7 /r Description Move bytes selected by a mask value in xmm2 from xmm1 to the memory location specified by DS:rDI. Mnemonic Encoding VMASKMOVDQU xmm1, xmm2 VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.00001 X.1111.0.01 F7 /r Related Instructions (V)MASKMOVPD, (V)MASKMOVPS 160 MASKMOVDQU, VMASKMOVDQU Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S S S A X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference X S S A A A A A X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. MASKMOVDQU, VMASKMOVDQU 161 AMD64 Technology MAXPD VMAXPD 26568—Rev. 3.22—May 2018 Maximum Packed Double-Precision Floating-Point Compares each packed double-precision floating-point value of the first source operand to the corresponding value of the second source operand and writes the numerically greater value into the corresponding location of the destination. If both source operands are equal to zero, the value of the second source operand is returned. If either operand is a NaN (SNaN or QNaN), and invalid-operation exceptions are masked, the second source operand is written to the destination. There are legacy and extended forms of the instruction: MAXPD Compares two pairs of packed double-precision floating-point values. The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VMAXPD The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Compares two pairs of packed double-precision floating-point values. The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Compares four pairs of packed double-precision floating-point values. The first source operand is a YMM register and the second source operand is either a YMM register or a 256-bit memory location. The destination is a YMM register. Instruction Support Form Subset MAXPD SSE2 VMAXPD AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 162 MAXPD, VMAXPD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic MAXPD xmm1, xmm2/mem128 Opcode Description 66 0F 5F /r Compares two pairs of packed double-precision values in xmm1 and xmm2 or mem128 and writes the greater value to the corresponding position in xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VMAXPD xmm1, xmm2, xmm3/mem128 C4 RXB.00001 X.src.0.01 5F /r VMAXPD ymm1, ymm2, ymm3/mem256 C4 RXB.00001 X.src.1.01 5F /r Related Instructions (V)MAXPS, (V)MAXSD, (V)MAXSS, (V)MINPD, (V)MINPS, (V)MINSD, (V)MINSS MXCSR Flags Affected MM FZ 17 15 Note: RC 14 13 PM UM OM ZM DM IM DAZ PE UE OE ZE 12 11 10 9 8 7 6 5 4 3 2 DE IE M M 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Instruction Reference MAXPD, VMAXPD 163 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X S X S S S S A X S X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD floating-point, #XF S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE X — AVX and SSE exception A — AVX exception S — SSE exception 164 S S S S S S X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. MAXPD, VMAXPD Instruction Reference 26568—Rev. 3.22—May 2018 MAXPS VMAXPS AMD64 Technology Maximum Packed Single-Precision Floating-Point Compares each packed single-precision floating-point value of the first source operand to the corresponding value of the second source operand and writes the numerically greater value into the corresponding location of the destination. If both source operands are equal to zero, the value of the second source operand is returned. If either operand is a NaN (SNaN or QNaN), and invalid-operation exceptions are masked, the second source operand is written to the destination. There are legacy and extended forms of the instruction: MAXPS Compares four pairs of packed single-precision floating-point values. The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VMAXPS The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Compares four pairs of packed single-precision floating-point values. The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Compares eight pairs of packed single-precision floating-point values. The first source operand is a YMM register and the second source operand is either a YMM register or a 256-bit memory location. The destination is a YMM register. Instruction Support Form Subset Feature Flag MAXPS SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) VMAXPS AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Reference MAXPS, VMAXPS 165 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Encoding Mnemonic Opcode Description MAXPS xmm1, xmm2/mem128 0F 5F /r Compares four pairs of packed single-precision values in xmm1 and xmm2 or mem128 and writes the greater values to the corresponding positions in xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VMAXPS xmm1, xmm2, xmm3/mem128 C4 RXB.00001 X.src.0.00 5F /r VMAXPS ymm1, ymm2, ymm3/mem256 C4 RXB.00001 X.src.1.00 5F /r Related Instructions (V)MAXPD, (V)MAXSD, (V)MAXSS, (V)MINPD, (V)MINPS, (V)MINSD, (V)MINSS MXCSR Flags Affected MM FZ 17 15 Note: 166 RC 14 13 PM UM OM ZM DM IM DAZ PE UE OE ZE 12 11 10 9 8 7 6 5 4 3 2 DE IE M M 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. MAXPS, VMAXPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X S X S S S S A X S X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD floating-point, #XF S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference S S S S S S X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. MAXPS, VMAXPS 167 AMD64 Technology 26568—Rev. 3.22—May 2018 MAXSD VMAXSD Maximum Scalar Double-Precision Floating-Point Compares the scalar double-precision floating-point value in the low-order 64 bits of the first source operand to a corresponding value in the second source operand and writes the numerically greater value into the low-order 64 bits of the destination. If both source operands are equal to zero, the value of the second source operand is returned. If either operand is a NaN (SNaN or QNaN), and invalid-operation exceptions are masked, the second source operand is written to the destination. There are legacy and extended forms of the instruction: MAXSD The first source operand is an XMM register. The second source operand is either an XMM register or a 64-bit memory location. The first source register is also the destination. When the second source is a 64-bit memory location, the upper 64 bits of the first source register are copied to the destination. Bits [127:64] of the destination are not affected. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VMAXSD The extended form of the instruction has a 128-bit encoding only. The first source operand is an XMM register and the second source operand is either an XMM register or a 64-bit memory location. The destination is an XMM register. When the second source is a 64bit memory location, the upper 64 bits of the first source register are copied to the destination. Bits [127:64] of the destination are copied from bits [127:64] of the first source. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset MAXSD SSE2 VMAXSD AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic MAXSD xmm1, xmm2/mem64 Opcode Description F2 0F 5F /r Compares a pair of scalar double-precision values in the low-order 64 bits of xmm1 and xmm2 or mem64 and writes the greater value to the low-order 64 bits of xmm1. Mnemonic VMAXSD xmm1, xmm2, xmm3/mem64 Encoding VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.00001 X.src.X.11 5F /r Related Instructions (V)MAXPD, (V)MAXPS, (V)MAXSS, (V)MINPD, (V)MINPS, (V)MINSD, (V)MINSS 168 MAXSD, VMAXSD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM OM 11 10 ZM 9 DM 8 IM 7 DAZ 6 PE 5 UE 4 OE 3 ZE 2 DE IE M M 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X X X X S S X S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. MAXSD, VMAXSD 169 AMD64 Technology 26568—Rev. 3.22—May 2018 MAXSS VMAXSS Maximum Scalar Single-Precision Floating-Point Compares the scalar single-precision floating-point value in the low-order 32 bits of the first source operand to a corresponding value in the second source operand and writes the numerically greater value into the low-order 32 bits of the destination. If both source operands are equal to zero, the value of the second source operand is returned. If either operand is a NaN (SNaN or QNaN), and invalid-operation exceptions are masked, the second source operand is written to the destination. There are legacy and extended forms of the instruction: MAXSS The first source operand is an XMM register. The second source operand is either an XMM register or a 32-bit memory location. The first source register is also the destination. Bits [127:32] of the destination are not affected. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VMAXSS The extended form of the instruction has a 128-bit encoding only. The first source operand is an XMM register and the second source operand is either an XMM register or a 32-bit memory location. The destination is an XMM register. Bits [127:32] of the destination are copied from the first source operand. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset Feature Flag MAXSS SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) VMAXSS AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic MAXSS xmm1, xmm2/mem32 Opcode Description F3 0F 5F /r Compares a pair of scalar single-precision values in the low-order 32 bits of xmm1 and xmm2 or mem32 and writes the greater value to the low-order 32 bits of xmm1. Mnemonic VMAXSS xmm1, xmm2, xmm3/mem32 Encoding VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.00001 X.src.X.10 5F /r Related Instructions (V)MAXPD, (V)MAXPS, (V)MAXSD, (V)MINPD, (V)MINPS, (V)MINSD, (V)MINSS 170 MAXSS, VMAXSS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM OM 11 10 ZM 9 DM 8 IM 7 DAZ 6 PE 5 UE 4 OE 3 ZE 2 DE IE M M 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X X X X S S X S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. MAXSS, VMAXSS 171 AMD64 Technology MINPD VMINPD 26568—Rev. 3.22—May 2018 Minimum Packed Double-Precision Floating-Point Compares each packed double-precision floating-point value of the first source operand to the corresponding value of the second source operand and writes the numerically lesser value into the corresponding location of the destination. If both source operands are equal to zero, the value of the second source operand is returned. If either operand is a NaN (SNaN or QNaN), and invalid-operation exceptions are masked, the second source operand is written to the destination. There are legacy and extended forms of the instruction: MINPD Compares two pairs of packed double-precision floating-point values. The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VMINPD The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Compares two pairs of packed double-precision floating-point values. The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Compares four pairs of packed double-precision floating-point values. The first source operand is a YMM register and the second source operand is either a YMM register or a 256-bit memory location. The destination is a YMM register. Instruction Support Form Subset MINPD SSE2 VMINPD AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 172 MINPD, VMINPD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic MINPD xmm1, xmm2/mem128 Opcode Description 66 0F 5D /r Compares two pairs of packed double-precision values in xmm1 and xmm2 or mem128 and writes the lesser value to the corresponding position in xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VMINPD xmm1, xmm2, xmm3/mem128 C4 RXB.00001 X.src.0.01 5D /r VMINPD ymm1, ymm2, ymm3/mem256 C4 RXB.00001 X.src.1.01 5D /r Related Instructions (V)MAXPD, (V)MAXPS, (V)MAXSD, (V)MAXSS, (V)MINPS, (V)MINSD, (V)MINSS MXCSR Flags Affected MM FZ 17 15 Note: RC 14 13 PM UM OM ZM DM IM DAZ PE UE OE ZE 12 11 10 9 8 7 6 5 4 3 2 DE IE M M 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Instruction Reference MINPD, VMINPD 173 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X S X S S S S A X S X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD floating-point, #XF S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE X — AVX and SSE exception A — AVX exception S — SSE exception 174 S S S S S S X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. MINPD, VMINPD Instruction Reference 26568—Rev. 3.22—May 2018 MINPS VMINPS AMD64 Technology Minimum Packed Single-Precision Floating-Point Compares each packed single-precision floating-point value of the first source operand to the corresponding value of the second source operand and writes the numerically lesser value into the corresponding location of the destination. If both source operands are equal to zero, the value of the second source operand is returned. If either operand is a NaN (SNaN or QNaN), and invalid-operation exceptions are masked, the second source operand is written to the destination. There are legacy and extended forms of the instruction: MINPS Compares four pairs of packed single-precision floating-point values. The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VMINPS The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Compares four pairs of packed single-precision floating-point values. The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Compares eight pairs of packed single-precision floating-point values. The first source operand is a YMM register and the second source operand is either a YMM register or a 256-bit memory location. The destination is a YMM register. Instruction Support Form Subset Feature Flag MINPS SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) VMINPS AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Reference MINPS, VMINPS 175 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Encoding Mnemonic Opcode Description MINPS xmm1, xmm2/mem128 0F 5D /r Compares four pairs of packed single-precision values in xmm1 and xmm2 or mem128 and writes the lesser values to the corresponding positions in xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VMINPS xmm1, xmm2, xmm3/mem128 C4 RXB.00001 X.src.0.00 5D /r VMINPS ymm1, ymm2, ymm3/mem256 C4 RXB.00001 X.src.1.00 5D /r Related Instructions (V)MAXPD, (V)MAXPS, (V)MAXSD, (V)MAXSS, (V)MINPD, (V)MINSD, (V)MINSS MXCSR Flags Affected MM FZ 17 15 Note: 176 RC 14 13 PM UM OM ZM DM IM DAZ PE UE OE ZE 12 11 10 9 8 7 6 5 4 3 2 DE IE M M 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. MINPS, VMINPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X S X S S S S A X S X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD floating-point, #XF S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference S S S S S S X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. MINPS, VMINPS 177 AMD64 Technology 26568—Rev. 3.22—May 2018 MINSD VMINSD Minimum Scalar Double-Precision Floating-Point Compares the scalar double-precision floating-point value in the low-order 64 bits of the first source operand to a corresponding value in the second source operand and writes the numerically lesser value into the low-order 64 bits of the destination. If both source operands are equal to zero, the value of the second source operand is returned. If either operand is a NaN (SNaN or QNaN), and invalid-operation exceptions are masked, the second source operand is written to the destination. There are legacy and extended forms of the instruction: MINSD The first source operand is an XMM register. The second source operand is either an XMM register or a 64-bit memory location. The first source register is also the destination. Bits [127:64] of the destination are not affected. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VMINSD The extended form of the instruction has a 128-bit encoding only. The first source operand is an XMM register and the second source operand is either an XMM register or a 64-bit memory location. The destination is an XMM register. Bits [127:64] of the destination are copied from the first source operand. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset MINSD SSE2 VMINSD AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic MINSD xmm1, xmm2/mem64 Opcode F2 0F 5D /r Description Compares a pair of scalar double-precision values in the low-order 64 bits of xmm1 and xmm2 or mem64 and writes the lesser value to the low-order 64 bits of xmm1. Mnemonic VMINSD xmm1, xmm2, xmm3/mem64 Encoding VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.00001 X.src.X.11 5D /r Related Instructions (V)MAXPD, (V)MAXPS, (V)MAXSD, (V)MAXSS, (V)MINPD, (V)MINPS, (V)MINSS 178 MINSD, VMINSD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM OM 11 10 ZM 9 DM 8 IM 7 DAZ 6 PE 5 UE 4 OE 3 ZE 2 DE IE M M 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X X X X S S X S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. MINSD, VMINSD 179 AMD64 Technology 26568—Rev. 3.22—May 2018 MINSS VMINSS Minimum Scalar Single-Precision Floating-Point Compares the scalar single-precision floating-point value in the low-order 32 bits of the first source operand to a corresponding value in the second source operand and writes the numerically lesser value into the low-order 32 bits of the destination. If both source operands are equal to zero, the value of the second source operand is returned. If either operand is a NaN (SNaN or QNaN), and invalid-operation exceptions are masked, the second source operand is written to the destination. There are legacy and extended forms of the instruction: MINSS The first source operand is an XMM register. The second source operand is either an XMM register or a 32-bit memory location. The first source register is also the destination. Bits [127:32] of the destination are not affected. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VMINSS The extended form of the instruction has a 128-bit encoding only. The first source operand is an XMM register and the second source operand is either an XMM register or a 32-bit memory location. The destination is an XMM register. Bits [127:32] of the destination are copied from the first source operand. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset Feature Flag MINSS SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) VMINSS AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic MINSS xmm1, xmm2/mem32 Opcode F3 0F 5D /r Description Compares a pair of scalar single-precision values in the low-order 32 bits of xmm1 and xmm2 or mem32 and writes the lesser value to the low-order 32 bits of xmm1. Mnemonic VMINSS xmm1, xmm2, xmm3/mem32 Encoding VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.00001 X.src.X.10 5D /r Related Instructions (V)MAXPD, (V)MAXPS, (V)MAXSD, (V)MAXSS, (V)MINPD, (V)MINPS, (V)MINSD 180 MINSS, VMINSS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM OM 11 10 ZM 9 DM 8 IM 7 DAZ 6 PE 5 UE 4 OE 3 ZE 2 DE IE M M 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X X X X S S X S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. MINSS, VMINSS 181 AMD64 Technology MOVAPD VMOVAPD 26568—Rev. 3.22—May 2018 Move Aligned Packed Double-Precision Floating-Point Moves packed double-precision floating-point values. Values can be moved from a register or memory location to a register; or from a register to a register or memory location. A memory operand that is not aligned causes a general-protection exception. There are legacy and extended forms of the instruction: MOVAPD Moves two double-precision floating-point values. There are encodings for each type of move. • The source operand is either an XMM register or a 128-bit memory location. The destination operand is an XMM register. • The source operand is an XMM register. The destination operand is either an XMM register or a 128-bit memory location. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VMOVAPD The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Moves two double-precision floating-point values. There are encodings for each type of move: • The source operand is either an XMM register or a 128-bit memory location. The destination operand is an XMM register. • The source operand is an XMM register. The destination operand is either an XMM register or a 128-bit memory location. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Moves four double-precision floating-point values. There are encodings for each type of move: • The source operand is either a YMM register or a 256-bit memory location. The destination operand is a YMM register. • The source operand is a YMM register. The destination operand is either a YMM register or a 256-bit memory location. Instruction Support Form Subset MOVAPD SSE2 VMOVAPD AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 182 MOVAPD, VMOVAPD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic Opcode Description MOVAPD xmm1, xmm2/mem128 66 0F 28 /r Moves two packed double-precision floating-point values from xmm2 or mem128 to xmm1. MOVAPD xmm1/mem128, xmm2 66 0F 29 /r Moves two packed double-precision floating-point values from xmm1 or mem128 to xmm2. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VMOVAPD xmm1, xmm2/mem128 C4 RXB.00001 X.1111.0.01 28 /r VMOVAPD xmm1/mem128, xmm2 C4 RXB.00001 X.1111.0.01 29 /r VMOVAPD ymm1, ymm2/mem256 C4 RXB.00001 X.1111.1.01 28 /r VMOVAPD ymm1/mem256, ymm2 C4 RXB.00001 X.1111.1.01 29 /r Related Instructions (V)MOVHPD, (V)MOVLPD, (V)MOVMSKPD, (V)MOVSD, (V)MOVUPD Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X S X A Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference S X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Memory operand not aligned on a 16-byte boundary. Write to a read-only data segment. VEX256: Memory operand not 32-byte aligned. VEX128: Memory operand not 16-byte aligned. Null data segment used to reference memory. Instruction execution caused a page fault. MOVAPD, VMOVAPD 183 AMD64 Technology MOVAPS VMOVAPS 26568—Rev. 3.22—May 2018 Move Aligned Packed Single-Precision Floating-Point Moves packed single-precision floating-point values. Values can be moved from a register or memory location to a register; or from a register to a register or memory location. A memory operand that is not aligned causes a general-protection exception. There are legacy and extended forms of the instruction: MOVAPS Moves four single-precision floating-point values. There are encodings for each type of move. • The source operand is either an XMM register or a 128-bit memory location. The destination operand is an XMM register. • The source operand is an XMM register. The destination operand is either an XMM register or a 128-bit memory location. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VMOVAPS The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Moves four single-precision floating-point values. There are encodings for each type of move. • The source operand is either an XMM register or a 128-bit memory location. The destination operand is an XMM register. • The source operand is an XMM register. The destination operand is either an XMM register or a 128-bit memory location. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Moves eight single-precision floating-point values. There are encodings for each type of move. • The source operand is either a YMM register or a 256-bit memory location. The destination operand is a YMM register. • The source operand is a YMM register. The destination operand is either a YMM register or a 256-bit memory location. Instruction Support Form Subset Feature Flag MOVAPS SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) VMOVAPS AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 184 MOVAPS, VMOVAPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic Opcode MOVAPS xmm1, xmm2/mem128 0F 28 /r Moves four packed single-precision floating-point values from xmm2 or mem128 to xmm1. Description MOVAPS xmm1/mem128, xmm2 0F 29 /r Moves four packed single-precision floating-point values from xmm1 or mem128 to xmm2. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VMOVAPS xmm1, xmm2/mem128 C4 RXB.00001 X.1111.0.00 28 /r VMOVAPS xmm1/mem128, xmm2 C4 RXB.00001 X.1111.0.00 29 /r VMOVAPS ymm1, ymm2/mem256 C4 RXB.00001 X.1111.1.00 28 /r VMOVAPS ymm1/mem256, ymm2 C4 RXB.00001 X.1111.1.00 29 /r Related Instructions (V)MOVHLPS, (V)MOVHPS, (V)MOVLHPS, (V)MOVLPS, (V)MOVMSKPS, (V)MOVSS, (V)MOVUPS Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X S X A Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference S X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Memory operand not aligned on a 16-byte boundary. Write to a read-only data segment. VEX256: Memory operand not 32-byte aligned. VEX128: Memory operand not 16-byte aligned. Null data segment used to reference memory. Instruction execution caused a page fault. MOVAPS, VMOVAPS 185 AMD64 Technology 26568—Rev. 3.22—May 2018 MOVD VMOVD Move Doubleword or Quadword Moves 32-bit and 64-bit values. A value can be moved from a general-purpose register or memory location to the corresponding low-order bits of an XMM register, with zero-extension to 128 bits; or from the low-order bits of an XMM register to a general-purpose register or memory location. The quadword form of this instruction is distinct from the differently-encoded (V)MOVQ instruction. There are legacy and extended forms of the instruction: MOVD There are two encodings for 32-bit moves, characterized by REX.W = 0. • The source operand is either a 32-bit general-purpose register or a 32-bit memory location. The destination is an XMM register. The 32-bit value is zero-extended to 128 bits. • The source operand is an XMM register. The destination is either a 32-bit general-purpose register or a 32-bit memory location. There are two encodings for 64-bit moves, characterized by REX.W = 1. • The source operand is either a 64-bit general-purpose register or a 64-bit memory location. The destination is an XMM register. The 64-bit value is zero-extended to 128 bits. • The source operand is an XMM register. The destination is either a 64-bit general-purpose register or a 64-bit memory location. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VMOVD The extended form of the instruction has four 128-bit encodings: There are two encodings for 32-bit moves, characterized by VEX.W = 0. • The source operand is either a 32-bit general-purpose register or a 32-bit memory location. The destination is an XMM register. The 32-bit value is zero-extended to 128 bits. • The source operand is an XMM register. The destination is either a 32-bit general-purpose register or a 32-bit memory location. There are two encodings for 64-bit moves, characterized by VEX.W = 1. • The source operand is either a 64-bit general-purpose register or a 64-bit memory location. The destination is an XMM register. The 64-bit value is zero-extended to 128 bits. • The source operand is an XMM register. The destination is either a 64-bit general-purpose register or a 64-bit memory location. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support 186 Form Subset MOVD SSE2 VMOVD AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) MOVD, VMOVD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode Description MOVD xmm, reg32/mem32 66 (W0) 0F 6E /r Move a 32-bit value from reg32/mem32 to xmm. MOVD xmm, reg64/mem64 66 (W1) 0F 6E /r Move a 64-bit value from reg64/mem64 to xmm. MOVD reg32/mem32, xmm 66 (W0) 0F 7E /r Move a 32-bit value from xmm to reg32/mem32 MOVD reg64/mem64, xmm 66 (W1) 0F 7E /r Move a 64-bit value from xmm to reg64/mem64. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VMOVD1 xmm, reg32/mem32 C4 RXB.00001 0.1111.0.01 6E /r VMOVQ xmm, reg64/mem64 C4 RXB.00001 1.1111.0.01 6E /r VMOVD1 reg32/mem32, xmm C4 RXB.00001 0.1111.0.01 7E /r VMOVQ reg64/mem64, xmm C4 RXB.00001 1.1111.0.01 7E /r Note: 1. Also known as MOVQ in some developer tools. Related Instructions (V)MOVDQA, (V)MOVDQU, (V)MOVQ Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference S S X S S A A A A A X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Write to a read-only data segment. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. MOVD, VMOVD 187 AMD64 Technology 26568—Rev. 3.22—May 2018 MOVDDUP VMOVDDUP Move and Duplicate Double-Precision Floating-Point Moves and duplicates double-precision floating-point values. There are legacy and extended forms of the instruction: MOVDDUP Moves and duplicates one quadword value. The source operand is either the low 64 bits of an XMM register or the address of the least-significant byte of 64 bits of data in memory. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VMOVDDUP The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Moves and duplicates one quadword value. The source operand is either the low 64 bits of an XMM register or the address of the least-significant byte of 64 bits of data in memory. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Moves and duplicates two even-indexed quadword values. The source operand is either a YMM register or the address of the least-significant byte of 256 bits of data in memory. The destination is a YMM register.Bits [63:0] of the source are written to bits [127:64] and [63:0] of the destination; bits [191:128] of the source are written to bits [255:192] and [191:128] of the destination. Instruction Support Form Subset Feature Flag MOVDDUP SSE3 CPUID Fn0000_0001_ECX[SSE3] (bit 0) VMOVDDUP AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic MOVDDUP xmm1, xmm2/mem64 Opcode F2 0F 12 /r Description Moves two copies of the low 64 bits of xmm2 or mem64 to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode MOVDDUP xmm1, xmm2/mem64 C4 RXB.00001 X.1111.0.11 12 /r MOVDDUP ymm1, ymm2/mem256 C4 RXB.00001 X.1111.1.11 12 /r 188 MOVDDUP, VMOVDDUP Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Related Instructions (V)MOVSHDUP, (V)MOVSLDUP Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference S S X S S A A A A X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference with alignment checking enabled. MOVDDUP, VMOVDDUP 189 AMD64 Technology 26568—Rev. 3.22—May 2018 MOVDQA VMOVDQA Move Aligned Double Quadword Moves aligned packed integer values. Values can be moved from a register or a memory location to a register, or from a register to a register or a memory location. A memory operand that is not aligned causes a general-protection exception. There are legacy and extended forms of the instruction: MOVDQA Moves two aligned quadwords (128-bit move). There are two encodings. • The source operand is an XMM register. The destination is either an XMM register or a 128-bit memory location. • The source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VMOVDQA The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Moves two aligned quadwords (128-bit move). There are two encodings. • The source operand is an XMM register. The destination is either an XMM register or a 128-bit memory location. • The source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Moves four aligned quadwords (256-bit move). There are two encodings. • The source operand is a YMM register. The destination is either a YMM register or a 256-bit memory location. • The source operand is either a YMM register or a 256-bit memory location. The destination is a YMM register. Instruction Support Form Subset MOVDQA SSE2 VMOVDQA AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 190 MOVDQA, VMOVDQA Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic Opcode Description MOVDQA xmm1, xmm2/mem128 66 0F 6F /r Moves aligned packed integer values from xmm2 ormem128 to xmm1. MOVDQA xmm1/mem128, xmm2 66 0F 7F /r Moves aligned packed integer values from xmm1 or mem128 to xmm2. Mnemonic Encoding W.vvvv.L.pp Opcode VMOVDQA xmm1, xmm2/mem128 VEX RXB.map_select C4 RXB.00001 X.1111.0.01 6F /r VMOVDQA xmm1/mem128, xmm2 C4 RXB.00001 X.1111.0.01 6F /r VMOVDQA ymm1, xmm2/mem256 C4 RXB.00001 X.1111.1.01 7F /r VMOVDQA ymm1/mem256, ymm2 C4 RXB.00001 X.1111.1.01 7F /r Related Instructions (V)MOVD, (V)MOVDQU, (V)MOVQ Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X S X A Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference S X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Memory operand not aligned on a 16-byte boundary. Write to a read-only data segment. VEX256: Memory operand not 32-byte aligned. VEX128: Memory operand not 16-byte aligned. Null data segment used to reference memory. Instruction execution caused a page fault. MOVDQA, VMOVDQA 191 AMD64 Technology MOVDQU VMOVDQU 26568—Rev. 3.22—May 2018 Move Unaligned Double Quadword Moves unaligned packed integer values. Values can be moved from a register or a memory location to a register, or from a register to a register or a memory location. There are legacy and extended forms of the instruction: MOVDQU Moves two unaligned quadwords (128-bit move). There are two encodings. • The source operand is an XMM register. The destination is either an XMM register or a 128-bit memory location. • The source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VMOVDQU The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Moves two unaligned quadwords (128-bit move). There are two encodings: • The source operand is an XMM register. The destination is either an XMM register or a 128-bit memory location. • The source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Moves four unaligned quadwords (256-bit move). There are two encodings: • The source operand is a YMM register. The destination is either a YMM register or a 256-bit memory location. • The source operand is either a YMM register or a 256-bit memory location. The destination is a YMM register. Instruction Support Form Subset MOVDQU SSE2 VMOVDQU AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 192 MOVDQU, VMOVDQU Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic Opcode Description MOVDQU xmm1, xmm2/mem128 F3 0F 6F /r Moves unaligned packed integer values from xmm2 or mem128 to xmm1. MOVDQU xmm1/mem128, xmm2 F3 0F 7F /r Moves unaligned packed integer values from xmm1 or mem128 to xmm2. Mnemonic Encoding W.vvvv.L.pp Opcode VMOVDQU xmm1, xmm2/mem128 VEX RXB.map_select C4 RXB.00001 X.1111.0.10 6F /r VMOVDQU xmm1/mem128, xmm2 C4 RXB.00001 X.1111.0.10 6F /r VMOVDQU ymm1, xmm2/mem256 C4 RXB.00001 X.1111.1.10 7F /r VMOVDQU ymm1/mem256, ymm2 C4 RXB.00001 X.1111.1.10 7F /r Related Instructions (V)MOVD, (V)MOVDQA, (V)MOVQ Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Alignment check, #AC S Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Instruction Reference X S S A A A A X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Write to a read-only data segment. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. MOVDQU, VMOVDQU 193 AMD64 Technology 26568—Rev. 3.22—May 2018 MOVHLPS VMOVHLPS Move High to Low Packed Single-Precision Floating-Point Moves two packed single-precision floating-point values from the high quadword of an XMM register to the low quadword of an XMM register. There are legacy and extended forms of the instruction: MOVHLPS The source operand is bits [127:64] of an XMM register. The destination is bits [63:0] of an XMM register. Bits [127:64] of the destination are not affected. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VMOVHLPS The extended form of the instruction has a 128-bit encoding only. The source operands are bits [127:64] of two XMM registers. The destination is a third XMM register. Bits [127:64] of the first source are moved to bits [127:64] of the destination; bits [127:64] of the second source are moved to bits [63:0] of the destination. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset Feature Flag MOVHLPS SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) VMOVHLPS AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode MOVHLPS xmm1, xmm2 0F 12 /r Description Moves two packed single-precision floating-point values from xmm2[127:64] to xmm1[63:0]. Mnemonic Encoding VEX RXB.map_select VMOVHLPS xmm1, xmm2, xmm3 C4 RXB.00001 W.vvvv.L.pp Opcode X.src.0.00 12 /r Related Instructions (V)MOVAPS, (V)MOVHPS, (V)MOVLHPS, (V)MOVLPS, (V)MOVMSKPS, (V)MOVSS, (V)MOVUPS 194 MOVHLPS, VMOVHLPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X A S S X A S S X Device not available, #NM S X — AVX and SSE exception A — AVX exception S — SSE exception X S Invalid opcode, #UD Instruction Reference X S S A A A A X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. MOVHLPS, VMOVHLPS 195 AMD64 Technology MOVHPD VMOVHPD 26568—Rev. 3.22—May 2018 Move High Packed Double-Precision Floating-Point Moves a packed double-precision floating-point value. Values can be moved from a 64-bit memory location to the high-order quadword of an XMM register, or from the high-order quadword of an XMM register to a 64-bit memory location. There are legacy and extended forms of the instruction: MOVHPD There are two encodings. • The source operand is a 64-bit memory location. The destination is bits [127:64] of an XMM register. • The source operand is bits [127:64] of an XMM register. The destination is a 64-bit memory location. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VMOVHPD The extended form of the instruction has two 128-bit encodings: • There are two source operands. The first source is an XMM register. The second source is a 64-bit memory location. The destination is an XMM register. Bits [63:0] of the source register are written to bits [63:0] of the destination; bits [63:0] of the source memory location are written to bits [127:64] of the destination. • The source operand is bits [127:64] of an XMM register. The destination is a 64-bit memory location. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset MOVHPD SSE2 VMOVHPD AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 196 MOVHPD, VMOVHPD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic Opcode Description MOVHPD xmm1, mem64 66 0F 16 /r Moves a packed double-precision floating-point value from mem64 to xmm1[127:64]. MOVHPD mem64, xmm1 66 0F 17 /r Moves a packed double-precision floating-point value from xmm1[127:64] to mem64. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VMOVHPD xmm1, xmm2, mem64 C4 RXB.00001 X.src.0.01 16 /r VMOVHPD mem64, xmm1 C4 RXB.00001 X.1111.0.01 17 /r Related Instructions (V)MOVAPD, (V)MOVLPD, (V)MOVMSKPD, (V)MOVSD, (V)MOVUPD Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference S S X S S A A A A A X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b (for memory destination encoding only). VEX.L = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Write to a read-only data segment. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. MOVHPD, VMOVHPD 197 AMD64 Technology MOVHPS VMOVHPS 26568—Rev. 3.22—May 2018 Move High Packed Single-Precision Floating-Point Moves two packed single-precision floating-point value. Values can be moved from a 64-bit memory location to the high-order quadword of an XMM register, or from the high-order quadword of an XMM register to a 64-bit memory location. There are legacy and extended forms of the instruction: MOVHPS There are two encodings. • The source operand is a 64-bit memory location. The destination is bits [127:64] of an XMM register. • The source operand is bits [127:64] of an XMM register. The destination is a 64-bit memory location. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VMOVHPS The extended form of the instruction has two 128-bit encodings: • There are two source operands. The first source is an XMM register. The second source is a 64-bit memory location. The destination is an XMM register. Bits [63:0] of the source register are written to bits [63:0] of the destination; bits [63:0] of the source memory location are written to bits [127:64] of the destination. • The source operand is bits [127:64] of an XMM register. The destination is a 64-bit memory location. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset Feature Flag MOVHPS SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) VMOVHPS AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 198 MOVHPS, VMOVHPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic Opcode MOVHPS xmm1, mem64 0F 16 /r Moves two packed double-precision floating-point value from mem64 to xmm1[127:64]. Description MOVHPS mem64, xmm1 0F 17 /r Moves two packed double-precision floating-point value from xmm1[127:64] to mem64. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VMOVHPS xmm1, xmm2, mem64 C4 RXB.00001 X.src.0.00 16 /r VMOVHPS mem64, xmm1 C4 RXB.00001 X.1111.0.00 17 /r Related Instructions (V)MOVAPS, (V)MOVHLPS, (V)MOVLHPS, (V)MOVLPS, (V)MOVMSKPS, (V)MOVSS, (V)MOVUPS Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference S S X S S A A A A A X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b (for memory destination encoding only). VEX.L = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Write to a read-only data segment. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. MOVHPS, VMOVHPS 199 AMD64 Technology 26568—Rev. 3.22—May 2018 MOVLHPS VMOVLHPS Move Low to High Packed Single-Precision Floating-Point Moves two packed single-precision floating-point values from the low quadword of an XMM register to the high quadword of a second XMM register. There are legacy and extended forms of the instruction: MOVLHPS The source operand is bits [63:0] of an XMM register. The destination is bits [127:64] of an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VMOVLHPS The extended form of the instruction has a 128-bit encoding only. The source operands are bits [63:0] of two XMM registers. The destination is a third XMM register. Bits [63:0] of the first source are moved to bits [63:0] of the destination; bits [63:0] of the second source are moved to bits [127:64] of the destination. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset Feature Flag MOVLHPS SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) VMOVLHPS AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode MOVLHPS xmm1, xmm2 0F 16 /r Description Moves two packed single-precision floating-point values from xmm2[63:0] to xmm1[127:64]. Mnemonic Encoding VEX RXB.map_select VMOVLHPS xmm1, xmm2, xmm3 C4 RXB.00001 W.vvvv.L.pp Opcode X.src.0.00 16 /r Related Instructions (V)MOVAPS, (V)MOVHLPS, (V)MOVHPS, (V)MOVLPS, (V)MOVMSKPS, (V)MOVSS, (V)MOVUPS 200 MOVLHPS, VMOVLHPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X A S S X A S S X Device not available, #NM S X — AVX and SSE exception A — AVX exception S — SSE exception X S Invalid opcode, #UD Instruction Reference X S S A A A A X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. MOVLHPS, VMOVLHPS 201 AMD64 Technology 26568—Rev. 3.22—May 2018 MOVLPD VMOVLPD Move Low Packed Double-Precision Floating-Point Moves a packed double-precision floating-point value. Values can be moved from a 64-bit memory location to the low-order quadword of an XMM register, or from the low-order quadword of an XMM register to a 64-bit memory location. There are legacy and extended forms of the instruction: MOVLPD There are two encodings. • The source operand is a 64-bit memory location. The destination is bits [63:0] of an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. • The source operand is bits [63:0] of an XMM register. The destination is a 64-bit memory location. VMOVLPD The extended form of the instruction has two 128-bit encodings. • There are two source operands. The first source is an XMM register. The second source is a 64-bit memory location. The destination is an XMM register. Bits [127:64] of the source register are written to bits [127:64] of the destination; bits [63:0] of the source memory location are written to bits [63:0] of the destination. Bits [255:128] of the YMM register that corresponds to the destination are cleared. • The source operand is bits [63:0] of an XMM register. The destination is a 64-bit memory location. Instruction Support Form Subset MOVLPD SSE2 VMOVLPD AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode Description MOVLPD xmm1, mem64 66 0F 12 /r Moves a packed double-precision floating-point value from mem64 to xmm1[63:0]. MOVLPD mem64, xmm1 66 0F 13 /r Moves a packed double-precision floating-point value from xmm1[63:0] to mem64. Mnemonic Encoding VEX RXB.map_select VMOVLPD xmm1, xmm2, mem64 C4 VMOVLPD mem64, xmm1 C4 202 W.vvvv.L.pp Opcode RXB.00001 X.src.0.01 12 /r RXB.00001 X.1111.0.01 13 /r MOVLPD, VMOVLPD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Related Instructions (V)MOVAPD, (V)MOVHPD, (V)MOVMSKPD, (V)MOVSD, (V)MOVUPD Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference S S X S S A A A A A X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b (for memory destination encoding only). VEX.L = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Write to a read-only data segment. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. MOVLPD, VMOVLPD 203 AMD64 Technology 26568—Rev. 3.22—May 2018 MOVLPS VMOVLPS Move Low Packed Single-Precision Floating-Point Moves two packed single-precision floating-point values. Values can be moved from a 64-bit memory location to the low-order quadword of an XMM register, or from the low-order quadword of an XMM register to a 64-bit memory location. There are legacy and extended forms of the instruction: MOVLPS There are two encodings. • The source operand is a 64-bit memory location. The destination is bits [63:0] of an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. • The source operand is bits [63:0] of an XMM register. The destination is a 64-bit memory location. VMOVLPS The extended form of the instruction has two 128-bit encodings. • There are two source operands. The first source is an XMM register. The second source is a 64-bit memory location. The destination is an XMM register. Bits [127:64] of the source register are written to bits [127:64] of the destination; bits [63:0] of the source memory location are written to bits [63:0] of the destination. Bits [255:128] of the YMM register that corresponds to the destination are cleared. • The source operand is bits [63:0] of an XMM register. The destination is a 64-bit memory location. Instruction Support Form Subset Feature Flag MOVLPS SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) VMOVLPS AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode Description MOVLPS xmm1, mem64 0F 12 /r Moves two packed single-precision floating-point value from mem64 to xmm1[63:0]. MOVLPS mem64, xmm1 0F 13 /r Moves two packed single-precision floating-point value from xmm1[63:0] to mem64. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VMOVLPS xmm1, xmm2, mem64 C4 RXB.00001 X.src.0.00 12 /r VMOVLPS mem64, xmm1 C4 RXB.00001 X.1111.0.00 13 /r 204 MOVLPS, VMOVLPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Related Instructions (V)MOVAPS, (V)MOVHLPS, (V)MOVHPS, (V)MOVLHPS, (V)MOVMSKPS, (V)MOVSS, (V)MOVUPS Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference S S X S S A A A A A X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b (for memory destination encoding only). VEX.L = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Write to a read-only data segment. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. MOVLPS, VMOVLPS 205 AMD64 Technology MOVMSKPD VMOVMSKPD 26568—Rev. 3.22—May 2018 Extract Sign Mask Packed Double-Precision Floating-Point Extracts the sign bits of packed double-precision floating-point values from an XMM register, zeroextends the value, and writes it to the low-order bits of a general-purpose register. There are legacy and extended forms of the instruction: MOVMSKPD Extracts two mask bits. The source operand is an XMM register. The destination can be either a 64-bit or a 32-bit general purpose register. Writes the extracted bits to positions [1:0] of the destination and clears the remaining bits. Bits [255:128] of the YMM register that corresponds to the source are not affected. MOVMSKPD The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Extracts two mask bits. The source operand is an XMM register. The destination can be either a 64-bit or a 32-bit general purpose register. Writes the extracted bits to positions [1:0] of the destination and clears the remaining bits. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Extracts four mask bits. The source operand is a YMM register. The destination can be either a 64-bit or a 32-bit general purpose register. Writes the extracted bits to positions [3:0] of the destination and clears the remaining bits. Instruction Support Form Subset MOVMSKPD SSE2 VMOVMSKPD AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic MOVMSKPD reg, xmm Opcode 66 0F 50 /r Description Move zero-extended sign bits of packed double-precision values from xmm to a general-purpose register. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VMOVMSKPD reg, xmm C4 RXB.00001 X.1111.0.01 50 /r VMOVMSKPD reg, ymm C4 RXB.00001 X.1111.1.01 50 /r 206 MOVMSKPD, VMOVMSKPD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Related Instructions (V)MOVMSKPS, (V)PMOVMSKB Exceptions Exception Mode Real Virt Prot X A S S X A S S X Device not available, #NM S X — AVX and SSE exception A — AVX exception S — SSE exception X S Invalid opcode, #UD Instruction Reference X S S A A A A X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. MOVMSKPD, VMOVMSKPD 207 AMD64 Technology 26568—Rev. 3.22—May 2018 MOVMSKPS VMOVMSKPS Extract Sign Mask Packed Single-Precision Floating-Point Extracts the sign bits of packed single-precision floating-point values from an XMM register, zeroextends the value, and writes it to the low-order bits of a general-purpose register. There are legacy and extended forms of the instruction: MOVMSKPS Extracts four mask bits. The source operand is an XMM register. The destination can be either a 64-bit or a 32-bit general purpose register. Writes the extracted bits to positions [3:0] of the destination and clears the remaining bits. MOVMSKPS The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Extracts four mask bits. The source operand is an XMM register. The destination can be either a 64-bit or a 32-bit general purpose register. Writes the extracted bits to positions [3:0] of the destination and clears the remaining bits. YMM Encoding Extracts eight mask bits. The source operand is a YMM register. The destination can be either a 64-bit or a 32-bit general purpose register. Writes the extracted bits to positions [7:0] of the destination and clears the remaining bits. Instruction Support Form Subset Feature Flag MOVMSKPS SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) VMOVMSKPS AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode MOVMSKPS reg, xmm 0F 50 /r Description Move zero-extended sign bits of packed single-precision values from xmm to a general-purpose register. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VMOVMSKPS reg, xmm C4 RXB.00001 X.1111.0.00 50 /r VMOVMSKPS reg, ymm C4 RXB.00001 X.1111.1.00 50 /r 208 MOVMSKPS, VMOVMSKPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Related Instructions (V)MOVMSKPD, (V)PMOVMSKB Exceptions Exception Mode Real Virt Prot X A S S X A S S X Device not available, #NM S X — AVX and SSE exception A — AVX exception S — SSE exception X S Invalid opcode, #UD Instruction Reference X S S A A A A X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. MOVMSKPS, VMOVMSKPS 209 AMD64 Technology 26568—Rev. 3.22—May 2018 MOVNTDQ VMOVNTDQ Move Non-Temporal Double Quadword Moves double quadword values from a register to a memory location. Indicates to the processor that the data is non-temporal, and is unlikely to be used again soon. The processor treats the store as a write-combining (WC) memory write, which minimizes cache pollution. The method of minimization depends on the hardware implementation of the instruction. For further information, see “Memory Optimization” in Volume 1. The instruction is weakly-ordered with respect to other instructions that operate on memory. Software should use an SFENCE or MFENCE instruction to force strong memory ordering of MOVNTDQ with respect to other stores. An attempted store to a non-aligned memory location results in a #GP exception. There are legacy and extended forms of the instruction: MOVNTDQ Moves one 128-bit value. The source operand is an XMM register. The destination is a 128-bit memory location. VMOVNTDQ The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Moves one 128-bit value. The source operand is an XMM register. The destination is a 128-bit memory location. YMM Encoding Moves two 128-bit values. The source operand is a YMM register. The destination is a 256-bit memory location. Instruction Support Form Subset MOVNTDQ SSE2 VMOVNTDQ AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic MOVNTDQ mem128, xmm Opcode 66 0F E7 /r Description Moves a 128-bit value from xmm to mem128, minimizing cache pollution. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VMOVNTDQ mem128, xmm C4 RXB.00001 X.1111.0.01 E7 /r VMOVNTDQ mem256, ymm C4 RXB.00001 X.1111.1.01 E7 /r 210 MOVNTDQ, VMOVNTDQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Related Instructions (V)MOVNTDQA, (V)MOVNTPD, (V)MOVNTPS Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X S X A Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference S X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Memory operand not aligned on a 16-byte boundary. Write to a read-only data segment. VEX256: Memory operand not 32-byte aligned. VEX128: Memory operand not 16-byte aligned. Null data segment used to reference memory. Instruction execution caused a page fault. MOVNTDQ, VMOVNTDQ 211 AMD64 Technology 26568—Rev. 3.22—May 2018 MOVNTDQA VMOVNTDQA Move Non-Temporal Double Quadword Aligned Loads an XMM/YMM register from a naturally-aligned 128-bit or 256-bit memory location. Indicates to the processor that the data is non-temporal, and is unlikely to be used again soon. The processor treats the load as a write-combining (WC) memory read, which minimizes cache pollution. The method of minimization depends on the hardware implementation of the instruction. For further information, see “Memory Optimization” in Volume 1. The instruction is weakly-ordered with respect to other instructions that operate on memory. Software should use an MFENCE instruction to force strong memory ordering of MOVNTDQA with respect to other reads. An attempted load from a non-aligned memory location results in a #GP exception. There are legacy and extended forms of the instruction: MOVNTDQA Loads a 128-bit value into the specified XMM register from a 16-byte aligned memory location. VMOVNTDQA The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Loads a 128-bit value into the specified XMM register from a 16-byte aligned memory location. YMM Encoding Loads a 256-bit value into the specified YMM register from a 32-byte aligned memory location. Instruction Support Form MOVNTDQA Subset Feature Flag SSE4.1 CPUID Fn0000_0001_ECX[SSE41] (bit 19) VMOVNTDQA 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VMOVNTDQA 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic MOVNTDQA xmm, mem128 Opcode Description 66 0F 38 2A /r Loads xmm from an aligned memory location, minimizing cache pollution. Encoding Mnemonic VEX RXB.map_select W.vvvv.L.pp Opcode VMOVNTDQA xmm, mem128 C4 RXB.02 X.1111.0.01 2A /r VMOVNTDQA ymm, mem256 C4 RXB.02 X.1111.1.01 2A /r Related Instructions (V)MOVNTDQ, (V)MOVNTPD, (V)MOVNTPS 212 MOVNTDQA, VMOVNTDQA Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS X S S A A A A A X X X X S X General protection, #GP A Page fault, #PF S X — AVX, AVX2, and SSE exception A — AVX, AVX2 exception S — SSE exception Instruction Reference X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Memory operand not aligned on a 16-byte boundary. Write to a read-only data segment. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Null data segment used to reference memory. Instruction execution caused a page fault. MOVNTDQA, VMOVNTDQA 213 AMD64 Technology MOVNTPD VMOVNTPD 26568—Rev. 3.22—May 2018 Move Non-Temporal Packed Double-Precision Floating-Point Moves packed double-precision floating-point values from a register to a memory location. Indicates to the processor that the data is non-temporal, and is unlikely to be used again soon. The processor treats the store as a write-combining (WC) memory write, which minimizes cache pollution. The method of minimization depends on the hardware implementation of the instruction. For further information, see “Memory Optimization” in Volume 1. The instruction is weakly-ordered with respect to other instructions that operate on memory. Software should use an SFENCE or MFENCE instruction to force strong memory ordering of MOVNTDQ with respect to other stores. An attempted store to a non-aligned memory location results in a #GP exception. There are legacy and extended forms of the instruction: MOVNTPD Moves two values. The source operand is an XMM register. The destination is a 128-bit memory location. MOVNTPD The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Moves two values. The source operand is an XMM register. The destination is a 128-bit memory location. YMM Encoding Moves four values. The source operand is a YMM register. The destination is a 256-bit memory location. Instruction Support Form Subset MOVNTPD SSE2 VMOVNTPD AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic MOVNTPD mem128, xmm Opcode 66 0F 2B /r Description Moves two packed double-precision floating-point values from xmm to mem128, minimizing cache pollution. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VMOVNTPD mem128, xmm C4 RXB.00001 X.1111.0.01 2B /r VMOVNTPD mem256, ymm C4 RXB.00001 X.1111.1.01 2B /r 214 MOVNTPD, VMOVNTPD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Related Instructions MOVNTDQ, MOVNTI, MOVNTPS, MOVNTQ Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X S X A Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference S X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Memory operand not aligned on a 16-byte boundary. Write to a read-only data segment. VEX256: Memory operand not 32-byte aligned. VEX128: Memory operand not 16-byte aligned. Null data segment used to reference memory. Instruction execution caused a page fault. MOVNTPD, VMOVNTPD 215 AMD64 Technology MOVNTPS VMOVNTPS 26568—Rev. 3.22—May 2018 Move Non-Temporal Packed Single-Precision Floating-Point Moves packed single-precision floating-point values from a register to a memory location. Indicates to the processor that the data is non-temporal, and is unlikely to be used again soon. The processor treats the store as a write-combining (WC) memory write, which minimizes cache pollution. The method of minimization depends on the hardware implementation of the instruction. For further information, see “Memory Optimization” in Volume 1. The instruction is weakly-ordered with respect to other instructions that operate on memory. Software should use an SFENCE or MFENCE instruction to force strong memory ordering of MOVNTDQ with respect to other stores. An attempted store to a non-aligned memory location results in a #GP exception. There are legacy and extended forms of the instruction: MOVNTPS Moves four values. The source operand is an XMM register. The destination is a 128-bit memory location. MOVNTPS The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Moves four values. The source operand is an XMM register. The destination is a 128-bit memory location. YMM Encoding Moves eight values. The source operand is a YMM register. The destination is a 256-bit memory location. Instruction Support Form Subset Feature Flag MOVNTPS SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) VMOVNTPS AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode MOVNTPS mem128, xmm 0F 2B /r Description Moves four packed double-precision floating-point values from xmm to mem128, minimizing cache pollution. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VMOVNTPS mem128, xmm C4 RXB.00001 X.1111.0.00 2B /r VMOVNTPS mem256, ymm C4 RXB.00001 X.1111.1.00 2B /r 216 MOVNTPS, VMOVNTPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Related Instructions (V)MOVNTDQ, (V)MOVNTDQA, (V)MOVNTPD, (V)MOVNTQ Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X S X A Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference S X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Memory operand not aligned on a 16-byte boundary. Write to a read-only data segment. VEX256: Memory operand not 32-byte aligned. VEX128: Memory operand not 16-byte aligned. Null data segment used to reference memory. Instruction execution caused a page fault. MOVNTPS, VMOVNTPS 217 AMD64 Technology 26568—Rev. 3.22—May 2018 MOVNTSD Move Non-Temporal Scalar Double-Precision Floating-Point Stores one double-precision floating-point value from an XMM register to a 64-bit memory location. This instruction indicates to the processor that the data is non-temporal, and is unlikely to be used again soon. The processor treats the store as a write-combining memory write, which minimizes cache pollution. The diagram below illustrates the operation of this instruction: mem64 XMM register 63 0 127 64 63 0 copy Instruction Support Form Subset MOVNTSD SSE4A Feature Flag CPUID Fn8000_0001_ECX[SSE4A] (bit 6) Software must check the CPUID bit once per program or library initialization before using the instruction, or inconsistent behavior may result. For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic MOVNTSD Opcode mem64, xmm F2 0F 2B /r Description Stores one double-precision floating-point XMM register value into a 64 bit memory location. Treat as a non-temporal store. Related Instructions MOVNTDQ, MOVNTI, MOVNTPD, MOVNTPS, MOVNTQ, MOVNTSS rFLAGS Affected None 218 MOVNTSD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Real Virtual 8086 Protected Cause of Exception X X X The SSE4A instructions are not supported, as indicated by CPUID Fn8000_0001_ECX[SSE4A] = 0. X X X The emulate bit (CR0.EM) was set to 1. X X X The operating-system FXSAVE/FXRSTOR support bit (CR4.OSFXSR) was cleared to 0. Device not available, #NM X X X The task-switch bit (CR0.TS) was set to 1. Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical. X X X A memory address exceeded a data segment limit or was non-canonical. X A null data segment was used to reference memory. X The destination operand was in a non-writable segment. Invalid opcode, #UD General protection, #GP Page fault, #PF X X A page fault resulted from executing the instruction. Alignment check, #AC X X An unaligned memory reference was performed while alignment checking was enabled. Instruction Reference MOVNTSD 219 AMD64 Technology 26568—Rev. 3.22—May 2018 MOVNTSS Move Non-Temporal Scalar Single-Precision Floating-Point Stores one single-precision floating-point value from an XMM register to a 32-bit memory location. This instruction indicates to the processor that the data is non-temporal, and is unlikely to be used again soon. The processor treats the store as a write-combining memory write, which minimizes cache pollution. The diagram below illustrates the operation of this instruction: mem32 XMM register 31 0 127 31 0 copy Instruction Support Form Subset MOVNTSS SSE4A Feature Flag CPUID Fn8000_0001_ECX[SSE4A] (bit 6) Software must check the CPUID bit once per program or library initialization before using the instruction, or inconsistent behavior may result. For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic MOVNTSS Opcode mem32, xmm F3 0F 2B /r Description Stores one single-precision floating-point XMM register value into a 32-bit memory location. Treat as a non-temporal store. Related Instructions MOVNTDQ, MOVNTI, MOVNTOPD, MOVNTPS, MOVNTQ, MOVNTSD rFLAGS Affected None 220 MOVNTSS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Real Virtual 8086 Protected Cause of Exception X X X The SSE4A instructions are not supported, as indicated by CPUID Fn8000_0001_ECX[SSE4A] = 0. X X X The emulate bit (CR0.EM) was set to 1. X X X The operating-system FXSAVE/FXRSTOR support bit (CR4.OSFXSR) was cleared to 0. Device not available, #NM X X X The task-switch bit (CR0.TS) was set to 1. Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical. X X X A memory address exceeded a data segment limit or was non-canonical. X A null data segment was used to reference memory. X The destination operand was in a non-writable segment. Invalid opcode, #UD General protection, #GP Page fault, #PF X X A page fault resulted from executing the instruction. Alignment check, #AC X X An unaligned memory reference was performed while alignment checking was enabled. Instruction Reference MOVNTSS 221 AMD64 Technology 26568—Rev. 3.22—May 2018 MOVQ VMOVQ Move Quadword Moves 64-bit values. The source is either the low-order quadword of an XMM register or a 64-bit memory location. The destination is either the low-order quadword of an XMM register or a 64-bit memory location. When the destination is a register, the 64-bit value is zero-extended to 128 bits. There are legacy and extended forms of the instruction: MOVQ There are two encodings: • The source operand is either an XMM register or a 64-bit memory location. The destination is an XMM register. The 64-bit value is zero-extended to 128 bits. • The source operand is an XMM register. The destination is either an XMM register or a 64-bit memory location. When the destination is a register, the 64-bit value is zero-extended to 128 bits. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VMOVQ The extended form of the instruction has three 128-bit encodings: • The source operand is an XMM register. The destination is an XMM register. The 64-bit value is zero-extended to 128 bits. • The source operand is a 64-bit memory location. The destination is an XMM register. The 64-bit value is zero-extended to 128 bits. • The source operand is an XMM register. The destination is either an XMM register or a 64-bit memory location. When the destination is a register, the 64-bit value is zero-extended to 128 bits. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset MOVQ SSE2 VMOVQ AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 222 MOVQ, VMOVQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Opcode Description MOVQ xmm1, xmm2/mem64 Mnemonic F3 0F 7E /r Move a zero-extended 64-bit value from xmm2 or mem64 to xmm1. MOVQ xmm1/mem64, xmm2 66 0F D6 /r Move a 64-bit value from xmm2 to xmm1 or mem64. Zero-extends for register destination. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VMOVQ xmm1, xmm2 C4 RXB.00001 X.1111.0.10 7E /r VMOVQ xmm1, mem64 C4 RXB.00001 X.1111.0.10 7E /r VMOVQ xmm1/mem64, xmm2 C4 RXB.00001 X.1111.0.01 D6 /r Related Instructions (V)MOVD, (V)MOVDQA, (V)MOVDQU Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference S S X S S A A A A A X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Write to a read-only data segment. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. MOVQ, VMOVQ 223 AMD64 Technology MOVSD VMOVSD 26568—Rev. 3.22—May 2018 Move Scalar Double-Precision Floating-Point Moves scalar double-precision floating point values. The source is either a low-order quadword of an XMM register or a 64-bit memory location. The destination is either a low-order quadword of an XMM register or a 64-bit memory location. There are legacy and extended forms of the instruction: MOVSD There are two encodings. • The source operand is either an XMM register or a 64-bit memory location. The destination is an XMM register. If the source operand is a register, bits [127:64] of the destination are not affected. If the source operand is a 64-bit memory location, the upper 64 bits of the destination are cleared. • The source operand is an XMM register. The destination is either an XMM register or a 64-bit memory location. When the destination is a register, bits [127:64] of the destination are not affected. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VMOVSD The extended form of the instruction has four 128-bit encodings. Two of the encodings are functionally equivalent. • The source operand is a 64-bit memory location. The destination is an XMM register. The 64-bit value is zero-extended to 128 bits. • The source operand is an XMM register. The destination is a 64-bit memory location. • Two functionally-equivalent encodings: There are two source XMM registers. The destination is an XMM register. Bits [127:64] of the first source register are copied to bits [127:64] of the destination; the 64-bit value in bits [63:0] of the second source register is written to bits [63:0] of the destination. Bits [255:128] of the YMM register that corresponds to the destination are cleared. This instruction must not be confused with the MOVSD (move string doubleword) instruction of the general-purpose instruction set. Assemblers can distinguish the instructions by the number and type of operands. Instruction Support Form Subset MOVSD SSE2 VMOVSD AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 224 MOVSD, VMOVSD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic Opcode Description MOVSD xmm1, xmm2/mem64 F2 0F 10 /r Moves a 64-bit value from xmm2 or mem64 to xmm1. Zero extends to 128 bits when source operand is memory. MOVSD xmm1/mem64, xmm2 F2 0F 11 /r Moves a 64-bit value from xmm2 to xmm1 or mem64. Encoding 1 Mnemonic VEX RXB.map_select W.vvvv.L.pp Opcode VMOVSD xmm1, mem64 C4 RXB.00001 X.1111.X.11 10 /r VMOVSD mem64, xmm1 C4 RXB.00001 X.1111.X.11 11 /r VMOVSD xmm1, xmm2, xmm3 2 C4 RXB.00001 X.src.X.11 10 /r VMOVSD xmm1, xmm2, xmm3 2 C4 RXB.00001 X.src.X.11 11 /r Note 1: The addressing mode differentiates between the two operand form (where one operand is a memory location) and the three operand form (where all operands are held in registers). Note 2: These two encodings are functionally equivalent. Related Instructions (V)MOVAPD, (V)MOVHPD, (V)MOVLPD, (V)MOVMSKPD, (V)MOVUPD Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference S S X S S A A A A X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b (for memory destination enoding only). REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Write to a read-only data segment. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. MOVSD, VMOVSD 225 AMD64 Technology 26568—Rev. 3.22—May 2018 MOVSHDUP VMOVSHDUP Move High and Duplicate Single-Precision Moves and duplicates odd-indexed single-precision floating-point values. There are legacy and extended forms of the instruction: MOVSHDUP Moves and duplicates two odd-indexed single-precision floating-point values. The source operand is an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [127:96] of the source are duplicated and written to bits [127:96] and [95:64] of the destination. Bits [63:32] of the source are duplicated and written to bits [63:32] and [31:0] of the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VMOVSHDUP The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Moves and duplicates two odd-indexed single-precision floating-point values. The source operand is an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [127:96] of the source are duplicated and written to bits [127:96] and [95:64] of the destination. Bits [63:32] of the source are duplicated and written to bits [63:32] and [31:0] of the destination. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Moves and duplicates four odd-indexed single-precision floating-point values. The source operand is a YMM register or a 256-bit memory location. The destination is a YMM register. Bits [255:224] of the source are duplicated and written to bits [255:224] and [223:192] of the destination. Bits [191:160] of the source are duplicated and written to bits [191:160] and [159:128] of the destination. Bits [127:96] of the source are duplicated and written to bits [127:96] and [95:64] of the destination. Bits [63:32] of the source are duplicated and written to bits [63:32] and [31:0] of the destination. Instruction Support Form Subset Feature Flag MOVSHDUP SSE3 CPUID Fn0000_0001_ECX[SSE3] (bit 0) VMOVSHDUP AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 226 MOVSHDUP, VMOVSHDUP Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic MOVSHDUP xmm1, xmm2/mem128 Opcode Description F3 0F 16 /r Moves and duplicates two odd-indexed singleprecision floating-point values in xmm2 or mem128. Writes to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VMOVSHDUP xmm1, xmm2/mem128 C4 RXB.00001 X.1111.0.10 16 /r VMOVSHDUP ymm1, ymm2/mem256 C4 RXB.00001 X.1111.1.10 16 /r Related Instructions (V)MOVDDUP, (V)MOVSLDUP Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S S S A X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference X S S A A A A X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. MOVSHDUP, VMOVSHDUP 227 AMD64 Technology 26568—Rev. 3.22—May 2018 MOVSLDUP VMOVSLDUP Move Low and Duplicate Single-Precision Moves and duplicates even-indexed single-precision floating-point values. There are legacy and extended forms of the instruction: MOVSLDUP Moves and duplicates two even-indexed single-precision floating-point values. The source operand is an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [95:64] of the source are duplicated and written to bits [127:96] and [95:64] of the destination. Bits [31:0] of the source are duplicated and written to bits [63:32] and [31:0] of the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VMOVSLDUP The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Moves and duplicates two even-indexed single-precision floating-point values. The source operand is an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [95:64] of the source are duplicated and written to bits [127:96] and [95:64] of the destination. Bits [31:0] of the source are duplicated and written to bits [63:32] and [31:0] of the destination. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Moves and duplicates four even-indexed single-precision floating-point values. The source operand is a YMM register or a 256-bit memory location. The destination is a YMM register. Bits [223:192] of the source are duplicated and written to bits [255:224] and [223:192] of the destination. Bits [159:128] of the source are duplicated and written to bits [191:160] and [159:128] of the destination. Bits [95:64] of the source are duplicated and written to bits [127:96] and [95:64] of the destination. Bits [31:0] of the source are duplicated and written to bits [63:32] and [31:0] of the destination. Instruction Support Form Subset Feature Flag MOVSLDUP SSE3 CPUID Fn0000_0001_ECX[SSE3] (bit 0) VMOVSLDUP AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 228 MOVSLDUP, VMOVSLDUP Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic MOVSLDUP xmm1, xmm2/mem128 Opcode Description F3 0F 12 /r Moves and duplicates two even-indexed singleprecision floating-point values in xmm2 or mem128. Writes to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VMOVSLDUP xmm1, xmm2/mem128 C4 RXB.00001 X.1111.0.10 12 /r VMOVSLDUP ymm1, ymm2/mem256 C4 RXB.00001 X.1111.1.10 12 /r Related Instructions (V)MOVDDUP, (V)MOVSHDUP Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S S S A X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference X S S A A A A X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. MOVSLDUP, VMOVSLDUP 229 AMD64 Technology MOVSS VMOVSS 26568—Rev. 3.22—May 2018 Move Scalar Single-Precision Floating-Point Moves scalar single-precision floating point values. The source is either a low-order doubleword of an XMM register or a 32-bit memory location. The destination is either a low-order doubleword of an XMM register or a 32-bit memory location. There are legacy and extended forms of the instruction: MOVSS There are three encodings. • The source operand is an XMM register. The destination is an XMM register. Bits [127:32] of the destination are not affected. • The source operand is a 32-bit memory location. The destination is an XMM register. The 32-bit value is zero-extended to 128 bits. • The source operand is an XMM register. The destination is either an XMM register or a 32-bit memory location. When the destination is a register, bits [127:32] of the destination are not affected. Bits [255:128] of the YMM register that corresponds to the source are not affected. VMOVSS The extended form of the instruction has four 128-bit encodings. Two of the encodings are functionally equivalent. • The source operand is a 32-bit memory location. The destination is an XMM register. The 32-bit value is zero-extended to 128 bits. • The source operand is an XMM register. The destination is a 32-bit memory location. • Two functionally-equivalent encodings: There are two source XMM registers. The destination is an XMM register. Bits [127:64] of the first source register are copied to bits [127:64] of the destination; the 32-bit value in bits [31:0] of the second source register is written to bits [31:0] of the destination. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset Feature Flag MOVSS SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) VMOVSS AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 230 MOVSS, VMOVSS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic Opcode Description MOVSS xmm1, xmm2 F3 0F 10 /r Moves a 32-bit value from xmm2 to xmm1. MOVSS xmm1, mem32 F3 0F 10 /r Moves a zero-extended 32-bit value from mem32 to xmm1. MOVSS xmm2/mem32, xmm1 F3 0F 11 /r Moves a 32-bit value from xmm1 to xmm2 or mem32. Mnemonic Encoding1 VEX RXB.map_select VMOVSS xmm1, mem32 VMOVSS mem32, xmm1 W.vvvv.L.pp Opcode 10 /r C4 RXB.00001 X.1111.X.10 C4 RXB.00001 X.1111.X.10 11 /r 2 C4 RXB.00001 X.src.X.10 10 /r VMOVSS xmm1, xmm2, xmm3 2 C4 RXB.00001 X.src.X.10 11 /r VMOVSS xmm1, xmm2, xmm3 Note 1: The addressing mode differentiates between the two operand form (where one operand is a memory location) and the three operand form (where all operands are held in registers). Note 2: These two encodings are functionally equivalent. Related Instructions (V)MOVAPS, (V)MOVHLPS, (V)MOVHPS, (V)MOVLHPS, (V)MOVLPS, (V)MOVMSKPS, (V)MOVUPS Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference S S X S S A A A A X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b (for memory destination enoding only). REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Write to a read-only data segment. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. MOVSS, VMOVSS 231 AMD64 Technology MOVUPD VMOVUPD 26568—Rev. 3.22—May 2018 Move Unaligned Packed Double-Precision Floating-Point Moves packed double-precision floating-point values. Values can be moved from a register or memory location to a register; or from a register to a register or memory location. A memory operand that is not aligned does not cause a general-protection exception. There are legacy and extended forms of the instruction: MOVUPD Moves two double-precision floating-point values. There are encodings for each type of move. • The source operand is either an XMM register or a 128-bit memory location. The destination operand is an XMM register. • The source operand is an XMM register. The destination operand is either an XMM register or a 128-bit memory location. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VMOVUPD The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Moves two double-precision floating-point values. There are encodings for each type of move. • The source operand is either an XMM register or a 128-bit memory location. The destination operand is an XMM register. • The source operand is an XMM register. The destination operand is either an XMM register or a 128-bit memory location. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Moves four double-precision floating-point values. There are encodings for each type of move. • The source operand is either a YMM register or a 256-bit memory location. The destination operand is a YMM register. • The source operand is a YMM register. The destination operand is either a YMM register or a 256-bit memory location. Instruction Support Form Subset MOVUPD SSE2 VMOVUPD AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 232 MOVUPD, VMOVUPD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic Opcode Description MOVUPD xmm1, xmm2/mem128 66 0F 10 /r Moves two packed double-precision floating-point values from xmm2 or mem128 to xmm1. MOVUPD xmm1/mem128, xmm2 66 0F 11 /r Moves two packed double-precision floating-point values from xmm1 or mem128 to xmm2. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VMOVUPD xmm1, xmm2/mem128 C4 RXB.00001 X.1111.0.01 10 /r VMOVUPD xmm1/mem128, xmm2 C4 RXB.00001 X.1111.0.01 11 /r VMOVUPD ymm1, ymm2/mem256 C4 RXB.00001 X.1111.1.01 10 /r VMOVUPD ymm1/mem256, ymm2 C4 RXB.00001 X.1111.1.01 11 /r Related Instructions (V)MOVAPD, (V)MOVHPD, (V)MOVLPD, (V)MOVMSKPD, (V)MOVSD Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Alignment check, #AC S Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Instruction Reference X S S A A A A X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Write to a read-only data segment. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. MOVUPD, VMOVUPD 233 AMD64 Technology MOVUPS VMOVUPS 26568—Rev. 3.22—May 2018 Move Unaligned Packed Single-Precision Floating-Point Moves packed single-precision floating-point values. Values can be moved from a register or memory location to a register; or from a register to a register or memory location. A memory operand that is not aligned does not cause a general-protection exception. There are legacy and extended forms of the instruction: MOVUPS Moves four single-precision floating-point values. There are encodings for each type of move. • The source operand is either an XMM register or a 128-bit memory location. The destination operand is an XMM register. • The source operand is an XMM register. The destination operand is either an XMM register or a 128-bit memory location. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VMOVUPS The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Moves four single-precision floating-point values. There are encodings for each type of move. • The source operand is either an XMM register or a 128-bit memory location. The destination operand is an XMM register. • The source operand is an XMM register. The destination operand is either an XMM register or a 128-bit memory location. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Moves eight single-precision floating-point values. There are encodings for each type of move. • The source operand is either a YMM register or a 256-bit memory location. The destination operand is a YMM register. • The source operand is a YMM register. The destination operand is either a YMM register or a 256-bit memory location. Instruction Support Form Subset Feature Flag MOVUPS SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) VMOVUPS AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 234 MOVUPS, VMOVUPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic Opcode MOVUPS xmm1, xmm2/mem128 0F 10 /r Moves four packed single-precision floating-point values from xmm2 or unaligned mem128 to xmm1. Description MOVUPS xmm1/mem128, xmm2 0F 11 /r Moves four packed single-precision floating-point values from xmm1 or unaligned mem128 to xmm2. Mnemonic Encoding W.vvvv.L.pp Opcode VMOVUPS xmm1, xmm2/mem128 VEX RXB.map_select C4 RXB.00001 X.1111.0.00 10 /r VMOVUPS xmm1/mem128, xmm2 C4 RXB.00001 X.1111.0.00 11 /r VMOVUPS ymm1, ymm2/mem256 C4 RXB.00001 X.1111.1.00 10 /r VMOVUPS ymm1/mem256, ymm2 C4 RXB.00001 X.1111.1.00 11 /r Related Instructions (V)MOVAPS, (V)MOVHLPS, (V)MOVHPS, (V)MOVLHPS, (V)MOVLPS, (V)MOVMSKPS, (V)MOVSS Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Alignment check, #AC S Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Instruction Reference X S S A A A A X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Write to a read-only data segment. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. MOVUPS, VMOVUPS 235 AMD64 Technology MPSADBW VMPSADBW 26568—Rev. 3.22—May 2018 Multiple Sum of Absolute Differences Calculates 8 or 16 sums of absolute differences of sequentially selected groups of four contiguous unsigned byte integers in the first source operand and a selected group of four contiguous unsigned byte integers in a second source operand and writes the eight or sixteen 16-bit unsigned integer sums to sequential words of the destination register. The 256-bit form of the instruction additionally performs a similar but independent calculation using the upper 128 bits of the source operands. Figure 2-2 on page 238 provides a graphical representation of the operation of the instruction. The following description accompanies it. The computation uses as inputs 11 bytes from the first source operand and 4 bytes in the second source operand. Bit fields in the imm8 operand specify the index of the right-most byte of each group. Bits [1:0] of the immediate operand determine the index of the right-most byte of four contiguous bytes within the second source operand used in the operation that produces the result (or, in the case of the 256-bit form of the instruction, the lower 128 bits of the result). Bit 2 of the immediate operand determines the right-most index of the 11contiguous bytes in the first source operand used in the same calculation. In the 128-bit form of the instruction, bits [7:3] of the immediate operand are ignored. Bits [4:3] of the immediate operand determine the index of the right-most byte of four contiguous bytes within the second source operand used in the operation that produces the upper 128 bits of the result in the 256-bit form of the instruction. Bit 5 of the immediate operand determines the right-most index of the 11 contiguous bytes within in the upper half of the first 256-bit source operand used in the same calculation. In the 256-bit form of the instruction, bits [7:6] of the immediate operand are ignored. Each word of the destination register receives the result of a separate computation of the sum of absolute differences function applied to a specific pair of four-element vectors derived from the source operands. The sum of absolute differences function SumAbsDiff (A, B) takes as input two 4-element unsigned 8-bit integer vectors and produces a single unsigned 16-bit integer result. The function is defined as: SumAbsDiff(A, B) = | A[0]-B[0] | + | A[1]-B[1] | + | A[2]-B[2] | + | A[3]-B[3] | The sum of absolute differences function produces a quantitative measure of the difference between two 4-element vectors. Each of the calculations that generates a result uses this metric to assess the difference between the selected 4-byte vector from operand 2 (B in the above equation) with each of eight overlapping 4-byte vectors (A in the equation) selected sequentially from the first source operand. The right-most word (Word 0) of the destination receives the result of the comparison of the rightmost 4 bytes of the selected group of 11 from operand 1 (src1[ i1+3 : i1], as shown in the figure) to the selected 4 bytes from operand 2 (src2[j1+3:j1], in the figure). Word 1 of the destination receives the result of the comparison of the four bytes starting at an offset of 1 from the right-most byte of the group of 11 (src1[ i1+4 : i1+1] in the figure) to the 4 bytes from operand 2. Word 2 of the destination receives the result of the comparison of the four bytes starting at an offset of 2 from the right-most byte of the group of 11 (src1[ i1+5 : i1+2], in the figure) to the selected 4 bytes from operand 2. This continues in like manner until the left-most four bytes of the 11 are compared to the 4 bytes from operand 2 with the result being written to Word 7. This completes the generation of the lower 128 bits of the result. 236 MPSADBW, VMPSADBW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology The generation of the upper 128 bits of the result for the 256-bit form of the instruction is performed in like manner using separately selected groups of bytes from the upper half of the 256-bit operands, as described above. The following is a more formal description of the operation of the (V)MPSADBW instruction: For both the 128-bit and 256-bit form of the instruction, the following set of operations is performed: src1 and src2 are byte vectors that overlay the first and second source operand respectively. dest is a word vector that overlays the destination register. tmp1[ ] is an array of 4-element vectors derived from the first source operand. tmp2 and tmp3 are 4-element vectors derived from the second source operand. i1 = imm8[2] * 4 j1= imm8[1:0] * 4 tmp1[0] = {src1[i1+3], src1[i1+2], src1[i1+1], src1[i1]} tmp1[1] = {src1[i1+4], src1[i1+3], src1[i1+2], src1[i1+1]} tmp1[2] = {src1[i1+5], src1[i1+4], src1[i1+3], src1[i1+2]} tmp1[3] = {src1[i1+6], src1[i1+5], src1[i1+4], src1[i1+3]} tmp1[4] = {src1[i1+7], src1[i1+6], src1[i1+5], src1[i1+4]} tmp1[5] = {src1[i1+8], src1[i1+7], src1[i1+6], src1[i1+5]} tmp1[6] = {src1[i1+9], src1[i1+8], src1[i1+7], src1[i1+6]} tmp1[7] = {src1[i1+10], src1[i1+9], src1[i1+8], src1[i1+7]} tmp2 = {src2[j1+3], src2[j1+2], src2[j1+1], src2[j1]} dest[0] = SumAbsDiff(tmp1[0], tmp2) dest[1] = SumAbsDiff(tmp1[1], tmp2) dest[2] = SumAbsDiff(tmp1[2], tmp2) dest[3] = SumAbsDiff(tmp1[3], tmp2) dest[4] = SumAbsDiff(tmp1[4], tmp2) dest[5] = SumAbsDiff(tmp1[5], tmp2) dest[6] = SumAbsDiff(tmp1[6], tmp2) dest[7] = SumAbsDiff(tmp1[7], tmp2) Additionally, for the 256-bit form of the instruction, the following set of operations is performed: i2 = imm8[5] * 4 + 16 j2= imm8[4:3] * 4 +16 tmp1[8] = {src1[i2+3], src1[i2+2], src1[i2+1], src1[i2]} tmp1[9] = {src1[i2+4], src1[i2+3], src1[i2+2], src1[i2+1]} tmp1[10] = {src1[i2+5], src1[i2+4], src1[i2+3], src1[i2+2]} tmp1[11] = {src1[i2+6], src1[i2+5], src1[i2+4], src1[i2+3]} tmp1[12] = {src1[i2+7], src1[i2+6], src1[i2+5], src1[i2+4]} tmp1[13] = {src1[i2+8], src1[i2+7], src1[i2+6], src1[i2+5]} tmp1[14] = {src1[i2+9], src1[i2+8], src1[i2+7], src1[i2+6]} tmp1[15] = {src1[i2+10], src1[i2+9], src1[i2+8], src1[i2+7]} tmp3 = {src2[j2+3], src2[j2+2], src2[j2+1], src2[j2]} dest[8] = SumAbsDiff(tmp1[8], tmp3) dest[9] = SumAbsDiff(tmp1[9], tmp3) dest[10] = SumAbsDiff(tmp1[10], tmp3) dest[11] = SumAbsDiff(tmp1[11], tmp3) Instruction Reference MPSADBW, VMPSADBW 237 AMD64 Technology 26568—Rev. 3.22—May 2018 dest[12] = SumAbsDiff(tmp1[12], tmp3) dest[13] = SumAbsDiff(tmp1[13], tmp3) dest[14] = SumAbsDiff(tmp1[14], tmp3) dest[15] = SumAbsDiff(tmp1[15], tmp3) src1[i1+10:i1+7] src1[i1+9:i1+6] src1[i1+8:i1+5] src1[i1+7:i1+4] src1[i1+6:i1+3] src1[i1+5:i1+2] src1[i1+4:i1+1] src1[i1+3:i1] bytes bytes bytes bytes bytes bytes bytes bytes src1[ j1+3:j1] tmp1[7] tmp1[6] tmp1[5] tmp1[4] tmp1[3] tmp1[2] tmp1[1] tmp1[0] bytes Σ |Δ| Σ |Δ| Σ |Δ| Σ |Δ| Σ |Δ| Σ |Δ| Σ |Δ| Σ |Δ| word 7 word 6 word 5 word 4 word 3 word 2 word 1 word 0 tmp2 Destination XMM Register (lower half of YMM Register) src1[i2+10:i2+7] src1[i2+9:i2+6] src1[i2+8:i2+5] src1[i2+7:i2+4] src1[i2+6:i2+3] src1[i2+5:i2+2] src1[i2+4:i2+1] src1[i2+3:i2] bytes bytes bytes bytes bytes bytes bytes bytes src1[ j2+3:j2] tmp1[15] tmp1[14] tmp1[13] tmp1[12] tmp1[11] tmp1[10] tmp1[9] tmp1[8] bytes Σ |Δ| Σ |Δ| Σ |Δ| Σ |Δ| Σ |Δ| Σ |Δ| Σ |Δ| Σ |Δ| word 15 word 14 word 13 word 12 word 11 word 10 word 9 word 8 tmp3 Destination YMM Register (upper half) Notes: • i1 is a byte offset into source operand 1 (i1 = imm8[2] * 4). • j1 is a byte offset into source operand 2 (j1 = imm8[1:0] * 4) • i2 is a second byte offset into source operand 1 (i2 = imm8[5] * 4 + 16) • j2 is a second byte offset into source operand 2 (j2 = imm8[4:3] * 4 + 16) • Σ |Δ| represents the sum of absolute differences function which operates on two 4-element unsigned packed byte values and produces an unsigned 16-bit integer. MPSADBW_instruct2.eps Figure 2-2. (V)MPSADBW Instruction There are legacy and extended forms of the instruction: MPSADBW The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. 238 MPSADBW, VMPSADBW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology VMPSADBW The extended form of the instruction has 128-bit and 256-bit encodings: XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Bits [127:0] of the destination receive the results of the first 8 sums of absolute differences calculation using the selected bytes of the lower halves of the two source operands. Bits [255:128] of the destination receive the results of the second 8 sums of absolute differences calculation using selected bytes of the upper halves of the two source operands. Instruction Support Form Subset MPSADBW SSE4.1 Feature Flag CPUID Fn0000_0001_ECX[SSE41] (bit 19) VMPSADBW 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VMPSADBW 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic MPSADBW xmm1, xmm2/mem128, imm8 Opcode Description 66 0F 3A 42 /r ib Sums absolute difference of groups of four 8-bit integer in xmm1 and xmm2 or mem128. Writes results to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VMPSADBW xmm1, xmm2, xmm3/mem128, imm8 C4 RXB.03 X.src1.0.01 42 /r ib VMPSADBW ymm1, ymm2, ymm3/mem256, imm8 C4 RXB.03 X.src1.1.01 42 /r ib Related Instructions (V)PSADBW, (V)PABSB, (V)PABSD, (V)PABSW Instruction Reference MPSADBW, VMPSADBW 239 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 240 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. MPSADBW, VMPSADBW Instruction Reference 26568—Rev. 3.22—May 2018 MULPD VMULPD AMD64 Technology Multiply Packed Double-Precision Floating-Point Multiplies each packed double-precision floating-point value of the first source operand by the corresponding packed double-precision floating-point value of the second source operand and writes the product of each multiplication into the corresponding quadword of the destination. There are legacy and extended forms of the instruction: MULPD Multiplies two double-precision floating-point values in the first source XMM register by the corresponding double precision floating-point values in either a second XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VMULPD The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Multiplies two double-precision floating-point values in the first source XMM register by the corresponding double-precision floating-point values in either a second source XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Multiplies four double-precision floating-point values in the first source YMM register by the corresponding double precision floating-point values in either a second source YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset MULPD SSE2 VMULPD AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic MULPD xmm1, xmm2/mem128 Opcode 66 0F 59 /r Description Multiplies two packed double-precision floatingpoint values in xmm1 by corresponding values in xmm2 or mem128. Writes results to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VMULPD xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src.0.01 59 /r VMULPD ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src.1.01 59 /r Instruction Reference MULPD, VMULPD 241 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)MULPS, (V)MULSD, (V)MULSS MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM OM 11 10 ZM 9 DM 8 IM 7 DAZ 6 PE UE OE M M M 5 4 3 ZE 2 DE IE M M 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X S X S S S S A X S S X S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE X — AVX and SSE exception A — AVX exception S — SSE exception 242 X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. MULPD, VMULPD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology MULPS VMULPS Multiply Packed Single-Precision Floating-Point Multiplies each packed single-precision floating-point value of the first source operand by the corresponding packed single-precision floating-point value of the second source operand and writes the product of each multiplication into the corresponding elements of the destination. There are legacy and extended forms of the instruction: MULPS Multiplies four single-precision floating-point values in the first source XMM register by the corresponding single-precision floating-point values of either a second source XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VMULPS The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Multiplies four single-precision floating-point values in the first source XMM register by the corresponding single-precision floating-point values of either a second source XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Multiplies eight single-precision floating-point values in the first source YMM register by the corresponding single-precision floating-point values of either a second source YMM register or a 256-bit memory location. Writes the results to a third YMM register. Instruction Support Form Subset Feature Flag MULPS SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) VMULPS AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode MULPS xmm1, xmm2/mem128 0F 59 /r Description Multiplies four packed single-precision floating-point values in xmm1 by corresponding values in xmm2 or mem128. Writes the products to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VMULPS xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.00 59 /r VMULPS ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.00 59 /r Instruction Reference MULPS, VMULPS 243 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)MULPD, (V)MULSD, (V)MULSS MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM OM 11 10 ZM 9 DM 8 IM 7 DAZ 6 PE UE OE M M M 5 4 3 ZE 2 DE IE M M 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X S X S S S S A X S S X S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE X — AVX and SSE exception A — AVX exception S — SSE exception 244 X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. MULPS, VMULPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology MULSD VMULSD Multiply Scalar Double-Precision Floating-Point Multiplies the double-precision floating-point value in the low-order quadword of the first source operand by the double-precision floating-point value in the low-order quadword of the second source operand and writes the product into the low-order quadword of the destination. There are legacy and extended forms of the instruction: MULSD The first source operand is an XMM register and the second source operand is either an XMM register or a 64-bit memory location. The first source register is also the destination register. Bits [127:64] of the destination and bits [255:128] of the corresponding YMM register are not affected. VMULSD The extended form of the instruction has a 128-bit encoding only. The first source operand is an XMM register and the second source operand is either an XMM register or a 64-bit memory location. The destination is a third XMM register. Bits [127:64] of the first source operand are copied to bits [127:64] of the destination. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset MULSD SSE2 VMULSD AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic MULSD xmm1, xmm2/mem64 Opcode Description F2 0F 59 /r Multiplies low-order double-precision floating-point values in xmm1 by corresponding values in xmm2 or mem64. Writes the products to xmm1. Mnemonic VMULSD xmm1, xmm2, xmm3/mem64 Encoding VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.01 X.src1.X.11 59 /r Related Instructions (V)MULPD, (V)MULPS, (V)MULSS Instruction Reference MULSD, VMULSD 245 AMD64 Technology 26568—Rev. 3.22—May 2018 MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM OM 11 10 ZM 9 DM 8 IM 7 DAZ 6 PE UE OE M M M 5 4 3 ZE 2 DE IE M M 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X X X X S S X S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE X — AVX and SSE exception A — AVX exception S — SSE exception 246 X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. MULSD, VMULSD Instruction Reference 26568—Rev. 3.22—May 2018 MULSS VMULSS AMD64 Technology Multiply Scalar Single-Precision Floating-Point Multiplies the single-precision floating-point value in the low-order doubleword of the first source operand by the single-precision floating-point value in the low-order doubleword of the second source operand and writes the product into the low-order doubleword of the destination. There are legacy and extended forms of the instruction: MULSS The first source operand is an XMM register and the second source operand is either an XMM register or a 32-bit memory location. The first source register is also the destination. Bits [127:32] of the destination register and bits [255:128] of the corresponding YMM register are not affected. VMULSS The extended form of the instruction has a 128-bit encoding only. The first source operand is an XMM register and the second source operand is either an XMM register or a 32-bit memory location. The destination is a third XMM register. Bits [127:32] of the first source register are copied to bits [127:32] of the of the destination. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset Feature Flag MULSS SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) VMULSS AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic MULSS xmm1, xmm2/mem32 Opcode Description F3 0F 59 /r Multiplies a single-precision floating-point value in the loworder doubleword of xmm1 by a corresponding value in xmm2 or mem32. Writes the product to xmm1. Mnemonic VMULSS xmm1, xmm2, xmm3/mem32 Encoding VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.01 X.src1.X.10 59 /r Related Instructions (V)MULPD, (V)MULPS, (V)MULSD Instruction Reference MULSS, VMULSS 247 AMD64 Technology 26568—Rev. 3.22—May 2018 MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM OM 11 10 ZM 9 DM 8 IM 7 DAZ 6 PE UE OE M M M 5 4 3 ZE 2 DE IE M M 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X X X X S S X S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE X — AVX and SSE exception A — AVX exception S — SSE exception 248 X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. MULSS, VMULSS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology ORPD VORPD OR Packed Double-Precision Floating-Point Performs bitwise OR of two packed double-precision floating-point values in the first source operand with the corresponding two packed double-precision floating-point values in the second source operand and writes the results into the corresponding elements of the destination. There are legacy and extended forms of the instruction: ORPD The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VORPD The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register and the second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset ORPD SSE2 VORPD AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic ORPD xmm1, xmm2/mem128 Opcode Description 66 0F 56 /r Performs bitwise OR of two packed double-precision floating-point values in xmm1 with corresponding values in xmm2 or mem128. Writes the result to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VORPD xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 56 /r VORPD ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 56 /r Related Instructions (V)ANDNPS, (V)ANDPD, (V)ANDPS, (V)ORPS, (V)XORPD, (V)XORPS Instruction Reference ORPD, VORPD 249 AMD64 Technology 26568—Rev. 3.22—May 2018 MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception 250 X A S S X A S S X S S S S S S S S S S S S S S A X S S A A A X X X X S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Memory operand not 16-byte aligned and MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. ORPD, VORPD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology ORPS VORPS OR Packed Single-Precision Floating-Point Performs bitwise OR of the four packed single-precision floating-point values in the first source operand with the corresponding four packed single-precision floating-point values in the second source operand, and writes the result into the corresponding elements of the destination. There are legacy and extended forms of the instruction: ORPS The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VORPS The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register and the second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag ORPS SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) VORPS AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode Description ORPS xmm1, xmm2/mem128 0F 56 /r Performs bitwise OR of four packed double-precision floatingpoint values in xmm1 with corresponding values in xmm2 or mem128. Writes the result to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VORPS xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.00 56 /r VORPS ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.00 56 /r Related Instructions (V)ANDNPD, (V)ANDNPS, (V)ANDPD, (V)ANDPS, (V)ORPD, (V)XORPD, (V)XORPS Instruction Reference ORPS, VORPS 251 AMD64 Technology 26568—Rev. 3.22—May 2018 MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception 252 X A S S X A S S X S S S S S S S S S S S S S S A X S S A A A X X X X S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Memory operand not 16-byte aligned and MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. ORPS, VORPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PABSB VPABSB Packed Absolute Value Signed Byte Computes the absolute value of 16 or 32 packed 8-bit signed integers in the source operand. Each byte of the destination receives an unsigned 8-bit integer that is the absolute value of the signed 8-bit integer in the corresponding byte of the source operand. There are legacy and extended forms of the instruction: PABSB The source operand is an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPABSB The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The source operand is an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The source operand is a YMM register or a 256-bit memory location. The destination is a YMM register. All 32 bytes of the destination are written. Instruction Support Form Subset Feature Flag PABSB SSSE3 VPABSB 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPABSB 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_ECX[SSSE3] (bit 9) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PABSB xmm1, xmm2/mem128 Opcode Description 0F 38 1C /r Computes the absolute value of each packed 8-bit signed integer value in xmm2/mem128 and writes the 8-bit unsigned results to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPABSB xmm1, xmm2/mem128 C4 RXB.02 X.1111.0.01 1C /r VPABSB ymm1, ymm2/mem256 C4 RXB.02 X.1111.1.01 1C /r Related Instructions (V)PABSW, (V)PABSD Instruction Reference PABSB, VPABSB 253 AMD64 Technology 26568—Rev. 3.22—May 2018 MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 254 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PABSB, VPABSB Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PABSD VPABSD Packed Absolute Value Signed Doubleword Computes the absolute value of four or eight packed 32-bit signed integers in the source operand. Each doubleword of the destination receives an unsigned 32-bit integer that is the absolute value of the signed 32-bit integer in the corresponding doubleword of the source operand. There are legacy and extended forms of the instruction: PABSD The source operand is an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPABSD The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The source operand is an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The source operand is a YMM register or a 256-bit memory location. The destination is a YMM register. All four doublewords of the destination are written. Instruction Support Form Subset Feature Flag PABSD SSSE3 VPABSD 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPABSD 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_ECX[SSSE3] (bit 9) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PABSD xmm1, xmm2/mem128 Opcode Description 0F 38 1E /r Computes the absolute value of each packed 32-bit signed integer value in xmm2/mem128 and writes the 32-bit unsigned results to xmm1 Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPABSD xmm1, xmm2/mem128 C4 RXB.02 X.1111.0.01 1E /r VPABSD ymm1, ymm2/mem256 C4 RXB.02 X.1111.1.01 1E /r Related Instructions (V)PABSB, (V)PABSW Instruction Reference PABSD, VPABSD 255 AMD64 Technology 26568—Rev. 3.22—May 2018 MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 256 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PABSD, VPABSD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PABSW VPABSW Packed Absolute Value Signed Word Computes the absolute value of eight or sixteen packed 16-bit signed integers in the source operand. Each word of the destination receives an unsigned 16-bit integer that is the absolute value of the signed 16-bit integer in the corresponding word of the source operand. There are legacy and extended forms of the instruction: PABSW The source operand is an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPABSW The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The source operand is an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The source operand is a YMM register or a 256-bit memory location. The destination is a YMM register. All 16 words of the destination are written. Instruction Support Form Subset Feature Flag PABSW SSSE3 VPABSW 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPABSW 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_ECX[SSSE3] (bit 9) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PABSW xmm1, xmm2/mem128 Opcode Description 0F 38 1D /r Computes the absolute value of each packed 16-bit signed integer value in xmm2/mem128 and writes the 16-bit unsigned results to xmm1 Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPABSW xmm1, xmm2/mem128 C4 RXB.02 X.1111.0.01 1D /r VPABSW ymm1, ymm2/mem256 C4 RXB.02 X.1111.1.01 1D /r Related Instructions (V)PABSB, (V)PABSD Instruction Reference PABSW, VPABSW 257 AMD64 Technology 26568—Rev. 3.22—May 2018 MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 258 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PABSW, VPABSW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PACKSSDW VPACKSSDW Pack with Signed Saturation Doubleword to Word Converts four or eight 32-bit signed integers from the first source operand and the second source operand into 16-bit signed integers and packs the results into the destination. Positive source value greater than 7FFFh are saturated to 7FFFh; negative source values less than 8000h are saturated to 8000h. Converted values from the first source operand are packed into the low-order words of the destination; converted values from the second source operand are packed into the high-order words of the destination. There are legacy and extended forms of the instruction: PACKSSDW The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPACKSSDW The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PACKSSDW SSE2 VPACKSSDW 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPACKSSDW 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_EDX[SSE2] (bit 26) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PACKSSDW xmm1, xmm2/mem128 Opcode 66 0F 6B /r Description Converts 32-bit signed integers in xmm1 and xmm2 or mem128 into 16-bit signed integers with saturation. Writes packed results to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPACKSSDW xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 6B /r VPACKSSDW ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 6B /r Instruction Reference PACKSSDW, VPACKSSDW 259 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PACKSSWB, (V)PACKUSDW, (V)PACKUSWB MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 260 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PACKSSDW, VPACKSSDW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PACKSSWB VPACKSSWB Pack with Signed Saturation Word to Byte Converts eight or sixteen 16-bit signed integers from the first source operand and the second source operand into sixteen or thirty two 8-bit signed integers and packs the results into the destination. Positive source values greater than 7Fh are saturated to 7Fh; negative source values less than 80h are saturated to 80h. Converted values from the first source operand are packed into the low-order bytes of the destination; converted values from the second source operand are packed into the high-order bytes of the destination. There are legacy and extended forms of the instruction: PACKSSWB The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPACKSSWB The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PACKSSWB SSE2 VPACKSSWB 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPACKSSWB 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_EDX[SSE2] (bit 26) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PACKSSWB xmm1, xmm2/mem128 Opcode Description 66 0F 63 /r Converts 16-bit signed integers in xmm1 and xmm2 or mem128 into 8-bit signed integers with saturation. Writes packed results to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPACKSSWB xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 63 /r VPACKSSWB ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 63 /r Instruction Reference PACKSSWB, VPACKSSWB 261 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PACKSSDW, (V)PACKUSDW, (V)PACKUSWB MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 262 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PACKSSWB, VPACKSSWB Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PACKUSDW VPACKUSDW Pack with Unsigned Saturation Doubleword to Word Converts four or eight 32-bit signed integers from the first source operand and the second source operand into eight or sixteen 16-bit unsigned integers and packs the results into the destination. Source values greater than FFFFh are saturated to FFFFh; source values less than 0000h are saturated to 0000h. Packs converted values from the first source operand into the low-order words of the destination; packs converted values from the second source operand into the high-order words of the destination. There are legacy and extended forms of the instruction: PACKUSDW The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPACKUSDW The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PACKUSDW SSE4.1 VPACKUSDW 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPACKUSDW 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_ECX[SSE41] (bit 19) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PACKUSDW xmm1, xmm2/mem128 Opcode Description 66 0F 38 2B /r Converts 32-bit signed integers in xmm1 and xmm2 or mem128 into 16-bit unsigned integers with saturation. Writes packed results to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPACKUSDW xmm1, xmm2, xmm3/mem128 C4 RXB.02 X.src1.0.01 2B /r VPACKUSDW ymm1, ymm2, ymm3/mem256 C4 RXB.02 X.src1.0.01 2B /r Instruction Reference PACKUSDW, VPACKUSDW 263 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PACKSSDW, (V)PACKSSWB, (V)PACKUSWB MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 264 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PACKUSDW, VPACKUSDW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PACKUSWB VPACKUSWB Pack with Unsigned Saturation Word to Byte Converts eight or sixteen 16-bit signed integers from the first source operand and the second source operand into sixteen or thirty two 8-bit unsigned integers and packs the results into the destination. When a source value is greater than 7Fh it is saturated to FFh; when source value is less than 00h, it is saturated to 00h. Packs converted values from the first source operand into the low-order bytes of the destination; packs converted values from the second source operand into the high-order bytes of the destination. There are legacy and extended forms of the instruction: PACKUSWB The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPACKUSWB The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PACKUSWB SSE2 VPACKUSWB 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPACKUSWB 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_EDX[SSE2] (bit 26) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PACKUSWB xmm1, xmm2/mem128 Opcode Description 66 0F 67 /r Converts 16-bit signed integers in xmm1 and xmm2 or mem128 into 8-bit signed integers with saturation. Writes packed results to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPACKUSWB xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 67 /r VPACKUSWB ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 67 /r Instruction Reference PACKUSWB, VPACKUSWB 265 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PACKSSDW, (V)PACKSSWB, (V)PACKUSDW MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 266 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PACKUSWB, VPACKUSWB Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PADDB VPADDB Packed Add Bytes Adds 16 or 32 packed 8-bit integer values in the first source operand to corresponding values in the second source operand and writes the integer sums to the corresponding bytes of the destination. This instruction operates on both signed and unsigned integers. When a result overflows, the carry is ignored (neither the overflow nor carry bit in rFLAGS is set), and only the low-order 8 bits of each result are written to the destination. There are legacy and extended forms of the instruction: PADDB The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPADDB The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PADDB SSE2 VPADDB 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPADDB 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_EDX[SSE2] (bit 26) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PADDB xmm1, xmm2/mem128 Opcode Description 66 0F FC /r Adds packed byte integer values in xmm1 and xmm2 or mem128 Writes the sums to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPADDB xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 FC /r VPADDB ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 FC /r Instruction Reference PADDB, VPADDB 267 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PADDD, (V)PADDQ, (V)PADDSB, (V)PADDSW, (V)PADDUSB, (V)PADDUSW, (V)PADDW MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 268 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PADDB, VPADDB Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PADDD VPADDD Packed Add Doublewords Adds 4 or 8 packed 32-bit integer value in the first source operand to corresponding values in the second source operand and writes integer sums to the corresponding doublewords of the destination. This instruction operates on both signed and unsigned integers. When a result overflows, the carry is ignored (neither the overflow nor carry bit in rFLAGS is set), and only the low-order 32 bits of each result are written to the destination. There are legacy and extended forms of the instruction: PADDD The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPADDD The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PADDD SSE2 VPADDD 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPADDD 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_EDX[SSE2] (bit 26) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PADDD xmm1, xmm2/mem128 Opcode 66 0F FE /r Description Adds packed doubleword integer values in xmm1 and xmm2 or mem128 Writes the sums to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPADDD xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 FE /r VPADDD ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 FE /r Instruction Reference PADDD, VPADDD 269 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PADDB, (V)PADDQ, (V)PADDSB, (V)PADDSW, (V)PADDUSB, (V)PADDUSW, (V)PADDW MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 270 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PADDD, VPADDD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PADDQ VPADDQ Packed Add Quadwords Adds 2 or 4 packed 64-bit integer values in the first source operand to corresponding values in the second source operand and writes the integer sums to the corresponding quadwords of the destination. This instruction operates on both signed and unsigned integers. When a result overflows, the carry is ignored (neither the overflow nor carry bit in rFLAGS is set), and only the low-order 64 bits of each result are written to the destination. There are legacy and extended forms of the instruction: PADDQ The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPADDQ The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PADDQ SSE2 VPADDQ 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPADDQ 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_EDX[SSE2] (bit 26) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PADDQ xmm1, xmm2/mem128 Opcode 66 0F D4 /r Description Adds packed quadword integer values in xmm1 and xmm2 or mem128 Writes the sums to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPADDQ xmm1, xmm2, xmm3/mem128 C4 RXB.00001 X.src1.0.01 D4 /r VPADDQ ymm1, ymm2, ymm3/mem256 C4 RXB.00001 X.src1.1.01 D4 /r Instruction Reference PADDQ, VPADDQ 271 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PADDB, (V)PADDD, (V)PADDSB, (V)PADDSW, (V)PADDUSB, (V)PADDUSW, (V)PADDW MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 272 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PADDQ, VPADDQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PADDSB VPADDSB Packed Add with Signed Saturation Bytes Adds 16 or 32 packed 8-bit signed integer values in the first source operand to the corresponding values in the second source operand and writes the signed integer sums to corresponding bytes of the destination. Positive sums greater than 7Fh are saturated to 7Fh; negative sums less than 80h are saturated to 80h. There are legacy and extended forms of the instruction: PADDSB The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPADDSB The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PADDSB SSE2 VPADDSB 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPADDSB 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_EDX[SSE2] (bit 26) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PADDSB xmm1, xmm2/mem128 Opcode 66 0F EC /r Description Adds packed signed 8-bit integer values in xmm1 and xmm2 or mem128 with signed saturation. Writes the sums to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPADDSB xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 EC /r VPADDSB ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 EC /r Instruction Reference PADDSB, VPADDSB 273 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PADDB, (V)PADDD, (V)PADDQ, (V)PADDSW, (V)PADDUSB, (V)PADDUSW, (V)PADDW MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 274 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PADDSB, VPADDSB Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PADDSW VPADDSW Packed Add with Signed Saturation Words Adds 8 or 16 packed 16-bit signed integer value in the first source operand to the corresponding values in the second source operand and writes the signed integer sums to the corresponding words of the destination. Positive sums greater than 7FFFh are saturated to 7FFFh; negative sums less than 8000h are saturated to 8000h. There are legacy and extended forms of the instruction: PADDSW The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPADDSW The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PADDSW SSE2 VPADDSW 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) CPUID Fn0000_0001_EDX[SSE2] (bit 26) VPADDSW 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PADDSW xmm1, xmm2/mem128 Opcode Description 66 0F ED /r Adds packed signed 16-bit integer values in xmm1 and xmm2 or mem128 with signed saturation. Writes the sums to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPADDSW xmm1, xmm2, xmm3/mem128 C4 RXB.00001 X.src1.0.01 ED /r VPADDSW ymm1, ymm2, ymm3/mem256 C4 RXB.00001 X.src1.1.01 ED /r Instruction Reference PADDSW, VPADDSW 275 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PADDB, (V)PADDD, (V)PADDQ, (V)PADDSB, (V)PADDUSB, (V)PADDUSW, (V)PADDW MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 276 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PADDSW, VPADDSW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PADDUSB VPADDUSB Packed Add with Unsigned Saturation Bytes Adds 16 or 32 packed 8-bit unsigned integer values in the first source operand to the corresponding values in the second source operand and writes the unsigned integer sums to the corresponding bytes of the destination. Sums greater than FFh are saturated to FFh. There are legacy and extended forms of the instruction: PADDUSB The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPADDUSB The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PADDUSB SSE2 VPADDUSB 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) CPUID Fn0000_0001_EDX[SSE2] (bit 26) VPADDUSB 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PADDUSB xmm1, xmm2/mem128 Opcode 66 0F DC /r Description Adds packed unsigned 8-bit integer values in xmm1 and xmm2 or mem128 with unsigned saturation. Writes the sums to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPADDUSB xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 DC /r VPADDUSB ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 DC /r Instruction Reference PADDUSB, VPADDUSB 277 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PADDB, (V)PADDD, (V)PADDQ, (V)PADDSB, (V)PADDSW, (V)PADDUSW, (V)PADDW rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 278 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PADDUSB, VPADDUSB Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PADDUSW VPADDUSW Packed Add with Unsigned Saturation Words Adds 8 or 16 packed 16-bit unsigned integer value in the first source operand to the corresponding values in the second source operand and writes the unsigned integer sums to the corresponding words of the destination. Sums greater than FFFFh are saturated to FFFFh. There are legacy and extended forms of the instruction: PADDUSW The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPADDUSW The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PADDUSW SSE2 VPADDUSW 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPADDUSW 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_EDX[SSE2] (bit 26) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PADDUSW xmm1, xmm2/mem128 Opcode Description 66 0F DD /r Adds packed unsigned 16-bit integer values in xmm1 and xmm2 or mem128 with unsigned saturation. Writes the sums to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPADDUSW xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 DD /r VPADDUSW ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 DD /r Instruction Reference PADDUSW, VPADDUSW 279 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PADDB, (V)PADDD, (V)PADDQ, (V)PADDSB, (V)PADDSW, (V)PADDUSB, (V)PADDW rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 280 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PADDUSW, VPADDUSW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PADDW VPADDW Packed Add Words Adds or 16 packed 16-bit integer value in the first source operand to the corresponding values in the second source operand and writes the integer sums to the corresponding word of the destination. This instruction operates on both signed and unsigned integers. When a result overflows, the carry is ignored (neither the overflow nor carry bit in rFLAGS is set), and only the low-order 16 bits of each result are written to the destination. There are legacy and extended forms of the instruction: PADDW The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPADDW The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PADDW SSE2 VPADDW 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPADDW 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_EDX[SSE2] (bit 26) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PADDW xmm1, xmm2/mem128 Opcode 66 0F FD /r Description Adds packed 16-bit integer values in xmm1 and xmm2 or mem128. Writes the sums to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPADDW xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 FD /r VPADDW ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 FD /r Instruction Reference PADDW, VPADDW 281 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PADDB, (V)PADDD, (V)PADDQ, (V)PADDSB, (V)PADDSW, (V)PADDUSB, (V)PADDUSW RFlags Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 282 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PADDW, VPADDW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PALIGNR VPALIGNR Packed Align Right Concatenates one or two pairs of 16-byte values from the first and second source operands and rightshifts the concatenated values the number of bytes specified by the unsigned immediate operand. Writes the least-significant 16 bytes of the shifted result to the destination or writes the least-significant 16 bytes of the two shifted results to the upper and lower halves of the destination. For the 128-bit form of the instruction, the first and second 128-bit source operands are concatenated to form a temporary 256-bit value with the first source operand occupying the most-significant half of the temporary value. After the right-shift operation, the lower 128 bits of the result are written to the destination. For the 256-bit form of the instruction, the lower 16 bytes of the first and second source operands are concatenated to form a first temporary 256-bit value with the bytes from the first source operand occupying the most-significant half of the temporary value. The upper 16 bytes of the first and second source operands are concatenated to form a second temporary 256-bit value with the bytes from the first source operand occupying the most-significant half of the second temporary value. Both temporary values are right-shifted the number of bytes specified by the immediate operand. After the rightshift operation, the lower 16 bytes of the first temporary value are written to the lower 128 bits of the destination and the lower 16 bytes of the second temporary value are written to the upper 128 bits of the destination. The binary value of the immediate operand determines the byte shift value. On each shift the mostsignificant byte is set to zero. When the byte shift value is greater than 31, the destination is zeroed. There are two forms of the instruction. PALIGNR The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPALIGNR The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset PALIGNR SSSE3 VPALIGNR 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPALIGNR 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) Instruction Reference Feature Flag CPUID Fn0000_0001_ECX[SSSE3] (bit 9) PALIGNR, VPALIGNR 283 AMD64 Technology 26568—Rev. 3.22—May 2018 For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode PALIGNR xmm1, xmm2/mem128, imm8 Description 66 0F 3A 0F /r ib Right-shifts xmm1:xmm2/mem128 imm8 bytes. Writes shifted result to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPALIGNR xmm1, xmm2, xmm3/mem128, imm8 C4 RXB.03 X.src1.0.01 0F /r ib VPALIGNR ymm1, ymm2, ymm3/mem256, imm8 C4 RXB.03 X.src1.1.01 0F /r ib Related Instructions None rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 284 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PALIGNR, VPALIGNR Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PAND VPAND Packed AND Performs a bitwise AND of the packed values in the first and second source operands and writes the result to the destination. There are legacy and extended forms of the instruction: PAND The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPAND The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PAND SSE2 VPAND 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPAND 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_EDX[SSE2] (bit 26) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PAND xmm1, xmm2/mem128 Opcode Description 66 0F DB /r Performs bitwise AND of values in xmm1 and xmm2 or mem128. Writes the result to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPAND xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 DB /r VPAND ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 DB /r Related Instructions (V)PANDN, (V)POR, (V)PXOR Instruction Reference PAND, VPAND 285 AMD64 Technology 26568—Rev. 3.22—May 2018 rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 286 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PAND, VPAND Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PANDN VPANDN Packed AND NOT Generates the ones’ complement of the value in the first source operand and performs a bitwise AND of the complement and the value in the second source operand. Writes the result to the destination. There are legacy and extended forms of the instruction: PANDN The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPANDN The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PANDN SSE2 VPANDN 128-bit AVX CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) VPANDN 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PANDN xmm1, xmm2/mem128 Opcode 66 0F DF /r Description Generates ones’ complement of xmm1, then performs bitwise AND with value in xmm2 or mem128. Writes the result to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPANDN xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src.0.01 DF /r VPANDN ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src.1.01 DF /r Related Instructions (V)PAND, (V)POR, (V)PXOR Instruction Reference PANDN, VPANDN 287 AMD64 Technology 26568—Rev. 3.22—May 2018 rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 288 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PANDN, VPANDN Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PAVGB VPAVGB Packed Average Unsigned Bytes Computes the rounded averages of 16 or 32 packed unsigned 8-bit integer values in the first source operand and the corresponding values of the second source operand. Writes each average to the corresponding byte of the destination. An average is computed by adding pairs of 8-bit integer values in corresponding positions in the two operands, adding 1 to a 9-bit temporary sum, and right-shifting the temporary sum by one bit position. There are legacy and extended forms of the instruction: PAVGB The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPAVGB The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PAVGB SSE2 VPAVGB 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPAVGB 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_EDX[SSE2] (bit 26) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PAVGB xmm1, xmm2/mem128 Opcode Description 66 0F E0 /r Averages pairs of packed 8-bit unsigned integer values in xmm1 and xmm2 or mem128. Writes the averages to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPAVGB xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 E0 /r VPAVGB ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 E0 /r Instruction Reference PAVGB, VPAVGB 289 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions PAVGW rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 290 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PAVGB, VPAVGB Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PAVGW VPAVGW Packed Average Unsigned Words Computes the rounded average of packed unsigned 16-bit integer values in the first source operand and the corresponding values of the second source operand. Writes each average to the corresponding word of the destination. An average is computed by adding pairs of 16-bit integer values in corresponding positions in the two operands, adding 1 to a 17-bit temporary sum, and right-shifting the temporary sum by one bit position. There are legacy and extended forms of the instruction: PAVGW The first source operand is an XMM register and the second source operand is an XMM register or 128-bit memory location. The destination is the same XMM register as the first source operand; the upper 128-bits of the corresponding YMM register are not affected. VPAVGW The extended form of the instruction has128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PAVGW SSE2 VPAVGW 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPAVGW 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_EDX[SSE2] (bit 26) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PAVGW xmm1, xmm2/mem128 Opcode Description 66 0F E3 /r Averages pairs of packed 16-bit unsigned integer values in xmm1 and xmm2 or mem128. Writes the averages to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPAVGW xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 E3 /r VPAVGW ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 E3 /r Instruction Reference PAVGW, VPAVGW 291 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PAVGB rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 292 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PAVGW, VPAVGW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PBLENDVB VPBLENDVB Variable Blend Packed Bytes Copies packed bytes from either of two sources to a destination, as specified by a mask operand. The mask is defined by the most significant bit of each byte of the mask operand. The position of a mask bit corresponds to the position of the most significant bit of a copied value. • When a mask bit = 0, the specified element of the first source is copied to the corresponding position in the destination. • When a mask bit = 1, the specified element of the second source is copied to the corresponding position in the destination. There are legacy and extended forms of the instruction: PBLENDVB The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. The mask operand is the implicit register XMM0. VPBLENDVB The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. The mask operand is a fourth XMM register selected by bits [7:4] of an immediate byte. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. The mask operand is a fourth YMM register selected by bits [7:4] of an immediate byte. Instruction Support Form Subset Feature Flag PBLENDVB SSE4.1 CPUID Fn0000_0001_ECX[SSE41] (bit 19) VPBLENDVB 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPBLENDVB 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Reference PBLENDVB, VPBLENDVB 293 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Encoding Mnemonic Opcode PBLENDVB xmm1, xmm2/mem128 Description 66 0F 38 10 /r Selects byte values from xmm1 or xmm2/mem128, depending on the value of corresponding mask bits in XMM0. Writes the selected values to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPBLENDVB xmm1, xmm2, xmm3/mem128, xmm4 C4 RXB.03 0.src1.0.01 4C /r is4 VPBLENDVB ymm1, ymm2, ymm3/mem256, ymm4 C4 RXB.03 0.src1.1.01 4C /r is4 Related Instructions (V)BLENDVPD, (V)BLENDVPS rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 294 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PBLENDVB, VPBLENDVB Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PBLENDW VPBLENDW Blend Packed Words Copies packed words from either of two sources to a destination, as specified by an immediate 8-bit mask operand. For the 256-bit form, the same 8-bit mask is applied twice; once to select words to be written to the lower 128 bits of the destination and again to select words to be written to the upper 128 bits of the destination. Each bit of the mask selects a word from one of the source operands based on the position of the word within the operand. Bit 0 of the mask selects the least-significant word (word 0) to be copied, bit 1 selects the next-most significant word (word 1), and so forth. Bit 7 selects word 7 (the most-significant word for 128-bit operands). For the 256-bit operands, the mask is reused to select words in the upper 128-bits of the source operands to be copied. Bit 0 of the mask selects word 8, bit 1 selects word 9, and so forth. Finally, bit 7 of the mask selects the word from position 15. • When a mask bit = 0, the specified element of the first source is copied to the corresponding position in the destination. • When a mask bit = 1, the specified element of the second source is copied to the corresponding position in the destination. There are legacy and extended forms of the instruction: PBLENDW The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPBLENDW The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PBLENDW SSE4.1 CPUID Fn0000_0001_ECX[SSE41] (bit 19) VPBLENDW 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPBLENDW 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Reference PBLENDW, VPBLENDW 295 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Encoding Mnemonic Opcode PBLENDW xmm1, xmm2/mem128, imm8 66 0F 3A 0E /r ib Description Selects word values from xmm1 or xmm2/mem128, as specified by imm8. Writes the selected values to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPBLENDW xmm1, xmm2, xmm3/mem128, imm8 C4 RXB.03 X.src1.0.01 0E /r /ib VPBLENDW ymm1, ymm2, ymm3/mem256, imm8 C4 RXB.03 X.src1.1.01 0E /r /ib Related Instructions (V)BLENDPD rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 296 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PBLENDW, VPBLENDW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PCLMULQDQ VPCLMULQDQ Carry-less Multiply Quadwords Performs a carry-less multiplication of a selected quadword element of the first source operand by a selected quadword element of the second source operand and writes the product to the destination. Carry-less multiplication, also known as binary polynomial multiplication, is the mathematical operation of computing the product of two operands without generating or propagating carries. It is an essential component of cryptographic processing, and typically requires a large number of cycles. The instruction provides an efficient means of performing the operation and is particularly useful in implementing the Galois counter mode used in the Advanced Encryption Standard (AES). See Appendix A on page 973 for additional information. Bits 4 and 0 of an 8-bit immediate byte operand specify which quadword of each source operand to multiply, as follows. Mnemonic Imm[0] Imm[4] Quadword Operands Selected (V)PCLMULLQLQDQ 0 0 SRC1[63:0], SRC2[63:0] (V)PCLMULHQLQDQ 1 0 SRC1[127:64], SRC2[63:0] (V)PCLMULLQHQDQ 0 1 SRC1[63:0], SRC2[127:64] (V)PCLMULHQHQDQ 1 1 SRC1[127:64], SRC2[127:64] Alias mnemonics are provided for the various immediate byte combinations. There are legacy and extended forms of the instruction: PCLMULQDQ The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPCLMULQDQ The extended form of the instruction has a 128-bit encoding only. The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset PCLMULQDQ PCLMULQDQ CPUID Fn0000_0001_ECX[PCLMULQDQ] (bit 1) Feature Flag VPCLMULQDQ AVX or PCLMULQDQ CPUID Fn0000_0001_ECX[PCLMULQDQ] (bit 1) or CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Reference PCLMULQDQ, VPCLMULQDQ 297 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Encoding Mnemonic Opcode PCLMULQDQ xmm1, xmm2/mem128, imm8 Description 66 0F 3A 44 /r ib Performs carry-less multiplication of a selected quadword element of xmm1 by a selected quadword element of xmm2 or mem128. Elements are selected by bits 4 and 0 of imm8. Writes the product to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp VPCLMULQDQ xmm1, xmm2, xmm3/mem128, imm8 C4 RXB.00011 X.src.0.01 Opcode 44 /r ib Related Instructions (V)PMULDQ, (V)PMULUDQ rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception 298 X A S S X A S S X S S S S S S S S S S S S S S A X S S A A A X X X X S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Memory operand not 16-byte aligned and MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. PCLMULQDQ, VPCLMULQDQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PCMPEQB VPCMPEQB Packed Compare Equal Bytes Compares packed byte values in the first source operand to corresponding values in the second source operand and writes a comparison result to the corresponding byte of the destination. When values are equal, the result is FFh; when values are not equal, the result is 00h. There are legacy and extended forms of the instruction: PCMPEQB The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPCMPEQB The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PCMPEQB SSE2 VPCMPEQB 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPCMPEQB 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_EDX[SSE2] (bit 26) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PCMPEQB xmm1, xmm2/mem128 Opcode 66 0F 74 /r Description Compares packed bytes in xmm1 to packed bytes in xmm2 or mem128. Writes results to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPCMPEQB xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 74 /r VPCMPEQB ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 74 /r Related Instructions (V)PCMPEQD, (V)PCMPEQW, (V)PCMPGTB, (V)PCMPGTD, (V)PCMPGTW Instruction Reference PCMPEQB, VPCMPEQB 299 AMD64 Technology 26568—Rev. 3.22—May 2018 rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 300 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PCMPEQB, VPCMPEQB Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PCMPEQD VPCMPEQD Packed Compare Equal Doublewords Compares packed doubleword values in the first source operand to corresponding values in the second source operand and writes a comparison result to the corresponding doubleword of the destination. When values are equal, the result is FFFFFFFFh; when values are not equal, the result is 00000000h. There are legacy and extended forms of the instruction: PCMPEQD The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPCMPEQD The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PCMPEQD SSE2 VPCMPEQD 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPCMPEQD 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_EDX[SSE2] (bit 26) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PCMPEQD xmm1, xmm2/mem128 Opcode 66 0F 76 /r Description Compares packed doublewords in xmm1 to packed doublewords in xmm2 or mem128. Writes results to xmm1. Mnemonic Encoding W.vvvv.L.pp Opcode VPCMPEQD xmm1, xmm2, xmm3/mem128 VEX RXB.map_select C4 RXB.01 X.src1.0.01 76 /r VPCMPEQD ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 76 /r Related Instructions (V)PCMPEQB, (V)PCMPEQW, (V)PCMPGTB, (V)PCMPGTD, (V)PCMPGTW Instruction Reference PCMPEQD, VPCMPEQD 301 AMD64 Technology 26568—Rev. 3.22—May 2018 rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 302 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PCMPEQD, VPCMPEQD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PCMPEQQ VPCMPEQQ Packed Compare Equal Quadwords Compares packed quadword values in the first source operand to corresponding values in the second source operand and writes a comparison result to the corresponding quadword of the destination. When values are equal, the result is FFFFFFFFFFFFFFFFh; when values are not equal, the result is 0000000000000000h. There are legacy and extended forms of the instruction: PCMPEQQ The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPCMPEQQ The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PCMPEQQ SSE4.1 CPUID Fn0000_0001_ECX[SSE41] (bit 19) VPCMPEQQ 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPCMPEQQ 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PCMPEQQ xmm1, xmm2/mem128 Opcode Description 66 0F 38 29 /r Compares packed quadwords in xmm1 to packed quadwords in xmm2 or mem128. Writes results to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPCMPEQQ xmm1, xmm2, xmm3/mem128 C4 RXB.02 X.src1.0.01 29 /r VPCMPEQQ ymm1, ymm2, ymm3/mem256 C4 RXB.02 X.src1.1.01 29 /r Instruction Reference PCMPEQQ, VPCMPEQQ 303 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PCMPEQB, (V)PCMPEQW, (V)PCMPGTB, (V)PCMPGTD, (V)PCMPGTW rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 304 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PCMPEQQ, VPCMPEQQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PCMPEQW VPCMPEQW Packed Compare Equal Words Compares packed word values in the first source operand to corresponding values in the second source operand and writes a comparison result to the corresponding word of the destination. When values are equal, the result is FFFFh; when values are not equal, the result is 0000h. There are legacy and extended forms of the instruction: PCMPEQW The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPCMPEQW The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PCMPEQW SSE2 VPCMPEQW 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPCMPEQW 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_EDX[SSE2] (bit 26) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PCMPEQW xmm1, xmm2/mem128 Opcode Description 66 0F 75 /r Compares packed words in xmm1 to packed words in xmm2 or mem128. Writes results to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPCMPEQW xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 75 /r VPCMPEQW ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 75 /r Related Instructions (V)PCMPEQB, (V)PCMPEQD, (V)PCMPGTB, (V)PCMPGTD, (V)PCMPGTW Instruction Reference PCMPEQW, VPCMPEQW 305 AMD64 Technology 26568—Rev. 3.22—May 2018 rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 306 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PCMPEQW, VPCMPEQW Instruction Reference 26568—Rev. 3.22—May 2018 PCMPESTRI VPCMPESTRI AMD64 Technology Packed Compare Explicit Length Strings Return Index Compares character string data in the first and second source operands. Comparison operations are carried out as specified by values encoded in the immediate operand. Writes an index to the ECX register. Source operands are formatted as a packed characters in one of two supported widths: 8 or 16 bits. Characters may be treated as either signed or unsigned values. Each operand has associated with it a separate integer value specifying the length of the string. The absolute value of the data in the EAX/RAX register represents the length of the character string in the first source operand; the absolute value of the data in the EDX/RDX register represents the length of the character string in the second source operand. If the absolute value of the data in either register is greater than the maximum string length that fits in 128 bits, the length is set to the maximum: 8, for 16-bit characters, or 16, for 8-bit characters. The comparison operations between the two operand strings are summarized in an intermediate result—a comparison summary bit vector that is post-processed to produce the final output. Data fields within the immediate byte specify the source data format, comparison type, comparison summary bit vector post-processing, and output option selection. The index of either the most significant or least significant set bit of the post-processed comparison summary bit vector is returned in ECX. If no bits are set in the post-processed comparison summary bit vector, ECX is set to 16 for source operand strings composed of 8-bit characters or 8 for 16-bit character strings. See Section 1.5, “String Compare Instructions” for information about source string data format, comparison operations, comparison summary bit vector generation, post-processing, and output selection options. The rFLAGS are set to indicate the following conditions: Flag Condition CF Cleared if the comparison summary bit vector is zero; otherwise set. PF cleared. AF cleared. ZF Set if the specified length of the second string is less than the maximum; otherwise cleared. SF Set if the specified length of the first string is less than the maximum; otherwise cleared. OF Equal to the value of the lsb of the post-processed comparison summary bit vector. There are legacy and extended forms of the instruction: PCMPESTRI The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. A result index is written to the ECX register. VPCMPESTRI The extended form of the instruction has a 128-bit encoding only. Instruction Reference PCMPESTRI, VPCMPESTRI 307 AMD64 Technology 26568—Rev. 3.22—May 2018 The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. A result index is written to the ECX register. Instruction Support Form Subset PCMPESTRI SSE4.2 VPCMPESTRI AVX Feature Flag CPUID Fn0000_0001_ECX[SSE42] (bit 20) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode PCMPESTRI xmm1, xmm2/mem128, imm8 Description 66 0F 3A 61 /r ib Compares packed string data in xmm1 and xmm2 or mem128. Writes a result index to the ECX register. Mnemonic Encoding VPCMPESTRI xmm1, xmm2/mem128, imm8 VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.00011 X.1111.0.01 61 /r ib Related Instructions (V)PCMPESTRM, (V)PCMPISTRI, (V)PCMPISTRM rFLAGS Affected ID VIP VIF AC VM RF NT IOPL OF DF IF TF M 21 Note: 20 19 18 17 16 14 13 12 11 10 9 8 SF ZF AF PF CF M M 0 0 M 7 6 4 2 0 Bits 31:22, 15, 5, 3, and 1 are reserved. A flag that is set or cleared is M (modified). Unaffected flags are blank. Undefined flags are U. MXCSR Flags Affected None 308 PCMPESTRI, VPCMPESTRI Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S S S A X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference X S S A A A A A X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. PCMPESTRI, VPCMPESTRI 309 AMD64 Technology PCMPESTRM VPCMPESTRM 26568—Rev. 3.22—May 2018 Packed Compare Explicit Length Strings Return Mask Compares character string data in the first and second source operands. Comparison operations are carried out as specified by values encoded in the immediate operand. Writes a mask value to the YMM0/XMM0 register. Source operands are formatted as a packed characters in one of two supported widths: 8 or 16 bits. Characters may be treated as either signed or unsigned values. Each operand has associated with it a separate integer value specifying the length of the string. The absolute value of the data in the EAX/RAX register represents the length of the character string in the first source operand; the absolute value of the data in the EDX/RDX register represents the length of the character string in the second source operand. If the absolute value of the data in either register is greater than the maximum string length that fits in 128 bits, the length is set to the maximum: 8, for 16-bit characters, or 16, for 8-bit characters. The comparison operations between the two operand strings are summarized in an intermediate result—a comparison summary bit vector that is post-processed to produce the final output. Data fields within the immediate byte specify the source data format, comparison type, comparison summary bit vector post-processing, and output option selection. Depending on the output option selected, the post-processed comparison summary bit vector is either zero-extended to 128 bits or expanded into a byte/word-mask and then written to XMM0. See Section 1.5, “String Compare Instructions” for information about source string data format, comparison operations, comparison summary bit vector generation, post-processing, and output selection options. The rFLAGS are set to indicate the following conditions: Flag Condition CF Cleared if the comparison summary bit vector is zero; otherwise set. PF cleared. AF cleared. ZF Set if the specified length of the second string is less than the maximum; otherwise cleared. SF Set if the specified length of the first string is less than the maximum; otherwise cleared. OF Equal to the value of the lsb of the post-processed summary bit vector. There are legacy and extended forms of the instruction: PCMPESTRM The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The mask result is written to the XMM0 register. VPCMPESTRM The extended form of the instruction has a 128-bit encoding only. 310 PCMPESTRM, VPCMPESTRM Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The mask result is written to the XMM0 register. Bits [255:128] of the YMM0 register are cleared. Instruction Support Form Subset PCMPESTRM SSE4.2 VPCMPESTRM AVX Feature Flag CPUID Fn0000_0001_ECX[SSE42] (bit 20) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode PCMPESTRMxmm1, xmm2/mem128, imm8 Description 66 0F 3A 60 /r ib Compares packed string data in xmm1 and xmm2 or mem128. Writes a mask value to the XMM0 register. Mnemonic Encoding VEX RXB.map_select VPCMPESTRM xmm1, xmm2/mem128, imm8 C4 RXB.00011 W.vvvv.L.pp Opcode X.1111.0.01 60 /r ib Related Instructions (V)PCMPESTRI, (V)PCMPISTRI, (V)PCMPISTRM rFLAGS Affected ID VIP VIF AC VM RF NT IOPL OF DF IF TF M 21 Note: 20 19 18 17 16 14 13 12 11 10 9 8 SF ZF AF PF CF M M 0 0 M 7 6 4 2 0 Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. MXCSR Flags Affected None Instruction Reference PCMPESTRM, VPCMPESTRM 311 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S S S A X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception 312 X S S A A A A A X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. PCMPESTRM, VPCMPESTRM Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PCMPGTB VPCMPGTB Packed Compare Greater Than Signed Bytes Compares packed signed byte values in the first source operand to corresponding values in the second source operand and writes a comparison result to the corresponding byte of the destination. When a value in the first operand is greater than a value in the second source operand, the result is FFh; when a value in the first operand is less than or equal to a value in the second operand, the result is 00h. There are legacy and extended forms of the instruction: PCMPGTB The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPCMPGTB The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PCMPGTB SSE2 VPCMPGTB 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPCMPGTB 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_EDX[SSE2] (bit 26) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PCMPGTB xmm1, xmm2/mem128 Opcode 66 0F 64 /r Description Compares packed bytes in xmm1 to packed bytes in xmm2 or mem128. Writes results to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPCMPGTB xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 64 /r VPCMPGTB ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 64 /r Instruction Reference PCMPGTB, VPCMPGTB 313 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PCMPEQB, (V)PCMPEQD, (V)PCMPEQW, (V)PCMPGTD, (V)PCMPGTW rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 314 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PCMPGTB, VPCMPGTB Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PCMPGTD VPCMPGTD Packed Compare Greater Than Signed Doublewords Compares packed signed doubleword values in the first source operand to corresponding values in the second source operand and writes a comparison result to the corresponding doubleword of the destination. When a value in the first operand is greater than a value in the second operand, the result is FFFFFFFFh; when a value in the first operand is less than or equal to a value in the second operand, the result is 00000000h. There are legacy and extended forms of the instruction: PCMPGTD The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPCMPGTD The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PCMPGTD SSE2 VPCMPGTD 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPCMPGTD 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_EDX[SSE2] (bit 26) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PCMPGTD xmm1, xmm2/mem128 Opcode 66 0F 66 /r Description Compares packed bytes in xmm1 to packed bytes in xmm2 or mem128. Writes results to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPCMPGTD xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 66 /r VPCMPGTD ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 66 /r Instruction Reference PCMPGTD, VPCMPGTD 315 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PCMPEQB, (V)PCMPEQD, (V)PCMPEQW, (V)PCMPGTB, (V)PCMPGTW rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 316 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PCMPGTD, VPCMPGTD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PCMPGTQ VPCMPGTQ Packed Compare Greater Than Signed Quadwords Compares packed signed quadword values in the first source operand to corresponding values in the second source operand and writes a comparison result to the corresponding quadword of the destination. When a value in the first operand is greater than a value in the second operand, the result is FFFFFFFFFFFFFFFFh; when a value in the first operand is less than or equal to a value in the second operand, the result is 0000000000000000h. There are legacy and extended forms of the instruction: PCMPGTQ The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPCMPGTQ The extended form of the instruction has 128-bit and 256-bit encodings: XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PCMPGTQ SSE4.2 CPUID Fn0000_0001_ECX[SSE42] (bit 20) VPCMPGTQ 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPCMPGTD 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PCMPGTQ xmm1, xmm2/mem128 Opcode Description 66 0F 38 37 /r Compares packed bytes in xmm1 to packed bytes in xmm2 or mem128. Writes results to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPCMPGTQ xmm1, xmm2, xmm3/mem128 C4 RXB.02 X.src1.0.01 37 /r VPCMPGTQ ymm1, ymm2, ymm3/mem256 C4 RXB.02 X.src1.1.01 37 /r Instruction Reference PCMPGTQ, VPCMPGTQ 317 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PCMPEQB, (V)PCMPEQD, (V)PCMPEQW, (V)PCMPGTB, (V)PCMPGTW rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 318 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PCMPGTQ, VPCMPGTQ Instruction Reference 26568—Rev. 3.22—May 2018 PCMPGTW VPCMPGTW AMD64 Technology Packed Compare Greater Than Signed Words Compares packed signed word values in the first source operand to corresponding values in the second source operand and writes a comparison result to the corresponding word of the destination. When a value in the first operand is greater than a value in the second operand, the result is FFFFh; when a value in the first operand is less than or equal to a value in the second operand, the result is 0000h. There are legacy and extended forms of the instruction: PCMPGTW The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPCMPGTW The extended form of the instruction has 128-bit and 256-bit encodings: XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PCMPGTW SSE2 VPCMPGTW 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPCMPGTW 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_EDX[SSE2] (bit 26) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PCMPGTW xmm1, xmm2/mem128 Opcode 66 0F 65 /r Description Compares packed bytes in xmm1 to packed bytes in xmm2 or mem128. Writes results to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPCMPGTW xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 65 /r VPCMPGTW ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 65 /r Instruction Reference PCMPGTW, VPCMPGTW 319 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PCMPEQB, (V)PCMPEQD, (V)PCMPEQW, (V)PCMPGTB, (V)PCMPGTD rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 320 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PCMPGTW, VPCMPGTW Instruction Reference 26568—Rev. 3.22—May 2018 PCMPISTRI VPCMPISTRI AMD64 Technology Packed Compare Implicit Length Strings Return Index Compares character string data in the first and second source operands. Comparison operations are carried out as specified by values encoded in the immediate operand. Writes an index to the ECX register. Source operands are formatted as a packed characters in one of two supported widths: 8 or 16 bits. Characters may be treated as either signed or unsigned values. Source operand strings shorter than the maximum that can be packed into a 128-bit value are terminated by a null character (value of 0). The characters prior to the null character constitute the string. If the first (lowest indexed) character is null, the string length is 0. The comparison operations between the two operand strings are summarized in an intermediate result—a comparison summary bit vector that is post-processed to produce the final output. Data fields within the immediate byte specify the source data format, comparison type, comparison summary bit vector post-processing, and output option selection. The index of either the most significant or least significant set bit of the post-processed comparison summary bit vector is returned in ECX. If no bits are set in the post-processed comparison summary bit vector, ECX is set to 16 for source operand strings composed of 8-bit characters or 8 for 16-bit character strings. See Section 1.5, “String Compare Instructions” for information about source string data format, comparison operations, comparison summary bit vector generation, post-processing, and output selection options. The rFLAGS are set to indicate the following conditions: Flag Condition CF Cleared if the comparison summary bit vector is zero; otherwise set. PF cleared. AF cleared. ZF Set if any byte (word) in the second operand is null; otherwise cleared. SF Set if any byte (word) in the first operand is null; otherwise cleared OF Equal to the value of the lsb of the post-processed summary bit vector. There are legacy and extended forms of the instruction: PCMPISTRI The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. A result index is written to the ECX register. VPCMPISTRI The extended form of the instruction has a 128-bit encoding only. The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. A result index is written to the ECX register. Instruction Reference PCMPISTRI, VPCMPISTRI 321 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Support Form Subset PCMPISTRI SSE4.2 VPCMPISTRI AVX Feature Flag CPUID Fn0000_0001_ECX[SSE42] (bit 20) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode PCMPISTRI xmm1, xmm2/mem128, imm8 Description 66 0F 3A 63 /r ib Compares packed string data in xmm1 and xmm2 or mem128. Mnemonic Encoding VEX RXB.map_select VPCMPISTRI xmm1, xmm2/mem128, imm8 C4 RXB.03 W.vvvv.L.pp Opcode X.1111.0.01 63 /r ib Related Instructions (V)PCMPESTRI, (V)PCMPESTRM, (V)PCMPISTRM rFLAGS Affected ID VIP VIF AC VM RF NT IOPL OF DF IF TF M 21 Note: 20 19 18 17 16 14 13 12 11 10 9 8 SF ZF AF PF CF M M 0 0 M 7 6 4 2 0 Bits 31:22, 15, 5, 3, and 1 are reserved. A flag that is set or cleared is M (modified). Unaffected flags are blank. Undefined flags are U. MXCSR Flags Affected None 322 PCMPISTRI, VPCMPISTRI Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S S S A X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference X S S A A A A A X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. PCMPISTRI, VPCMPISTRI 323 AMD64 Technology PCMPISTRM VPCMPISTRM 26568—Rev. 3.22—May 2018 Packed Compare Implicit Length Strings Return Mask Compares character string data in the first and second source operands. Comparison operations are carried out as specified by values encoded in the immediate operand. Writes a mask value to the YMM0/XMM0 register Source operands are formatted as a packed characters in one of two supported widths: 8 or 16 bits. Characters may be treated as either signed or unsigned values. Source operand strings shorter than the maximum that can be packed into a 128-bit value are terminated by a null character (value of 0). The characters prior to the null character constitute the string. If the first (lowest indexed) character is null, the string length is 0. The comparison operations between the two operand strings are summarized in an intermediate result—a comparison summary bit vector that is post-processed to produce the final output. Data fields within the immediate byte specify the source data format, comparison type, comparison summary bit vector post-processing, and output option selection. Depending on the output option selected, the post-processed comparison summary bit vector is either zero-extended to 128 bits or expanded into a byte/word-mask and then written to XMM0. See Section 1.5, “String Compare Instructions” for information about source string data format, comparison operations, comparison summary bit vector generation, post-processing, and output selection options. The rFLAGS are set to indicate the following conditions: Flag Condition CF Cleared if the comparison summary bit vector is zero; otherwise set. PF cleared. AF cleared. ZF Set if any byte (word) in the second operand is null; otherwise cleared. SF Set if any byte (word) in the first operand is null; otherwise cleared. OF Equal to the value of the lsb of the post-processed summary bit vector. There are legacy and extended forms of the instruction: PCMPISTRM The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The mask result is written to the XMM0 register. VPCMPISTRM The extended form of the instruction has a 128-bit encoding only. The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The mask result is written to the XMM0 register. Bits [255:128] of the YMM0 register are cleared. 324 PCMPISTRM, VPCMPISTRM Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Support Form Subset PCMPISTRM SSE4.2 VPCMPISTRM AVX Feature Flag CPUID Fn0000_0001_ECX[SSE42] (bit 20) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode PCMPISTRM xmm1, xmm2/mem128, imm8 Description 66 0F 3A 62 /r ib Compares packed string data in xmm1 and xmm2 or mem128. Writes a result or mask to the XMM0 register. Mnemonic Encoding VPCMPISTRM xmm1, xmm2/mem128, imm8 VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.03 X.1111.0.01 62 /r ib Related Instructions (V)PCMPESTRI, (V)PCMPESTRM, (V)PCMPISTRI rFLAGS Affected ID VIP VIF AC VM RF NT IOPL OF DF IF TF M 21 Note: 20 19 18 17 16 14 13 12 11 10 9 8 SF ZF AF PF CF M M 0 0 M 7 6 4 2 0 Bits 31:22, 15, 5, 3, and 1 are reserved. A flag that is set or cleared is M (modified). Unaffected flags are blank. Undefined flags are U. MXCSR Flags Affected None Instruction Reference PCMPISTRM, VPCMPISTRM 325 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S S S A X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception 326 X S S A A A A A X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. PCMPISTRM, VPCMPISTRM Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PEXTRB VPEXTRB Extract Packed Byte Extracts a byte from a source register and writes it to an 8-bit memory location or to the low-order byte of a general-purpose register, with zero-extension to 32 or 64 bits. Bits [3:0] of an immediate byte operand select the byte to be extracted: Value of imm8 [3:0] Source Bits Extracted 0000 [7:0] 0001 [15:8] 0010 [23:16] 0011 [31:24] 0100 [39:32] 0101 [47:40] 0110 [55:48] 0111 [63:56] 1000 [71:64] 1001 [79:72] 1010 [87:80] 1011 [95:88] 1100 [103:96] 1101 [111:104] 1110 [119:112] 1111 [127:120] There are legacy and extended forms of the instruction: PEXTRB The source operand is an XMM register and the destination is either an 8-bit memory location or the low-order byte of a general-purpose register. When the destination is a general-purpose register, the extracted byte is zero-extended to 32 or 64 bits. VPEXTRB The extended form of the instruction has a 128-bit encoding only. The source operand is an XMM register and the destination is either an 8-bit memory location or the low-order byte of a general-purpose register. When the destination is a general-purpose register, the extracted byte is zero-extended to 32 or 64 bits. Instruction Support Form Subset PEXTRB SSE4.1 VPEXTRB AVX Instruction Reference Feature Flag CPUID Fn0000_0001_ECX[SSE41] (bit 19) CPUID Fn0000_0001_ECX[AVX] (bit 28) PEXTRB, VPEXTRB 327 AMD64 Technology 26568—Rev. 3.22—May 2018 For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PEXTRB reg/m8, xmm, imm8 Opcode Description 66 0F 3A 14 /r ib Extracts an 8-bit value specified by imm8 from xmm and writes it to m8 or the low-order byte of a generalpurpose register, with zero-extension. Mnemonic Encoding VPEXTRB reg/mem8, xmm, imm8 VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.03 X.1111.0.01 14 /r ib Related Instructions (V)PEXTRD, (V)PEXTRW, (V)PEXTRQ, (V)PINSRB, (V)PINSRD, (V)PINSRW, (V)PINSRQ rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — AVX and SSE exception A — AVX exception S — SSE exception 328 S S X S S A A A A A X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Write to a read-only data segment. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. PEXTRB, VPEXTRB Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PEXTRD VPEXTRD Extract Packed Doubleword Extracts a doubleword from a source register and writes it to an 32-bit memory location or a 32-bit general-purpose register. Bits [1:0] of an immediate byte operand select the doubleword to be extracted: Value of imm8 [1:0] Source Bits Extracted 00 [31:0] 01 [63:32] 10 [95:64] 11 [127:96] There are legacy and extended forms of the instruction: PEXTRD The encoding is the same as PEXTRQ, with REX.W = 0. The source operand is an XMM register and the destination is either an 32-bit memory location or a 32-bit general-purpose register. VPEXTRD The extended form of the instruction has a 128-bit encoding only. The encoding is the same as VPEXTRQ, with VEX.W = 0. The source operand is an XMM register and the destination is either an 32-bit memory location or a 32-bit general-purpose register. Instruction Support Form Subset PEXTRD SSE4.1 VPEXTRD AVX Feature Flag CPUID Fn0000_0001_ECX[SSE41] (bit 19) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode Description PEXTRD reg32/mem32, xmm, imm8 66 (W0) 0F 3A 16 /r ib Extracts a 32-bit value specified by imm8 from xmm and writes it to mem32 or reg32. Mnemonic VPEXTRD reg32/mem32, xmm, imm8 Instruction Reference Encoding VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.03 0.1111.0.01 16 /r ib PEXTRD, VPEXTRD 329 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PEXTRB, (V)PEXTRW, (V)PEXTRQ, (V)PINSRB, (V)PINSRD, (V)PINSRW, (V)PINSRQ rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — AVX and SSE exception A — AVX exception S — SSE exception 330 S S X S S A A A A A X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Write to a read-only data segment. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. PEXTRD, VPEXTRD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PEXTRQ VPEXTRQ Extract Packed Quadword Extracts a quadword from a source register and writes it to an 64-bit memory location or to a 64-bit general-purpose register. Bit [0] of an immediate byte operand selects the quadword to be extracted: Value of imm8 [0] Source Bits Extracted 0 [63:0] 1 [127:64] There are legacy and extended forms of the instruction: PEXTRQ The encoding is the same as PEXTRD, with REX.W = 1. The source operand is an XMM register and the destination is either an 64-bit memory location or a 64-bit general-purpose register. VPEXTRQ The extended form of the instruction has a 128-bit encoding only. The encoding is the same as VPEXTRD, with VEX.W = 1. The source operand is an XMM register and the destination is either an 64-bit memory location or a 64-bit general-purpose register. Instruction Support Form Subset PEXTRD SSE4.1 VPEXTRD AVX Feature Flag CPUID Fn0000_0001_ECX[SSE41] (bit 19) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode Description PEXTRQ reg64/mem64, xmm, imm8 66 (W1) 0F 3A 16 /r ib Extracts a 64-bit value specified by imm8 from xmm and writes it to mem64 or reg64. Mnemonic VPEXTRQ reg64/mem64, xmm, imm8 Encoding VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.03 1.1111.0.01 16 /r ib Related Instructions (V)PEXTRB, (V)PEXTRD, (V)PEXTRW, (V)PINSRB, (V)PINSRD, (V)PINSRW, (V)PINSRQ rFLAGS Affected None Instruction Reference PEXTRQ, VPEXTRQ 331 AMD64 Technology 26568—Rev. 3.22—May 2018 MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — AVX and SSE exception A — AVX exception S — SSE exception 332 S S X S S A A A A A X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Write to a read-only data segment. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. PEXTRQ, VPEXTRQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PEXTRW VPEXTRW Extract Packed Word Extracts a word from a source register and writes it to a 16-bit memory location or to the low-order word of a general-purpose register, with zero-extension to 32 or 64 bits. Bits [3:0] of an immediate byte operand select the word to be extracted: Value of imm8 [2:0] Source Bits Extracted 000 [15:0] 001 [31:16] 010 [47:32 011 [63:48] 100 [79:64] 101 [95:80] 110 [111:96] 111 [127:112] There are legacy and extended forms of the instruction: PEXTRW The legacy form of the instruction has SSE2 and SSE4.1 encodings. The source operand is an XMM register and the destination is the low-order word of a general-purpose register. The extracted word is zero-extended to 32 or 64 bits. The source operand is an XMM register and the destination is either an 16-bit memory location or the low-order word of a general-purpose register. When the destination is a general-purpose register, the extracted word is zero-extended to 32 or 64 bits. VPEXTRW The extended form of the instruction has two 128-bit encodings that correspond to the two legacy encodings. The source operand is an XMM register and the destination is the low-order word of a general-purpose register. The extracted word is zero-extended to 32 or 64 bits. The source operand is an XMM register and the destination is either an 16-bit memory location or the low-order word of a general-purpose register. When the destination is a general-purpose register, the extracted word is zero-extended to 32 or 64 bits. Instruction Support Form Subset PEXTRW reg SSE2 PEXTRW reg/mem16 SSE4.1 VPEXTRW AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[SSE41] (bit 19) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Reference PEXTRW, VPEXTRW 333 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Encoding Mnemonic Opcode PEXTRW reg, xmm, imm8 PEXTRW reg/m16, xmm, imm8 Description 66 0F C5 /r ib Extracts a 16-bit value specified by imm8 from xmm and writes it to the low-order byte of a generalpurpose register, with zero-extension. 66 0F 3A 15 /r ib Extracts a 16-bit value specified by imm8 from xmm and writes it to m16 or the low-order byte of a general-purpose register, with zero-extension. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPEXTRW reg, xmm, imm8 C4 RXB.01 X.1111.0.01 C5 /r ib VPEXTRW reg/mem16, xmm, imm8 C4 RXB.03 X.1111.0.01 15 /r ib Related Instructions (V)PEXTRB, (V)PEXTRD, (V)PEXTRQ, (V)PINSRB, (V)PINSRD, (V)PINSRW, (V)PINSRQ rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — AVX and SSE exception A — AVX exception S — SSE exception 334 S S X S S A A A A A X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Write to a read-only data segment. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. PEXTRW, VPEXTRW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PHADDD VPHADDD Packed Horizontal Add Doubleword Adds adjacent 32-bit signed integers in each of two source operands and packs the sums into the destination. If a sum overflows, the carry is ignored (neither the overflow nor carry bit in rFLAGS is set) and only the low-order 32 bits of the sum are written in the destination. Adds the 32-bit signed integer values in bits [63:32] and bits [31:0] of the first source operand and packs the sum into bits [31:0] of the destination; adds the 32-bit signed integer values in bits [127:96] and bits [95:64] of the first source operand and packs the sum into bits [63:32] of the destination. Adds the corresponding values in the second source operand and packs the sums into bits [95:64] and [127:96] of the destination. Additionally, for the 256-bit form, adds the 32-bit signed integer values in bits [191:160] and bits [159:128] of the first source operand and packs the sum into bits [159:128] of the destination; adds the 32-bit signed integer values in bits [255:224] and bits [223:192] of the first source operand and packs the sum into bits [191:160] of the destination. Adds the corresponding values in the second source operand and packs the sums into bits [223:192] and [255:224] of the destination. There are legacy and extended forms of the instruction: PHADDD The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination register. Bits [255:128] of the YMM register that corresponds to the destination not affected. VPHADDD The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PHADDD SSSE3 VPHADDD 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPHADDD 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_ECX[SSSE3] (bit 9) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Reference PHADDD, VPHADDD 335 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Encoding Mnemonic Opcode PHADDD xmm1, xmm2/mem128 Description 66 0F 38 02 /r Adds adjacent pairs of signed integers in xmm1 and xmm2 or mem128. Writes packed sums to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPHADDD xmm1, xmm2, xmm3/mem128 C4 RXB.02 X.src1.0.01 02 /r VPHADDD ymm1, ymm2, ymm3/mem256 C4 RXB.02 X.src1.1.01 02 /r Related Instructions (V)PHADDW, (V)PHADDSW rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 336 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PHADDD, VPHADDD Instruction Reference 26568—Rev. 3.22—May 2018 PHADDSW VPHADDSW AMD64 Technology Packed Horizontal Add with Saturation Word Adds adjacent 16-bit signed integers in each of two source operands, with saturation, and packs the 16-bit signed sums into the destination. Positive sums greater than 7FFFh are saturated to 7FFFh; negative sums less than 8000h are saturated to 8000h. For the 128-bit form of the instruction, the following operations are performed: dest is the destination register – either an XMM register or the corresponding YMM register. src1 is the first source operand. src2 is the second source operand. Ssum() is a function that returns the saturated 16-bit signed sum of its arguments. dest[15:0] = Ssum(src1[31:16], src1[15:0]) dest[31:16] = Ssum(src1[63:48], src1[47:32]) dest[47:32] = Ssum(src1[95:80], src1[79:64]) dest[63:48] = Ssum(src1[127:112], src1[111:96]) dest[79:64] = Ssum(src2[31:16], src2[15:0]) dest[95:80] = Ssum(src2[63:48], src2[47:32]) dest[111:96] = Ssum(src2[95:80], src2[79:64]) dest[127:112] = Ssum(src2[127:112], src2[111:96]) Additionally, for the 256-bit form of the instruction, the following operations are performed: dest[143:128] = Ssum(src1[159:144], src1[143:128]) dest[159:144] = Ssum(src1[191:176], src1[175:160]) dest[175:160] = Ssum(src1[223:208], src1[207:192]) dest[191:176] = Ssum(src1[255:240], src1[239:224]) dest[207:192] = Ssum(src2[159:144], src2[143:128]) dest[223:208] = Ssum(src2[191:176], src2[175:160]) dest[239:224] = Ssum(src2[223:208], src2[207:192]) dest[255:240] = Ssum(src2[255:240], src2[239:224]) There are legacy and extended forms of the instruction: PHADDSW The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPHADDSW The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Reference PHADDSW, VPHADDSW 337 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Support Form Subset Feature Flag PHADDSW SSSE3 VPHADDSW 128-bit AVX CPUID Fn0000_0001_ECX[SSSE3] (bit 9) CPUID Fn0000_0001_ECX[AVX] (bit 28) VPHADDSW 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PHADDSW xmm1, xmm2/mem128 Opcode Description 66 0F 38 03 /r Adds adjacent pairs of signed integers in xmm1 and xmm2 or mem128, with saturation. Writes packed sums to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPHADDSW xmm1, xmm2, xmm3/mem128 C4 RXB.02 X.src1.0.01 03 /r VPHADDSW ymm1, ymm2, ymm3/mem256 C4 RXB.02 X.src1.1.01 03 /r Related Instructions (V)PHADDD, (V)PHADDW rFLAGS Affected None MXCSR Flags Affected None 338 PHADDSW, VPHADDSW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception Instruction Reference X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PHADDSW, VPHADDSW 339 AMD64 Technology 26568—Rev. 3.22—May 2018 PHADDW VPHADDW Packed Horizontal Add Word Adds adjacent 16-bit signed integers in each of two source operands and packs the 16-bit sums into the destination. If a sum overflows, the carry is ignored (neither the overflow nor carry bit in rFLAGS is set). For the 128-bit form of the instruction, the following operations are performed: dest is the destination register – either an XMM register or the corresponding YMM register. src1 is the first source operand. src2 is the second source operand. dest[15:0] = src1[31:16] + src1[15:0] dest[31:16] = src1[63:48] + src1[47:32] dest[47:32] = src1[95:80] + src1[79:64] dest[63:48] = src1[127:112] + src1[111:96] dest[79:64] = src2[31:16] + src2[15:0] dest[95:80] = src2[63:48] + src2[47:32] dest[111:96] = src2[95:80] + src2[79:64] dest[127:112] = src2[127:112] + src2[111:96] Additionally, for the 256-bit form of the instruction, the following operations are performed: dest[143:128] = src1[159:144] + src1[143:128] dest[159:144] = src1[191:176] + src1[175:160] dest[175:160] = src1[223:208] + src1[207:192] dest[191:176] = src1[255:240] + src1[239:224] dest[207:192] = src2[159:144] + src2[143:128] dest[223:208] = src2[191:176] + src2[175:160] dest[239:224] = src2[223:208] + src2[207:192] dest[255:240] = src2[255:240] + src2[239:224] There are legacy and extended forms of the instruction: PHADDW The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPHADDW The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared.YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. 340 PHADDW, VPHADDW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Support Form Subset Feature Flag PHADDW SSSE3 VPHADDW 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) CPUID Fn0000_0001_ECX[SSSE3] (bit 9) VPHADDW 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding . Mnemonic PHADDW xmm1, xmm2/mem128 Opcode Description 66 0F 38 01 /r Adds adjacent pairs of signed integers in xmm1 and xmm2 or mem128. Writes packed sums to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPHADDW xmm1, xmm2, xmm3/mem128 C4 RXB.02 X.src1.0.01 01 /r VPHADDW ymm1, ymm2, ymm3/mem256 C4 RXB.02 X.src1.1.01 01 /r Related Instructions (V)PHADDD, (V)PHADDSW rFLAGS Affected None MXCSR Flags Affected None Instruction Reference PHADDW, VPHADDW 341 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 342 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PHADDW, VPHADDW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PHMINPOSUW VPHMINPOSUW Horizontal Minimum and Position Finds the minimum unsigned 16-bit value in the source operand and copies it to the low order word element of the destination. Writes the source position index of the value to bits [18:16] of the destination and clears bits[127:19] of the destination. There are legacy and extended forms of the instruction: PHMINPOSUW The source operand is an XMM register or 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPHMINPOSUW The extended form of the instruction has a 128-bit encoding only. The source operand is an XMM register or 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset PHMINPOSUW SSE4.1 VPHMINPOSUW AVX Feature Flag CPUID Fn0000_0001_ECX[SSE41] (bit 19) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode PHMINPOSUW xmm1, xmm2/mem128 Description 66 0F 38 41 /r Finds the minimum unsigned word element in xmm2 or mem128, copies it to xmm1[15:0]; writes its position index to xmm1[18:16], and clears xmm1[127:19]. Mnemonic Encoding VEX RXB.map_select VPHMINPOSUW xmm1, xmm2/mem128 C4 RXB.02 W.vvvv.L.pp Opcode X.1111.0.01 41 /r Related Instructions (V)PMINSB, (V)PMINSD, (V)PMINSW, (V)PMINUB, (V)PMINUD, (V)PMINUW rFLAGS Affected None MXCSR Flags Affected None Instruction Reference PHMINPOSUW, VPHMINPOSUW 343 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S S S A X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception 344 X S S A A A A A X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. PHMINPOSUW, VPHMINPOSUW Instruction Reference 26568—Rev. 3.22—May 2018 PHSUBD VPHSUBD AMD64 Technology Packed Horizontal Subtract Doubleword Subtracts adjacent 32-bit signed integers in each of two source operands and packs the differences into the destination. The higher-order doubleword of each pair is subtracted from the lower-order doubleword. Subtracts the 32-bit signed integer value in bits [63:32] of the first source operand from the 32-bit signed integer value in bits [31:0] of the first source operand and packs the difference into bits [31:0] of the destination; subtracts the 32-bit signed integer value in bits [127:96] of the first source operand from the 32-bit signed integer value in bits [95:64] of the first source operand and packs the difference into bits [63:32] of the destination. Performs the corresponding operations on pairs of 32-bit signed integer values in the second source operand and packs the differences into bits [95:64] and [127:96] of the destination. Additionally, for the 256-bit form, subtracts the 32-bit signed integer value in bits [191:160] of the first source operand from the 32-bit signed integer value in bits [159:128] of the first source operand and packs the difference into bits [159:128] of the destination; subtracts the 32-bit signed integer value in bits [255:224] of the first source operand from the 32-bit integer value in bits [223:192] of the first source operand and packs the difference into bits [191:160] of the destination. Performs the corresponding operations on pairs of 32-bit signed integer values in the second source operand and packs the differences into bits [223:192] and [255:224] of the destination. There are legacy and extended forms of the instruction: PHSUBD The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPHSUBD The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PHSUBD SSSE3 VPHSUBD 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPHSUBD 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_ECX[SSSE3] (bit 9) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Reference PHSUBD, VPHSUBD 345 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Encoding Mnemonic Opcode PHSUBD xmm1, xmm2/mem128 Description 66 0F 38 06 /r Subtracts adjacent pairs of signed integers in xmm1 and xmm2 or mem128. Writes packed differences to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPHSUBD xmm1, xmm2, xmm3/mem128 C4 RXB.02 X.src1.0.01 06 /r VPHSUBD ymm1, ymm2, ymm3/mem256 C4 RXB.02 X.src1.1.01 06 /r Related Instructions (V)PHSUBW, (V)PHSUBSW rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 346 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PHSUBD, VPHSUBD Instruction Reference 26568—Rev. 3.22—May 2018 PHSUBSW VPHSUBSW AMD64 Technology Packed Horizontal Subtract with Saturation Word Subtracts adjacent 16-bit signed integers in each of two source operands, with saturation, and packs the differences into the destination. The higher-order word of each pair is subtracted from the lowerorder word. Positive differences greater than 7FFFh are saturated to 7FFFh; negative differences less than 8000h are saturated to 8000h. For the 128-bit form of the instruction, the following operations are performed: dest is the destination register – either an XMM register or the corresponding YMM register. src1 is the first source operand. src2 is the second source operand. Sdiff(A,B) is a function that returns the saturated 16-bit signed difference A − B. dest[15:0] = Sdiff(src1[15:0], src1[31:16]) dest[31:16] = Sdiff(src1[47:32], src1[63:48]) dest[47:32] = Sdiff(src1[79:64], src1[95:80]) dest[63:48] = Sdiff(src1[111:96], src1[127:112]) dest[79:64] = Sdiff(src2[15:0], src2[31:16]) dest[95:80] = Sdiff(src2[47:32], src2[63:48]) dest[111:96] = Sdiff(src2[79:64], src2[95:80]) dest[127:112] = Sdiff(src2[111:96], src2[127:112]) Additionally, for the 256-bit form of the instruction, the following operations are performed: dest[143:128] = Sdiff(src1[143:128], src1[159:144]) dest[159:144] = Sdiff(src1[175:160], src1[191:176]) dest[175:160] = Sdiff(src1[207:192], src1[223:208]) dest[191:176] = Sdiff(src1[239:224], src1[255:240]) dest[207:192] = Sdiff(src2[143:128], src2[159:144]) dest[223:208] = Sdiff(src2[175:160], src2[191:176]) dest[239:224] = Sdiff(src2[207:192], src2[223:208]) dest[255:240] = Sdiff(src2[239:224], src2[255:240]) There are legacy and extended forms of the instruction: PHSUBSW The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPHSUBSW The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Reference PHSUBSW, VPHSUBSW 347 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Support Form Subset Feature Flag PHSUBSW SSSE3 VPHSUBSW 128-bit AVX CPUID Fn0000_0001_ECX[SSSE3] (bit 9) CPUID Fn0000_0001_ECX[AVX] (bit 28) VPHSUBSW 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PHSUBSW xmm1, xmm2/mem128 Opcode Description 66 0F 38 07 /r Subtracts adjacent pairs of signed integers in xmm1 and xmm2 or mem128, with saturation. Writes packed differences to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPHSUBSW xmm1, xmm2, xmm3/mem128 C4 RXB.02 X.src1.0.01 07 /r VPHSUBSW ymm1, ymm2, ymm3/mem256 C4 RXB.02 X.src1.1.01 07 /r Related Instructions (V)PHSUBD, (V)PHSUBW rFLAGS Affected None MXCSR Flags Affected None 348 PHSUBSW, VPHSUBSW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception Instruction Reference X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PHSUBSW, VPHSUBSW 349 AMD64 Technology 26568—Rev. 3.22—May 2018 PHSUBW VPHSUBW Packed Horizontal Subtract Word Subtracts adjacent 16-bit signed integers in each of two source operands and packs the differences into a destination. The higher-order word of each pair is subtracted from the lower-order word. For the 128-bit form of the instruction, the following operations are performed: dest is the destination register – either an XMM register or the corresponding YMM register. src1 is the first source operand. src2 is the second source operand. dest[15:0] = src1[15:0] − src1[31:16 dest[31:16] = src1[47:32] − src1[63:48] dest[47:32] = src1[79:64] − src1[95:80] dest[63:48] = src1[111:96] − src1[127:112] dest[79:64] = src2[15:0] − src2[31:16] dest[95:80] = src2[47:32] − src2[63:48] dest[111:96] = src2[79:64] − src2[95:80] dest[127:112] = src2[111:96] − src2[127:112] Additionally, for the 256-bit form of the instruction, the following operations are performed: dest[143:128] = src1[143:128] − src1[159:144] dest[159:144] = src1[175:160] − src1[191:176] dest[175:160] = src1[207:192] − src1[223:208] dest[191:176] = src1[239:224] − src1[255:240] dest[207:192] = src2[143:128] − src2[159:144] dest[223:208] = src2[175:160] − src2[191:176] dest[239:224] = src2[207:192] − src2[223:208] dest[255:240] = src2[239:224] − src2[255:240] There are legacy and extended forms of the instruction: PHSUBW The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPHSUBW The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. 350 PHSUBW, VPHSUBW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Support Form Subset Feature Flag PHSUBW SSSE3 VPHSUBW 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) CPUID Fn0000_0001_ECX[SSSE3] (bit 9) VPHSUBW 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PHSUBW xmm1, xmm2/mem128 Opcode Description 66 0F 38 05 /r Subtracts adjacent pairs of signed integers in xmm1 and xmm2 or mem128. Writes packed differences to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPHSUBW xmm1, xmm2, xmm3/mem128 C4 RXB.02 X.src1.0.01 05 /r VPHSUBW ymm1, ymm2, ymm3/mem256 C4 RXB.02 X.src1.1.01 05 /r Related Instructions (V)PHSUBD, (V)PHSUBW rFLAGS Affected None MXCSR Flags Affected None Instruction Reference PHSUBW, VPHSUBW 351 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 352 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PHSUBW, VPHSUBW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PINSRB VPINSRB Packed Insert Byte Inserts a byte from an 8-bit memory location or the low-order byte of a 32-bit general-purpose register into a destination register. Bits [3:0] of an immediate byte operand select the location where the byte is to be inserted: Value of imm8 [3:0] Insertion Location 0000 [7:0] 0001 [15:8] 0010 [23:16] 0011 [31:24] 0100 [39:32] 0101 [47:40] 0110 [55:48] 0111 [63:56] 1000 [71:64] 1001 [79:72] 1010 [87:80] 1011 [95:88] 1100 [103:96] 1101 [111:104] 1110 [119:112] 1111 [127:120] There are legacy and extended forms of the instruction: PINSRB The source operand is either an 8-bit memory location or the low-order byte of a 32-bit general-purpose register and the destination an XMM register. The other bytes of the destination are not affected. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPINSRB The extended form of the instruction has a 128-bit encoding only. There are two source operands. The first source operand is either an 8-bit memory location or the low-order byte of a 32-bit general-purpose register and the second source operand is an XMM register. The destination is a second XMM register. All the bytes of the second source other than the byte that corresponds to the location of the inserted byte are copied to the destination. Bits [255:128] of the YMM register that corresponds to destination are cleared. Instruction Reference PINSRB, VPINSRB 353 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Support Form Subset PINSRB SSE4.1 VPINSRB AVX Feature Flag CPUID Fn0000_0001_ECX[SSE41] (bit 19) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PINSRB xmm, reg32/mem8, imm8 Opcode 66 0F 3A 20 /r ib Description Inserts an 8-bit value selected by imm8 from the low-order byte of reg32 or from mem8 into xmm. Mnemonic Encoding VEX RXB.map_select VPINSRB xmm, reg/mem8, xmm, imm8 C4 RXB.03 W.vvvv.L.pp Opcode X.1111.0.01 20 /r ib Related Instructions (V)PEXTRB, (V)PEXTRD, (V)PEXTRQ, (V)PEXTRW, (V)PINSRD, (V)PINSRQ, (V)PINSRW rFLAGS Affected None MXCSR Flags Affected None 354 PINSRB, VPINSRB Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference S S X S S A A A A A X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. PINSRB, VPINSRB 355 AMD64 Technology 26568—Rev. 3.22—May 2018 PINSRD VPINSRD Packed Insert Doubleword Inserts a doubleword from a 32-bit memory location or a 32-bit general-purpose register into a destination register. Bits [1:0] of an immediate byte operand select the location where the doubleword is to be inserted: Value of imm8 [1:0] Insertion Location 00 [31:0] 01 [63:32] 10 [95:64] 11 [127:96] There are legacy and extended forms of the instruction: PINSRD The encoding is the same as PINSRQ, with REX.W = 0. The source operand is either a 32-bit memory location or a 32-bit general-purpose register and the destination an XMM register. The other doublewords of the destination are not affected. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPINSRD The extended form of the instruction has a 128-bit encoding only. The encoding is the same as VPINSRQ, with VEX.W = 0. There are two source operands. The first source operand is either a 32-bit memory location or a 32-bit general-purpose register and the second source operand is an XMM register. The destination is a second XMM register. All the doublewords of the second source other than the doubleword that corresponds to the location of the inserted doubleword are copied to the destination. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset PINSRD SSE4.1 VPINSRD AVX Feature Flag CPUID Fn0000_0001_ECX[SSE41] (bit 19) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 356 PINSRD, VPINSRD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic Opcode PINSRD xmm, reg32/mem32, imm8 Description 66 (W0) 0F 3A 22 /r ib Inserts a 32-bit value selected by imm8 from reg32 or mem32 into xmm. Mnemonic Encoding VPINSRD xmm, reg32/mem32, xmm, imm8 VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.03 0.1111.0.01 22 /r ib Related Instructions (V)PEXTRB, (V)PEXTRD, (V)PEXTRQ, (V)PEXTRW, (V)PINSRB, (V)PINSRQ, (V)PINSRW rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference S S X S S A A A A A X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. PINSRD, VPINSRD 357 AMD64 Technology 26568—Rev. 3.22—May 2018 PINSRQ VPINSRQ Packed Insert Quadword Inserts a quadword from a 64-bit memory location or a 64-bit general-purpose register into a destination register. Bit [0] of an immediate byte operand selects the location where the doubleword is to be inserted: Value of imm8 [0] Insertion Location 0 [63:0] 1 [127:64] There are legacy and extended forms of the instruction: PINSRQ The encoding is the same as PINSRD, with REX.W = 1. The source operand is either a 64-bit memory location or a 64-bit general-purpose register and the destination an XMM register. The other quadwords of the destination are not affected. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPINSRQ The extended form of the instruction has a 128-bit encoding only. The encoding is the same as VPINSRD, with VEX.W = 1. There are two source operands. The first source operand is either a 64-bit memory location or a 64-bit general-purpose register and the second source operand is an XMM register. The destination is a second XMM register. All the quadwords of the second source other than the quadword that corresponds to the location of the inserted quadword are copied to the destination. Bits [255:128] of the YMM register that corresponds to the destination XMM registers are cleared. Instruction Support Form Subset PINSRQ SSE4.1 VPINSRQ AVX Feature Flag CPUID Fn0000_0001_ECX[SSE41] (bit 19) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PINSRQ xmm, reg64/mem64, imm8 Opcode Description 66 (W1) 0F 3A 22 /r ib Inserts a 64-bit value selected by imm8 from reg64 or mem64 into xmm. Mnemonic Encoding VEX RXB.map_select VPINSRQ xmm, reg64/mem64, xmm, imm8 358 C4 PINSRQ, VPINSRQ RXB.03 W.vvvv.L.pp Opcode 1.1111.0.01 22 /r ib Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Related Instructions (V)PEXTRB, (V)PEXTRD, (V)PEXTRQ, (V)PEXTRW, (V)PINSRB, (V)PINSRD, (V)PINSRW rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference S S X S S A A A A A X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. PINSRQ, VPINSRQ 359 AMD64 Technology 26568—Rev. 3.22—May 2018 PINSRW VPINSRW Packed Insert Word Inserts a word from a 16-bit memory location or the low-order word of a 32-bit general-purpose register into a destination register. Bits [2:0] of an immediate byte operand select the location where the byte is to be inserted: Value of imm8 [2:0] Insertion Location 000 [15:0] 001 [31:16] 010 [47:32 011 [63:48] 100 [79:64] 101 [95:80] 110 [111:96] 111 [127:112] There are legacy and extended forms of the instruction: PINSRW The source operand is either a 16-bit memory location or the low-order word of a 32-bit general-purpose register and the destination an XMM register. The other words of the destination are not affected. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPINSRW The extended form of the instruction has a 128-bit encoding only. There are two source operands. The first source operand is either a 16-bit memory location or the low-order word of a 32-bit general-purpose register and the second source operand is an XMM register. The destination is an XMM register. All the words of the second source other than the word that corresponds to the location of the inserted word are copied to the destination. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset Feature Flag PINSRW SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) VPINSRW AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 360 PINSRW, VPINSRW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic PINSRW xmm, reg32/mem16, imm8 Opcode Description 66 0F C4 /r ib Inserts a 16-bit value selected by imm8 from the low-order word of reg32 or from mem16 into xmm. Mnemonic Encoding VPINSRW xmm, reg32/mem16, xmm, imm8 VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.01 X.1111.0.01 C4 /r ib Related Instructions (V)PEXTRB, (V)PEXTRD, (V)PEXTRQ, (V)PEXTRW, (V)PINSRB, (V)PINSRD, (V)PINSRQ rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference S S X S S A A A A A X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. PINSRW, VPINSRW 361 AMD64 Technology PMADDUBSW VPMADDUBSW 26568—Rev. 3.22—May 2018 Packed Multiply and Add Unsigned Byte to Signed Word Multiplies and adds sets of two packed 8-bit unsigned values from the first source operand and two packed 8-bit signed values from the second source operand, with signed saturation; writes eight 16-bit sums to the destination. For the 128-bit form of the instruction, the following operations are performed: dest is the destination register – either an XMM register or the corresponding YMM register. src1 is the first source operand. src2 is the second source operand. Ssum() is a function that returns the saturated 16-bit signed sum of its arguments. dest[15:0] = Ssum(src1[7:0] * src2[7:0], src1[15:8] * src2[15:8]) dest[31:16] = Ssum(src1[23:16] * src2[23:16], src1[31:24] * src2[31:24]) dest[47:32] = Ssum(src1[39:32] * src2[39:32], src1[47:40] * src2[47:40]) dest[63:48] = Ssum(src1[55:48] * src2[55:48], src1[63:56] * src2[63:56]) dest[79:64] = Ssum(src1[71:64] * src2[71:64], src1[79:72] * src2[79:72]) dest[95:80] = Ssum(src1[87:80] * src2[87:80], src1[95:88] * src2[95:88]) dest[111:96] = Ssum(src1[103:96] * src2[103:96]], src1[111:104] * src2[111:104]) dest[127:112] = Ssum(src1[119:112] * src2[119:112], src1[127:120] * src2[127:120]) Additionally, for the 256-bit form of the instruction, the following operations are performed: dest[143:128] = Ssum(src1[135:128] * src2[135:128], src1[143:136] * src2[143:136]) dest[159:144] = Ssum(src1[151:144] * src2[151:144], src1[159:152] * src2[159:152]) dest[175:160] = Ssum(src1[167:160] * src2[167:160], src1[175:168] * src2[175:168]) dest[191:176] = Ssum(src1[183:176] * src2[183:176], src1[191:184] * src2[191:184]) dest[207:192] = Ssum(src1[199:192] * src2[199:192], src1[207:200] * src2[207:200]) dest[223:208] = Ssum(src1[215:208] * src2[215:208], src1[223:216] * src2[223:216]) dest[239:224] = Ssum(src1[231:224] * src2[231:224], src1[239:232] * src2[239:232]) dest[255:240] = Ssum(src1[247:240] * src2[247:240], src1[255:248] * src2[255:248]) There are legacy and extended forms of the instruction: PMADDUBSW The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPMADDUBSW The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. 362 PMADDUBSW, VPMADDUBSW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Support Form Subset Feature Flag PMADDUBSW SSSE3 VPMADDUBSW 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) CPUID Fn0000_0001_ECX[SSSE3] (bit 9) VPMADDUBSW 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode Description PMADDUBSW xmm1, xmm2/mem128 66 0F 38 04 /r Multiplies packed 8-bit unsigned values in xmm1 and packed 8-bit signed values xmm2 / mem128, adds the products, and writes saturated sums to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPMADDUBSW xmm1, xmm2, xmm3/mem128 C4 RXB.02 X.src1.0.01 04 /r VPMADDUBSW ymm1, ymm2, ymm3/mem256 C4 RXB.02 X.src1.1.01 04 /r Related Instructions (V)PMADDWD rFLAGS Affected None MXCSR Flags Affected None Instruction Reference PMADDUBSW, VPMADDUBSW 363 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 364 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PMADDUBSW, VPMADDUBSW Instruction Reference 26568—Rev. 3.22—May 2018 PMADDWD VPMADDWD AMD64 Technology Packed Multiply and Add Word to Doubleword Multiplies and adds sets of four packed 16-bit signed values from two source registers; writes four 32-bit sums to the destination. For the 128-bit form of the instruction, the following operations are performed: dest is the destination register – either an XMM register or the corresponding YMM register. src1 is the first source operand. src2 is the second source operand. dest[31:0] = (src1[15:0] * src2[15:0]) + (src1[31:16] * src2[31:16]) dest[63:32] = (src1[47:32] * src2[47:32]) + (src1[63:48] * src2[63:48]) dest[95:64] = (src1[79:64] * src2[79:64]) + (src1[95:80] * src2[95:80]) dest[127:96] = (src1[111:96] * src2[111:96]) + (src1[127:112] * src2[127:112]) Additionally, for the 256-bit form of the instruction, the following operations are performed: dest[159:128] = (src1[143:128] * src2[143:128]) + (src1[159:144] * src2[159:144]) dest[191:160] = (src1[175:160] * src2[175:160]) + (src1[191:176] * src2[191:176]) dest[223:192] = (src1[207:192] * src2[207:192]) + (src1[223:208] * src2[223:208]) dest[255:224] = (src1[239:224] * src2[239:224]) + (src1[255:240] * src2[255:240]) When all four of the signed 16-bit source operands in a set have the value 8000h, the 32-bit overflow wraps around to 8000_0000h. There are no other overflow cases. There are legacy and extended forms of the instruction: PMADDWD The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPMADDWD The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PMADDWD SSE2 VPMADDWD 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPMADDWD 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_EDX[SSE2] (bit 26) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Reference PMADDWD, VPMADDWD 365 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Encoding Mnemonic PMADDWD xmm1, xmm2/mem128 Opcode Description 66 0F F5 /r Multiplies packed 16-bit signed values in xmm1 and xmm2 or mem128, adds the products, and writes the sums to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPMADDWD xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 F5 /r VPMADDWD ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 F5 /r Related Instructions (V)PMADDUBSW, (V)PMULHUW, (V)PMULHW, (V)PMULLW, (V)PMULUDQ rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 366 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PMADDWD, VPMADDWD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PMAXSB VPMAXSB Packed Maximum Signed Bytes Compares each packed 8-bit signed integer value of the first source operand to the corresponding value of the second source operand and writes the numerically greater value into the corresponding byte of the destination. The 128-bit form of the instruction compares 16 pairs of 8-bit signed integer values; the 256-bit form compares 32 pairs. There are legacy and extended forms of the instruction: PMAXSB The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPMAXSB The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PMAXSB SSE4.1 VPMAXSB 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPMAXSB 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_ECX[SSE41] (bit 19) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PMAXSB xmm1, xmm2/mem128 Opcode Description 66 0F 38 3C /r Compares 16 pairs of packed 8-bit values in xmm1 and xmm2 or mem128 and writes the greater values to the corresponding positions in xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPMAXSB xmm1, xmm2, xmm3/mem128 C4 RXB.02 X.src1.0.01 3C /r VPMAXSB ymm1, ymm2, ymm3/mem256 C4 RXB.02 X.src1.1.01 3C /r Instruction Reference PMAXSB, VPMAXSB 367 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PMAXSD, (V)PMAXSW, (V)PMAXUB, (V)PMAXUD, (V)PMAXUW rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 368 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PMAXSB, VPMAXSB Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PMAXSD VPMAXSD Packed Maximum Signed Doublewords Compares each packed 32-bit signed integer value of the first source operand to the corresponding value of the second source operand and writes the numerically greater value into the corresponding doubleword of the destination. The 128-bit form of the instruction compares four pairs of 32-bit signed integer values; the 256-bit form compares eight. There are legacy and extended forms of the instruction: PMAXSD The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPMAXSD The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PMAXSD SSE4.1 VPMAXSD 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPMAXSD 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_ECX[SSE41] (bit 19) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PMAXSD xmm1, xmm2/mem128 Opcode Description 66 0F 38 3D /r Compares four pairs of packed 32-bit values in xmm1 and xmm2 or mem128 and writes the greater values to the corresponding positions in xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPMAXSD xmm1, xmm2, xmm3/mem128 C4 RXB.02 X.src1.0.01 3D /r VPMAXSD ymm1, ymm2, ymm3/mem256 C4 RXB.02 X.src1.1.01 3D /r Instruction Reference PMAXSD, VPMAXSD 369 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PMAXSB, (V)PMAXSW, (V)PMAXUB, (V)PMAXUD, (V)PMAXUW rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 370 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PMAXSD, VPMAXSD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PMAXSW VPMAXSW Packed Maximum Signed Words Compares each packed 16-bit signed integer value of the first source operand to the corresponding value of the second source operand and writes the numerically greater value into the corresponding word of the destination. The 128-bit form of the instruction compares eight pairs of 16-bit signed integer values; the 256-bit form compares 16 pairs. There are legacy and extended forms of the instruction: PMAXSW The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPMAXSW The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PMAXSW SSE2 VPMAXSW 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPMAXSW 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_EDX[SSE2] (bit 26) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PMAXSW xmm1, xmm2/mem128 Opcode 66 0F EE /r Description Compares eight pairs of packed 16-bit values in xmm1 and xmm2 or mem128 and writes the greater values to the corresponding positions in xmm1. Mnemonic Encoding W.vvvv.L.pp Opcode VPMAXSW xmm1, xmm2, xmm3/mem128 VEX RXB.map_select C4 RXB.01 X.src1.0.01 EE /r VPMAXSW ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 EE /r Instruction Reference PMAXSW, VPMAXSW 371 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PMAXSB, (V)PMAXSD, (V)PMAXUB, (V)PMAXUD, (V)PMAXUW rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 372 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PMAXSW, VPMAXSW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PMAXUB VPMAXUB Packed Maximum Unsigned Bytes Compares each packed 8-bit unsigned integer value of the first source operand to the corresponding value of the second source operand and writes the numerically greater value into the corresponding byte of the destination. The 128-bit form of the instruction compares 16 pairs of 8-bit unsigned integer values; the 256-bit form compares 32 pairs. There are legacy and extended forms of the instruction: PMAXUB The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPMAXUB The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PMAXUB SSE2 VPMAXUB 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPMAXUB 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_EDX[SSE2] (bit 26) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PMAXUB xmm1, xmm2/mem128 Opcode Description 66 0F DE /r Compares 16 pairs of packed unsigned 8-bit values in xmm1 and xmm2 or mem128 and writes the greater values to the corresponding positions in xmm1. Mnemonic Encoding W.vvvv.L.pp Opcode VPMAXUB xmm1, xmm2, xmm3/mem128 VEX RXB.map_select C4 RXB.01 X.src1.0.01 DE /r VPMAXUB ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 DE /r Instruction Reference PMAXUB, VPMAXUB 373 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PMAXSB, (V)PMAXSD, (V)PMAXSW, (V)PMAXUD, (V)PMAXUW rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. None 374 PMAXUB, VPMAXUB Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PMAXUD VPMAXUD Packed Maximum Unsigned Doublewords Compares each packed 32-bit unsigned integer value of the first source operand to the corresponding value of the second source operand and writes the numerically greater value into the corresponding doubleword of the destination. The 128-bit form of the instruction compares four pairs of 32-bit unsigned integer values; the 256-bit form compares eight. There are legacy and extended forms of the instruction: PMAXUD The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPMAXUD The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PMAXUD SSE4.1 VPMAXUD 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPMAXUD 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_ECX[SSE41] (bit 19) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PMAXUD xmm1, xmm2/mem128 Opcode Description 66 0F 38 3F /r Compares four pairs of packed unsigned 32-bit values in xmm1 and xmm2 or mem128 and writes the greater values to the corresponding positions in xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPMAXUD xmm1, xmm2, xmm3/mem128 C4 RXB.02 X.src1.0.01 3F /r VPMAXUD ymm1, ymm2, ymm3/mem256 C4 RXB.02 X.src1.1.01 3F /r Instruction Reference PMAXUD, VPMAXUD 375 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PMAXSB, (V)PMAXSD, (V)PMAXSW, (V)PMAXUB, (V)PMAXUW rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 376 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PMAXUD, VPMAXUD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PMAXUW VPMAXUW Packed Maximum Unsigned Words Compares each packed 16-bit unsigned integer value of the first source operand to the corresponding value of the second source operand and writes the numerically greater value into the corresponding word of the destination. The 128-bit form of the instruction compares eight pairs of 16-bit unsigned integer values; the 256-bit form compares 16 pairs. There are legacy and extended forms of the instruction: PMAXUW The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPMAXUW The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PMAXUW SSE4.1 VPMAXUW 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPMAXUW 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_ECX[SSE41] (bit 19) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PMAXUW xmm1, xmm2/mem128 Opcode Description 66 0F 38 3E /r Compares eight pairs of packed unsigned 16-bit values in xmm1 and xmm2 or mem128 and writes the greater values to the corresponding positions in xmm1. Mnemonic Encoding W.vvvv.L.pp Opcode VPMAXUW xmm1, xmm2, xmm3/mem128 VEX RXB.map_select C4 RXB.02 X.src1.0.01 3E /r VPMAXUW ymm1, ymm2, ymm3/mem256 C4 RXB.02 X.src1.1.01 3E /r Instruction Reference PMAXUW, VPMAXUW 377 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PMAXSB, (V)PMAXSD, (V)PMAXSW, (V)PMAXUB, (V)PMAXUD rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 378 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PMAXUW, VPMAXUW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PMINSB VPMINSB Packed Minimum Signed Bytes Compares each packed 8-bit signed integer value of the first source operand to the corresponding value of the second source operand and writes the numerically lesser value into the corresponding byte of the destination. The 128-bit form of the instruction compares 16 pairs of 8-bit signed integer values; the 256-bit form compares 32 pairs. There are legacy and extended forms of the instruction: PMINSB The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPMINSB The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PMINSB SSE4.1 VPMINSB 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPMINSB 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_ECX[SSE41] (bit 19) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PMINSB xmm1, xmm2/mem128 Opcode Description 66 0F 38 38 /r Compares 16 pairs of packed 8-bit values in xmm1 and xmm2 or mem128 and writes the lesser values to the corresponding positions in xmm1 Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPMINSB xmm1, xmm2, xmm3/mem128 C4 RXB.02 X.src1.0.01 38 /r VPMINSB ymm1, ymm2, ymm3/mem256 C4 RXB.02 X.src1.1.01 38 /r Instruction Reference PMINSB, VPMINSB 379 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PMINSD, (V)PMINSW, (V)PMINUB, (V)PMINUD, (V)PMINUW rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 380 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PMINSB, VPMINSB Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PMINSD VPMINSD Packed Minimum Signed Doublewords Compares each packed 32-bit signed integer value of the first source operand to the corresponding value of the second source operand and writes the numerically lesser value into the corresponding doubleword of the destination. The 128-bit form of the instruction compares four pairs of 32-bit signed integer values; the 256-bit form compares eight. There are legacy and extended forms of the instruction: PMINSD The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPMINSD The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PMINSD SSE4.1 VPMINSD 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPMINSD 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_ECX[SSE41] (bit 19) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PMINSD xmm1, xmm2/mem128 Opcode Description 66 0F 38 39 /r Compares four pairs of packed 32-bit values in xmm1 and xmm2 or mem128 and writes the lesser values to the corresponding positions in xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPMINSD xmm1, xmm2, xmm3/mem128 C4 RXB.02 X.src1.0.01 39 /r VPMINSD ymm1, ymm2, ymm3/mem256 C4 RXB.02 X.src1.1.01 39 /r Instruction Reference PMINSD, VPMINSD 381 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PMINSB, (V)PMINSW, (V)PMINUB, (V)PMINUD, (V)PMINUW rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 382 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PMINSD, VPMINSD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PMINSW VPMINSW Packed Minimum Signed Words Compares each packed 16-bit signed integer value of the first source operand to the corresponding value of the second source operand and writes the numerically lesser value into the corresponding word of the destination. The 128-bit form of the instruction compares eight pairs of 16-bit signed integer values; the 256-bit form compares 16 pairs. There are legacy and extended forms of the instruction: PMINSW The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPMINSW The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PMINSW SSE2 VPMINSW 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPMINSW 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_EDX[SSE2] (bit 26) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PMINSW xmm1, xmm2/mem128 Opcode Description 66 0F EA /r Compares eight pairs of packed 16-bit values in xmm1 and xmm2 or mem128 and writes the lesser values to the corresponding positions in xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPMINSW xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 EA /r VPMINSW ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 EA /r Instruction Reference PMINSW, VPMINSW 383 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PMINSB, (V)PMINSD, (V)PMINUB, (V)PMINUD, (V)PMINUW rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 384 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PMINSW, VPMINSW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PMINUB VPMINUB Packed Minimum Unsigned Bytes Compares each packed 8-bit unsigned integer value of the first source operand to the corresponding value of the second source operand and writes the numerically lesser value into the corresponding byte of the destination. The 128-bit form of the instruction compares 16 pairs of 8-bit unsigned integer values; the 256-bit form compares 32 pairs. There are legacy and extended forms of the instruction: PMINUB The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPMINUB The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PMINUB SSE2 VPMINUB 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPMINUB 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_EDX[SSE2] (bit 26) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PMINUB xmm1, xmm2/mem128 Opcode 66 0F DA /r Description Compares 16 pairs of packed unsigned 8-bit values in xmm1 and xmm2 or mem128 and writes the lesser values to the corresponding positions in xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPMINUB xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 DA /r VPMINUB ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 DA /r Instruction Reference PMINUB, VPMINUB 385 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PMINSB, (V)PMINSD, (V)PMINSW, (V)PMINUD, (V)PMINUW rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 386 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PMINUB, VPMINUB Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PMINUD VPMINUD Packed Minimum Unsigned Doublewords Compares each packed 32-bit unsigned integer value of the first source operand to the corresponding value of the second source operand and writes the numerically lesser value into the corresponding doubleword of the destination. The 128-bit form of the instruction compares four pairs of 32-bit unsigned integer values; the 256-bit form compares eight. There are legacy and extended forms of the instruction: PMINUD The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPMINUD The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PMINUD SSE4.1 VPMINUD 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPMINUD 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_ECX[SSE41] (bit 19) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PMINUD xmm1, xmm2/mem128 Opcode Description 66 0F 38 3B /r Compares four pairs of packed unsigned 32-bit values in xmm1 and xmm2 or mem128 and writes the lesser values to the corresponding positions in xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPMINUD xmm1, xmm2, xmm3/mem128 C4 RXB.02 X.src1.0.01 3B /r VPMINUD ymm1, ymm2, ymm3/mem256 C4 RXB.02 X.src1.1.01 3B /r Instruction Reference PMINUD, VPMINUD 387 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PMINSB, (V)PMINSD, (V)PMINSW, (V)PMINUB, (V)PMINUW rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 388 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PMINUD, VPMINUD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PMINUW VPMINUW Packed Minimum Unsigned Words Compares each packed 16-bit unsigned integer value of the first source operand to the corresponding value of the second source operand and writes the numerically lesser value into the corresponding word of the destination. The 128-bit form of the instruction compares eight pairs of 16-bit unsigned integer values; the 256-bit form compares 16 pairs. There are legacy and extended forms of the instruction: PMINUW The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source operand is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPMINUW The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PMINUW SSE4.1 VPMINUW 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPMINUW 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_ECX[SSE41] (bit 19) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PMINUW xmm1, xmm2/mem128 Opcode Description 66 0F 38 3A /r Compares eight pairs of packed unsigned 16-bit values in xmm1 and xmm2 or mem128 and writes the lesser values to the corresponding positions in xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPMINUW xmm1, xmm2, xmm3/mem128 C4 RXB.02 X.src1.0.01 3A /r VPMINUW ymm1, ymm2, ymm3/mem256 C4 RXB.02 X.src1.1.01 3A /r Instruction Reference PMINUW, VPMINUW 389 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PMINSB, (V)PMINSD, (V)PMINSW, (V)PMINUB, (V)PMINUD rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 390 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PMINUW, VPMINUW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PMOVMSKB VPMOVMSKB Packed Move Mask Byte Copies the value of the most-significant bit of each byte element of the source operand to create a 16 or 32 bit mask value, zero-extends the value, and writes it to the destination. There are legacy and extended forms of the instruction: PMOVMSKB The source operand is an XMM register. The destination is a 32-bit general purpose register. The mask is zero-extended to fill the destination register, the mask occupies bits [15:0]. VPMOVMSKB The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The source operand is an XMM register. The destination is a 64-bit general purpose register. The mask is zero-extended to fill the destination register, the mask occupies bits [15:0]. YMM Encoding The source operand is a YMM register. The destination is a 64-bit general purpose register. The mask is zero-extended to fill the destination register, the mask occupies bits [31:0]. Instruction Support Form Subset Feature Flag PMOVMSKB SSE2 VPMOVMSKB 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPMOVMSKB 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_EDX[SSE2] (bit 26) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PMOVMSKB reg32, xmm1 Opcode 66 0F D7 /r Description Moves a zero-extended mask consisting of the mostsignificant bit of each byte in xmm1 to a 32-bit generalpurpose register. Mnemonic Encoding W.vvvv.L.pp Opcode VMOVMSKB reg64, xmm1 VEX RXB.map_select C4 RXB.01 X.1111.0.01 D7 /r VMOVMSKB reg64, ymm1 C4 RXB.01 X.1111.1.01 D7 /r Related Instructions (V)MOVMSKPD, (V)MOVMSKPS Instruction Reference PMOVMSKB, VPMOVMSKB 391 AMD64 Technology 26568—Rev. 3.22—May 2018 rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S Invalid opcode, #UD X X Device not available, #NM S S X — SSE, AVX and AVX2 exception A — AVX, AVX2exception S — SSE exception 392 X S S A A A A A X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv field ! = 1111b. VEX.L field = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. PMOVMSKB, VPMOVMSKB Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PMOVSXBD VPMOVSXBD Packed Move with Sign-Extension Byte to Doubleword Sign-extends four or eight packed 8-bit signed integers in the source operand to 32 bits and writes the packed doubleword signed integers to the destination. If the source operand is a register, the 8-bit signed integers are taken from the least-significant bytes of the register. There are legacy and extended forms of the instruction: PMOVSXBD The source operand is either an XMM register or a 32-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPMOVSXBD The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The source operand is either an XMM register or a 32-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The source operand is either an XMM register or a 64-bit memory location. The destination is a YMM register. Instruction Support Form Subset Feature Flag PMOVSXBD SSE4.1 CPUID Fn0000_0001_ECX[SSE41] (bit 19) VPMOVSXBD 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPMOVSXBD 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PMOVSXBD xmm1, xmm2/mem32 Opcode 66 0F 38 21 /r Description Sign-extends four packed signed 8-bit integers in the four low bytes of xmm2 or mem32 and writes four packed signed 32-bit integers to xmm1. Mnemonic Encoding W.vvvv.L.pp Opcode VPMOVSXBD xmm1, xmm2/mem32 VEX RXB.map_select C4 RXB.02 X.1111.0.01 21 /r VPMOVSXBD ymm1, xmm2/mem64 C4 RXB.02 X.1111.1.01 21 /r Instruction Reference PMOVSXBD, VPMOVSXBD 393 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PMOVSXBQ, (V)PMOVSXBW, (V)PMOVSXDQ, (V)PMOVSXWD, (V)PMOVSXW rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF S Alignment check, #AC S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 394 X S S A A A A A X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. PMOVSXBD, VPMOVSXBD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PMOVSXBQ VPMOVSXBQ Packed Move with Sign Extension Byte to Quadword Sign-extends two or four packed 8-bit signed integers in the source operand to 64 bits and writes the packed quadword signed integers to the destination. If the source operand is a register, the 8-bit signed integers are taken from the least-significant bytes of the register. There are legacy and extended forms of the instruction: PMOVSXBQ The source operand is either an XMM register or a 16-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPMOVSXBQ The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The source operand is either an XMM register or a 16-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The source operand is either an XMM register or a 32-bit memory location. The destination is a YMM register. Instruction Support Form Subset Feature Flag PMOVSXBQ SSE4.1 CPUID Fn0000_0001_ECX[SSE41] (bit 19) VPMOVSXBQ 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPMOVSXBQ 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PMOVSXBQ xmm1, xmm2/mem16 Opcode Description 66 0F 38 22 /r Sign-extends two packed signed 8-bit integers in the two low bytes of xmm2 or mem16 and writes two packed signed 64-bit integers to xmm1. Mnemonic Encoding W.vvvv.L.pp Opcode VPMOVSXBQ xmm1, xmm2/mem16 VEX RXB.map_select C4 RXB.02 X.1111.0.01 22 /r VPMOVSXBQ ymm1, xmm2/mem32 C4 RXB.02 X.1111.1.01 22 /r Instruction Reference PMOVSXBQ, VPMOVSXBQ 395 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PMOVSXBD, (V)PMOVSXBW, (V)PMOVSXDQ, (V)PMOVSXWD, (V)PMOVSXW rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF S Alignment check, #AC S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 396 X S S A A A A A X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. PMOVSXBQ, VPMOVSXBQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PMOVSXBW VPMOVSXBW Packed Move with Sign Extension Byte to Word Sign-extends eight or sixteen packed 8-bit signed integers in the source operand to 16 bits and writes the packed word signed integers to the destination. If the source operand is a register, the eight 8-bit signed integers are taken from the lower half of the register. There are legacy and extended forms of the instruction: PMOVSXBW The source operand is either an XMM register or a 64-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPMOVSXBW The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The source operand is either an XMM register or a 64-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The source operand is either an XMM register or a 128-bit memory location. The destination is a YMM register. Instruction Support Form Subset Feature Flag CPUID Fn0000_0001_ECX[SSE41] (bit 19) PMOVSXBW SSE4.1 VPMOVSXBW 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPMOVSXBW 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PMOVSXBW xmm1, xmm2/mem64 Opcode Description 66 0F 38 20 /r Sign-extends eight packed signed 8-bit integers in the eight low bytes of xmm2 or mem64 and writes eight packed signed 16-bit integers to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPMOVSXBW xmm1, xmm2/mem64 C4 RXB.02 X.1111.0.01 20 /r VPMOVSXBW ymm1, xmm2/mem128 C4 RXB.02 X.1111.1.01 20 /r Instruction Reference PMOVSXBW, VPMOVSXBW 397 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PMOVSXBD, (V)PMOVSXBQ, (V)PMOVSXDQ, (V)PMOVSXWD, (V)PMOVSXW rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF S Alignment check, #AC S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 398 X S S A A A A A X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. PMOVSXBW, VPMOVSXBW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PMOVSXDQ VPMOVSXDQ Packed Move with Sign-Extension Doubleword to Quadword Sign-extends two or four packed 32-bit signed integers in the source operand to 64 bits and writes the packed quadword signed integers to the destination. If the source operand is a register, the two 32-bit signed integers are taken from the lower half of the register. There are legacy and extended forms of the instruction: PMOVSXDQ The source operand is either an XMM register or a 64-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPMOVSXDQ The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The source operand is either an XMM register or a 64-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The source operand is either an XMM register or a 128-bit memory location. The destination is a YMM register. Instruction Support Form Subset Feature Flag PMOVSXDQ SSE4.1 CPUID Fn0000_0001_ECX[SSE41] (bit 19) VPMOVSXDQ 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPMOVSXDQ 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PMOVSXDQ xmm1, xmm2/mem64 Opcode Description 66 0F 38 25 /r Sign-extends two packed signed 32-bit integers in the two low doublewords of xmm2 or mem64 and writes two packed signed 64-bit integers to xmm1. Mnemonic Encoding W.vvvv.L.pp Opcode VPMOVSXDQ xmm1, xmm2/mem64 VEX RXB.map_select C4 RXB.02 X.1111.0.01 25 /r VPMOVSXDQ ymm1, xmm2/mem128 C4 RXB.02 X.1111.1.01 25 /r Instruction Reference PMOVSXDQ, VPMOVSXDQ 399 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PMOVSXBD, (V)PMOVSXBQ, (V)PMOVSXBW, (V)PMOVSXWD, (V)PMOVSXWQ rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF S Alignment check, #AC S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 400 X S S A A A A A X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. PMOVSXDQ, VPMOVSXDQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PMOVSXWD VPMOVSXWD Packed Move with Sign-Extension Word to Doubleword Sign-extends four or eight packed 16-bit signed integers in the source operand to 32 bits and writes the packed doubleword signed integers to the destination. If the source operand is a register, the four 16-bit signed integers are taken from the lower half of the register. There are legacy and extended forms of the instruction: PMOVSXWD The source operand is either an XMM register or a 64-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPMOVSXWD The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The source operand is either an XMM register or a 64-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The source operand is either an XMM register or a 128-bit memory location. The destination is a YMM register. Instruction Support Form Subset Feature Flag PMOVSXWD SSE4.1 CPUID Fn0000_0001_ECX[SSE41] (bit 19) VPMOVSXWD 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPMOVSXWD 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PMOVSXWD xmm1, xmm2/mem64 Opcode Description 66 0F 38 23 /r Sign-extends four packed signed 16-bit integers in the four low words of xmm2 or mem64 and writes four packed signed 32-bit integers to xmm1. Mnemonic Encoding W.vvvv.L.pp Opcode VPMOVSXWD xmm1, xmm2/mem64 VEX RXB.map_select C4 RXB.02 X.1111.0.01 23 /r VPMOVSXWD ymm1, xmm2/mem128 C4 RXB.02 X.1111.1.01 23 /r Instruction Reference PMOVSXWD, VPMOVSXWD 401 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PMOVSXBD, (V)PMOVSXBQ, (V)PMOVSXBW, (V)PMOVSXDQ, (V)PMOVSXWQ rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF S Alignment check, #AC S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 402 X S S A A A A A X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. PMOVSXWD, VPMOVSXWD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PMOVSXWQ VPMOVSXWQ Packed Move with Sign-Extension Word to Quadword Sign-extends two or four packed 16-bit signed integers to 64 bits and writes the packed quadword signed integers to the destination. If the source operand is a register, the 16-bit signed integers are taken from least-significant words of the register. There are legacy and extended forms of the instruction: PMOVSXWQ The source operand is either an XMM register or a 32-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPMOVSXWQ The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The source operand is either an XMM register or a 32-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The source operand is either an XMM register or a 64-bit memory location. The destination is a YMM register. Instruction Support Form Subset Feature Flag PMOVSXWQ SSE4.1 CPUID Fn0000_0001_ECX[SSE41] (bit 19) VPMOVSXWQ 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPMOVSXWQ 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PMOVSXWQ xmm1, xmm2/mem32 Opcode Description 66 0F 38 24 /r Sign-extends two packed signed 16-bit integers in the two low words of xmm2 or mem32 and writes two packed signed 64-bit integers to xmm1. Mnemonic Encoding W.vvvv.L.pp Opcode VPMOVSXWQ xmm1, xmm2/mem32 VEX RXB.map_select C4 RXB.02 X.1111.0.01 24 /r VPMOVSXWQ ymm1, xmm2/mem64 C4 RXB.02 X.1111.1.01 24 /r Instruction Reference PMOVSXWQ, VPMOVSXWQ 403 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PMOVSXBD, (V)PMOVSXBQ, (V)PMOVSXBW, (V)PMOVSXDQ, (V)PMOVSXWD rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF S Alignment check, #AC S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 404 X S S A A A A A X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. PMOVSXWQ, VPMOVSXWQ Instruction Reference 26568—Rev. 3.22—May 2018 PMOVZXBD VPMOVZXBD AMD64 Technology Packed Move with Zero-Extension Byte to Doubleword Zero-extends four or eight packed 8-bit unsigned integers in the source operand to 32 bits and writes the packed doubleword positive-signed integers to the destination. If the source operand is a register, the 8-bit signed integers are taken from the least-significant bytes of the register. There are legacy and extended forms of the instruction: PMOVZXBD The source operand is either an XMM register or a 32-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPMOVZXBD The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The source operand is either an XMM register or a 32-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The source operand is either an XMM register or a 64-bit memory location. The destination is a YMM register. Instruction Support Form Subset Feature Flag PMOVZXBD SSE4.1 CPUID Fn0000_0001_ECX[SSE41] (bit 19) VPMOVZXBD 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPMOVZXBD 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PMOVZXBD xmm1, xmm2/mem32 Opcode Description 66 0F 38 31 /r Zero-extends four packed unsigned 8-bit integers in the four low bytes of xmm2 or mem32 and writes four packed positivesigned 32-bit integers to xmm1. Mnemonic Encoding W.vvvv.L.pp Opcode VPMOVZXBD xmm1, xmm2/mem32 VEX RXB.map_select C4 RXB.02 X.1111.0.01 31 /r VPMOVZXBD ymm1, xmm2/mem64 C4 RXB.02 X.1111.1.01 31 /r Instruction Reference PMOVZXBD, VPMOVZXBD 405 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PMOVZXBQ, (V)PMOVZXBW, (V)PMOVZXDQ, (V)PMOVZXWD, (V)PMOVZXW rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF S Alignment check, #AC S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 406 X S S A A A A A X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. PMOVZXBD, VPMOVZXBD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PMOVZXBQ VPMOVZXBQ Packed Move Byte to Quadword with Zero-Extension Zero-extends two or four packed 8-bit unsigned integers in the source operand to 64 bits and writes the packed quadword positive-signed integers to the destination. If the source operand is a register, the 8-bit signed integers are taken from the least-significant bytes of the register. There are legacy and extended forms of the instruction: PMOVZXBQ The source operand is either an XMM register or a 16-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPMOVZXBQ The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The source operand is either an XMM register or a 16-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The source operand is either an XMM register or a 32-bit memory location. The destination is a YMM register. Instruction Support Form Subset Feature Flag PMOVZXBQ SSE4.1 CPUID Fn0000_0001_ECX[SSE41] (bit 19) VPMOVZXBQ 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPMOVZXBQ 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PMOVZXBQ xmm1, xmm2/mem16 Opcode Description 66 0F 38 32 /r Zero-extends two packed unsigned 8-bit integers in the two low bytes of xmm2 or mem16 and writes two packed positivesigned 64-bit integers to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPMOVZXBQ xmm1, xmm2/mem16 C4 RXB.02 X.1111.0.01 32 /r VPMOVZXBQ ymm1, xmm2/mem32 C4 RXB.02 X.1111.1.01 32 /r Instruction Reference PMOVZXBQ, VPMOVZXBQ 407 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PMOVZXBD, (V)PMOVZXBW, (V)PMOVZXDQ, (V)PMOVZXWD, (V)PMOVZXW rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF S Alignment check, #AC S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 408 X S S A A A A A X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. PMOVZXBQ, VPMOVZXBQ Instruction Reference 26568—Rev. 3.22—May 2018 PMOVZXBW VPMOVZXBW AMD64 Technology Packed Move Byte to Word with Zero-Extension Zero-extends eight or sixteen packed 8-bit unsigned integers in the source operand to 16 bits and writes the packed word positive-signed integers to the destination. If the source operand is a register, the eight 8-bit signed integers are taken from the lower half of the register. There are legacy and extended forms of the instruction: PMOVZXBW The source operand is either an XMM register or a 64-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPMOVZXBW The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The source operand is either an XMM register or a 64-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The source operand is either an XMM register or a 128-bit memory location. The destination is a YMM register. Instruction Support Form Subset Feature Flag PMOVZXBW SSE4.1 CPUID Fn0000_0001_ECX[SSE41] (bit 19) VPMOVZXBW 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPMOVZXBW 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PMOVZXBW xmm1, xmm2/mem64 Opcode Description 66 0F 38 30 /r Zero-extends eight packed unsigned 8-bit integers in the eight low bytes of xmm2 or mem64 and writes eight packed positivesigned 16-bit integers to xmm1. Mnemonic Encoding W.vvvv.L.pp Opcode VPMOVZXBW xmm1, xmm2/mem64 VEX RXB.map_select C4 RXB.02 X.1111.0.01 30 /r VPMOVZXBW ymm1, xmm2/mem128 C4 RXB.02 X.1111.1.01 30 /r Instruction Reference PMOVZXBW, VPMOVZXBW 409 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PMOVZXBD, (V)PMOVZXBQ, (V)PMOVZXDQ, (V)PMOVZXWD, (V)PMOVZXW rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF S Alignment check, #AC S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 410 X S S A A A A A X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. PMOVZXBW, VPMOVZXBW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PMOVZXDQ VPMOVZXDQ Packed Move with Zero-Extension Doubleword to Quadword Zero-extends two or four packed 32-bit unsigned integers in the source operand to 64 bits and writes the packed quadword positive-signed integers to the destination. If the source operand is a register, the two 32-bit signed integers are taken from the lower half of the register. There are legacy and extended forms of the instruction: PMOVZXDQ The source operand is either an XMM register or a 64-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPMOVZXDQ The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The source operand is either an XMM register or a 64-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The source operand is either an XMM register or a 128-bit memory location. The destination is a YMM register. Instruction Support Form Subset Feature Flag PMOVZXDQ SSE4.1 CPUID Fn0000_0001_ECX[SSE41] (bit 19) VPMOVZXDQ 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPMOVZXDQ 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PMOVZXDQ xmm1, xmm2/mem64 Opcode Description 66 0F 38 35 /r Zero-extends two packed unsigned 32-bit integers in the two low doublewords of xmm2 or mem64 and writes two packed positivesigned 64-bit integers to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPMOVZXDQ xmm1, xmm2/mem64 C4 RXB.02 X.1111.0.01 35 /r VPMOVZXDQ ymm1, xmm2/mem128 C4 RXB.02 X.1111.1.01 35 /r Instruction Reference PMOVZXDQ, VPMOVZXDQ 411 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PMOVZXBD, (V)PMOVZXBQ, (V)PMOVZXBW, (V)PMOVZXWD, (V)PMOVZXWQ rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF S Alignment check, #AC S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 412 X S S A A A A A X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. PMOVZXDQ, VPMOVZXDQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PMOVZXWD VPMOVZXWD Packed Move Word to Doubleword with Zero-Extension Zero-extends four or eight packed 16-bit unsigned integers in the source operand to 32 bits and writes the packed doubleword positive-signed integers to the destination. If the source operand is a register, the four 16-bit signed integers are taken from the lower half of the register. There are legacy and extended forms of the instruction: PMOVZXWD The source operand is either an XMM register or a 64-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPMOVZXWD The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The source operand is either an XMM register or a 64-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The source operand is either an XMM register or a 128-bit memory location. The destination is a YMM register. Instruction Support Form Subset Feature Flag PMOVZXWD SSE4.1 CPUID Fn0000_0001_ECX[SSE41] (bit 19) VPMOVZXWD 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPMOVZXWD 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode PMOVZXWD xmm1, xmm2/mem64 Description 66 0F 38 33 /r Zero-extends four packed unsigned 16-bit integers in the four low words of xmm2 or mem64 and writes four packed positivesigned 32-bit integers to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPMOVZXWD xmm1, xmm2/mem64 C4 RXB.02 X.1111.0.01 33 /r VPMOVZXWD ymm1, xmm2/mem128 C4 RXB.02 X.1111.1.01 33 /r Instruction Reference PMOVZXWD, VPMOVZXWD 413 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PMOVZXBD, (V)PMOVZXBQ, (V)PMOVZXBW, (V)PMOVZXDQ, (V)PMOVZXWQ rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF S Alignment check, #AC S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 414 X S S A A A A A X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. PMOVZXWD, VPMOVZXWD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PMOVZXWQ VPMOVZXWQ Packed Move with Zero-Extension Word to Quadword Zero-extends two or four packed 16-bit unsigned integers to 64 bits and writes the packed quadword positive signed integers to the destination. If the source operand is a register, the 16-bit signed integers are taken from least-significant words of the register. There are legacy and extended forms of the instruction: PMOVZXWQ The source operand is either an XMM register or a 32-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPMOVZXWQ The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The source operand is either an XMM register or a 32-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The source operand is either an XMM register or a 64-bit memory location. The destination is a YMM register. Instruction Support Form Subset Feature Flag PMOVZXWQ SSE4.1 CPUID Fn0000_0001_ECX[SSE41] (bit 19) VPMOVZXWQ 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPMOVZXWQ 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PMOVZXWQ xmm1, xmm2/mem32 Opcode Description 66 0F 38 34 /r Zero-extends two packed unsigned 16-bit integers in the two low words of xmm2 or mem32 and writes two packed positivesigned 64-bit integers to xmm1. Mnemonic Encoding W.vvvv.L.pp Opcode VPMOVZXWQ xmm1, xmm2/mem32 VEX RXB.map_select C4 RXB.02 X.1111.0.01 34 /r VPMOVZXWQ ymm1, xmm2/mem64 C4 RXB.02 X.1111.1.01 34 /r Instruction Reference PMOVZXWQ, VPMOVZXWQ 415 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PMOVZXBD, (V)PMOVZXBQ, (V)PMOVZXBW, (V)PMOVZXDQ, (V)PMOVZXWD rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF S Alignment check, #AC S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 416 X S S A A A A A X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. PMOVZXWQ, VPMOVZXWQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PMULDQ VPMULDQ Packed Multiply Signed Doubleword to Quadword Multiplies two or four pairs of 32-bit signed integers in the first and second source operands and writes two or four packed quadword signed integer products to the destination. For the 128-bit form of the instruction, the following operations are performed: dest is the destination register – either an XMM register or the corresponding YMM register. src1 is the first source operand. src2 is the second source operand. dest[63:0] = (src1[31:0] * src2[31:0]) dest[127:64] = (src1[95:64] * src2[95:64]) Additionally, for the 256-bit form of the instruction, the following operations are performed: dest[191:128] = (src1[159:128] * src2[159:128]) dest[255:192] = (src1[223:192] * src2[223:192]) There are legacy and extended forms of the instruction: PMULDQ The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPMULDQ The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PMULDQ SSE4.1 CPUID Fn0000_0001_ECX[SSE41] (bit 19) VPMULDQ 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPMULDQ 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Reference PMULDQ, VPMULDQ 417 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Encoding Mnemonic Opcode PMULDQ xmm1, xmm2/mem128 66 0F 38 28 /r Description Multiplies two packed 32-bit signed integers in xmm1[31:0] and xmm1[95:64] by the corresponding values in xmm2 or mem128. Writes packed 64-bit signed integer products to xmm1[63:0] and xmm1[127:64]. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPMULDQ xmm1, xmm2, xmm3/mem128 C4 RXB.02 X.src1.0.01 28 /r VPMULDQ ymm1, ymm2, ymm3/mem256 C4 RXB.02 X.src1.1.01 28 /r Related Instructions (V)PMULLD, (V)PMULHW, (V)PMULHUW,(V)PMULUDQ, (V)PMULLW rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 418 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PMULDQ, VPMULDQ Instruction Reference 26568—Rev. 3.22—May 2018 PMULHRSW VPMULHRSW AMD64 Technology Packed Multiply High with Round and Scale Words Multiplies each packed 16-bit signed value in the first source operand by the corresponding value in the second source operand, truncates the 32-bit product to the 18 most significant bits by right-shifting, then rounds the truncated value by adding 1 to its least-significant bit. Writes bits [16:1] of the sum to the corresponding word of the destination. There are legacy and extended forms of the instruction: PMULHRSW The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPMULHRSW The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PMULHRSW SSSE3 VPMULHRSW 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPMULHRSW 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_ECX[SSSE3] (bit 9) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PMULHRSW xmm1, xmm2/mem128 Opcode Description 66 0F 38 0B /r Multiplies each packed 16-bit signed value in xmm1 by the corresponding value in xmm2 or mem128, truncates product to 18 bits, rounds by adding 1. Writes bits [16:1] of the sum to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPMULHRSW xmm1, xmm2, xmm3/mem128 C4 RXB.2 X.src1.0.01 0B /r VPMULHRSW ymm1, ymm2, ymm3/mem256 C4 RXB.2 X.src1.1.01 0B /r Instruction Reference PMULHRSW, VPMULHRSW 419 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions None rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 420 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PMULHRSW, VPMULHRSW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PMULHUW VPMULHUW Packed Multiply High Unsigned Word Multiplies each packed 16-bit unsigned value in the first source operand by the corresponding value in the second source operand; writes the high-order 16 bits of each 32-bit product to the corresponding word of the destination. There are legacy and extended forms of the instruction: PMULHUW The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPMULHUW The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PMULHUW SSE2 VPMULHUW 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPMULHUW 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_EDX[SSE2] (bit 26) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PMULHUW xmm1, xmm2/mem128 Opcode Description 66 0F E4 /r Multiplies packed 16-bit unsigned values in xmm1 by the corresponding values in xmm2 or mem128. Writes bits [31:16] of each product to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPMULHUW xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 E4 /r VPMULHUW ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 E4 /r Instruction Reference PMULHUW, VPMULHUW 421 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PMULDQ, (V)PMULHW, (V)PMULLD, (V)PMULLW, (V)PMULUDQ rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 422 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PMULHUW, VPMULHUW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PMULHW VPMULHW Packed Multiply High Signed Word Multiplies each packed 16-bit signed value in the first source operand by the corresponding value in the second source operand; writes the high-order 16 bits of each 32-bit product to the corresponding word of the destination. There are legacy and extended forms of the instruction: PMULHW The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPMULHW The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PMULHW SSE2 VPMULHW 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPMULHW 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_EDX[SSE2] (bit 26) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PMULHW xmm1, xmm2/mem128 Opcode Description 66 0F E5 /r Multiplies packed 16-bit signed values in xmm1 by the corresponding values in xmm2 or mem128. Writes bits [31:16] of each product to xmm1. Mnemonic Encoding W.vvvv.L.pp Opcode VPMULHW xmm1, xmm2, xmm3/mem128 VEX RXB.map_select C4 RXB.01 X.src1.0.01 E5 /r VPMULHW ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 E5 /r Instruction Reference PMULHW, VPMULHW 423 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PMULDQ, (V)PMULHUW, (V)PMULLD, (V)PMULLW, (V)PMULUDQ rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 424 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PMULHW, VPMULHW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PMULLD VPMULLD Packed Multiply and Store Low Signed Doubleword Multiplies four packed 32-bit signed integers in the first source operand by the corresponding values in the second source operand and writes bits [31:0] of each 64-bit product to the corresponding 32-bit element of the destination. There are legacy and extended forms of the instruction: PMULLD The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPMULLD The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PMULLD SSE4.1 CPUID Fn0000_0001_ECX[SSE41] (bit 19) VPMULLD 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPMULLD 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PMULLD xmm1, xmm2/mem128 Opcode Description 66 0F 38 40 /r Multiplies four packed 32-bit signed integers in xmm1 by corresponding values in xmm2 or m128. Writes bits [31:0] of each 64-bit product to the corresponding 32-bit element of xmm1. Mnemonic Encoding W.vvvv.L.pp Opcode VPMULLD xmm1, xmm2, xmm3/mem128 VEX RXB.map_select C4 RXB.02 X.src1.0.01 40 /r VPMULLD ymm1, ymm2, ymm3/mem256 C4 RXB.02 X.src1.1.01 40 /r Instruction Reference PMULLD, VPMULLD 425 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PMULDQ, (V)PMULHUW, (V)PMULHW, (V)PMULLW, (V)PMULUDQ rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 426 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PMULLD, VPMULLD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PMULLW VPMULLW Packed Multiply Low Signed Word Multiplies eight packed 16-bit signed integers in the first source operand by the corresponding values in the second source operand and writes bits [15:0] of each 32-bit product to the corresponding 16-bit element of the destination. There are legacy and extended forms of the instruction: PMULLW The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPMULLW The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PMULLW SSE2 VPMULLW 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPMULLW 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_EDX[SSE2] (bit 26) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PMULLW xmm1, xmm2/mem128 Opcode Description 66 0F D5 /r Multiplies eight packed 16-bit signed integers in xmm1 by corresponding values in xmm2 or m128. Writes bits [15:0] of each 32-bit product to the corresponding 16-bit element of xmm1. Mnemonic Encoding W.vvvv.L.pp Opcode VPMULLW xmm1, xmm2, xmm3/mem128 VEX RXB.map_select C4 RXB.01 X.src1.0.01 D5 /r VPMULLW ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 D5 /r Instruction Reference PMULLW, VPMULLW 427 AMD64 Technology 26568—Rev. 3.22—May 2018 Related Instructions (V)PMULDQ, (V)PMULHUW, (V)PMULHW, (V)PMULLD, (V)PMULUDQ rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 428 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PMULLW, VPMULLW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PMULUDQ VPMULUDQ Packed Multiply Unsigned Doubleword to Quadword Multiplies two or four pairs of 32-bit unsigned integers in the first and second source operands and writes two or four packed quadword unsigned integer products to the destination. For the 128-bit form of the instruction, the following operations are performed: dest is the destination register – either an XMM register or the corresponding YMM register. src1 is the first source operand. src2 is the second source operand. dest[63:0] = (src1[31:0] * src2[31:0]) dest[127:64] = (src1[95:64] * src2[95:64]) Additionally, for the 256-bit form of the instruction, the following operations are performed: dest[191:128] = (src1[159:128] * src2[159:128]) dest[255:192] = (src1[223:192] * src2[223:192]) There are legacy and extended forms of the instruction: PMULUDQ The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPMULUDQ The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PMULUDQ SSE2 VPMULUDQ 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPMULUDQ 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_EDX[SSE2] (bit 26) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Reference PMULUDQ, VPMULUDQ 429 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Encoding Mnemonic PMULUDQ xmm1, xmm2/mem128 Opcode Description 66 0F F4 /r Multiplies two packed 32-bit unsigned integers in xmm1[31:0] and xmm1[95:64] by the corresponding values in xmm2 or mem128. Writes packed 64-bit unsigned integer products to xmm1[63:0] and xmm1[127:64]. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPMULUDQ xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 F4 /r VPMULUDQ ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 F4 /r Related Instructions (V)PMULDQ, (V)PMULHUW, (V)PMULHW, (V)PMULLD, (V)PMULLW, (V)PMULUDQ rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 430 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PMULUDQ, VPMULUDQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology POR VPOR Packed OR Performs a bitwise OR of the first and second source operands and writes the result to the destination. When one or both of a pair of corresponding bits in the first and second operands are set, the corresponding bit of the destination is set; when neither source bit is set, the destination bit is cleared. There are legacy and extended forms of the instruction: POR The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The first source XMM register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPOR The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag POR SSE2 VPOR 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) CPUID Fn0000_0001_EDX[SSE2] (bit 26) VPOR 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic POR xmm1, xmm2/mem128 Opcode 66 0F EB /r Description Performs bitwise OR of values in xmm1 and xmm2 or mem128. Writes results to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPOR xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 EB /r VPOR ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 EB /r Related Instructions (V)PAND, (V)PANDN, (V)PXOR Instruction Reference POR, VPOR 431 AMD64 Technology 26568—Rev. 3.22—May 2018 rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 432 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. POR, VPOR Instruction Reference 26568—Rev. 3.22—May 2018 PSADBW VPSADBW AMD64 Technology Packed Sum of Absolute Differences Bytes to Words Subtracts the 16 or 32 packed 8-bit unsigned integers in the second source operand from the corresponding values in the first source operand and computes the absolute value of the differences. Computes two or four unsigned 16-bit integer sums of groups of eight absolute differences and writes the sums to specific words of the destination. For the 128-bit form of the instruction: • The unsigned 16-bit integer sum of absolute differences of the eight bytes [7:0] of the source operands is written to bits [15:0] of the destination; bits [63:16] are cleared. • The unsigned 16-bit integer sum of absolute differences of the eight bytes [15:8] of the source operands is written to bits [79:64] of the destination; bits [127:80] are cleared. Additionally, for the 256-bit form of the instruction: • The unsigned 16-bit integer sum of absolute differences of the eight bytes [23:16] of the source operands is written to bits [143:128] of the destination; bits [191:144] are cleared. • The unsigned 16-bit integer sum of absolute differences of the eight bytes [24:31] of the source operands is written to bits [207:192] of the destination; bits [255:208] are cleared. There are legacy and extended forms of the instruction: PSADBW The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The first source XMM register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPSADBW The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PSADBW SSE2 VPSADBW 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPSADBW 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_EDX[SSE2] (bit 26) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Reference PSADBW, VPSADBW 433 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Encoding Mnemonic PSADBW xmm1, xmm2/mem128 Opcode Description 66 0F F6 /r Compute the sum of the absolute differences of two sets of packed 8-bit unsigned integer values in xmm1 and xmm2 or mem128. Writes 16-bit unsigned integer sums to xmm1 Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPSADBW xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 F6 /r VPSADBW ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 F6 /r Related Instructions (V)MPSADBW rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 434 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PSADBW, VPSADBW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PSHUFB VPSHUFB Packed Shuffle Byte Copies bytes from the first source operand to the destination or clears bytes in the destination, as specified by control bytes in the second source operand. The control bytes occupy positions in the source operand that correspond to positions in the destination. Each control byte has the following fields. 7 FRZ 6 4 Reserved Bits [7] 3 0 SRC_Index Description Set the bit to clear the corresponding byte of the destination. Clear the bit to copy the selected source byte to the corresponding byte of the destination. [6:4] Reserved [3:0] Binary value selects the source byte. For the 256-bit form of the instruction, the SRC_Index fields in the upper 16 bytes of the second source operand select bytes in the upper 16 bytes of the first source operand to be copied. There are legacy and extended forms of the instruction: PSHUFB The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The first source XMM register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPSHUFB The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset PSHUFB SSSE3 Feature Flag VPSHUFB 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPSHUFB 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_ECX[SSSE3] (bit 9) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Reference PSHUFB, VPSHUFB 435 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Encoding Mnemonic PSHUFB xmm1, xmm2/mem128 Opcode Description 66 0F 38 00 /r Moves bytes in xmm1 as specified by control bytes in xmm2 or mem128. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPSHUFB xmm1, xmm2, xmm3/mem128 C4 RXB.02 X.src1.0.01 00 /r VPSHUFB ymm1, ymm2, ymm3/mem256 C4 RXB.02 X.src1.1.01 00 /r Related Instructions (V)PSHUFD, (V)PSHUFW, (V)PSHUHW, (V)PSHUFLW rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 436 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PSHUFB, VPSHUFB Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PSHUFD VPSHUFD Packed Shuffle Doublewords Copies packed doubleword values from a source to a doubleword in the destination, as specified by bit fields of an immediate byte operand. A source doubleword can be copied more than once. Source doublewords are selected by two-bit fields in the immediate-byte operand. Each field corresponds to a destination doubleword, as shown: Destination Doubleword Immediate-Byte Bit Field Value of Bit Field Source Doubleword [31:0] [1:0] 00 [31:0] 01 [63:32] 10 [95:64] 11 [127:96] 00 [31:0] 01 [63:32] 10 [95:64] 11 [127:96] 00 [31:0] 01 [63:32] 10 [95:64] 11 [127:96] 00 [31:0] 01 [63:32] 10 [95:64] 11 [127:96] [63:32] [95:64] [127:96] [3:2] [5:4] [7:6] For the 256-bit form of the instruction, the same immediate byte selects doublewords in the upper 128-bits of the source operand to be copied to the destination. Destination Doubleword Immediate-Byte Bit Field Value of Bit Field Source Doubleword [159:128] [1:0] 00 [159:128] 01 [191:160] 10 [223:192] 11 [225:224] 00 [159:128] 01 [191:160] 10 [223:192] 11 [225:224] [191:160] Instruction Reference [3:2] PSHUFD, VPSHUFD 437 AMD64 Technology 26568—Rev. 3.22—May 2018 Destination Doubleword Immediate-Byte Bit Field Value of Bit Field Source Doubleword [223:192] [5:4] 00 [159:128] 01 [191:160] 10 [223:192] 11 [225:224] 00 [159:128] 01 [191:160] 10 [223:192] 11 [225:224] [255:224] [7:6] There are legacy and extended forms of the instruction: PSHUFD The source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPSHUFD The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The source operand is either a YMM register or a 256-bit memory location. The destination is a YMM register. Instruction Support Form Subset Feature Flag PSHUFD SSE2 VPSHUFD 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) CPUID Fn0000_0001_EDX[SSE2] (bit 26) VPSHUFD 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 438 PSHUFD, VPSHUFD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic Opcode PSHUFD xmm1, xmm2/mem128, imm8 66 0F 70 /r ib Description Copies packed 32-bit values from xmm2 or mem128 to xmm1, as specified by imm8. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPSHUFD xmm1, xmm2/mem128, imm8 C4 RXB.01 X.1111.0.01 70 /r ib VPSHUFD ymm1, ymm2/mem256, imm8 C4 RXB.01 X.1111.1.01 70 /r ib Related Instructions (V)PSHUFHW, (V)PSHUFLW, (V)PSHUFW rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception Instruction Reference X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PSHUFD, VPSHUFD 439 AMD64 Technology 26568—Rev. 3.22—May 2018 PSHUFHW VPSHUFHW Packed Shuffle High Words Copies packed word values from the high quadword of the source operand or the upper quadwords of two halves of the source operand to a word in the high quadword of the destination or the upper quadwords of two halves of the destination, as specified by bit fields of an immediate byte operand. A source word can be copied more than once. Source words are selected by two-bit fields in the immediate-byte operand. Each field corresponds to a destination word, as shown: Destination Word Immediate-Byte Bit Field Value of Bit Field Source Word [79:64] [1:0] 00 [79:64] 01 [95:80] 10 [111:96] 11 [127:112] 00 [79:64] 01 [95:80] 10 [111:96] 11 [127:112] 00 [79:64] 01 [95:80] 10 [111:96] 11 [127:112] 00 [79:64] 01 [95:80] 10 [111:96] 11 [127:112] [95:80] [111:96] [127:112] [3:2] [5:4] [7:6] The least-significant quadword of the source is copied to the corresponding quadword of the destination. For the 256-bit form of the instruction, the same immediate byte selects words in the most-significant quadword of the source operand to be copied to the destination: 440 Destination Word Immediate-Byte Bit Field Value of Bit Field Source Word [207:192] [1:0] 00 [207:192] 01 [223:208] 10 [239:224] 11 [255:240] PSHUFHW, VPSHUFHW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Destination Word Immediate-Byte Bit Field Value of Bit Field Source Word [223:208] [3:2] 00 [207:192] 01 [223:208] 10 [239:224] 11 [255:240] 00 [207:192] 01 [223:208] 10 [239:224] 11 [255:240] 00 [207:192] 01 [223:208] 10 [239:224] 11 [255:240] [239:224] [255:240] [5:4] [7:6] The least-significant quadword of the upper 128 bits of the source is copied to the corresponding quadword of the destination. There are legacy and extended forms of the instruction: PSHUFHW The source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPSHUFHW The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The source operand is either a YMM register or a 256-bit memory location. The destination is a YMM register. Instruction Support Form Subset PSHUFHW SSE2 Feature Flag VPSHUFHW 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPSHUFHW 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_EDX[SSE2] (bit 26) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Reference PSHUFHW, VPSHUFHW 441 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Encoding Mnemonic PSHUFHW xmm1, xmm2/mem128, imm8 Opcode Description F3 0F 70 /r ib Copies packed 16-bit values from the high-order quadword of xmm2 or mem128 to the high-order quadword of xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPSHUFHW xmm1, xmm2/mem128, imm8 C4 RXB.01 X.1111.0.10 70 /r ib VPSHUFHW ymm1, ymm2/mem256, imm8 C4 RXB.01 X.1111.1.10 70 /r ib Related Instructions (V)PSHUFD, (V)PSHUFLW, (V)PSHUFW rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 442 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PSHUFHW, VPSHUFHW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PSHUFLW VPSHUFLW Packed Shuffle Low Words Copies packed word values from the low quadword of the source operand or the lower quadwords of two halves of the source operand to a word in the low quadword of the destination or the lower quadwords of two halves of the destination, as specified by bit fields of an immediate byte operand. A source word can be copied more than once. Source words are selected by two-bit fields in the immediate-byte operand. Each bit field corresponds to a destination word, as shown: Destination Word Immediate-Byte Bit Field Value of Bit Field Source Word [15:0] [1:0] 00 [15:0] 01 [31:16] 10 [47:32] 11 [63:48] 00 [15:0] 01 [31:16] 10 [47:32] 11 [63:48] 00 [15:0] 01 [31:16] 10 [47:32] 11 [63:48] 00 [15:0] 01 [31:16] 10 [47:32] 11 [63:48] [31:16] [47:32] [63:48] [3:2] [5:4] [7:6] The most-significant quadword of the source is copied to the corresponding quadword of the destination. For the 256-bit form of the instruction, the same immediate byte selects words in the lower quadword of the upper 128 bits of the source operand to be copied to the destination: Destination Word Immediate-Byte Bit Field Value of Bit Field Source Word [143:128] [1:0] 00 [143:128] 01 [159:144] 10 [175:160] 11 [191:176] Instruction Reference PSHUFLW, VPSHUFLW 443 AMD64 Technology 26568—Rev. 3.22—May 2018 Destination Word Immediate-Byte Bit Field Value of Bit Field Source Word [159:144] [3:2] 00 [143:128] 01 [159:144] 10 [175:160] 11 [191:176] 00 [143:128] 01 [159:144] 10 [175:160] 11 [191:176] 00 [143:128] 01 [159:144] 10 [175:160] 11 [191:176] [175:160] [191:176] [5:4] [7:6] The most-significant quadword of the upper 128 bits of the source is copied to the corresponding quadword of the destination. There are legacy and extended forms of the instruction: PSHUFLW The source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPSHUFLW The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The source operand is either a YMM register or a 256-bit memory location. The destination is a YMM register. Instruction Support Form Subset PSHUFLW SSE2 Feature Flag VPSHUFLW 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPSHUFLW 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_EDX[SSE2] (bit 26) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 444 PSHUFLW, VPSHUFLW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic PSHUFLW xmm1, xmm2/mem128, imm8 Opcode Description F2 0F 70 /r ib Copies packed 16-bit values from the loworder quadword of xmm2 or mem128 to the low-order quadword of xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPSHUFLW xmm1, xmm2/mem128, imm8 C4 RXB.01 X.1111.0.11 70 /r ib VPSHUFLW ymm1, ymm2/mem256, imm8 C4 RXB.01 X.1111.1.11 70 /r ib Related Instructions (V)PSHUFD, (V)PSHUFHW, (V)PSHUFW rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception Instruction Reference X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PSHUFLW, VPSHUFLW 445 AMD64 Technology 26568—Rev. 3.22—May 2018 PSIGNB VPSIGNB Packed Sign Byte For each packed signed byte in the first source operand, evaluate the corresponding byte of the second source operand and perform one of the following operations. • When a byte of the second source is negative, write the two’s-complement of the corresponding byte of the first source to the destination. • When a byte of the second source is positive, copy the corresponding byte of the first source to the destination. • When a byte of the second source is zero, clear the corresponding byte of the destination. There are legacy and extended forms of the instruction: PSIGNB The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The first source XMM register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPSIGNB The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PSIGNB SSSE3 VPSIGNB 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPSIGNB 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_ECX[SSSE3] (bit 9) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 446 PSIGNB, VPSIGNB Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic PSIGNB xmm1, xmm2/mem128 Opcode Description 66 0F 38 08 /r Perform operation based on evaluation of each packed 8-bit signed integer value in xmm2 or mem128. Write 8-bit signed results to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPSIGNB xmm1, xmm2, xmm2/mem128 C4 RXB.02 X.src1.0.01 08 /r VPSIGNB ymm1, ymm2, ymm2/mem256 C4 RXB.02 X.src1.1.01 08 /r Related Instructions (V)PSIGNW, (V)PSIGND rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception Instruction Reference X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PSIGNB, VPSIGNB 447 AMD64 Technology 26568—Rev. 3.22—May 2018 PSIGND VPSIGND Packed Sign Doubleword For each packed signed doubleword in the first source operand, evaluate the corresponding doubleword of the second source operand and perform one of the following operations. • When a doubleword of the second source is negative, write the two’s-complement of the corresponding doubleword of the first source to the destination. • When a doubleword of the second source is positive, copy the corresponding doubleword of the first source to the destination. • When a doubleword of the second source is zero, clear the corresponding doubleword of the destination. There are legacy and extended forms of the instruction: PSIGND The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The first source XMM register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPSIGND The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PSIGND SSSE3 VPSIGND 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) CPUID Fn0000_0001_ECX[SSSE3] (bit 9) VPSIGND 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 448 PSIGND, VPSIGND Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic PSIGND xmm1, xmm2/mem128 Opcode Description 66 0F 38 0A /r Perform operation based on evaluation of each packed 32-bit signed integer value in xmm2 or mem128. Write 32-bit signed results to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPSIGND xmm1, xmm2, xmm3/mem128 C4 RXB.02 X.src1.0.01 0A /r VPSIGND ymm1, ymm2, ymm3/mem256 C4 RXB.02 X.src1.1.01 0A /r Related Instructions (V)PSIGNB, (V)PSIGNW rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception Instruction Reference X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PSIGND, VPSIGND 449 AMD64 Technology 26568—Rev. 3.22—May 2018 PSIGNW VPSIGNW Packed Sign Word For each packed signed word in the first source operand, evaluate the corresponding word of the second source operand and perform one of the following operations. • When a word of the second source is negative, write the two’s-complement of the corresponding word of the first source to the destination. • When a word of the second source is positive, copy the corresponding word of the first source to the destination. • When a word of the second source is zero, clear the corresponding word of the destination. There are legacy and extended forms of the instruction: PSIGNW The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The first source XMM register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPSIGNW The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PSIGNW SSSE3 VPSIGNW 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPSIGNW 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_ECX[SSSE3] (bit 9) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 450 PSIGNW, VPSIGNW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic PSIGNW xmm1, xmm2/mem128 Opcode Description 66 0F 38 09 /r Perform operation based on evaluation of each packed 16-bit signed integer value in xmm2 or mem128. Write 16-bit signed results to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPSIGNW xmm1, xmm2, xmm3/mem128 C4 RXB.02 X.src1.0.01 09 /r VPSIGNW ymm1, ymm2, ymm3/mem256 C4 RXB.02 X.src1.1.01 09 /r Related Instructions (V)PSIGNB, (V)PSIGND rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception Instruction Reference X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PSIGNW, VPSIGNW 451 AMD64 Technology 26568—Rev. 3.22—May 2018 PSLLD VPSLLD Packed Shift Left Logical Doublewords Left-shifts each packed 32-bit value in the source operand as specified by a shift-count operand and writes the shifted values to the destination. The shift-count operand can be an immediate byte, a second register, or a memory location. The shift count is treated as an unsigned integer. When the shift count is provided by a register or memory location, only bits [63:0] of the value are considered. Low-order bits emptied by shifting are cleared. When the shift count is greater than 31, the destination is cleared. There are legacy and extended forms of the instruction: PSLLD There are two forms of the instruction, based on the type of count operand. The first source operand is an XMM register. The shift count is specified by either a second XMM register or a 128-bit memory location, or by an immediate 8-bit operand. The first source XMM register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPSLLD The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding There are two 128-bit encodings. These differ based on the type of count operand. The first source operand is an XMM register. The shift count is specified by either a second XMM register or a 128-bit memory location, or by an immediate 8-bit operand. The destination is an XMM register. For the immediate operand encoding, the destination is specified by VEX.vvvv. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding There are two 256-bit encodings. These differ based on the type of count operand. The first source operand is a YMM register. The shift count is specified by either a second XMM register or a 128-bit memory location, or by an immediate 8-bit operand. The destination is a YMM register. For the immediate operand encoding, the destination is specified by VEX.vvvv. Instruction Support Form Subset Feature Flag PSLLD SSE2 VPSLLD 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) CPUID Fn0000_0001_EDX[SSE2] (bit 26) VPSLLD 256-bit AVX2 CPUID Fn0000_00007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 452 PSLLD, VPSLLD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic PSLLD xmm1, xmm2/mem128 PSLLD xmm, imm8 Opcode Description 66 0F F2 /r Left-shifts packed doublewords in xmm1 as specified by xmm2[63:0] or mem128[63:0]. 66 0F 72 /6 ib Left-shifts packed doublewords in xmm as specified by imm8. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPSLLD xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 F2 /r VPSLLD xmm1, xmm2, imm8 C4 RXB.01 X.dest.0.01 72 /6 ib VPSLLD ymm1, ymm2, xmm3/mem128 C4 RXB.01 X.src1.1.01 F2 /r VPSLLD ymm1, ymm2, imm8 C4 RXB.01 X.dest.1.01 72 /6 ib Related Instructions (V)PSLLDQ, (V)PSLLQ, (V)PSLLW, (V)PSRAD, (V)PSRAW, (V)PSRLD, (V)PSRLDQ, (V)PSRLQ, (V)PSRLW, VPSLLVD, VPSLLVQ, VPSRAVD, VPSRLVD, VPSRLVQ rFLAGS Affected None MXCSR Flags Affected None Instruction Reference PSLLD, VPSLLD 453 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X A A A S Alignment check, #AC A Page fault, #PF X — AVX, AVX2, and SSE exception A — AVX and AVX2 exception S — SSE exception 454 A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. When alignment checking enabled: • 128-bit memory operand not 16-byte aligned. • 256-bit memory operand not 32-byte aligned. Instruction execution caused a page fault. PSLLD, VPSLLD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PSLLDQ VPSLLDQ Packed Shift Left Logical Double Quadword Left-shifts the one or each of the two double quadword values in the source operand the number of bytes specified by an immediate byte operand and writes the shifted values to the destination. The immediate byte operand supplies an unsigned shift count. Low-order bytes emptied by shifting are cleared. When the shift value is greater than 15, the destination is cleared. For the 256-bit form of the instruction, the shift count is applied to both the upper and the lower double quadword. Bytes shifted out of the lower 128 bits are not shifted into the upper. There are legacy and extended forms of the instruction: PSLLDQ The source XMM register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPSLLDQ The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The source operand is an XMM register. The destination is an XMM register specified by VEX.vvvv. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The source operand is a YMM register. The destination is a YMM register specified by VEX.vvvv. Instruction Support Form Subset Feature Flag PSLLDQ SSE2 VPSLLDQ 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) CPUID Fn0000_0001_EDX[SSE2] (bit 26) VPSLLDQ 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PSLLDQ xmm, imm8 Opcode Description 66 0F 73 /7 ib Left-shifts double quadword value in xmm1 as specified by imm8. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPSLLDQ xmm1, xmm2, imm8 C4 RXB.01 0.dest.0.01 73 /7 ib VPSLLDQ ymm1, ymm2, imm8 C4 RXB.01 0.dest.1.01 73 /7 ib Related Instructions (V)PSLLD, (V)PSLLQ, (V)PSLLW, (V)PSRAD, (V)PSRAW, (V)PSRLD, (V)PSRLDQ, (V)PSRLQ, (V)PSRLW, VPSLLVD, VPSLLVQ, VPSRAVD, VPSRLVD, VPSRLVQ Instruction Reference PSLLDQ, VPSLLDQ 455 AMD64 Technology 26568—Rev. 3.22—May 2018 rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S Invalid opcode, #UD X X Device not available, #NM S S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 456 X S S A A A A X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. PSLLDQ, VPSLLDQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PSLLQ VPSLLQ Packed Shift Left Logical Quadwords Left-shifts each packed 64-bit value in the source operand as specified by a shift-count operand and writes the shifted values to the destination. The shift-count operand can be an immediate byte, a second register, or a memory location. The shift count is treated as an unsigned integer. When the shift count is provided by a register or memory location, only bits [63:0] of the value are considered. Low-order bits emptied by shifting are cleared. When the shift value is greater than 63, the destination is cleared. There are legacy and extended forms of the instruction: PSLLQ There are two forms of the instruction, based on the type of count operand. The first source operand is an XMM register. The shift count is specified by either a second XMM register or a 128-bit memory location, or by an immediate 8-bit operand. The first source XMM register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPSLLQ The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding There are two 128-bit encodings. These differ based on the type of count operand. The first source operand is an XMM register. The shift count is specified by either a second XMM register or a 128-bit memory location, or by an immediate 8-bit operand. The destination is an XMM register. For the immediate operand encoding, the destination is specified by VEX.vvvv. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding There are two 256-bit encodings. These differ based on the type of count operand. The first source operand is a YMM register. The shift count is specified by either a second XMM register or a 128-bit memory location, or by an immediate 8-bit operand. The destination is a YMM register. For the immediate operand encoding, the destination is specified by VEX.vvvv. Instruction Support Form Subset Feature Flag PSLLQ SSE2 VPSLLQ 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) CPUID Fn0000_0001_EDX[SSE2] (bit 26) VPSLLQ 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Reference PSLLQ, VPSLLQ 457 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Encoding Mnemonic PSLLQ xmm1, xmm2/mem128 PSLLQ xmm, imm8 Opcode Description 66 0F F3 /r Left-shifts packed quadwords in xmm1 as specified by xmm2[63:0] or mem128[63:0]. 66 0F 73 /6 ib Left-shifts packed quadwords in xmm as specified by imm8. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPSLLQ xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 F3 /r VPSLLQ xmm1, xmm2, imm8 C4 RXB.01 X.dest.0.01 73 /6 ib VPSLLQ ymm1, ymm2, xmm3/mem128 C4 RXB.01 X.src1.1.01 F3 /r VPSLLQ ymm1, ymm2, imm8 C4 RXB.01 X.dest.1.01 73 /6 ib Related Instructions (V)PSLLD, (V)PSLLDQ, (V)PSLLW, (V)PSRAD, (V)PSRAW, (V)PSRLD, (V)PSRLDQ, (V)PSRLQ, (V)PSRLW, VPSLLVD, VPSLLVQ, VPSRAVD, VPSRLVD, VPSRLVQLLVQ rFLAGS Affected None MXCSR Flags Affected None 458 PSLLQ, VPSLLQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X A A A S Alignment check, #AC A Page fault, #PF X — AVX, AVX2, and SSE exception A — AVX and AVX2 exception S — SSE exception Instruction Reference A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. When alignment checking enabled: • 128-bit memory operand not 16-byte aligned. • 256-bit memory operand not 32-byte aligned. Instruction execution caused a page fault. PSLLQ, VPSLLQ 459 AMD64 Technology 26568—Rev. 3.22—May 2018 PSLLW VPSLLW Packed Shift Left Logical Words Left-shifts each packed 16-bit value in the source operand as specified by a shift-count operand and writes the shifted values to the destination. The shift-count operand can be an immediate byte, a second register, or a memory location. The shift count is treated as an unsigned integer. When the shift count is provided by a register or memory location, only bits [63:0] of the value are considered. Low-order bits emptied by shifting are cleared. When the shift count is greater than 15, the destination is cleared. There are legacy and extended forms of the instruction: PSLLW There are two forms of the instruction, based on the type of count operand. The first source operand is an XMM register. The shift count is specified by either a second XMM register or a 128-bit memory location, or by an immediate 8-bit operand. The first source XMM register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPSLLW The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding There are two 128-bit encodings. These differ based on the type of count operand. The first source operand is an XMM register. The shift count is specified by either a second XMM register or a 128-bit memory location, or by an immediate 8-bit operand. The destination is an XMM register. For the immediate operand encoding, the destination is specified by VEX.vvvv. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding There are two 256-bit encodings. These differ based on the type of count operand. The first source operand is a YMM register. The shift count is specified by either a second XMM register or a 128-bit memory location, or by an immediate 8-bit operand. The destination is a YMM register. For the immediate operand encoding, the destination is specified by VEX.vvvv. Instruction Support Form Subset Feature Flag PSLLW SSE2 VPSLLW 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) CPUID Fn0000_0001_EDX[SSE2] (bit 26) VPSLLW 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 460 PSLLW, VPSLLW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic PSLLW xmm1, xmm2/mem128 PSLLW xmm, imm8 Opcode 66 0F F1 /r 66 0F 71 /6 ib Description Left-shifts packed words in xmm1 as specified by xmm2[63:0] or mem128[63:0]. Left-shifts packed words in xmm as specified by imm8. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPSLLW xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 F1 /r VPSLLW xmm1, xmm2, imm8 C4 RXB.01 X.dest.0.01 71 /6 ib VPSLLW ymm1, ymm2, xmm3/mem128 C4 RXB.01 X.src1.1.01 F1 /r VPSLLW ymm1, ymm2, imm8 C4 RXB.01 X.dest.1.01 71 /6 ib Related Instructions (V)PSLLD, (V)PSLLDQ, (V)PSLLQ, (V)PSRAD, (V)PSRAW, (V)PSRLD, (V)PSRLDQ, (V)PSRLQ, (V)PSRLW, VPSLLVD, VPSLLVQ, VPSRAVD, VPSRLVD, VPSRLVQ rFLAGS Affected None MXCSR Flags Affected None Instruction Reference PSLLW, VPSLLW 461 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X A A A S Alignment check, #AC A Page fault, #PF X — AVX, AVX2, and SSE exception A — AVX and AVX2 exception S — SSE exception 462 A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. When alignment checking enabled: • 128-bit memory operand not 16-byte aligned. • 256-bit memory operand not 32-byte aligned. Instruction execution caused a page fault. PSLLW, VPSLLW Instruction Reference 26568—Rev. 3.22—May 2018 PSRAD VPSRAD AMD64 Technology Packed Shift Right Arithmetic Doublewords Right-shifts each packed 32-bit value in the source operand as specified by a shift-count operand and writes the shifted values to the destination. The shift-count operand can be an immediate byte, a second register, or a memory location. The shift count is treated as an unsigned integer. When the shift count is provided by a register or memory location, only bits [63:0] of the value are considered. High-order bits emptied by shifting are filled with the sign bit of the initial value. When the shift value is greater than 31, each doubleword of the destination is filled with the sign bit of its initial value. There are legacy and extended forms of the instruction: PSRAD There are two forms of the instruction, based on the type of count operand. The first source operand is an XMM register. The shift count is specified by either a second XMM register or a 128-bit memory location, or by an immediate 8-bit operand. The first source XMM register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPSRAD The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding There are two 128-bit encodings. These differ based on the type of count operand. The first source operand is an XMM register. The shift count is specified by either a second XMM register or a 128-bit memory location, or by an immediate 8-bit operand. The destination is an XMM register. For the immediate operand encoding, the destination is specified by VEX.vvvv. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding There are two 256-bit encodings. These differ based on the type of count operand. The first source operand is a YMM register. The shift count is specified by either a second XMM register or a 128-bit memory location, or by an immediate 8-bit operand. The destination is a YMM register. For the immediate operand encoding, the destination is specified by VEX.vvvv. Instruction Support Form Subset Feature Flag PSRAD SSE2 VPSRAD 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) CPUID Fn0000_0001_EDX[SSE2] (bit 26) VPSRAD 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Reference PSRAD, VPSRAD 463 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Encoding Mnemonic PSRAD xmm1, xmm2/mem128 PSRAD xmm, imm8 Opcode Description 66 0F E2 /r Right-shifts packed doublewords in xmm1 as specified by xmm2[63:0] or mem128[63:0]. 66 0F 72 /4 ib Right-shifts packed doublewords in xmm as specified by imm8. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPSRAD xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 E2 /r VPSRAD xmm1, xmm2, imm8 C4 RXB.01 X.dest.0.01 72 /4 ib VPSRAD ymm1, ymm2, xmm3/mem128 C4 RXB.01 X.src1.1.01 E2 /r VPSRAD ymm1, ymm2, imm8 C4 RXB.01 X.dest.1.01 72 /4 ib Related Instructions (V)PSLLD, (V)PSLLDQ, (V)PSLLQ, (V)PSLLW, (V)PSRAW, (V)PSRLD, (V)PSRLDQ, (V)PSRLQ, (V)PSRLW, VPSLLVD, VPSLLVQ, VPSRAVD, VPSRLVD, VPSRLVQ rFLAGS Affected None MXCSR Flags Affected None 464 PSRAD, VPSRAD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X A A A S Alignment check, #AC A Page fault, #PF X — AVX, AVX2, and SSE exception A — AVX and AVX2 exception S — SSE exception Instruction Reference A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. When alignment checking enabled: • 128-bit memory operand not 16-byte aligned. • 256-bit memory operand not 32-byte aligned. Instruction execution caused a page fault. PSRAD, VPSRAD 465 AMD64 Technology 26568—Rev. 3.22—May 2018 PSRAW VPSRAW Packed Shift Right Arithmetic Words Right-shifts each packed 16-bit value in the source operand as specified by a shift-count operand and writes the shifted values to the destination. The shift-count operand can be an immediate byte, a second register, or a memory location. The shift count is treated as an unsigned integer. When the shift count is provided by a register or memory location, only bits [63:0] of the value are considered. High-order bits emptied by shifting are filled with the sign bit of the initial value. When the shift value is greater than 16, each doubleword of the destination is filled with the sign bit of its initial value. There are legacy and extended forms of the instruction: PSRAW There are two forms of the instruction, based on the type of count operand. The first source operand is an XMM register. The shift count is specified by either a second XMM register or a 128-bit memory location, or by an immediate 8-bit operand. The first source XMM register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPSRAW The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding There are two 128-bit encodings. These differ based on the type of count operand. The first source operand is an XMM register. The shift count is specified by either a second XMM register or a 128-bit memory location, or by an immediate 8-bit operand. The destination is an XMM register. For the immediate operand encoding, the destination is specified by VEX.vvvv. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding There are two 256-bit encodings. These differ based on the type of count operand. The first source operand is a YMM register. The shift count is specified by either a second XMM register or a 128-bit memory location, or by an immediate 8-bit operand. The destination is a YMM register. For the immediate operand encoding, the destination is specified by VEX.vvvv. Instruction Support Form Subset Feature Flag PSRAW SSE2 VPSRAW 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) CPUID Fn0000_0001_EDX[SSE2] (bit 26) VPSRAW 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 466 PSRAW, VPSRAW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic PSRAW xmm1, xmm2/mem128 PSRAW xmm, imm8 Opcode Description 66 0F E1 /r Right-shifts packed words in xmm1 as specified by xmm2[63:0] or mem128[63:0]. 66 0F 71 /4 ib Right-shifts packed words in xmm as specified by imm8. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPSRAW xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 E1 /r VPSRAW xmm1, xmm2, imm8 C4 RXB.01 X.dest.0.01 71 /4 ib VPSRAW ymm1, ymm2, xmm3/mem128 C4 RXB.01 X.src1.1.01 E1 /r VPSRAW ymm1, ymm2, imm8 C4 RXB.01 X.dest.1.01 71 /4 ib Related Instructions (V)PSLLD, (V)PSLLDQ, (V)PSLLQ, (V)PSLLW, (V)PSRAD, (V)PSRLD, (V)PSRLDQ, (V)PSRLQ, (V)PSRLW, VPSLLVD, VPSLLVQ, VPSRAVD, VPSRLVD, VPSRLVQ rFLAGS Affected None MXCSR Flags Affected None Instruction Reference PSRAW, VPSRAW 467 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X A A A S Alignment check, #AC A Page fault, #PF X — AVX, AVX2, and SSE exception A — AVX and AVX2 exception S — SSE exception 468 A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. When alignment checking enabled: • 128-bit memory operand not 16-byte aligned. • 256-bit memory operand not 32-byte aligned. Instruction execution caused a page fault. PSRAW, VPSRAW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PSRLD VPSRLD Packed Shift Right Logical Doublewords Right-shifts each packed 32-bit value in the source operand as specified by a shift-count operand and writes the shifted values to the destination. The shift-count operand can be an immediate byte, a second register, or a memory location. The shift count is treated as an unsigned integer. When the shift count is provided by a register or memory location, only bits [63:0] of the value are considered. High-order bits emptied by shifting are cleared. When the shift value is greater than 31, the destination is cleared. There are legacy and extended forms of the instruction: PSRLD There are two forms of the instruction, based on the type of count operand. The first source operand is an XMM register. The shift count is specified by either a second XMM register or a 128-bit memory location, or by an immediate 8-bit operand. The first source XMM register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPSRLD The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding There are two 128-bit encodings. These differ based on the type of count operand. The first source operand is an XMM register. The shift count is specified by either a second XMM register or a 128-bit memory location, or by an immediate 8-bit operand. The destination is an XMM register. For the immediate operand encoding, the destination is specified by VEX.vvvv. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding There are two 256-bit encodings. These differ based on the type of count operand. The first source operand is a YMM register. The shift count is specified by either a second XMM register or a 128-bit memory location, or by an immediate 8-bit operand. The destination is a YMM register. For the immediate operand encoding, the destination is specified by VEX.vvvv. Instruction Support Form Subset Feature Flag PSRLD SSE2 VPSRLD 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPSRLD 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_EDX[SSE2] (bit 26) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Reference PSRLD, VPSRLD 469 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Encoding Mnemonic PSRLD xmm1, xmm2/mem128 PSRLD xmm, imm8 Opcode Description 66 0F D2 /r Right-shifts packed doublewords in xmm1 as specified by xmm2[63:0] or mem128[63:0]. 66 0F 72 /2 ib Right-shifts packed doublewords in xmm as specified by imm8. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPSRLD xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 D2 /r VPSRLD xmm1, xmm2, imm8 C4 RXB.01 X.dest.0.01 72 /2 ib VPSRLD ymm1, ymm2, xmm3/mem128 C4 RXB.01 X.src1.1.01 D2 /r VPSRLD ymm1, ymm2, imm8 C4 RXB.01 X.dest.1.01 72 /2 ib Related Instructions (V)PSLLD, (V)PSLLDQ, (V)PSLLQ, (V)PSLLW, (V)PSRAD, (V)PSRAW, (V)PSRLDQ, (V)PSRLQ, (V)PSRLW, VPSLLVD, VPSLLVQ, VPSRAVD, VPSRLVD, VPSRLVQ rFLAGS Affected None MXCSR Flags Affected None 470 PSRLD, VPSRLD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X A A A S Alignment check, #AC A Page fault, #PF X — AVX, AVX2, and SSE exception A — AVX and AVX2 exception S — SSE exception Instruction Reference A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. When alignment checking enabled: • 128-bit memory operand not 16-byte aligned. • 256-bit memory operand not 32-byte aligned. Instruction execution caused a page fault. PSRLD, VPSRLD 471 AMD64 Technology 26568—Rev. 3.22—May 2018 PSRLDQ VPSRLDQ Packed Shift Right Logical Double Quadword Right-shifts one or each of two double quadword values in the source operand the number of bytes specified by an immediate byte operand and writes the shifted values to the destination. The immediate byte operand supplies an unsigned shift count. High-order bytes emptied by shifting are cleared. When the shift value is greater than 15, the destination is cleared. For the 256-bit form of the instruction, the shift count is applied to both the upper and the lower double quadword. Bytes shifted out of the upper 128 bits are not shifted into the lower. There are legacy and extended forms of the instruction: PSRLDQ The source XMM register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPSRLDQ The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The source operand is an XMM register. The destination is an XMM register specified by VEX.vvvv. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The source operand is a YMM register. The destination is a YMM register specified by VEX.vvvv. Instruction Support Form Subset Feature Flag PSRLDQ SSE2 VPSRLDQ 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) CPUID Fn0000_0001_EDX[SSE2] (bit 26) VPSRLDQ 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PSRLDQ xmm, imm8 Opcode 66 0F 73 /3 ib Description Right-shifts double quadword value in xmm1 as specified by imm8. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPSRLDQ xmm1, xmm2, imm8 C4 RXB.01 X.dest.0.01 73 /3 ib VPSRLDQ ymm1, ymm2, imm8 C4 RXB.01 X.dest.1.01 73 /3 ib 472 PSRLDQ, VPSRLDQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Related Instructions (V)PSLLD, (V)PSLLDQ, (V)PSLLQ, (V)PSLLW, (V)PSRAD, (V)PSRAW, (V)PSRLD, (V)PSRLQ, (V)PSRLW, VPSLLVD, VPSLLVQ, VPSRAVD, VPSRLVD, VPSRLVQ rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S Invalid opcode, #UD X X Device not available, #NM S S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception Instruction Reference X S S A A A A X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. PSRLDQ, VPSRLDQ 473 AMD64 Technology 26568—Rev. 3.22—May 2018 PSRLQ VPSRLQ Packed Shift Right Logical Quadwords Right-shifts each packed 64-bit value in the source operand as specified by a shift-count operand and writes the shifted values to the destination. The shift-count operand can be an immediate byte, a second register, or a memory location. The shift count is treated as an unsigned integer. When the shift count is provided by a register or memory location, only bits [63:0] of the value are considered. High-order bits emptied by shifting are cleared. When the shift value is greater than 63, the destination is cleared. There are legacy and extended forms of the instruction: PSRLQ There are two forms of the instruction, based on the type of count operand. The first source operand is an XMM register. The shift count is specified by either a second XMM register or a 128-bit memory location, or by an immediate 8-bit operand. The first source XMM register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPSRLQ The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding There are two 128-bit encodings. These differ based on the type of count operand. The first source operand is an XMM register. The shift count is specified by either a second XMM register or a 128-bit memory location, or by an immediate 8-bit operand. The destination is an XMM register. For the immediate operand encoding, the destination is specified by VEX.vvvv. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding There are two 256-bit encodings. These differ based on the type of count operand. The first source operand is a YMM register. The shift count is specified by either a second XMM register or a 128-bit memory location, or by an immediate 8-bit operand. The destination is a YMM register. For the immediate operand encoding, the destination is specified by VEX.vvvv. Instruction Support Form Subset Feature Flag PSRLQ SSE2 VPSRLQ 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) CPUID Fn0000_0001_EDX[SSE2] (bit 26) VPSRLQ 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 474 PSRLQ, VPSRLQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic PSRLQ xmm1, xmm2/mem128 PSRLQ xmm, imm8 Opcode 66 0F D3 /r 66 0F 73 /2 ib Description Right-shifts packed quadwords in xmm1 as specified by xmm2[63:0] or mem128[63:0]. Right-shifts packed quadwords in xmm as specified by imm8. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPSRLQ xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 D3 /r VPSRLQ xmm1, xmm2, imm8 C4 RXB.01 X.dest.0.01 73 /2 ib VPSRLQ ymm1, ymm2, xmm3/mem128 C4 RXB.01 X.src1.1.01 D3 /r VPSRLQ ymm1, ymm2, imm8 C4 RXB.01 X.dest.1.01 73 /2 ib Related Instructions (V)PSLLD, (V)PSLLDQ, (V)PSLLQ, (V)PSLLW, (V)PSRAD, (V)PSRAW, (V)PSRLD, (V)PSRLDQ, (V)PSRLW, VPSLLVD, VPSLLVQ, VPSRAVD, VPSRLVD, VPSRLVQ rFLAGS Affected None MXCSR Flags Affected None Instruction Reference PSRLQ, VPSRLQ 475 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X A A A S Alignment check, #AC A Page fault, #PF X — AVX, AVX2, and SSE exception A — AVX and AVX2 exception S — SSE exception 476 A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. When alignment checking enabled: • 128-bit memory operand not 16-byte aligned. • 256-bit memory operand not 32-byte aligned. Instruction execution caused a page fault. PSRLQ, VPSRLQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PSRLW VPSRLW Packed Shift Right Logical Words Right-shifts each packed 16-bit value in the source operand as specified by a shift-count operand and writes the shifted values to the destination. The shift-count operand can be an immediate byte, a second register, or a memory location. The shift count is treated as an unsigned integer. When the shift count is provided by a register or memory location, only bits [63:0] of the value are considered. High-order bits emptied by shifting are cleared. When the shift value is greater than 15, the destination is cleared. There are legacy and extended forms of the instruction: PSRLW There are two forms of the instruction, based on the type of count operand. The first source operand is an XMM register. The shift count is specified by either a second XMM register or a 128-bit memory location, or by an immediate 8-bit operand. The first source XMM register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPSRLW The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding There are two 128-bit encodings. These differ based on the type of count operand. The first source operand is an XMM register. The shift count is specified by either a second XMM register or a 128-bit memory location, or by an immediate 8-bit operand. The destination is an XMM register. For the immediate operand encoding, the destination is specified by VEX.vvvv. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding There are two 256-bit encodings. These differ based on the type of count operand. The first source operand is a YMM register. The shift count is specified by either a second XMM register or a 128-bit memory location, or by an immediate 8-bit operand. The destination is a YMM register. For the immediate operand encoding, the destination is specified by VEX.vvvv. Instruction Support Form Subset Feature Flag PSRLW SSE2 VPSRLW 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) CPUID Fn0000_0001_EDX[SSE2] (bit 26) VPSRLW 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Reference PSRLW, VPSRLW 477 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Encoding Mnemonic PSRLW xmm1, xmm2/mem128 PSRLW xmm, imm8 Opcode Description 66 0F D1 /r Right-shifts packed words in xmm1 as specified by xmm2[63:0] or mem128[63:0]. 66 0F 71 /2 ib Right-shifts packed words in xmm as specified by imm8. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPSRLW xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 D1 /r VPSRLW xmm1, xmm2, imm8 C4 RXB.01 X.dest.0.01 71 /2 ib VPSRLW ymm1, ymm2, xmm3/mem128 C4 RXB.01 X.src1.1.01 D1 /r VPSRLW ymm1, ymm2, imm8 C4 RXB.01 X.dest.1.01 71 /2 ib Related Instructions (V)PSLLD, (V)PSLLDQ, (V)PSLLQ, (V)PSLLW, (V)PSRAD, (V)PSRAW, (V)PSRLD, (V)PSRLDQ, (V)PSRLQ, VPSLLVD, VPSLLVQ, VPSRAVD, VPSRLVD, VPSRLVQ rFLAGS Affected None MXCSR Flags Affected None 478 PSRLW, VPSRLW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X A A A S Alignment check, #AC A Page fault, #PF X — AVX, AVX2, and SSE exception A — AVX and AVX2 exception S — SSE exception Instruction Reference A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. When alignment checking enabled: • 128-bit memory operand not 16-byte aligned. • 256-bit memory operand not 32-byte aligned. Instruction execution caused a page fault. PSRLW, VPSRLW 479 AMD64 Technology 26568—Rev. 3.22—May 2018 PSUBB VPSUBB Packed Subtract Bytes Subtracts 16 or 32 packed 8-bit integer values in the second source operand from the corresponding values in the first source operand and writes the integer differences to the corresponding bytes of the destination. This instruction operates on both signed and unsigned integers. When a result overflows, the carry is ignored (neither the overflow nor carry bit in rFLAGS is set), and only the low-order 8 bits of each result are written to the destination. There are legacy and extended forms of the instruction: PSUBB The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPSUBB The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PSUBB SSE2 VPSUBB 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) CPUID Fn0000_0001_EDX[SSE2] (bit 26) VPSUBB 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PSUBB xmm1, xmm2/mem128 Opcode 66 0F F8 /r Description Subtracts 8-bit signed integer values in xmm2 or mem128 from corresponding values in xmm1. Writes differences to xmm1 Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPSUBB xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 F8 /r VPSUBB ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 F8 /r 480 PSUBB, VPSUBB Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Related Instructions (V)PSUBD, (V)PSUBQ, (V)PSUBSB, (V)PSUBSW, (V)PSUBUSB, (V)PSUBUSW, (V)PSUBW rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception Instruction Reference X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PSUBB, VPSUBB 481 AMD64 Technology 26568—Rev. 3.22—May 2018 PSUBD VPSUBD Packed Subtract Doublewords Subtracts four or eight packed 32-bit integer values in the second source operand from the corresponding values in the first source operand and writes the integer differences to the corresponding doubleword of the destination. This instruction operates on both signed and unsigned integers. When a result overflows, the carry is ignored (neither the overflow nor carry bit in rFLAGS is set), and only the low-order 8 bits of each result are written to the destination. There are legacy and extended forms of the instruction: PSUBD The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VSUBD The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PSUBD SSE2 VPSUBD 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) CPUID Fn0000_0001_EDX[SSE2] (bit 26) VPSUBD 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PSUBD xmm1, xmm2/mem128 Opcode 66 0F FA /r Description Subtracts packed 32-bit integer values in xmm2 or mem128 from corresponding values in xmm1. Writes the differences to xmm1 Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPSUBD xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 FA /r VPSUBD ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 FA /r 482 PSUBD, VPSUBD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Related Instructions (V)PSUBB, (V)PSUBQ, (V)PSUBSB, (V)PSUBSW, (V)PSUBUSB, (V)PSUBUSW, (V)PSUBW rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception Instruction Reference X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PSUBD, VPSUBD 483 AMD64 Technology 26568—Rev. 3.22—May 2018 PSUBQ VPSUBQ Packed Subtract Quadword Subtracts two or four packed 64-bit integer values in the second source operand from the corresponding values in the first source operand and writes the differences to the corresponding quadword of the destination. This instruction operates on both signed and unsigned integers. When a result overflows, the carry is ignored (neither the overflow nor carry bit in rFLAGS is set), and only the low-order 8 bits of each result are written to the destination. There are legacy and extended forms of the instruction: PSUBQ The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VSUBQ The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PSUBQ SSE2 VPSUBQ 128-bit AVX CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) VPSUBQ 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PSUBQ xmm1, xmm2/mem128 Opcode Description 66 0F FB /r Subtracts packed 64-bit integer values in xmm2 or mem128 from corresponding values in xmm1. Writes the differences to xmm1 Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPSUBQ xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 FB /r VPSUBQ ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 FB /r 484 PSUBQ, VPSUBQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Related Instructions (V)PSUBB, (V)PSUBD, (V)PSUBSB, (V)PSUBSW, (V)PSUBUSB, (V)PSUBUSW, (V)PSUBW rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception Instruction Reference X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PSUBQ, VPSUBQ 485 AMD64 Technology 26568—Rev. 3.22—May 2018 PSUBSB VPSUBSB Packed Subtract Signed With Saturation Bytes Subtracts 16 or 32 packed 8-bit signed integer value in the second source operand from the corresponding values in the first source operand and writes the signed integer differences to the corresponding byte of the destination. For each packed value in the destination, if the value is larger than the largest signed 8-bit integer, it is saturated to 7Fh, and if the value is smaller than the smallest signed 8-bit integer, it is saturated to 80h. There are legacy and extended forms of the instruction: PSUBSB The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPSUBSB The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PSUBSB SSE2 VPSUBSB 128-bit AVX CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) VPSUBSB 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PSUBSB xmm1, xmm2/mem128 Opcode Description 66 0F E8 /r Subtracts packed 8-bit signed integer values in xmm2 or mem128 from corresponding values in xmm1. Writes the differences to xmm1 Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPSUBSB xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 E8 /r VPSUBSB ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 E8 /r 486 PSUBSB, VPSUBSB Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Related Instructions (V)PSUBB, (V)PSUBD, (V)PSUBQ, (V)PSUBSW, (V)PSUBUSB, (V)PSUBUSW, (V)PSUBW rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception Instruction Reference X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PSUBSB, VPSUBSB 487 AMD64 Technology 26568—Rev. 3.22—May 2018 PSUBSW VPSUBSW Packed Subtract Signed With Saturation Words Subtracts eight or sixteen packed 16-bit signed integer values in the second source operand from the corresponding values in the first source operand and writes the signed integer differences to the corresponding word of the destination. Positive differences greater than 7FFFh are saturated to 7FFFh; negative differences less than 8000h are saturated to 8000h. There are legacy and extended forms of the instruction: PSUBSW The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPSUBSW The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PSUBSW SSE2 VPSUBSW 128-bit AVX CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) VPSUBSW 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PSUBSW xmm1, xmm2/mem128 Opcode Description 66 0F E9 /r Subtracts packed 16-bit signed integer values in xmm2 or mem128 from corresponding values in xmm1. Writes the differences to xmm1 Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPSUBSW xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 E9 /r VPSUBSW ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 E9 /r 488 PSUBSW, VPSUBSW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Related Instructions (V)PSUBB, (V)PSUBD, (V)PSUBQ, (V)PSUBSB, (V)PSUBUSB, (V)PSUBUSW, (V)PSUBW rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception Instruction Reference X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PSUBSW, VPSUBSW 489 AMD64 Technology 26568—Rev. 3.22—May 2018 PSUBUSB VPSUBUSB Packed Subtract Unsigned With Saturation Bytes Subtracts 16 or 32 packed 8-bit unsigned integer value in the second source operand from the corresponding values in the first source operand and writes the unsigned integer difference to the corresponding byte of the destination. Differences less than 00h are saturated to 00h. There are legacy and extended forms of the instruction: PSUBUSB The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPSUBUSB The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PSUBUSB SSE2 VPSUBUSB 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPSUBUSB 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_EDX[SSE2] (bit 26) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PSUBUSB xmm1, xmm2/mem128 Opcode Description 66 0F D8 /r Subtracts packed byte unsigned integer values in xmm2 or mem128 from corresponding values in xmm1. Writes the differences to xmm1 Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPSUBUSB xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 D8 /r VPSUBUSB ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 D8 /r 490 PSUBUSB, VPSUBUSB Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Related Instructions (V)PSUBB, (V)PSUBD, (V)PSUBQ, (V)PSUBSB, (V)PSUBSW, (V)PSUBUSW, (V)PSUBW rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception Instruction Reference X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PSUBUSB, VPSUBUSB 491 AMD64 Technology 26568—Rev. 3.22—May 2018 PSUBUSW VPSUBUSW Packed Subtract Unsigned With Saturation Words Subtracts eight or sixteen packed 16-bit unsigned integer value in the second source operand from the corresponding values in the first source operand and writes the unsigned integer differences to the corresponding word of the destination. Differences less than 0000h are saturated to 0000h. There are legacy and extended forms of the instruction: PSUBUSW The first source operand is an XMM register and the second source operand is an XMM register or 128-bit memory location. The first source operand is also the destination register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPSUBUSW The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PSUBUSW SSE2 VPSUBUSW 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) CPUID Fn0000_0001_EDX[SSE2] (bit 26) VPSUBUSW 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PSUBUSW xmm1, xmm2/mem128 Opcode 66 0F D9 /r Description Subtracts packed 16-bit unsigned integer values in xmm2 or mem128 from corresponding values in xmm1. Writes the differences to xmm1 Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPSUBUSW xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 D9 /r VPSUBUSW ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 D9 /r 492 PSUBUSW, VPSUBUSW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Related Instructions (V)PSUBB, (V)PSUBD, (V)PSUBQ, (V)PSUBSB, (V)PSUBSW, (V)PSUBUSB, (V)PSUBW rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception Instruction Reference X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PSUBUSW, VPSUBUSW 493 AMD64 Technology 26568—Rev. 3.22—May 2018 PSUBW VPSUBW Packed Subtract Words Subtracts eight or sixteen packed 16-bit integer values in the second source operand from the corresponding values in the first source operand and writes the integer differences to the corresponding word of the destination. This instruction operates on both signed and unsigned integers. When a result overflows, the carry is ignored (neither the overflow nor carry bit in rFLAGS is set), and only the low-order 8 bits of each result are written to the destination. There are legacy and extended forms of the instruction: PSUBW The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPSUBW The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PSUBW SSE2 VPSUBW 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) CPUID Fn0000_0001_EDX[SSE2] (bit 26) VPSUBW 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PSUBW xmm1, xmm2/mem128 Opcode Description 66 0F F9 /r Subtracts packed 16-bit integer values in xmm2 or mem128 from corresponding values in xmm1. Writes the differences to xmm1 Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPSUBW xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 F9 /r VPSUBW ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 F9 /r 494 PSUBW, VPSUBW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Related Instructions (V)PSUBB, (V)PSUBD, (V)PSUBQ, (V)PSUBSB, (V)PSUBSW, (V)PSUBUSB, (V)PSUBUSW rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception Instruction Reference X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PSUBW, VPSUBW 495 AMD64 Technology 26568—Rev. 3.22—May 2018 PTEST VPTEST Packed Bit Test First, performs a bitwise AND of the first source operand with the second source operand. Sets rFLAGS.ZF when all bit operations = 0; else, clears ZF. Second. performs a bitwise AND of the second source operand with the logical complement (NOT) of the first source operand. Sets rFLAGS.CF when all bit operations = 0; else, clears CF. Neither source operand is modified. There are legacy and extended forms of the instruction: PTEST The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. VPTEST The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. YMM Encoding The first source operand is a YMM register. The second source operand is a YMM register or 256-bit memory location. Instruction Support Form Subset PTEST SSE4.1 VPTEST AVX Feature Flag CPUID Fn0000_0001_ECX[SSE41] (bit 19) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PTEST xmm1, xmm2/mem128 Opcode Description 66 0F 38 17 /r Set ZF if bitwise AND of xmm2/m128 with xmm1 = 0; else, clear ZF. Set CF if bitwise AND of xmm2/m128 with NOTxmm1 = 0; else, clear CF. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPTEST xmm1, xmm2/mem128 C4 RXB.00010 X.1111.0.01 17 /r VPTEST ymm1, ymm2/mem256 C4 RXB.00010 X.1111.1.01 17 /r Related Instructions VTESTPD, VTESTPS 496 PTEST, VPTEST Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology rFLAGS Affected ID VIP VIF AC VM RF NT IOPL OF DF IF TF 0 21 Note: 20 19 18 17 16 14 13:12 11 10 9 8 SF ZF AF PF CF 0 M 0 0 M 7 6 4 2 0 Bits 31:22, 15, 5, 3 and 1 are reserved. A flag set or cleared is M (modified). Unaffected flags are blank. Undefined flags are U. MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S S S A X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference X S S A A A A X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. PTEST, VPTEST 497 AMD64 Technology PUNPCKHBW VPUNPCKHBW 26568—Rev. 3.22—May 2018 Unpack and Interleave High Bytes Unpacks the 8 high-order bytes of each octword the first and second source operands and interleaves the bytes as they are copied to the destination. The low-order bytes of each octword of the source operands are ignored. Bytes are interleaved in ascending order from the least-significant byte of the upper 8 bytes of each octword of the source operands with bytes from the first source operand occupying the lower byte of each pair copied to the destination. For the 128-bit form of the instruction, the following operations are performed: dest[7:0] = src1[71:64] dest[15:8] = src2[71:64] dest[23:16] = src1[79:72] dest[31:24] = src2[79:72] dest[39:32] = src1[87:80] dest[47:40] = src2[87:80] dest[55:48] = src1[95:88] dest[63:56] = src2[95:88] dest[71:64] = src1[103:96] dest[79:72] = src2[103:96] dest[87:80] = src1[111:104] dest[95:88] = src2[111:104] dest[103:96] = src1[119:112] dest[111:104] = src2[119:112] dest[119:112] = src1[127:120] dest[127:120] = src2[127:120] Additionally, for the 256-bit form of the instruction, the following operations are performed: dest[135:128] = src1[199:192] dest[143:136] = src2[199:192] dest[151:144] = src1[207:200] dest[159:152] = src2[207:200] dest[167:160] = src1[215:208] dest[175:168] = src2[215:208] dest[183:176] = src1[223:216] dest[191:184] = src2[223:216] dest[199:192] = src1[231:224] dest[207:200] = src2[231:224] dest[215:208] = src1[239:232] dest[223:216] = src2[239:232] dest[231:224] = src1[247:240] dest[239:232] = src2[247:240] dest[247:240] = src1[255:248] dest[255:248] = src2[255:248] When the second source operand is all 0s, the destination effectively contains the 8 high-order bytes from the first source operand or the 8 high-order bytes from both octwords of the first source operand zero-extended to 16 bits. This operation is useful for expanding unsigned 8-bit values to unsigned 16-bit operands for subsequent processing that requires higher precision. 498 PUNPCKHBW, VPUNPCKHBW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology There are legacy and extended forms of the instruction: PUNPCKHBW The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The first source operand is also the destination register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPUNPCKHBW The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PUNPCKHBW SSE2 VPUNPCKHBW 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPUNPCKHBW 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_EDX[SSE2] (bit 26) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode PUNPCKHBW xmm1, xmm2/mem128 66 0F 68 /r Description Unpacks and interleaves the high-order bytes of xmm1 and xmm2 or mem128. Writes the bytes to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPUNPCKHBW xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 68 /r VPUNPCKHBW ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 68 /r Related Instructions (V)PUNPCKHDQ, (V)PUNPCKHQDQ, (V)PUNPCKHWD, (V)PUNPCKLBW, (V)PUNPCKLDQ, (V)PUNPCKLQDQ, (V)PUNPCKLWD rFLAGS Affected None Instruction Reference PUNPCKHBW, VPUNPCKHBW 499 AMD64 Technology 26568—Rev. 3.22—May 2018 MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 500 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PUNPCKHBW, VPUNPCKHBW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PUNPCKHDQ VPUNPCKHDQ Unpack and Interleave High Doublewords Unpacks the two high-order doublewords of each octword of the first and second source operands and interleaves the doublewords as they are copied to the destination. The low-order doublewords of each octword of the source operands are ignored. Doublewords are interleaved in ascending order from the least-significant doubleword of the high quadword of each octword with doublewords from the first source operand occupying the lower doubleword of each pair copied to the destination. For the 128-bit form of the instruction, the following operations are performed: dest[31:0] = src1[95:64] dest[63:32] = src2[95:64] dest[95:64] = src1[127:96] dest[127:96] = src2[127:96] Additionally, for the 256-bit form of the instruction, the following operations are performed: dest[159:128] = src1[223:192] dest[191:160] = src2[223:192] dest[223:192] = src1[255:224] dest[255:224] = src2[255:224] When the second source operand is all 0s, the destination effectively receives the 2 high-order doublewords from the first source operand or the 2 high-order doublewords from both octwords of the first source operand zero-extended to 64 bits. This operation is useful for expanding unsigned 32-bit values to unsigned 64-bit operands for subsequent processing that requires higher precision. There are legacy and extended forms of the instruction: PUNPCKHDQ The first source operand is an XMM register and the second source operand is an XMM register or 128-bit memory location. The first source operand is also the destination register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPUNPCKHDQ The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Reference PUNPCKHDQ, VPUNPCKHDQ 501 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Support Form Subset Feature Flag PUNPCKHDQ SSE2 VPUNPCKHDQ 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) CPUID Fn0000_0001_EDX[SSE2] (bit 26) VPUNPCKHDQ 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PUNPCKHDQ xmm1, xmm2/mem128 Opcode Description 66 0F 6A /r Unpacks and interleaves the high-order doublewords of xmm1 and xmm2 or mem128. Writes the doublewords to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPUNPCKHDQ xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 6A /r VPUNPCKHDQ ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 6A /r Related Instructions (V)PUNPCKHBW, (V)PUNPCKHQDQ, (V)PUNPCKHWD, (V)PUNPCKLBW, (V)PUNPCKLDQ, (V)PUNPCKLQDQ, (V)PUNPCKLWD rFLAGS Affected None MXCSR Flags Affected None 502 PUNPCKHDQ, VPUNPCKHDQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception Instruction Reference X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PUNPCKHDQ, VPUNPCKHDQ 503 AMD64 Technology 26568—Rev. 3.22—May 2018 PUNPCKHQDQ VPUNPCKHQDQ Unpack and Interleave High Quadwords Unpacks the high-order quadword of each octword of the first and second source operands and interleaves the quadwords as they are copied to the destination. The low-order quadword of each octword of the source operands is ignored. Quadwords are interleaved in ascending order with the high-order quadword from the first source operand or each octword of the first source operand occupying the lower quadword of corresponding octword of the destination. For the 128-bit form of the instruction, the following operations are performed: dest[63:0] = src1[127:64] dest[127:64] = src2[127:64] Additionally, for the 256-bit form of the instruction, the following operations are performed: dest[191:128] = src1[255:192] dest[255:192] = src2[255:192] When the second source operand is all 0s, the destination effectively receives the quadword from upper half of the first source operand or the high-order quadwords from each octword of the first source operand zero-extended to 128 bits. This operation is useful for expanding unsigned 64-bit values to unsigned 128-bit operands for subsequent processing that requires higher precision. There are legacy and extended forms of the instruction: PUNPCKHQDQ The first source operand is an XMM register and the second source operand is an XMM register or 128-bit memory location. The first source operand is also the destination register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPUNPCKHQDQ The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset PUNPCKHQDQ SSE2 VPUNPCKHQDQ 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPUNPCKHQDQ 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 504 Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) PUNPCKHQDQ, VPUNPCKHQDQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode PUNPCKHQDQ xmm1, xmm2/mem128 Description 66 0F 6D /r Unpacks and interleaves the high-order quadwords of xmm1 and xmm2 or mem128. Writes the bytes to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPUNPCKHQDQ xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 6D /r VPUNPCKHQDQ ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 6D /r Related Instructions (V)PUNPCKHBW, (V)PUNPCKHDQ, (V)PUNPCKHWD, (V)PUNPCKLBW, (V)PUNPCKLDQ, (V)PUNPCKLQDQ, (V)PUNPCKLWD rFLAGS Affected None MXCSR Flags Affected None Instruction Reference PUNPCKHQDQ, VPUNPCKHQDQ 505 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 506 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PUNPCKHQDQ, VPUNPCKHQDQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PUNPCKHWD VPUNPCKHWD Unpack and Interleave High Words Unpacks the 4 high-order words of each octword of the first and second source operands and interleaves the words as they are copied to the destination. The low-order words of each octword of the source operands are ignored. Words are interleaved in ascending order from the least-significant word of the high quadword of each octword with words from the first source operand occupying the lower word of each pair copied to the destination. For the 128-bit form of the instruction, the following operations are performed: dest[15:0] = src1[79:64] dest[31:16] = src2[79:64] dest[47:32] = src1[95:80] dest[63:48] = src2[95:80] dest[79:64] = src1[111:96] dest[95:80] = src2[111:96] dest[111:96] = src1[127:112] dest[127:112] = src2[127:112] Additionally, for the 256-bit form of the instruction, the following operations are performed: dest[143:128] = src1[207:192] dest[159:144] = src2[207:192] dest[175:160] = src1[223:208] dest[191:176] = src2[223:208] dest[207:192] = src1[239:224] dest[223:208] = src2[239:224] dest[239:224] = src1[255:240] dest[255:240] = src2[255:240] When the second source operand is all 0s, the destination effectively receives the 4 high-order words from the first source operand or the 4 high-order words from both octwords of the first source operand zero-extended to 32 bits. This operation is useful for expanding unsigned 16-bit values to unsigned 32-bit operands for subsequent processing that requires higher precision. There are legacy and extended forms of the instruction: PUNPCKHWD The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The first source operand is also the destination register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPUNPCKHWD The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Reference PUNPCKHWD, VPUNPCKHWD 507 AMD64 Technology 26568—Rev. 3.22—May 2018 YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset PUNPCKHWD SSE2 Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) VPUNPCKHWD 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPUNPCKHWD 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode PUNPCKHWD xmm1, xmm2/mem128 66 0F 69 /r Description Unpacks and interleaves the high-order words of xmm1 and xmm2 or mem128. Writes the words to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPUNPCKHWD xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 69 /r VPUNPCKHWD ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 69 /r Related Instructions (V)PUNPCKHBW, (V)PUNPCKHDQ, (V)PUNPCKHQDQ, (V)PUNPCKLBW, (V)PUNPCKLDQ, (V)PUNPCKLQDQ, (V)PUNPCKLWD rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S Invalid opcode, #UD 508 X S S A A A A X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. PUNPCKHWD, VPUNPCKHWD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Device not available, #NM Stack, #SS General protection, #GP Mode Real Virt Prot S S S S S S X X X X S S S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception Instruction Reference X Cause of Exception CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PUNPCKHWD, VPUNPCKHWD 509 AMD64 Technology PUNPCKLBW VPUNPCKLBW 26568—Rev. 3.22—May 2018 Unpack and Interleave Low Bytes Unpacks the 8 low-order bytes of each octword of the first and second source operands and interleaves the bytes as they are copied to the destination. The high-order bytes of each octword are ignored. Bytes are interleaved in ascending order from the least-significant byte of source operands with bytes from the first source operand occupying the lower byte of each pair copied to the destination. For the 128-bit form of the instruction, the following operations are performed: dest[7:0] = src1[7:0] dest[15:8] = src2[7:0] dest[23:16] = src1[15:8] dest[31:24] = src2[15:8] dest[39:32] = src1[23:16] dest[47:40] = src2[23:16] dest[55:48] = src1[31:24] dest[63:56] = src2[31:24] dest[71:64] = src1[39:32] dest[79:72] = src2[39:32] dest[87:80] = src1[47:40] dest[95:88] = src2[47:40] dest[103:96] = src1[55:48] dest[111:104] = src2[55:48] dest[119:112] = src1[63:56] dest[127:120] = src2[63:56] Additionally, for the 256-bit form of the instruction, the following operations are performed: dest[135:128] = src1[135:128] dest[143:136] = src2[135:128] dest[151:144] = src1[143:136] dest[159:152] = src2[143:136] dest[167:160] = src1[151:144] dest[175:168] = src2[151:144] dest[183:176] = src1[159:152] dest[191:184] = src2[159:152] dest[199:192] = src1[167:160] dest[207:200] = src2[167:160] dest[215:208] = src1[175:168] dest[223:216] = src2[175:168] dest[231:224] = src1[183:176] dest[239:232] = src2[183:176] dest[247:240] = src1[191:184] dest[255:248] = src2[191:184] When the second source operand is all 0s, the destination effectively receives the eight low-order bytes from the first source operand or the eight low-order bytes from both octwords of the first source operand zero-extended to 16 bits. This operation is useful for expanding unsigned 8-bit values to unsigned 16-bit operands for subsequent processing that requires higher precision. 510 PUNPCKLBW, VPUNPCKLBW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology There are legacy and extended forms of the instruction: PUNPCKLBW The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The first source operand is also the destination register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPUNPCKLBW The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PUNPCKLBW SSE2 VPUNPCKLBW 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPUNPCKLBW 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) CPUID Fn0000_0001_EDX[SSE2] (bit 26) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode PUNPCKLBW xmm1, xmm2/mem128 Description 66 0F 60 /r Unpacks and interleaves the low-order bytes of xmm1 and xmm2 or mem128. Writes the bytes to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPUNPCKLBW xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 60 /r VPUNPCKLBW ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 60 /r Related Instructions (V)PUNPCKHBW, (V)PUNPCKHDQ, (V)PUNPCKHQDQ, (V)PUNPCKHWD, (V)PUNPCKLDQ, (V)PUNPCKLQDQ, (V)PUNPCKLWD rFLAGS Affected None MXCSR Flags Affected None Instruction Reference PUNPCKLBW, VPUNPCKLBW 511 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 512 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PUNPCKLBW, VPUNPCKLBW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PUNPCKLDQ VPUNPCKLDQ Unpack and Interleave Low Doublewords Unpacks the two low-order doublewords of each octword of the first and second source operands and interleaves the doublewords as they are copied to the destination. The high-order doublewords of each octword of the source operands are ignored. Doublewords are interleaved in ascending order from the least-significant doubleword of the sources with doublewords from the first source operand occupying the lower doubleword of each pair copied to the destination. For the 128-bit form of the instruction, the following operations are performed: dest[31:0] = src1[31:0] dest[63:32] = src2[31:0] dest[95:64] = src1[63:32] dest[127:96] = src2[63:32] Additionally, for the 256-bit form of the instruction, the following operations are performed: dest[159:128] = src1[159:128] dest[191:160] = src2[159:128] dest[223:192] = src1[191:160] dest[255:224] = src2[191:160] When the second source operand is all 0s, the destination effectively receives the two low-order doublewords from the first source operand or the two low-order doublewords from both octwords of the source operand zero-extended to 64 bits. This operation is useful for expanding unsigned 32-bit values to unsigned 64-bit operands for subsequent processing that requires higher precision. There are legacy and extended forms of the instruction: PUNPCKLDQ The first source operand is an XMM register and the second source operand is an XMM register or 128-bit memory location. The first source operand is also the destination register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPUNPCKLDQ The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Reference PUNPCKLDQ, VPUNPCKLDQ 513 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Support Form Subset Feature Flag PUNPCKLDQ SSE2 VPUNPCKLDQ 128-bit AVX CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) VPUNPCKLDQ 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PUNPCKLDQ xmm1, xmm2/mem128 Opcode Description 66 0F 62 /r Unpacks and interleaves the low-order doublewords of xmm1 and xmm2 or mem128. Writes the doublewords to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPUNPCKLDQ xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 62 /r VPUNPCKLDQ ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 62 /r Related Instructions (V)PUNPCKHW, (V)PUNPCKHDQ, (V)PUNPCKHQDQ, (V)PUNPCKHWD, (V)PUNPCKLBW, (V)PUNPCKLQDQ, (V)PUNPCKLWD rFLAGS Affected None MXCSR Flags Affected None 514 PUNPCKLDQ, VPUNPCKLDQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception Instruction Reference X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PUNPCKLDQ, VPUNPCKLDQ 515 AMD64 Technology 26568—Rev. 3.22—May 2018 PUNPCKLQDQ VPUNPCKLQDQ Unpack and Interleave Low Quadwords Unpacks the low-order quadword of each octword of the first and second source operands and interleaves the quadwords as they are copied to the destination. The high-order quadword of each octword of the source operands is ignored. Quadwords are interleaved in ascending order from the least-significant quadword of the sources with quadwords from the first source operand occupying the lower quadword of each pair copied to the destination. For the 128-bit form of the instruction, the following operations are performed: dest[63:0] = src1[63:0] dest[127:64] = src2[63:0] Additionally, for the 256-bit form of the instruction, the following operations are performed: dest[191:128] = src1[191:128] dest[255:192] = src2[191:128] When the second source operand is all 0s, the destination effectively receives the low-order quadword from the first source operand or the low-order quadword of both octwords of the first source operand zero-extended to 128 bits. This operation is useful for expanding unsigned 64-bit values to unsigned 128-bit operands for subsequent processing that requires higher precision. There are legacy and extended forms of the instruction: PUNPCKLQDQ The first source operand is an XMM register and the second source operand is an XMM register or 128-bit memory location. The first source operand is also the destination register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPUNPCKLQDQ The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset PUNPCKLQDQ SSE2 VPUNPCKLQDQ 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPUNPCKLQDQ 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) 516 Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) PUNPCKLQDQ, VPUNPCKLQDQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode PUNPCKLQDQ xmm1, xmm2/mem128 Description 66 0F 6C /r Unpacks and interleaves the low-order quadwords of xmm1 and xmm2 or mem128. Writes the bytes to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPUNPCKLQDQ xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 6C /r VPUNPCKLQDQ ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 6C /r Related Instructions (V)PUNPCKHBW, (V)PUNPCKHDQ, (V)PUNPCKHQDQ, (V)PUNPCKHWD, (V)PUNPCKLBW, (V)PUNPCKLDQ, (V)PUNPCKLWD rFLAGS Affected None MXCSR Flags Affected None Instruction Reference PUNPCKLQDQ, VPUNPCKLQDQ 517 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 518 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PUNPCKLQDQ, VPUNPCKLQDQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology PUNPCKLWD VPUNPCKLWD Unpack and Interleave Low Words Unpacks the four low-order words of each octword of the first and second source operands and interleaves the words as they are copied to the destination. The high-order words of each octword of the source operands are ignored. Words are interleaved in ascending order from the least-significant word of the source operands with words from the first source operand occupying the lower word of each pair copied to the destination. For the 128-bit form of the instruction, the following operations are performed: dest[15:0] = src1[15:0] dest[31:16] = src2[15:0] dest[47:32] = src1[31:16] dest[63:48] = src2[31:16] dest[79:64] = src1[47:32] dest[95:80] = src2[47:32] dest[111:96] = src1[63:48] dest[127:112] = src2[63:48] Additionally, for the 256-bit form of the instruction, the following operations are performed: dest[143:128] = src1[143:128] dest[159:144] = src2[143:128] dest[175:160] = src1[159:144] dest[191:176] = src2[159:144] dest[207:192] = src1[175:160] dest[223:208] = src2[175:160] dest[239:224] = src1[191:176] dest[255:240] = src2[191:176] When the second source operand is all 0s, the destination effectively receives the 4 low-order words from the first source operand or the 4 low-order words of each octword of the first source operand zero-extended to 32 bits. This operation is useful for expanding unsigned 16-bit values to unsigned 32-bit operands for subsequent processing that requires higher precision. There are legacy and extended forms of the instruction: PUNPCKLWD The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The first source operand is also the destination register. Bits [255:128] of the YMM register that corresponds to the destination are not affected. PUNPCKLWD The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Reference PUNPCKLWD, VPUNPCKLWD 519 AMD64 Technology 26568—Rev. 3.22—May 2018 YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset PUNPCKLWD SSE2 Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) VPUNPCKLWD 128-bit AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VPUNPCKLWD 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PUNPCKLWD xmm1, xmm2/mem128 Opcode Description 66 0F 61 /r Unpacks and interleaves the low-order words of xmm1 and xmm2 or mem128. Writes the words to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPUNPCKLWD xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 61 /r VPUNPCKLWD ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 61 /r Related Instructions (V)PUNPCKHBW, (V)PUNPCKHDQ, (V)PUNPCKHQDQ, (V)PUNPCKHWD, (V)PUNPCKLBW, (V)PUNPCKLDQ, (V)PUNPCKLQDQ rFLAGS Affected None MXCSR Flags Affected None 520 PUNPCKLWD, VPUNPCKLWD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception Instruction Reference X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PUNPCKLWD, VPUNPCKLWD 521 AMD64 Technology 26568—Rev. 3.22—May 2018 PXOR VPXOR Packed Exclusive OR Performs a bitwise XOR of the first and second source operands and writes the result to the destination. When either of a pair of corresponding bits in the first and second operands are set, the corresponding bit of the destination is set; when both source bits are set or when both source bits are not set, the destination bit is cleared. There are legacy and extended forms of the instruction: PXOR The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The first source XMM register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VPXOR The extended form of the instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag PXOR SSE2 VPXOR 128-bit AVX CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) VPXOR 256-bit AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic PXOR xmm1, xmm2/mem128 Opcode Description 66 0F EF /r Performs bitwise XOR of values in xmm1 and xmm2 or mem128. Writes the result to xmm1 Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPXOR xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 EF /r VPXOR ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 EF /r Related Instructions (V)PAND, (V)PANDN, (V)POR 522 PXOR, VPXOR Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception Instruction Reference X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. PXOR, VPXOR 523 AMD64 Technology 26568—Rev. 3.22—May 2018 RCPPS VRCPPS Reciprocal Packed Single-Precision Floating-Point Computes the approximate reciprocal of each packed single-precision floating-point value in the source operand and writes the results to the corresponding doubleword of the destination. MXCSR.RC as no effect on the result. The maximum error is less than or equal to 1.5 * 2–12 times the true reciprocal. A source value that is ±zero or denormal returns an infinity of the source value sign. Results that underflow are changed to signed zero. For both SNaN and QNaN source operands, a QNaN is returned. There are legacy and extended forms of the instruction: RCPPS Computes four reciprocals. The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VRCPPS The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Computes four reciprocals. The source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Computes eight reciprocals. The source operand is either a YMM register or a 256-bit memory location. The destination is a YMM register. Instruction Support Form Subset RCPPS SSE2 VRCPPS AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode RCPPS xmm1, xmm2/mem128 0F 53 /r Description Computes reciprocals of packed single-precision floatingpoint values in xmm1 or mem128. Writes result to xmm1 Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VRCPPS xmm1, xmm2/mem128 C4 RXB.01 X.1111.0.00 53 /r VRCPPS ymm1, ymm2/mem256 C4 RXB.01 X.1111.1.00 53 /r 524 RCPPS, VRCPPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Related Instructions (V)RCPSS, (V)RSQRTPS, (V)RSQRTSS rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S S S A X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference X S S A A A A X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. RCPPS, VRCPPS 525 AMD64 Technology 26568—Rev. 3.22—May 2018 RCPSS VRCPSS Reciprocal Scalar Single-Precision Floating-Point Computes the approximate reciprocal of the scalar single-precision floating-point value in a source operand and writes the results to the low-order doubleword of the destination. MXCSR.RC as no effect on the result. The maximum error is less than or equal to 1.5 * 2–12 times the true reciprocal. A source value that is ±zero or denormal returns an infinity of the source value sign. Results that underflow are changed to signed zero. For both SNaN and QNaN source operands, a QNaN is returned. There are legacy and extended forms of the instruction: RCPSS The source operand is either an XMM register or a 32-bit memory location. The destination is an XMM register. Bits [127:32] of the destination are not affected. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VRCPSS The extended form of the instruction has a 128-bit encoding only. The first source operand and the destination are XMM registers. The second source operand is either an XMM register or a 32-bit memory location. Bits [31:0] of the destination contain the reciprocal; bits [127:32] of the destination are copied from the first source register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset Feature Flag RCPSS SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) VRCPSS AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic RCPSS xmm1, xmm2/mem32 Opcode Description F3 0F 53 /r Computes reciprocal of scalar single-precision floating-point value in xmm1 or mem32. Writes the result to xmm1. Mnemonic Encoding VEX RXB.map_select VRCPSS xmm1, xmm2, xmm3/mem128 C4 RXB.01 W.vvvv.L.pp Opcode X.src1.X.10 53 /r Related Instructions (V)RCPPS, (V)RSQRTPS, (V)RSQRTSS rFLAGS Affected None 526 RCPSS, VRCPSS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference X A S S X A S S S S S S S S S S S S X S S A A A X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. RCPSS, VRCPSS 527 AMD64 Technology 26568—Rev. 3.22—May 2018 ROUNDPD VROUNDPD Round Packed Double-Precision Floating-Point Rounds two or four double-precision floating-point values as specified by an immediate byte operand. Source values are rounded to integral values and written to the destination as double-precision floating-point values. SNaN source values are converted to QNaN. When DAZ =1, denormals are converted to zero before rounding. The immediate byte operand is defined as follows. 7 4 3 2 Reserved P O 1 0 RC Bits Mnemonic Description [7:4] — Reserved [3] P Precision Exception [2] O Rounding Control Source [1:0] RC Rounding Control Precision exception definitions: Value Description 0 Normal PE exception 1 PE field is not updated. No precision exception is taken when unmasked. Rounding control source definitions: Value Description 0 Use RC from immediate operand 1 Use RC from MXCSR Rounding control definition: Value Description 00 Nearest 01 Downward (toward negative infinity) 10 Upward (toward positive infinity) 11 Truncated There are legacy and extended forms of the instruction: ROUNDPD Rounds two source values. The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. There is a third 8-bit immediate operand. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. 528 ROUNDPD, VROUNDPD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology VROUNDPD The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Rounds two source values. The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. There is a third 8-bit immediate operand. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Rounds four source values. The first source operand is a YMM register and the second source operand is either a YMM register or a 256-bit memory location. There is a third 8-bit immediate operand. The destination is a third YMM register. Instruction Support Form Subset PCMPEQQ SSE4.1 VPCMPEQQ AVX Feature Flag CPUID Fn0000_0001_ECX[SSE41] (bit 19) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic ROUNDPD xmm1, xmm2/mem128, imm8 Opcode Description 66 0F 3A 09 /r ib Rounds double-precision floating-point values in xmm2 or mem128. Writes rounded doubleprecision values to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VROUNDPD xmm1, xmm2/mem128, imm8 C4 RXB.03 X.1111.0.01 09 /r ib VROUNDPD ymm1, xmm2/mem256, imm8 C4 RXB.03 X.1111.1.01 09 /r ib Related Instructions (V)ROUNDPS, (V)ROUNDSD, (V)ROUNDSS rFLAGS Affected None MXCSR Flags Affected MM FZ 17 15 RC PM UM OM ZM DM IM DAZ 12 11 10 9 8 7 6 PE UE OE ZE DE 4 3 2 1 M 14 13 5 IE M 0 Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank. Instruction Reference ROUNDPD, VROUNDPD 529 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A A X S S X S S S S S S S S X X X S X S S S S A X S S X S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception 530 X X X A source operand was an SNaN value. Undefined operation. A result could not be represented exactly in the destination format. ROUNDPD, VROUNDPD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology ROUNDPS VROUNDPS Round Packed Single-Precision Floating-Point Rounds four or eight single-precision floating-point values as specified by an immediate byte operand. Source values are rounded to integral values and written to the destination as single-precision floating-point values. SNaN source values are converted to QNaN. When DAZ =1, denormals are converted to zero before rounding. The immediate byte operand is defined as follows. 7 4 3 2 Reserved P O 1 0 RC Bits Mnemonic Description [7:4] — Reserved [3] P Precision Exception [2] O Rounding Control Source [1:0] RC Rounding Control Precision exception definitions: Value Description 0 Normal PE exception 1 PE field is not updated. No precision exception is taken when unmasked. Rounding control source definitions: Value Description 0 Use RC from immediate operand 1 Use RC from MXCSR Rounding control definition: Value Description 00 Nearest 01 Downward (toward negative infinity) 10 Upward (toward positive infinity) 11 Truncated There are legacy and extended forms of the instruction: ROUNDPS Rounds four source values. The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. There is a third 8-bit immediate operand. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. Instruction Reference ROUNDPS, VROUNDPS 531 AMD64 Technology 26568—Rev. 3.22—May 2018 VROUNDPS The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Rounds four source values. The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. There is a third 8-bit immediate operand. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Rounds eight source values. The first source operand is a YMM register and the second source operand is either a YMM register or a 256-bit memory location. There is a third 8-bit immediate operand. The destination is a third YMM register. Instruction Support Form Subset ROUNDPS SSE4.1 VROUNDPS AVX Feature Flag CPUID Fn0000_0001_ECX[SSE41] (bit 19) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode ROUNDPS xmm1, xmm2/mem128, imm8 Description 66 0F 3A 08 /r ib Rounds single-precision floating-point values in xmm2 or mem128. Writes rounded single-precision values to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VROUNDPS xmm1, xmm2/mem128, imm8 C4 RXB.03 X.1111.0.01 08 /r ib VROUNDPS ymm1, xmm2/mem256, imm8 C4 RXB.03 X.1111.1.01 08 /r ib Related Instructions (V)ROUNDPD, (V)ROUNDSD, (V)ROUNDSS rFLAGS Affected None MXCSR Flags Affected MM FZ 17 15 RC PM UM OM ZM DM IM DAZ 12 11 10 9 8 7 6 PE UE OE ZE DE 4 3 2 1 M 14 13 5 IE M 0 Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank. 532 ROUNDPS, VROUNDPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A A X S S X S S S S S S S S X X X S X S S S S A X S S X S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference X X X A source operand was an SNaN value. Undefined operation. A result could not be represented exactly in the destination format. ROUNDPS, VROUNDPS 533 AMD64 Technology 26568—Rev. 3.22—May 2018 ROUNDSD VROUNDSD Round Scalar Double-Precision Rounds a scalar double-precision floating-point value as specified by an immediate byte operand. Source values are rounded to integral values and written to the destination as double-precision floating-point values. SNaN source values are converted to QNaN. When DAZ =1, denormals are converted to zero before rounding. The immediate byte operand is defined as follows. 7 4 3 2 Reserved P O 1 0 RC Bits Mnemonic Description [7:4] — Reserved [3] P Precision Exception [2] O Rounding Control Source [1:0] RC Rounding Control Precision exception definitions: Value Description 0 Normal PE exception 1 PE field is not updated. No precision exception is taken when unmasked. Rounding control source definitions: Value Description 0 Use RC from immediate operand 1 Use RC from MXCSR Rounding control definition: Value Description 00 Nearest 01 Downward (toward negative infinity) 10 Upward (toward positive infinity) 11 Truncated There are legacy and extended forms of the instruction: ROUNDSD The source operand is either an XMM register or a 64-bit memory location. When the source is an XMM register, the value to be rounded must be in the low quadword. The destination is an XMM register. There is a third 8-bit immediate operand. Bits [127:64] of the destination are not affected. Bits [255:128] of the YMM register that corresponds to destination XMM register are not affected. 534 ROUNDSD, VROUNDSD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology VROUNDSD The extended form of the instruction has a 128-bit encoding only. The first source operand is an XMM register. The second source operand is either an XMM register or a 64-bit memory location. The destination is a third XMM register. There is a fourth 8-bit immediate operand. Bits [127:64] of the destination are copied from the first source operand. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset ROUNDSD SSE4.1 VROUNDSD AVX Feature Flag CPUID Fn0000_0001_ECX[SSE41] (bit 19) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic ROUNDSD xmm1, xmm2/mem64, imm8 Opcode Description 66 0F 3A 0B /r ib Rounds a double-precision floating-point value in xmm2[63:0] or mem64. Writes a rounded double-precision value to xmm1. Mnemonic Encoding VROUNDSD xmm1, xmm2, xmm3/mem64, imm8 VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.03 X.src1.X.01 0B /r ib Related Instructions (V)ROUNDPD, (V)ROUNDPS, (V)ROUNDSS rFLAGS Affected None MXCSR Flags Affected MM FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE M 17 15 14 13 12 11 10 9 8 7 6 5 IE M 4 3 2 1 0 Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank. Instruction Reference ROUNDSD, VROUNDSD 535 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X X X X S S X S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception 536 X X X A source operand was an SNaN value. Undefined operation. A result could not be represented exactly in the destination format. ROUNDSD, VROUNDSD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology ROUNDSS VROUNDSS Round Scalar Single-Precision Rounds a scalar single-precision floating-point value as specified by an immediate byte operand. Source values are rounded to integral values and written to the destination as single-precision floating-point values. SNaN source values are converted to QNaN. When DAZ =1, denormals are converted to zero before rounding. The immediate byte operand is defined as follows. 7 4 3 2 Reserved P O 1 0 RC Bits Mnemonic Description [7:4] — Reserved [3] P Precision Exception [2] O Rounding Control Source [1:0] RC Rounding Control Precision exception definitions: Value Description 0 Normal PE exception 1 PE field is not updated. No precision exception is taken when unmasked. Rounding control source definitions: Value Description 0 Use RC from immediate operand 1 Use RC from MXCSR Rounding control definition: Value Description 00 Nearest 01 Downward (toward negative infinity) 10 Upward (toward positive infinity) 11 Truncated There are legacy and extended forms of the instruction: ROUNDSS The source operand is either an XMM register or a 32-bit memory location. When the source is an XMM register, the value to be rounded must be in the low doubleword. The destination is an XMM register. There is a third 8-bit immediate operand. Bits [127:32] of the destination are not affected. Bits [255:128] of the YMM register that corresponds to destination XMM register are not affected. Instruction Reference ROUNDSS, VROUNDSS 537 AMD64 Technology 26568—Rev. 3.22—May 2018 VROUNDSS The extended form of the instruction has a 128-bit encoding only. The first source operand is an XMM register. The second source operand is either an XMM register or a 32-bit memory location. The destination is a third XMM register. There is a fourth 8-bit immediate operand. Bits [127:32] of the destination are copied from the first source operand. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset ROUNDSS SSE4.1 VROUNDSS AVX Feature Flag CPUID Fn0000_0001_ECX[SSE41] (bit 19) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic ROUNDSS xmm1, xmm2/mem64, imm8 Opcode Description 66 0F 3A 0A /r ib Rounds a single-precision floating-point value in xmm2[63:0] or mem64. Writes a rounded single-precision value to xmm1. Mnemonic Encoding VROUNDSS xmm1, xmm2, xmm3/mem64, imm8 VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.03 X.src1.X.01 0A /r ib Related Instructions (V)ROUNDPD, (V)ROUNDPS, (V)ROUNDSD rFLAGS Affected None MXCSR Flags Affected MM FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE M 17 15 14 13 12 11 10 9 8 7 6 5 IE M 4 3 2 1 0 Note: A flag that may be set or cleared is M (modified). Unaffected flags are blank. 538 ROUNDSS, VROUNDSS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X X X X S S X S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference X X X A source operand was an SNaN value. Undefined operation. A result could not be represented exactly in the destination format. ROUNDSS, VROUNDSS 539 AMD64 Technology 26568—Rev. 3.22—May 2018 RSQRTPS VRSQRTPS Reciprocal Square Root Packed Single-Precision Floating-Point Computes the approximate reciprocal of the square root of each packed single-precision floatingpoint value in the source operand and writes the results to the corresponding doublewords of the destination. MXCSR.RC has no effect on the result. The maximum error is less than or equal to 1.5 * 2–12 times the true reciprocal square root. A source value that is ±zero or denormal returns an infinity of the source value sign. Negative source values other than –zero and –denormal return a QNaN floating-point indefinite value. For both SNaN and QNaN source operands, a QNaN is returned. There are legacy and extended forms of the instruction: RSQRTPS Computes four values. The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VRSQRTPS The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Computes four values. The destination is an XMM register. The source operand is either an XMM register or a 128-bit memory location. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Computes eight values. The destination is a YMM register. The source operand is either a YMM register or a 256-bit memory location. Instruction Support Form Subset Feature Flag RSQRTPS SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) VRSQRTPS AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode RSQRTPS xmm1, xmm2/mem128 0F 52 /r Description Computes reciprocals of square roots of packed singleprecision floating-point values in xmm1 or mem128. Writes result to xmm1 Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VRSQRTPS xmm1, xmm2/mem128 C4 RXB.01 X.1111.0.00 52 /r VRSQRTPS ymm1, ymm2/mem256 C4 RXB.01 X.1111.1.00 52 /r 540 RSQRTPS, VRSQRTPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Related Instructions (V)RSQRTSS, (V)SQRTPD, (V)SQRTPS, (V)SQRTSD, (V)SQRTSS rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S S S A X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference X S S A A A A X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. RSQRTPS, VRSQRTPS 541 AMD64 Technology 26568—Rev. 3.22—May 2018 RSQRTSS VRSQRTSS Reciprocal Square Root Scalar Single-Precision Floating-Point Computes the approximate reciprocal of the square root of the scalar single-precision floating-point value in a source operand and writes the result to the low-order doubleword of the destination. MXCSR.RC as no effect on the result. The maximum error is less than or equal to 1.5 * 2–12 times the true reciprocal square root. A source value that is ±zero or denormal returns an infinity of the source value’s sign. Negative source values other than –zero and –denormal return a QNaN floating-point indefinite value. For both SNaN and QNaN source operands, a QNaN is returned. There are legacy and extended forms of the instruction: RSQRTSS The source operand is either an XMM register or a 32-bit memory location. The destination is an XMM register. Bits [127:32] of the destination are not affected. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VRSQRTSS The extended form of the instruction has a 128-bit encoding only. The first source operand and the destination are XMM registers. The second source operand is either an XMM register or a 32-bit memory location. Bits [31:0] of the destination contain the reciprocal square root of the single-precision floating-point value held in bits [31:0] of the second source operand; bits [127:32] of the destination are copied from the first source register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset Feature Flag RSQRTSS SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) VRSQRTSS AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic RSQRTSS xmm1, xmm2/mem32 Opcode F3 0F 52 /r Description Computes reciprocal of square root of a scalar singleprecision floating-point value in xmm1 or mem32. Writes result to xmm1 Mnemonic Encoding VEX RXB.map_select VRSQRTSS xmm1, xmm2, xmm3/mem128 C4 RXB.01 W.vvvv.L.pp Opcode X.src1.X.10 52 /r Related Instructions (V)RSQRTPS, (V)SQRTPD, (V)SQRTPS, (V)SQRTSD, (V)SQRTSS 542 RSQRTSS, VRSQRTSS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference X A S S X A S S S S S S S S S S S S X S S A A A X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. RSQRTSS, VRSQRTSS 543 AMD64 Technology 26568—Rev. 3.22—May 2018 SHA1RNDS4 Four Rounds of SHA1 Execute 4 rounds of a SHA1 operation using the 4 double words (A, B, C, D) from the first source operand, and value E from the second operand. The lower two bits of the immediate are used to specify the function and constant appropriate for the current round of processing. The resulting (A, B, C, D) is placed in the destination register which is the same as the first source register. The following function is performed: A SRC1[127:96]; B SRC1[95:64]; C SRC1[63:32]; D SRC1[31:0]; W0E SRC2[127:96]; W1 SRC2[95:64]; W2 SRC2[63:32]; W3 SRC2[31:0]; i=imm[1:0] which determines f_i and K_i First Round operation: A_1 f_ 0(B, C, D) + (A Rotate Left 5) +W0E +K_0; B_1 A; C_1 B Rotate Left 30; D_1 C; E_1 D; FOR j = 1 to 3 { A_(j +1) f_j(B_j, C_j, D_j) + (A_j Rotate Left 5) +Wj+ E_j +K_i; B_(j+1) <- A_j; C_(j +1) B_j Rotate Left 30; D_(j +1) C_j; E_(j +1) D_j; } DEST[127:96] A_4; DEST[95:64] B_4; DEST[63:32] C_4; DEST[31:0] D_4; Mnemonic Opcode SHA1RNDS4 xmm1, xmm2/m128, imm8 0F 3A CC /r ib Description Executes 4 Rounds of SHA1 Related Instructions SHA1NEXTE, SHA1MSG1, SHA1MSG2 544 RSQRTSS, VRSQRTSS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology rFLAGS Affected None MXCSR Flags Affected None Exceptions Exceptions Invalid opcode, #UD Real Virtual Protected 8086 X X A A S S X Cause of Exception Instruction not supported by CPUID AVX instructions are only recognized in protected mode S CR0.EM=1 OR CR4.OSFXSR=0 A CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE] A XFEATURE_ENABLED_MASK[2:1] ! = 11b. A VEX.L = 1 when AVX2 not supported. A REX, F2, F3, or 66 prefix preceding VEX prefix. S S X Lock prefix (F0h) preceding opcode. Device not available, #NM S S X CR0.TS = 1. Stack, #SS S S X Memory address exceeding stack segment limit or non-canonical. General protection, #GP S S X Memory address exceeding data segment limit or non-canonical. X Null data segment used to reference memory S Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. A Alignment checking enabled and 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. X A page fault resulted from the execution of the instruction Alignment check, #AC S Page Fault, #PF S S X - SSE, AVX, and AVX2 exception A - AVX, AVX2 exception S - SSE exception Instruction Reference RSQRTSS, VRSQRTSS 545 AMD64 Technology 26568—Rev. 3.22—May 2018 SHA1NEXTE Calculate Next E SHA1 Calculate what the next E register values should be after 4 rounds of a SHA1 operation using the 4 double words from the second source operand, and value A from the first operand. The resulting E is placed in the destination register which is the same as the first source register. DEST[127:96] SRC2[127:96] + (SRC1[127:96] rotated left 30) DEST[95:0] SRC2[95:0]; Mnemonic Opcode SHA1NEXTE xmm1,xmm2/m128 0F 38 C8 /r Description Calculate Next E of SHA1 Related Instructions SHA1RNDS4, SHA1MSG1, SHA1MSG2 rFLAGS Affected None MXCSR Flags Affected None Exceptions Exceptions Invalid opcode, #UD Real Virtual Protected 8086 X X A A S S X Cause of Exception Instruction not supported by CPUID AVX instructions are only recognized in protected mode S CR0.EM=1 OR CR4.OSFXSR=0 A CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE] A XFEATURE_ENABLED_MASK[2:1] ! = 11b. A VEX.L = 1 when AVX2 not supported. A REX, F2, F3, or 66 prefix preceding VEX prefix. S S X Lock prefix (F0h) preceding opcode. Device not available, #NM S S X CR0.TS = 1. Stack, #SS S S X Memory address exceeding stack segment limit or non-canonical. General protection, #GP S S X Memory address exceeding data segment limit or non-canonical. X Null data segment used to reference memory 546 RSQRTSS, VRSQRTSS Instruction Reference 26568—Rev. 3.22—May 2018 Exceptions Alignment check, #AC Real S Page Fault, #PF AMD64 Technology Virtual Protected 8086 S S Cause of Exception S Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. A Alignment checking enabled and 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. X A page fault resulted from the execution of the instruction X - SSE, AVX, and AVX2 exception A - AVX, AVX2 exception S - SSE exception Instruction Reference RSQRTSS, VRSQRTSS 547 AMD64 Technology 26568—Rev. 3.22—May 2018 SHA1MSG1 Message Intermediate 1 Performs the 1st of two intermediate calculations necessary before doing the next four rounds of the SHA1 message. DEST[127:96] SRC1[63:32] XOR DEST[95:64] SRC1[31:0] XOR DEST[63:32] SRC2[127:96] XOR DEST[31:0] SRC2[95:64] XOR SRC1[127:96] SRC1[95:64] SRC1[63:32] SRC1[31:0] Mnemonic Opcode SHA1MSG1 xmm1, xmm2/m128 0F 38 C9 /r Description Calculate Message Intermediate 1 Related Instructions SHA1RNDS4, SHA1NEXTE, SHA1MSG2 rFLAGS Affected None MXCSR Flags Affected None Exceptions Exceptions Invalid opcode, #UD Real Virtual Protected 8086 X X A A S S X Cause of Exception Instruction not supported by CPUID AVX instructions are only recognized in protected mode S CR0.EM=1 OR CR4.OSFXSR=0 A CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE] A XFEATURE_ENABLED_MASK[2:1] ! = 11b. A VEX.L = 1 when AVX2 not supported. A REX, F2, F3, or 66 prefix preceding VEX prefix. S S X Lock prefix (F0h) preceding opcode. Device not available, #NM S S X CR0.TS = 1. Stack, #SS S S X Memory address exceeding stack segment limit or non-canonical. General protection, #GP S S X Memory address exceeding data segment limit or non-canonical. X Null data segment used to reference memory 548 RSQRTSS, VRSQRTSS Instruction Reference 26568—Rev. 3.22—May 2018 Exceptions Alignment check, #AC Real S Page Fault, #PF AMD64 Technology Virtual Protected 8086 S S Cause of Exception S Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. A Alignment checking enabled and 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. X A page fault resulted from the execution of the instruction X - SSE, AVX, and AVX2 exception A - AVX, AVX2 exception S - SSE exception Instruction Reference RSQRTSS, VRSQRTSS 549 AMD64 Technology 26568—Rev. 3.22—May 2018 SHA1MSG2 Message Calculation 2 Performs the 2nd of two intermediate calculations necessary before doing the next four rounds of the SHA1 message. Temp[31:0] (SRC1[127:96] XOR SRC2[95:64]) Rotate Left 1 DEST[127:96] Temp[31:0] DEST[95:64] (SRC1[95:64] DEST[63:32] (SRC1{63:32] DEST[31:0] (SRC1[31:0] XOR SRC2[63:32]) Rotate Left 1 XOR SRC2[31:0]) Rotate Left 1 XOR Temp[31:0]) Rotate Left 1 Mnemonic SHA1MSG2 xmm1, xmm2/m128 Opcode Description 0F 38 CA /r CCalculate Message Intermediate 2 Related Instructions SHA1RNDS4, SHA1NEXTE, SHA1MSG1 rFLAGS Affected None MXCSR Flags Affected None Exceptions Exceptions Invalid opcode, #UD Device not available, #NM 550 Real Virtual Protected 8086 X X A A S S X Cause of Exception Instruction not supported by CPUID AVX instructions are only recognized in protected mode S CR0.EM=1 OR CR4.OSFXSR=0 A CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE] A XFEATURE_ENABLED_MASK[2:1] ! = 11b. A VEX.L = 1 when AVX2 not supported. A REX, F2, F3, or 66 prefix preceding VEX prefix. S S X Lock prefix (F0h) preceding opcode. S S X CR0.TS = 1. RSQRTSS, VRSQRTSS Instruction Reference 26568—Rev. 3.22—May 2018 Exceptions Real AMD64 Technology Virtual Protected 8086 Cause of Exception Stack, #SS S S X Memory address exceeding stack segment limit or non-canonical. General protection, #GP S S X Memory address exceeding data segment limit or non-canonical. X Null data segment used to reference memory S Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. A Alignment checking enabled and 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. X A page fault resulted from the execution of the instruction Alignment check, #AC S Page Fault, #PF S S X - SSE, AVX, and AVX2 exception A - AVX, AVX2 exception S - SSE exception Instruction Reference RSQRTSS, VRSQRTSS 551 AMD64 Technology 26568—Rev. 3.22—May 2018 SHA256RNDS2 Two Rounds of SHA256 Performs 2 rounds of SHA256 operation with the first operand holding the initial SHA256 state (C, D, G, H), the second operand holding the initial SHA256 state (A, B, E, F), and the implicit operand xmm0 holding a pre-computed sum of the next two double word round 2 message as well as the corresponding round constants. The resulting SHA256 state (A, B, E, F) is placed in the destination register. A_0 SRC2[127:96]; B_0 SRC2[95:64]; C_0 SRC1[127:96]; D_0 SRC1[95:64]; E_0 SRC2[63:32]; F_0 SRC2[31:0]; G_0 SRC1[63:32]; H_0 SRC1[31:0]; K0 XMM0[31: 0]; K1 XMM0[63: 32]; FOR i = 0 to 1 { A_(i +1) Ch (E_i, F_i, G_i) + Perm1(E_i) +K_i + H_i + Ma(A_i , B_i, C_i) + Perm0(A_i); B_(i +1) A_i; C_(i +1) B_i ; D_(i +1) C_i; E_(i +1) Ch (E_i, F_i, G_i) + Perm1(E_i) + K_i + H_i + D_i; F_(i +1) E_i ; G_(i +1) F_i; H_(i +1) G_i; } DEST[127:96] A_2; DEST[95:64] B_2; DEST[63:32] E_2; DEST[31:0] F_2; Mnemonic SHA256RNDS2xmm1, xmm2/m128, xmm0 Opcode Description 0F 38 CB /r Execute 2 rounds of SHA256 Related Instructions SHA256MSG1, SHA256MSG2 rFLAGS Affected None MXCSR Flags Affected None 552 RSQRTSS, VRSQRTSS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exceptions Invalid opcode, #UD Real Virtual Protected 8086 X X A A S S X Cause of Exception Instruction not supported by CPUID AVX instructions are only recognized in protected mode S CR0.EM=1 OR CR4.OSFXSR=0 A CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE] A XFEATURE_ENABLED_MASK[2:1] ! = 11b. A VEX.L = 1 when AVX2 not supported. A REX, F2, F3, or 66 prefix preceding VEX prefix. S S X Lock prefix (F0h) preceding opcode. Device not available, #NM S S X CR0.TS = 1. Stack, #SS S S X Memory address exceeding stack segment limit or non-canonical. General protection, #GP S S X Memory address exceeding data segment limit or non-canonical. X Null data segment used to reference memory S Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. A Alignment checking enabled and 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. X A page fault resulted from the execution of the instruction Alignment check, #AC S Page Fault, #PF S S X - SSE, AVX, and AVX2 exception A - AVX, AVX2 exception S - SSE exception Instruction Reference RSQRTSS, VRSQRTSS 553 AMD64 Technology 26568—Rev. 3.22—May 2018 SHA256MSG1 Message Intermediate 1 Performs the 1st of two intermediate calculations necessary for the next four SHA256 message dwords. DEST[127:96] SRC1[127:96] DEST[95:64] SRC1[95:64] DEST[63:32] SRC1[63:32] DEST[31:0] SRC1[31:0] + + + + Perm2( SRC2[31:0]) Perm2( SRC1[127:96]) Perm2( SRC1[95:64] Perm2( SRC1[63:62]) Mnemonic SHA256MSG1xmm1, xmm2/m128 Opcode Description 0F 38 CC /r Calculate Message Intermediate 1 Related Instructions SHA256RNDS2, SHA256MSG2 rFLAGS Affected None MXCSR Flags Affected None Exceptions Exceptions Invalid opcode, #UD Real Virtual Protected 8086 X X A A S S X Cause of Exception Instruction not supported by CPUID AVX instructions are only recognized in protected mode S CR0.EM=1 OR CR4.OSFXSR=0 A CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE] A XFEATURE_ENABLED_MASK[2:1] ! = 11b. A VEX.L = 1 when AVX2 not supported. A REX, F2, F3, or 66 prefix preceding VEX prefix. S S X Lock prefix (F0h) preceding opcode. Device not available, #NM S S X CR0.TS = 1. Stack, #SS S S X Memory address exceeding stack segment limit or non-canonical. 554 RSQRTSS, VRSQRTSS Instruction Reference 26568—Rev. 3.22—May 2018 Exceptions General protection, #GP Alignment check, #AC Real S S Page Fault, #PF AMD64 Technology Virtual Protected 8086 S S S Cause of Exception X Memory address exceeding data segment limit or non-canonical. X Null data segment used to reference memory S Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. A Alignment checking enabled and 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. X A page fault resulted from the execution of the instruction X - SSE, AVX, and AVX2 exception A - AVX, AVX2 exception S - SSE exception Instruction Reference RSQRTSS, VRSQRTSS 555 AMD64 Technology 26568—Rev. 3.22—May 2018 SHA256MSG2 Message Intermediate 2 Performs the 2nd of two intermediate calculations necessary for the next four SHA256 message dwords. Temp0 Temp1 + Perm3( SRC2[95:64]) + Perm3( SRC2[127:96]) SRC1[31:0] SRC1[63:32] DEST[127:96] SRC1[127:96] DEST[95:64] SRC1[95:64] DEST[63:32] SRC1[63:32] DEST[31:0] SRC1[31:0] + + + + Perm3( Temp1) Perm3( Temp0) Perm3( SRC2[127:96]) Perm3( SRC2[95:624]) Mnemonic SHA256MSG1 xmm1, xmm2/m128 Opcode Description 0F 38 CD /r Calculate Message Intermediate 2 Related Instructions SHA256RNDS2, SHA256MSG1 rFLAGS Affected None MXCSR Flags Affected None Exceptions Exceptions Invalid opcode, #UD Real Virtual Protected 8086 X X A A S S X Cause of Exception Instruction not supported by CPUID AVX instructions are only recognized in protected mode S CR0.EM=1 OR CR4.OSFXSR=0 A CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE] A XFEATURE_ENABLED_MASK[2:1] ! = 11b. A VEX.L = 1 when AVX2 not supported. A REX, F2, F3, or 66 prefix preceding VEX prefix. S S X Lock prefix (F0h) preceding opcode. Device not available, #NM S S X CR0.TS = 1. Stack, #SS S S X Memory address exceeding stack segment limit or non-canonical. 556 RSQRTSS, VRSQRTSS Instruction Reference 26568—Rev. 3.22—May 2018 Exceptions General protection, #GP Alignment check, #AC Real S S Page Fault, #PF AMD64 Technology Virtual Protected 8086 S S S Cause of Exception X Memory address exceeding data segment limit or non-canonical. X Null data segment used to reference memory S Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. A Alignment checking enabled and 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. X A page fault resulted from the execution of the instruction X - SSE, AVX, and AVX2 exception A - AVX, AVX2 exception S - SSE exception Instruction Reference RSQRTSS, VRSQRTSS 557 AMD64 Technology 26568—Rev. 3.22—May 2018 SHUFPD VSHUFPD Shuffle Packed Double-Precision Floating-Point Copies packed double-precision floating-point values from either of two sources to quadwords in the destination, as specified by bit fields of an immediate byte operand. Each bit corresponds to a quadword destination. The 128-bit legacy and extended versions of the instruction use bits [1:0]; the 256-bit extended version uses bits [3:0], as shown. Destination Quadword Immediate-Byte Bit Field Value of Bit Field Source 1 Bits Copied Source 2 Bits Copied Used by 128-bit encoding and 256-bit encoding [63:0] [127:64] [0] [1] 0 [63:0] — 1 [127:64] — 0 — [63:0] 1 — ]127:64] 0 [191:128] — 1 [255:192] — 0 — [191:128] 1 — [255:192] Used only by 256-bit encoding [191:128] [255:192] [2] [3] There are legacy and extended forms of the instruction: SHUFPD Shuffles four source values. The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. There is a third 8-bit immediate operand. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VSHUFPD The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Shuffles four source values. The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. There is a fourth 8-bit immediate operand. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Shuffles eight source values. The first source operand is a YMM register and the second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. There is a fourth 8-bit immediate operand. 558 SHUFPD, VSHUFPD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Support Form Subset SHUFPD SSE2 VSHUFPD AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic SHUFPD xmm1, xmm2/mem128, imm8 Opcode 66 0F C6 /r ib Description Shuffles packed double-precision floatingpoint values in xmm1 and xmm2 or mem128. Writes the result to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VSHUFPD xmm1, xmm2, xmm3/mem128, imm8 C4 RXB.01 X.src1.0.01 C6 /r VSHUFPD ymm1, ymm2, ymm3/mem256, imm8 C4 RXB.01 X.src1.1.01 C6 /r Related Instructions (V)SHUFPS rFLAGS Affected None MXCSR Flags Affected None Instruction Reference SHUFPD, VSHUFPD 559 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception 560 X A S S X A S S X S S S S S S S S S S S S S S A X S S A A A X X X X S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Memory operand not 16-byte aligned and MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. SHUFPD, VSHUFPD Instruction Reference 26568—Rev. 3.22—May 2018 SHUFPS VSHUFPS AMD64 Technology Shuffle Packed Single-Precision Floating-Point Copies packed single-precision floating-point values from either of two sources to doublewords in the destination, as specified by bit fields of an immediate byte operand. Each bit field corresponds to a doubleword destination. The 128-bit legacy and extended versions of the instruction use a single 128-bit destination; the 256-bit extended version performs duplicate operations on bits [127:0] and bits [255:128] of the source and destination. Destination Doubleword [31:0] [63:32] [95:64] [127:96] [159:128] [191:160] [223:192] [255:224] Value of Bit Source 1 Field Bits Copied 00 [31:0] 01 [63:32] 10 [95:64] 11 [127:96] [3:2] 00 [31:0] 01 [63:32] 10 [95:64] 11 [127:96] [5:4] 00 — 01 — 10 — 11 — [7:6] 00 — 01 — 10 — 11 — Upper 128 bits of 256-bit source and destination used by 256-bit encoding [1:0] 00 [159:128] 01 [191:160] 10 [223:192] 11 [255:224] [3:2] 00 [159:128] 01 [191:160] 10 [223:192] 11 [255:224] [5:4] 00 — 01 — 10 — 11 — [7:6] 00 — 01 — 10 — 11 — Instruction Reference Immediate-Byte Bit Field [1:0] SHUFPS, VSHUFPS Source 2 Bits Copied — — — — — — — — [31:0] [63:32] [95:64] [127:96] [31:0] [63:32] [95:64] [127:96] — — — — — — — — [159:128] [191:160] [223:192] [255:224] [159:128] [191:160] [223:192] [255:224] 561 AMD64 Technology 26568—Rev. 3.22—May 2018 There are legacy and extended forms of the instruction: SHUFPS Shuffles eight source values. The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. There is a third 8-bit immediate operand. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VSHUFPS The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Shuffles eight source values. The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. There is a fourth 8-bit immediate operand. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Shuffles 16 source values. The first source operand is a YMM register and the second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. There is a fourth 8-bit immediate operand. Instruction Support Form Subset Feature Flag SHUFPS SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) VSHUFPS AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic SHUFPS xmm1, xmm2/mem128, imm8 Opcode 0F C6 /r ib Description Shuffles packed single-precision floatingpoint values in xmm1 and xmm2 or mem128. Writes the result to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VSHUFPS xmm1, xmm2, xmm3/mem128, imm8 C4 RXB.01 X.src1.0.00 C6 /r VSHUFPS ymm1, ymm2, ymm3/mem256, imm8 C4 RXB.01 X.src1.1.00 C6 /r Related Instructions (V)SHUFPD rFLAGS Affected None MXCSR Flags Affected None 562 SHUFPS, VSHUFPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference X A S S X A S S X S S S S S S S S S S S S S S A X S S A A A X X X X S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Memory operand not 16-byte aligned and MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. SHUFPS, VSHUFPS 563 AMD64 Technology 26568—Rev. 3.22—May 2018 SQRTPD VSQRTPD Square Root Packed Double-Precision Floating-Point Computes the square root of each packed double-precision floating-point value in a source operand and writes the result to the corresponding quadword of the destination. Performing the square root of +infinity returns +infinity. There are legacy and extended forms of the instruction: SQRTPD Computes two values. The destination is an XMM register. The source operand is either an XMM register or a 128-bit memory location. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VSQRTPD The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Computes two values. The source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Computes four values. The source operand is either a YMM register or a 256-bit memory location. The destination is a YMM register. Instruction Support Form Subset SQRTPD SSE2 VSQRTPD AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic SQRTPD xmm1, xmm2/mem128 Opcode 66 0F 51 /r Description Computes square roots of packed double-precision floating-point values in xmm1 or mem128. Writes the results to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VSQRTPD xmm1, xmm2/mem128 C4 RXB.01 X.1111.0.01 51 /r VSQRTPD ymm1, ymm2/mem256 C4 RXB.01 X.1111.1.01 51 /r Related Instructions (V)RSQRTPS, (V)RSQRTSS, (V)SQRTPS, (V)SQRTSD, (V)SQRTSS 564 SQRTPD, VSQRTPD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology rFLAGS Affected None MXCSR Flags Affected MM FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE M 17 Note: 15 14 13 12 11 10 9 8 7 6 5 4 3 2 DE IE M M 1 0 A flag that may be set or cleared is M (modified). Unaffected flags are blank. Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A A X S S X S S S S S S S S X X X S X S S S S A X S X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD floating-point, #XF S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference S S S S S S S S X X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. A result could not be represented exactly in the destination format. SQRTPD, VSQRTPD 565 AMD64 Technology 26568—Rev. 3.22—May 2018 SQRTPS VSQRTPS Square Root Packed Single-Precision Floating-Point Computes the square root of each packed single-precision floating-point value in a source operand and writes the result to the corresponding doubleword of the destination. Performing the square root of +infinity returns +infinity. There are legacy and extended forms of the instruction: SQRTPS Computes four values. The destination is an XMM register. The source operand is either an XMM register or a 128-bit memory location. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VSQRTPS The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Computes four values. The destination is an XMM register. The source operand is either an XMM register or a 128-bit memory location. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Computes eight values. The destination is a YMM register. The source operand is either a YMM register or a 256-bit memory location. Instruction Support Form Subset Feature Flag SQRTPS SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) VSQRTPS AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode SQRTPS xmm1, xmm2/mem128 0F 51 /r Description Computes square roots of packed single-precision floating-point values in xmm1 or mem128. Writes the results to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VSQRTPS xmm1, xmm2/mem128 C4 RXB.01 X.1111.0.00 51 /r VSQRTPS ymm1, ymm2/mem256 C4 RXB.01 X.1111.1.00 51 /r Related Instructions (V)RSQRTPS, (V)RSQRTSS, (V)SQRTPD, (V)SQRTSD, (V)SQRTSS 566 SQRTPS, VSQRTPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology rFLAGS Affected None MXCSR Flags Affected MM FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE M 17 Note: 15 14 13 12 11 10 9 8 7 6 5 4 3 2 DE IE M M 1 0 A flag that may be set or cleared is M (modified). Unaffected flags are blank. Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A A X S S X S S S S S S S S X X X S X S S S S A X S X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD floating-point, #XF S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference S S S S S S S S X X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. A result could not be represented exactly in the destination format. SQRTPS, VSQRTPS 567 AMD64 Technology 26568—Rev. 3.22—May 2018 SQRTSD VSQRTSD Square Root Scalar Double-Precision Floating-Point Computes the square root of a double-precision floating-point value and writes the result to the low quadword of the destination. The three-operand form of the instruction also writes a copy of the upper quadword of a second source operand to the upper quadword of the destination. Performing the square root of +infinity returns +infinity. There are legacy and extended forms of the instruction: SQRTSD The source operand is either an XMM register or a 64-bit memory location. When the source is an XMM register, the source value must be in the low quadword. The destination is an XMM register. Bits [127:64] of the destination are not affected. Bits [255:128] of the YMM register that corresponds to destination XMM register are not affected. VSQRTSD The extended form of the instruction has a single 128-bit encoding that requires three operands: VSQRTSD xmm1, xmm2, xmm3/mem64 The first source operand is an XMM register. The second source operand is either an XMM register or a 64-bit memory location. When the second source is an XMM register, the source value must be in the low quadword. The destination is a third XMM register. The square root of the second source operand is written to bits [63:0] of the destination register. Bits [127:64] of the destination are copied from the corresponding bits of the first source operand. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset SQRTSD SSE2 VSQRTSD AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic SQRTSD xmm1, xmm2/mem64 Opcode Description F2 0F 51 /r Computes the square root of a double-precision floatingpoint value in xmm1 or mem64. Writes the result to xmm1. Mnemonic Encoding VEX RXB.map_select VSQRTSD xmm1, xmm2, xmm3/mem64 C4 RXB.01 W.vvvv.L.pp Opcode X.src1.X.11 51 /r Related Instructions (V)RSQRTPS, (V)RSQRTSS, (V)SQRTPD, (V)SQRTPS, (V)SQRTSS 568 SQRTSD, VSQRTSD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology rFLAGS Affected None MXCSR Flags Affected MM FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE M 17 Note: 15 14 13 12 11 10 9 8 7 6 5 4 3 2 DE IE M M 1 0 A flag that may be set or cleared is M (modified). Unaffected flags are blank. Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X X X X S X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference S S S S S S S S X X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. A result could not be represented exactly in the destination format. SQRTSD, VSQRTSD 569 AMD64 Technology 26568—Rev. 3.22—May 2018 SQRTSS VSQRTSS Square Root Scalar Single-Precision Floating-Point Computes the square root of a single-precision floating-point value and writes the result to the low doubleword of the destination. The three-operand form of the instruction also writes a copy of the three most significant doublewords of a second source operand to the upper 96 bits of the destination. Performing the square root of +infinity returns +infinity. There are legacy and extended forms of the instruction: SQRTSS The source operand is either an XMM register or a 32-bit memory location. When the source is an XMM register, the source value must be in the low doubleword. The destination is an XMM register. Bits [127:32] of the destination are not affected. Bits [255:128] of the YMM register that corresponds to destination XMM register are not affected. VSQRTSS The extended form has a single 128-bit encoding that requires three operands: VSQRTSS xmm1, xmm2, xmm3/mem64 The first source operand is an XMM register. The second source operand is either an XMM register or a 32-bit memory location. When the second source is an XMM register, the source value must be in the low doubleword. The destination is a third XMM register. The square root of the second source operand is written to bits [31:0] of the destination register. Bits [127:32] of the destination are copied from the corresponding bits of the first source operand. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset Feature Flag SQRTSS SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) VSQRTSS AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic SQRTSS xmm1, xmm2/mem32 Opcode Description F3 0F 51 /r Computes square root of a single-precision floating-point value in xmm1 or mem32. Writes the result to xmm1. Mnemonic Encoding VEX RXB.map_select VSQRTSS xmm1, xmm2, xmm3/mem64 C4 RXB.01 W.vvvv.L.pp Opcode X.src1.X.10 51 /r Related Instructions (V)RSQRTPS, (V)RSQRTSS, (V)SQRTPD, (V)SQRTPS, (V)SQRTSD 570 SQRTSS, VSQRTSS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology rFLAGS Affected None MXCSR Flags Affected MM FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE M 17 Note: 15 14 13 12 11 10 9 8 7 6 5 4 3 2 DE IE M M 1 0 A flag that may be set or cleared is M (modified). Unaffected flags are blank. Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X X X X S X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference S S S S S S S S X X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. A result could not be represented exactly in the destination format. SQRTSS, VSQRTSS 571 AMD64 Technology 26568—Rev. 3.22—May 2018 STMXCSR VSTMXCSR Store MXCSR Saves the content of the MXCSR extended control/status register to a 32-bit memory location. Reserved bits are stored as zeroes. The MXCSR is described in “Registers” in Volume 1. For both legacy STMXCSR and extended VSTMXCSR forms of the instruction, the source operand is the MXCSR and the destination is a 32-bit memory location. There is one encoding for each instruction form. Instruction Support Form Subset Feature Flag STMXCSR SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) VSTMXCSR AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode STMXCSR mem32 0F AE /3 Description Stores content of MXCSR in mem32. Mnemonic Encoding VEX RXB.map_select VSTMXCSR mem32 C4 RXB.01 W.vvvv.L.pp Opcode X.1111.0.00 AE /3 Related Instructions (V)LDMXCSR rFLAGS Affected None MXCSR Flags Affected MM FZ M M M 17 15 14 Note: 572 RC M 13 PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE M M M M M M M M M M M M M 12 11 10 9 8 7 6 5 4 3 2 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. STMXCSR, VSTMXCSR Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference X A X A S S S S X S S S S S X S S S S S S X A S S A A A A X X X X X S X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. CR0.EM = 1. CR4.OSFXSR = 0. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Write to a read-only data segment. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. STMXCSR, VSTMXCSR 573 AMD64 Technology 26568—Rev. 3.22—May 2018 SUBPD VSUBPD Subtract Packed Double-Precision Floating-Point Subtracts each packed double-precision floating-point value of the second source operand from the corresponding value of the first source operand and writes the difference to the corresponding quadword of the destination. There are legacy and extended forms of the instruction: SUBPD Subtracts two pairs of values. The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VSUBPD The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Subtracts two pairs of values. The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Subtracts four pairs of values. The first source operand is a YMM register and the second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset SUBPD SSE2 VSUBPD AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic SUBPD xmm1, xmm2/mem128 Opcode Description 66 0F 5C /r Subtracts packed double-precision floating-point values in xmm2 or mem128 from corresponding values of xmm1. Writes differences to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VSUBPD xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 5C /r VSUBPD ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 5C /r Related Instructions (V)SUBPS, (V)SUBSD, (V)SUBSS 574 SUBPD, VSUBPD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology rFLAGS Affected None MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM OM 11 10 ZM 9 DM 8 IM 7 DAZ 6 PE UE OE M M M 5 4 3 ZE 2 DE IE M M 1 0 A flag that may be set or cleared is M (modified). Unaffected flags are blank. Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X S X S S S S A X S S X S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference X X X X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. SUBPD, VSUBPD 575 AMD64 Technology 26568—Rev. 3.22—May 2018 SUBPS VSUBPS Subtract Packed Single-Precision Floating-Point Subtracts each packed single-precision floating-point value of the second source operand from the corresponding value of the first source operand and writes the difference to the corresponding quadword of the destination. There are legacy and extended forms of the instruction: SUBPS Subtracts four pairs of values. The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VSUBPS The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Subtracts four pairs of values. The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Subtracts eight pairs of values. The first source operand is a YMM register and the second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag SUBPS SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) VSUBPS AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode SUBPS xmm1, xmm2/mem128 0F 5C /r Description Subtracts packed single-precision floating-point values in xmm2 or mem128 from corresponding values of xmm1. Writes differences to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VSUBPS xmm1, xmm2, xmm3/mem128 C4 RXB.00001 X.src.0.00 5C /r VSUBPS ymm1, ymm2, ymm3/mem256 C4 RXB.00001 X.src.1.00 5C /r Related Instructions (V)SUBPD, (V)SUBSD, (V)SUBSS 576 SUBPS, VSUBPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology rFLAGS Affected None MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM OM 11 10 ZM 9 DM 8 IM 7 DAZ 6 PE UE OE M M M 5 4 3 ZE 2 DE IE M M 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X S X S S S S A X S S X S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference X X X X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. SUBPS, VSUBPS 577 AMD64 Technology 26568—Rev. 3.22—May 2018 SUBSD VSUBSD Subtract Scalar Double-Precision Floating-Point Subtracts the double-precision floating-point value in the low-order quadword of the second source operand from the corresponding value in the first source operand and writes the result to the loworder quadword of the destination There are legacy and extended forms of the instruction: SUBSD The first source operand is an XMM register and the second source operand is either an XMM register or a 64-bit memory location. The first source register is also the destination register. Bits [127:64] of the destination and bits [255:128] of the corresponding YMM register are not affected. VSUBSD The extended form of the instruction has a 128-bit encoding only. The first source operand is an XMM register and the second source operand is either an XMM register or a 64-bit memory location. The destination is a third XMM register. Bits [127:64] of the first source operand are copied to bits [127:64] of the destination. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset SUBSD SSE2 VSUBSD AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic SUBSD xmm1, xmm2/mem64 Opcode Description F2 0F 5C /r Subtracts low-order double-precision floating-point value in xmm2 or mem64 from the corresponding value of xmm1. Writes the difference to xmm1. Mnemonic VSUBSD xmm1, xmm2, xmm3/mem64 Encoding VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.01 X.src1.X.11 5C /r Related Instructions (V)SUBPD, (V)SUBPS, (V)SUBSS rFLAGS Affected None 578 SUBSD, VSUBSD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM OM 11 10 ZM 9 DM 8 IM 7 DAZ 6 PE UE OE M M M 5 4 3 ZE 2 DE IE M M 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X X X X S S X S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference X X X X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. SUBSD, VSUBSD 579 AMD64 Technology 26568—Rev. 3.22—May 2018 SUBSS VSUBSS Subtract Scalar Single-Precision Floating-Point Subtracts the single-precision floating-point value in the low-order word of the second source operand from the corresponding value in the first source operand and writes the result to the low-order word of the destination There are legacy and extended forms of the instruction: SUBSS The first source operand is an XMM register and the second source operand is either an XMM register or a 32-bit memory location. The first source register is also the destination register. Bits [127:32] of the destination and bits [255:128] of the corresponding YMM register are not affected. VSUBSS The extended form of the instruction has a 128-bit encoding only. The first source operand is an XMM register and the second source operand is either an XMM register or a 32-bit memory location. The destination is a third XMM register. Bits [127:32] of the first source operand are copied to bits [127:32] of the destination. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset Feature Flag SUBSS SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) VSUBSS AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic SUBSS xmm1, xmm2/mem32 Opcode Description F3 0F 5C /r Subtracts a low-order single-precision floating-point value in xmm2 or mem32 from the corresponding value of xmm1. Writes the difference to xmm1. Mnemonic VSUBSS xmm1, xmm2, xmm3/mem32 Encoding VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.01 X.src1.X.10 5C /r Related Instructions (V)SUBPD, (V)SUBPS, (V)SUBSD rFLAGS Affected None 580 SUBSS, VSUBSS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM OM 11 10 ZM 9 DM 8 IM 7 DAZ 6 PE UE OE M M M 5 4 3 ZE 2 DE IE M M 1 0 M indicates a flag that may be modified (set or cleared). Blanks indicate flags that are not affected. Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X X X X S S X S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference X X X X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. SUBSS, VSUBSS 581 AMD64 Technology 26568—Rev. 3.22—May 2018 UCOMISD VUCOMISD Unordered Compare Scalar Double-Precision Floating-Point Performs an unordered comparison of a double-precision floating-point value in the low-order 64 bits of an XMM register with a double-precision floating-point value in the low-order 64 bits of an XMM register or a 64-bit memory location. The ZF, PF, and CF bits in the rFLAGS register reflect the result of the compare as follows. Result of Compare ZF PF CF Unordered 1 1 1 Greater Than 0 0 0 Less Than 0 0 1 Equal 1 0 0 The OF, AF, and SF bits in rFLAGS are cleared. If the instruction causes an unmasked SIMD floating-point exception (#XF), the rFLAGS bits are not updated. The result is unordered when one or both of the operand values is a NaN. UCOMISD signals a SIMD floating-point invalid operation exception (#I) only when a source operand is an SNaN. The legacy and extended forms of the instruction operate in the same way. Instruction Support Form Subset UCOMISD SSE2 VUCOMISD AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic UCOMISD xmm1, xmm2/mem64 Opcode Description 66 0F 2E /r Compares scalar double-precision floating-point values in xmm1 and xmm2 or mem64. Sets rFLAGS. Mnemonic Encoding VEX RXB.map_select VUCOMISD xmm1, xmm2/mem64 C4 RXB.00001 W.vvvv.L.pp Opcode X.1111.X.01 2E /r Related Instructions (V)CMPPD, (V)CMPPS, (V)CMPSD, (V)CMPSS, (V)COMISD, (V)COMISS, (V)UCOMISS 582 UCOMISD, VUCOMISD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology rFLAGS Affected ID VIP VIF AC VM RF NT IOPL OF DF IF TF SF ZF AF PF CF 0 M 0 M M 7 6 4 2 0 0 21 Note: Note: 20 19 18 17 16 14 13:12 11 10 9 8 Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set or cleared is M (modified). Unaffected flags are blank. If the instruction causes an unmasked SIMD floating-point exception (#XF), the rFLAGS bits are not updated. MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM OM 11 10 ZM 9 DM 8 IM 7 DAZ 6 PE 5 UE 4 OE 3 ZE 2 DE IE M M 1 0 A flag that may be set or cleared is M (modified). Unaffected flags are blank. Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X X X X S S X S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. UCOMISD, VUCOMISD 583 AMD64 Technology 26568—Rev. 3.22—May 2018 UCOMISS VUCOMISS Unordered Compare Scalar Single-Precision Floating-Point Performs an unordered comparison of a single-precision floating-point value in the low-order 32 bits of an XMM register with a single-precision floating-point value in the low-order 32 bits of an XMM register or a 32-bit memory location. The ZF, PF, and CF bits in the rFLAGS register reflect the result of the compare as follows. Result of Compare ZF PF CF Unordered 1 1 1 Greater Than 0 0 0 Less Than 0 0 1 Equal 1 0 0 The OF, AF, and SF bits in rFLAGS are cleared. If the instruction causes an unmasked SIMD floating-point exception (#XF), the rFLAGS bits are not updated. The result is unordered when one or both of the operand values is a NaN. UCOMISD signals a SIMD floating-point invalid operation exception (#I) only when a source operand is an SNaN. The legacy and extended forms of the instruction operate in the same way. Instruction Support Form Subset UCOMISS SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) Feature Flag VUCOMISS AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode UCOMISS xmm1, xmm2/mem32 0F 2E /r Description Compares scalar single-precision floating-point values in xmm1 and xmm2 or mem64. Sets rFLAGS. Mnemonic VUCOMISS xmm1, xmm2/mem32 Encoding VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.01 X.1111.X.00 2E /r Related Instructions (V)CMPPD, (V)CMPPS, (V)CMPSD, (V)CMPSS, (V)COMISD, (V)COMISS, (V)UCOMISD 584 UCOMISS, VUCOMISS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology rFLAGS Affected ID VIP VIF AC VM RF NT IOPL OF DF IF TF SF ZF AF PF CF 0 M 0 M M 7 6 4 2 0 0 21 Note: Note: 20 19 18 17 16 14 13:12 11 10 9 8 Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set or cleared is M (modified). Unaffected flags are blank. If the instruction causes an unmasked SIMD floating-point exception (#XF), the rFLAGS bits are not updated. MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM OM 11 10 ZM 9 DM 8 IM 7 DAZ 6 PE 5 UE 4 OE 3 ZE 2 DE IE M M 1 0 A flag that may be set or cleared is M (modified). Unaffected flags are blank. Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X X X X S S X S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. UCOMISS, VUCOMISS 585 AMD64 Technology 26568—Rev. 3.22—May 2018 UNPCKHPD VUNPCKHPD Unpack High Double-Precision Floating-Point Unpacks the high-order double-precision floating-point values of the first and second source operands and interleaves the values into the destination. Bits [63:0] of the source operands are ignored. Values are interleaved in ascending order from the lsb of the sources and the destination. Bits [127:64] of the first source are written to bits [63:0] of the destination; bits [127:64] of the second source are written to bits [127:64] of the destination. For the 256-bit encoding, the process is repeated for bits [255:192] of the sources and bits [255:128] of the destination. There are legacy and extended forms of the instruction: UNPCKHPD Interleaves one pair of values. The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VUNPCKHPD The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Interleaves one pair of values. The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Interleaves two pairs of values. The first source operand is a YMM register and the second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset UNPCKHPD SSE2 VUNPCKHPD AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic UNPCKHPD xmm1, xmm2/mem128 Opcode 66 0F 15 /r Description Unpacks the high-order double-precision floatingpoint values in xmm1 and xmm2 or mem128 and interleaves them into xmm1 Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VUNPCKHPD xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 15 /r VUNPCKHPD ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 15 /r 586 UNPCKHPD, VUNPCKHPD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Related Instructions (V)UNPCKHPS, (V)UNPCKLPD, (V)UNPCKLPS rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference X A S S X A S S X S S S S S S S S S S S S S S A X S S A A A X X X X S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Memory operand not 16-byte aligned and MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. UNPCKHPD, VUNPCKHPD 587 AMD64 Technology 26568—Rev. 3.22—May 2018 UNPCKHPS VUNPCKHPS Unpack High Single-Precision Floating-Point Unpacks the high-order single-precision floating-point values of the first and second source operands and interleaves the values into the destination. Bits [63:0] of the source operands are ignored. Values are interleaved in ascending order from the lsb of the sources and the destination. Bits [95:64] of the first source are written to bits [31:0] of the destination; bits [95:64] of the second source are written to bits [63:32] of the destination and so on, ending with bits [127:96] of the second source in bits [127:96] of the destination. For the 256-bit encoding, the process continues for bits [255:192] of the sources and bits [255:128] of the destination. There are legacy and extended forms of the instruction: UNPCKHPS Interleaves two pairs of values. The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VUNPCKHPS The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Interleaves two pairs of values. The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Interleaves four pairs of values. The first source operand is a YMM register and the second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag UNPCKHPS SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) VUNPCKHPS AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 588 UNPCKHPS, VUNPCKHPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic Opcode Description UNPCKHPS xmm1, xmm2/mem128 0F 15 /r Unpacks the high-order single-precision floating-point values in xmm1 and xmm2 or mem128 and interleaves them into xmm1 Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VUNPCKHPS xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.00 15 /r VUNPCKHPS ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.00 15 /r Related Instructions (V)UNPCKHPD, (V)UNPCKLPD, (V)UNPCKLPS rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference X A S S X A S S X S S S S S S S S S S S S S S A X S S A A A X X X X S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Memory operand not 16-byte aligned and MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. UNPCKHPS, VUNPCKHPS 589 AMD64 Technology 26568—Rev. 3.22—May 2018 UNPCKLPD VUNPCKLPD Unpack Low Double-Precision Floating-Point Unpacks the low-order double-precision floating-point values of the first and second source operands and interleaves the values into the destination. Bits [127:64] of the source operands are ignored. Values are interleaved in ascending order from the lsb of the sources and the destination. Bits [63:0] of the first source are written to bits [63:0] of the destination; bits [63:0] of the second source are written to bits [127:64] of the destination. For the 256-bit encoding, the process is repeated for bits [191:128] of the sources and bits [255:128] of the destination. There are legacy and extended forms of the instruction: UNPCKLPD Interleaves one pair of values. The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VUNPCKLPD The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Interleaves one pair of values. The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Interleaves two pairs of values. The first source operand is a YMM register and the second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset UNPCKLPD SSE2 VUNPCKLPD AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic UNPCKLPD xmm1, xmm2/mem128 Opcode Description 66 0F 14 /r Unpacks the low-order double-precision floating-point values in xmm1 and xmm2 or mem128 and interleaves them into xmm1 Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VUNPCKLPD xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 14 /r VUNPCKLPD ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 14 /r 590 UNPCKLPD, VUNPCKLPD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Related Instructions (V)UNPCKHPD, (V)UNPCKHPS, (V)UNPCKLPS rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference X A S S X A S S X S S S S S S S S S S S S S S A X S S A A A X X X X S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Memory operand not 16-byte aligned and MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. UNPCKLPD, VUNPCKLPD 591 AMD64 Technology 26568—Rev. 3.22—May 2018 UNPCKLPS VUNPCKLPS Unpack Low Single-Precision Floating-Point Unpacks the low-order single-precision floating-point values of the first and second source operands and interleaves the values into the destination. Bits [127:64] of the source operands are ignored. Values are interleaved in ascending order from the lsb of the sources and the destination. Bits [31:0] of the first source are written to bits [31:0] of the destination; bits [31:0] of the second source are written to bits [63:32] of the destination and so on, ending with bits [63:32] of the second source in bits [127:96] of the destination. For the 256-bit encoding, the process continues for bits [191:128] of the sources and bits [255:128] of the destination. There are legacy and extended forms of the instruction: UNPCKLPS Interleaves two pairs of values. The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VUNPCKLPS The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding Interleaves two pairs of values. The first source operand is an XMM register and the second source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Interleaves four pairs of values. The first source operand is a YMM register and the second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset Feature Flag UNPCKLPS SSE1 CPUID Fn0000_0001_EDX[SSE] (bit 25) VUNPCKLPS AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 592 UNPCKLPS, VUNPCKLPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic Opcode Description UNPCKLPS xmm1, xmm2/mem128 0F 14 /r Unpacks the high-order single-precision floating-point values in xmm1 and xmm2 or mem128 and interleaves them into xmm1 Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VUNPCKLPS xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.00 14 /r VUNPCKLPS ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.00 14 /r Related Instructions (V)UNPCKHPD, (V)UNPCKHPS, (V)UNPCKLPD rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception Instruction Reference X A S S X A S S X S S S S S S S S S S S S S S A X S S A A A X X X X S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Memory operand not 16-byte aligned and MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. UNPCKLPS, VUNPCKLPS 593 AMD64 Technology 26568—Rev. 3.22—May 2018 VBROADCASTF128 Load With Broadcast From 128-bit Memory Location Loads double-precision floating-point data from a 128-bit memory location and writes it to the two 128-bit elements of a YMM register This extended-form instruction has a single 256-bit encoding. The source operand is a 128-bit memory location. The destination is a YMM register. Instruction Support Form Subset VBROADCASTF128 AVX Feature Flag CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding VBROADCASTF128 ymm1, mem128 VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.02 0.1111.1.01 1A /r Related Instructions VBROADCASTSD, VBROADCASTSS rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot A A Invalid opcode, #UD Device not available, #NM Stack, #SS 594 A A A A A A A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. VEX.vvvv ! = 1111b. VEX.L = 0. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. VBROADCASTF128 Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception General protection, #GP Page fault, #PF Alignment check, #AC A — AVX exception. Instruction Reference Mode Real Virt Prot A A A A Cause of Exception Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. VBROADCASTF128 595 AMD64 Technology 26568—Rev. 3.22—May 2018 VBROADCASTI128 Load With Broadcast Integer From 128-bit Memory Location Loads data from a 128-bit memory location and writes it to the two 128-bit elements of a YMM register There is a single form of this instruction: VBROADCASTI128 dest, mem128 There is a single VEX.L = 1 encoding of this instruction. The source operand is a 128-bit memory location. The destination is a YMM register. Instruction Support Form Subset VBROADCASTI128 AVX2 Feature Flag Fn0000_00007_EBX[AVX2]_x0 (bit 5) Instruction Encoding Encoding Mnemonic VBROADCASTI128 ymm1, mem128 VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.02 0.1111.1.01 5A /r Related Instructions VBROADCASTF128, VEXTRACTF128, VEXTRACTI128, VINSERTF128, VINSERTI128 rFLAGS Affected None MXCSR Flags Affected None 596 VBROADCASTI128 Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC A — AVX exception. Instruction Reference A A A A A A A A A A A A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. VEX.vvvv ! = 1111b. VEX.L = 0. Register-based source operand specified (MODRM.mod = 11b) REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. VBROADCASTI128 597 AMD64 Technology 26568—Rev. 3.22—May 2018 VBROADCASTSD Load With Broadcast Scalar Double Loads a double-precision floating-point value from a register or memory and writes it to the four 64bit elements of a YMM register This extended-form instruction has a single 256-bit encoding. The source operand is the lower half of an XMM register or a 64-bit memory location. The destination is a YMM register. Instruction Support Form Subset Feature Flag VBROADCASTSD ymm1, mem64 AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VBROADCASTSD ymm1, xmm AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding VBROADCASTSD ymm1, xmm2/mem64 VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.02 0.1111.1.01 19 /r Related Instructions VBROADCASTF128, VBROADCASTSS rFLAGS Affected None MXCSR Flags Affected None 598 VBROADCASTSD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC A — AVX, AVX2 exception. Instruction Reference A A A A A A A A A A A A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. VEX.vvvv ! = 1111b. VEX.L = 0. Register-based source operand specified when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. VBROADCASTSD 599 AMD64 Technology 26568—Rev. 3.22—May 2018 VBROADCASTSS Load With Broadcast Scalar Single Loads a single-precision floating-point value from a register or memory and writes it to all 4 or 8 doublewords of an XMM or YMM register. This extended-form instruction has both 128-bit and 256-bit encodings: XMM Encoding Copies the source operand to all four 32-bit elements of the destination. The source operand is the least-significant 32 bits of an XMM register or a 32-bit memory location. The destination is an XMM register. YMM Encoding Copies the source operand to all eight 32-bit elements of the destination. The source operand is the least-significant 32 bits of an XMM register or a 32-bit memory location. The destination is a YMM register. Instruction Support Form Subset Feature Flag VBROADCASTSS mem32 AVX CPUID Fn0000_0001_ECX[AVX] (bit 28) VBROADCASTSS xmm AVX2 CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VBROADCASTSS xmm1, xmm2/mem32 C4 RXB.02 0.1111.0.01 18 /r VBROADCASTSS ymm1, xmm2/mem32 C4 RXB.02 0.1111.1.01 18 /r Related Instructions VBROADCASTF128, VBROADCASTSD rFLAGS Affected None MXCSR Flags Affected None 600 VBROADCASTSS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC A — AVX, AVX2 exception. Instruction Reference A A A A A A A A A A A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. VEX.vvvv ! = 1111b. MODRM.mod = 11b when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. VBROADCASTSS 601 AMD64 Technology 26568—Rev. 3.22—May 2018 VCVTPH2PS Convert Packed 16-Bit Floating-Point to Single-Precision Floating-Point Converts packed 16-bit floating point values to single-precision floating point values. A denormal source operand is converted to a normal result in the destination register. MXCSR.DAZ is ignored and no MXCSR denormal exception is reported. Because the full range of 16-bit floating-point encodings, including denormal encodings, can be represented exactly in single-precision format, rounding, inexact results, and denormalized results are not applicable. The operation of this instruction is illustrated in the following diagram. VCVTPH2PS 128-Bit src = xmm2/mem64 127 6463 127 255 96 95 64 63 16 15 0 convert 32 31 0 128 0s dest = xmm1 VCVTPH2PS 256-Bit src = xmm2/ mem128 127 112 111 96 95 convert 255 32 31 convert convert convert 48 47 convert convert 224 223 convert 192 191 80 79 convert 64 63 128 127 96 95 32 31 convert convert 160 159 48 47 64 63 16 15 0 convert 32 31 0 dest = ymm1 This extended-form instruction has both 128-bit and 256-bit encodings: XMM Encoding Converts four packed 16-bit floating-point values in the low-order 64 bits of an XMM register or in a 64-bit memory location to four packed single-precision floating-point values and writes the converted values to an XMM destination register. When the result operand is written to the destination register, the upper 128 bits of the corresponding YMM register are zeroed. 602 VCVTPH2PS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology YMM Encoding Converts eight packed 16-bit floating-point values in the low-order 128 bits of a YMM register or in a 128-bit memory location to eight packed single-precision floating-point values and writes the converted values to a YMM destination register. Instruction Support Form Subset VCVTPH2PS F16C Feature Flag CPUID Fn0000_0001_ECX[F16C] (bit 29) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VCVTPH2PS xmm1, xmm2/mem64 C4 RXB.02 0.1111.0.01 13 /r VCVTPH2PS ymm1, xmm2/mem128 C4 RXB.02 0.1111.1.01 13 /r Related Instructions VCVTPS2PH rFLAGS Affected None Instruction Reference VCVTPH2PS 603 AMD64 Technology 26568—Rev. 3.22—May 2018 MXCSR Flags Affected MM FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE M 17 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank. Exception Mode F F F Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. F Invalid opcode, #UD Cause of Exception Real Virt Prot CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. F XFEATURE_ENABLED_MASK[2:1] ! = 11b. F VEX.W field = 1. A VEX.vvvv ! = 1111b. F REX, F2, F3, or 66 prefix preceding VEX prefix. F Lock prefix (F0h) preceding opcode. F Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. Device not available, #NM F CR0.TS = 1. Stack, #SS F Memory address exceeding stack segment limit or non-canonical. F Memory address exceeding data segment limit or non-canonical. General protection, #GP F Null data segment used to reference memory. Alignment check, #AC F Unaligned memory reference when alignment checking enabled. Page fault, #PF F Instruction execution caused a page fault. SIMD Floating-Point Exception, #XF F Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid-operation exception (IE) F A source operand was an SNaN value. F Undefined operation. Denormalized-operand exception (DE) F A source operand was a denormal value. Overflow exception (OE) F Rounded result too large to fit into the format of the destination operand. Underflow exception (UE) F Rounded result too small to fit into the format of the destination operand. Precision exception (PE) F A result could not be represented exactly in the destination format. F — F16C exception. 604 VCVTPH2PS Instruction Reference 26568—Rev. 3.22—May 2018 VCVTPS2PH AMD64 Technology Convert Packed Single-Precision Floating-Point to 16-Bit Floating-Point Converts packed single-precision floating-point values to packed 16-bit floating-point values and writes the converted values to the destination register or to memory. An 8-bit immediate operand provides dynamic control of rounding. The operation of this instruction is illustrated in the following diagram. VCVTPS2PH 128-Bit 127 96 95 64 63 32 31 0 src = xmm2 convert convert 255 128 convert round imm8 convert 127 6463 48 47 32 31 16 15 0 0s 0s dest = xmm1/mem64 VCVTPS2PH 256-Bit src = ymm2 255 224 223 192 191 160 159 128 127 96 95 64 63 32 31 0 convert convert convert convert convert convert imm8 convert round convert 128 255 127 112 111 96 95 0s Instruction Reference 80 79 64 63 48 47 32 31 1615 0 dest = xmm1/mem128 VCVTPS2PH 605 AMD64 Technology 26568—Rev. 3.22—May 2018 The handling of rounding is controlled by fields in the immediate byte, as shown in the following table. Rounding Control with Immediate Byte Operand Mnemonic Rounding Source (RS) Bit 2 0 1 Value Rounding Control (RC) 1 0 Description 0 0 Nearest 0 1 Down 1 0 Up 1 1 Truncate X X Use MXCSR.RC for rounding. Notes Ignore MXCSR.RC. MXCSR[FTZ] has no effect on this instruction. Values within the half-precision denormal range are unconditionally converted to denormals. This extended-form instruction has both 128-bit and 256-bit encodings: XMM Encoding Converts four packed single-precision floating-point values in an XMM register to four packed 16-bit floating-point values and writes the converted values to the low-order 64 bits of the destination XMM register or to a 64-bit memory location. When the result is written to the destination XMM register, the high-order 64 bits in the destination XMM register and the upper 128 bits of the corresponding YMM register are cleared to 0s. YMM Encoding Converts eight packed single-precision floating-point values in a YMM register to eight packed 16bit floating-point values and writes the converted values to the low-order 128 bits of a YMM register or to a 128-bit memory location. When the result is written to the destination YMM register, the highorder 128 bits in the register are cleared to 0s. Instruction Support Form Subset VCVTPH2PH F16C Feature Flag CPUID Fn0000_0001_ECX[F16C] (bit 29) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 606 VCVTPS2PH Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VCVTPS2PH xmm1/mem64, xmm2, imm8 C4 RXB.03 0.1111.0.01 1D /r /imm8 VCVTPS2PH xmm1/mem128, ymm2, imm8 C4 RXB.03 0.1111.1.01 1D /r /imm8 Related Instructions VCVTPH2PS rFLAGS Affected None MXCSR Flags Affected MM 17 FZ 15 RC 14 PM 13 12 UM 11 OM 10 ZM 9 DM 8 IM 7 DAZ 6 PE UE OE M M M 5 4 3 ZE 2 DE IE M M 1 0 Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank. Instruction Reference VCVTPS2PH 607 AMD64 Technology Exception 26568—Rev. 3.22—May 2018 Mode Cause of Exception Real Virt Prot F F F Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. F CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. F XFEATURE_ENABLED_MASK[2:1] ! = 11b. F VEX.W field = 1. A VEX.vvvv ! = 1111b. F REX, F2, F3, or 66 prefix preceding VEX prefix. F Lock prefix (F0h) preceding opcode. F Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. Device not available, #NM F CR0.TS = 1. Stack, #SS F Memory address exceeding stack segment limit or non-canonical. F Memory address exceeding data segment limit or non-canonical. F Null data segment used to reference memory. Alignment check, #AC F Unaligned memory reference when alignment checking enabled. Page fault, #PF F Instruction execution caused a page fault. SIMD Floating-Point Exception, #XF F Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. Invalid opcode, #UD General protection, #GP SIMD Floating-Point Exceptions Invalid-operation exception (IE) F A source operand was an SNaN value. F Undefined operation. Denormalized-operand exception (DE) F A source operand was a denormal value. Overflow exception (OE) F Rounded result too large to fit into the format of the destination operand. Underflow exception (UE) F Rounded result too small to fit into the format of the destination operand. Precision exception (PE) F A result could not be represented exactly in the destination format. F — F16C exception. 608 VCVTPS2PH Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology VEXTRACTF128 Extract Packed Floating-Point Values Extracts 128 bits of packed data from a YMM register as specified by an immediate byte operand, and writes it to either an XMM register or a 128-bit memory location. Only bit [0] of the immediate operand is used. Operation is as follows. • When imm8[0] = 0, copy bits [127:0] of the source to the destination. • When imm8[0] = 1, copy bits [255:128] of the source to the destination. This extended-form instruction has a single 256-bit encoding. The source operand is a YMM register and the destination is either an XMM register or a 128-bit memory location. There is a third immediate byte operand. Instruction Support Form Subset VEXTRACTF128 AVX Feature Flag CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding VEXTRACTF128 xmm/mem128, ymm, imm8 VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.03 0.1111.1.01 19 /r ib Related Instructions VBROADCASTF128, VINSERTF128 rFLAGS Affected None MXCSR Flags Affected None Instruction Reference VEXTRACTF128 609 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC A — AVX exception. 610 A A A A A A A A A A A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. VEX.L = 0. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Write to a read-only data segment. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VEXTRACTF128 Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology VEXTRACTI128 Extract 128-bit Integer Writes a selected 128-bit half of a YMM register to an XMM register or a 128-bit memory location based on the value of bit 0 of an immediate byte. There is a single form of this instruction: VEXTRACTI128 dest, src, imm8 If imm8[0] = 0, the lower half of the source YMM register is selected; if imm8[0] = 1, the upper half of the source register is selected. There is a single VEX.L = 1 encoding of this instruction. The source operand is a YMM register. The destination is either an XMM register or a 128-bit memory location. When the destination is a register, bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset VEXTRACTI128 AVX2 Feature Flag Fn0000_00007_EBX[AVX2]_x0 (bit 5) Instruction Encoding Encoding Mnemonic VEXTRACTI128 xmm1/mem128, ymm2, imm8 VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.03 0.1111.1.01 39 /r ib Related Instructions VBROADCASTF128, VBROADCASTI128, VEXTRACTF128, VINSERTF128, VINSERTI128 rFLAGS Affected None MXCSR Flags Affected None Instruction Reference VEXTRACTI128 611 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC A — AVX exception. 612 A A A A A A A A A A A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. VEX.vvvv ! = 1111b. VEX.L = 0. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. VEXTRACTI128 Instruction Reference 26568—Rev. 3.22—May 2018 VFMADDPD VFMADD132PD VFMADD213PD VFMADD231PD AMD64 Technology Multiply and Add Packed Double-Precision Floating-Point Multiplies together two double-precision floating-point vectors and adds the unrounded product to a third double-precision floating-point vector producing a precise result which is then rounded to double-precision based on the mode specified by the MXCSR[RC] field. The rounded sum is written to the destination register. The role of each of the source operands specified by the assembly language prototypes given below is reflected in the vector equation in the comment on the right. There are two four-operand forms: VFMADDPD dest, src1, src2/mem, src3 VFMADDPD dest, src1, src2, src3/mem // dest = (src1* src2/mem) + src3 // dest = (src1* src2) + src3/mem and three three-operand forms: VFMADD132PD scr1, src2, src3/mem VFMADD213PD scr1, src2, src3/mem VFMADD231PD scr1, src2, src3/mem // src1 = (src1* src3/mem) + src2 // src1 = (src2* src1) + src3/mem // src1 = (src2* src3/mem) + src1 When VEX.L = 0, the vector size is 128 bits (two double-precision elements per vector) and registerbased source operands are held in XMM registers. When VEX.L = 1, the vector size is 256 bits (four double-precision elements per vector) and registerbased source operands are held in YMM registers. For the four-operand forms, VEX.W determines operand configuration. • When VEX.W = 0, the second source is either a register or a memory location and the third source is a register. • When VEX.W = 1, the second source is a register and the third source is either a register or a memory location. For the three-operand forms, VEX.W is 1. The first and second operands are registers and the third operand is either a register or a memory location. The destination is either an XMM register or a YMM register, as determined by VEX.L. When the destination is an XMM register (L = 0), bits [255:128] of the corresponding YMM register are cleared. Instruction Support Form Subset Feature Flag VFMADDPD FMA4 CPUID Fn8000_0001_ECX[FMA4] (bit 16) VFMADDnnnPD FMA CPUID Fn0000_0001_ECX[FMA] (bit 12) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Reference VFMADDPD, VFMADDnnnPD 613 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VFMADDPD xmm1, xmm2, xmm3/mem128, xmm4 C4 RXB.03 0.src1.0.01 69 /r /is4 VFMADDPD ymm1, ymm2, ymm3/mem256, ymm4 C4 RXB.03 0.src1.1.01 69 /r /is4 VFMADDPD xmm1, xmm2, xmm3, xmm4/mem128 C4 RXB.03 1.src1.0.01 69 /r /is4 VFMADDPD ymm1, ymm2, ymm3, ymm4/mem256 C4 RXB.03 1.src1.1.01 69 /r /is4 VFMADD132PD xmm0, xmm1, xmm2/m128 C4 RXB.02 1.src2.0.01 98 /r VFMADD132PD ymm0, ymm1, ymm2/m256 C4 RXB.02 1.src2.1.01 98 /r VFMADD213PD xmm0, xmm1, xmm2/m128 C4 RXB.02 1.src2.0.01 A8 /r VFMADD213PD ymm0, ymm1, ymm2/m256 C4 RXB.02 1.src2.1.01 A8 /r VFMADD231PD xmm0, xmm1, xmm2/m128 C4 RXB.02 1.src2.0.01 B8 /r VFMADD231PD ymm0, ymm1, ymm2/m256 C4 RXB.02 1.src2.1.01 B8 /r Related Instructions VFMADDPS, VFMADD132PS, VFMADD213PS, VFMADD231PS, VFMADDSD, VFMADD132SD, VFMADD213SD, VFMADD231SD, VFMADDSS, VFMADD132SS, VFMADD213SS, VFMADD231SS rFLAGS Affected None MXCSR Flags Affected MM FZ 17 15 Note: 614 RC 14 13 PM UM OM ZM DM IM DAZ 12 11 10 9 8 7 6 PE UE OE M M M 5 4 3 ZE 2 DE IE M M 1 0 A flag that may be set or cleared is M (modified). Unaffected flags are blank. VFMADDPD, VFMADDnnnPD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot F F Invalid opcode, #UD F F F F F F Device not available, #NM Stack, #SS Page fault, #PF Alignment check, #AC F F F F F F SIMD floating-point, #XF F General protection, #GP Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. FMA instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE F — FMA, FMA4 exception Instruction Reference F F F F F F A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. VFMADDPD, VFMADDnnnPD 615 AMD64 Technology 26568—Rev. 3.22—May 2018 VFMADDPS VFMADD132PS VFMADD213PS VFMADD231PS Multiply and Add Packed Single-Precision Floating-Point Multiplies together two single-precision floating-point vectors and adds the unrounded product to a third single-precision floating-point vector producing a precise result which is then rounded to singleprecision based on the mode specified by the MXCSR[RC] field. The rounded sum is written to the destination register. The role of each of the source operands specified by the assembly language prototypes given below is reflected in the vector equation in the comment on the right. There are two four-operand forms: VFMADDPS dest, src1, src2/mem, src3 VFMADDPS dest, src1, src2, src3/mem // dest = (src1* src2/mem) + src3 // dest = (src1* src2) + src3/mem and three three-operand forms: VFMADD132PS scr1, src2, src3/mem VFMADD213PS scr1, src2, src3/mem VFMADD231PS scr1, src2, src3/mem // src1 = (src1* src3/mem) + src2 // src1 = (src2* src1) + src3/mem // src1 = (src2* src3/mem) + src1 When VEX.L = 0, the vector size is 128 bits (four single-precision elements per vector) and registerbased source operands are held in XMM registers. When VEX.L = 1, the vector size is 256 bits (eight single-precision elements per vector) and registerbased source operands are held in YMM registers. For the four-operand forms, VEX.W determines operand configuration. • When VEX.W = 0, the second source is either a register or a memory location and the third source is a register. • When VEX.W = 1, the second source is a register and the third source is either a register or a memory location. For the three-operand forms, VEX.W is 0. The first and second operands are registers and the third operand is either a register or a memory location. The destination is either an XMM register or a YMM register, as determined by VEX.L. When the destination is an XMM register (L = 0), bits [255:128] of the corresponding YMM register are cleared. Instruction Support Form Subset Feature Flag VFMADDPS FMA4 CPUID Fn8000_0001_ECX[FMA4] (bit 16) VFMADDnnnPS FMA CPUID Fn0000_0001_ECX[FMA] (bit 12) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 616 VFMADDPS, VFMADDnnnPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VFMADDPS xmm1, xmm2, xmm3/mem128, xmm4 C4 RXB.03 0.src1.0.01 68 /r /is4 VFMADDPS ymm1, ymm2, ymm3/mem256, ymm4 C4 RXB.03 0.src1.1.01 68 /r /is4 VFMADDPS xmm1, xmm2, xmm3, xmm4/mem128 C4 RXB.03 1.src1.0.01 68 /r /is4 VFMADDPS ymm1, ymm2, ymm3, ymm4/mem256 C4 RXB.03 1.src1.1.01 68 /r /is4 VFMADD132PS xmm0, xmm1, xmm2/m128 C4 RXB.02 0.src2.0.01 98 /r VFMADD132PS ymm0, ymm1, ymm2/m256 C4 RXB.02 0.src2.1.01 98 /r VFMADD213PS xmm0, xmm1, xmm2/m128 C4 RXB.02 0.src2.0.01 A8 /r VFMADD213PS ymm0, ymm1, ymm2/m256 C4 RXB.02 0.src2.1.01 A8 /r VFMADD231PS xmm0, xmm1, xmm2/m128 C4 RXB.02 0.src2.0.01 B8 /r VFMADD231PS ymm0, ymm1, ymm2/m256 C4 RXB.02 0.src2.1.01 B8 /r Related Instructions VFMADDPD, VFMADD132PD, VFMADD213PD, VFMADD231PD, VFMADDSD, VFMADD132SD, VFMADD213SD, VFMADD231SD, VFMADDSS, VFMADD132SS, VFMADD213SS, VFMADD231SS rFLAGS Affected None MXCSR Flags Affected MM FZ 17 15 Note: RC 14 13 PM UM OM ZM DM IM DAZ 12 11 10 9 8 7 6 PE UE OE M M M 5 4 3 ZE 2 DE IE M M 1 0 A flag that may be set or cleared is M (modified). Unaffected flags are blank. Instruction Reference VFMADDPS, VFMADDnnnPS 617 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot F F Invalid opcode, #UD F F F F F F Device not available, #NM Stack, #SS Page fault, #PF Alignment check, #AC F F F F F F SIMD floating-point, #XF F General protection, #GP Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. FMA instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE F — FMA, FMA4 exception 618 F F F F F F A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. VFMADDPS, VFMADDnnnPS Instruction Reference 26568—Rev. 3.22—May 2018 VFMADDSD VFMADD132SD VFMADD213SD VFMADD231SD AMD64 Technology Multiply and Add Scalar Double-Precision Floating-Point Multiplies together two double-precision floating-point values and adds the unrounded product to a third double-precision floating-point value producing a precise result which is then rounded to double-precision based on the mode specified by the MXCSR[RC] field. The rounded sum is written to the destination register. The role of each of the source operands specified by the assembly language prototypes given below is reflected in the equation in the comment on the right. There are two four-operand forms: VFMADDSD dest, src1, src2/mem64, src3 VFMADDSD dest, src1, src2, src3/mem64 // dest = (src1* src2/mem64) + src3 // dest = (src1* src2) + src3/mem64 and three three-operand forms: VFMADD132SD scr1, src2, src3/mem64 VFMADD213SD scr1, src2, src3/mem64 VFMADD231SD scr1, src2, src3/mem64 // src1 = (src1* src3/mem64) + src2 // src1 = (src2* src1) + src3/mem64 // src1 = (src2* src3/mem64) + src1 All 64-bit double-precision floating-point register-based operands are held in the lower quadword of XMM registers. The result is written to the lower quadword of the destination register. For those instructions that use a memory-based operand, one of the source operands is a 64-bit value read from memory. For the four-operand forms, VEX.W determines operand configuration. • When VEX.W = 0, the second source is either a register or a 64-bit memory location and the third source is a register. • When VEX.W = 1, the second source is a register and the third source is either a register or a 64-bit memory location. For the three-operand forms, VEX.W is 1. The first and second operands are registers and the third operand is either a register or a 64-bit memory location. The destination is an XMM register. When the result is written to the destination XMM register, bits [127:64] of the destination and bits [255:128] of the corresponding YMM register are cleared. Instruction Support Form Subset Feature Flag VFMADDSD FMA4 CPUID Fn8000_0001_ECX[FMA4] (bit 16) VFMADDnnnSD FMA CPUID Fn0000_0001_ECX[FMA] (bit 12) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Reference VFMADDSD, VFMADDnnnSD 619 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VFMADDSD xmm1, xmm2, xmm3/mem128, xmm4 C4 RXB.03 0.src1.X.01 6B /r /is4 VFMADDSD xmm1, xmm2, xmm3, xmm4/mem128 C4 RXB.03 1.src1.X.01 6B /r /is4 VFMADD132SD xmm0, xmm1, xmm2/m128 C4 RXB.02 1.src2.X.01 99 /r VFMADD213SD xmm0, xmm1, xmm2/m128 C4 RXB.02 1.src2.X.01 A9 /r VFMADD231SD xmm0, xmm1, xmm2/m128 C4 RXB.02 1.src2.X.01 B9 /r Related Instructions VFMADDPD, VFMADD132PD, VFMADD213PD, VFMADD231PD, VFMADDPS, VFMADD132PS, VFMADD213PS, VFMADD231PS, VFMADDSS, VFMADD132SS, VFMADD213SS, VFMADD231SS rFLAGS Affected None MXCSR Flags Affected MM 17 Note: 620 FZ 15 RC 14 PM 13 12 UM 11 OM 10 ZM 9 DM 8 IM 7 DAZ 6 PE UE OE M M M 5 4 3 ZE 2 DE IE M M 1 0 A flag that may be set or cleared is M (modified). Unaffected flags are blank. VFMADDSD, VFMADDnnnSD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot F F Invalid opcode, #UD F F F F F F Device not available, #NM Stack, #SS Page fault, #PF Alignment check, #AC F F F F F F SIMD floating-point, #XF F General protection, #GP Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. FMA instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Non-aligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE F — FMA, FMA4 exception Instruction Reference F F F F F F A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. VFMADDSD, VFMADDnnnSD 621 AMD64 Technology 26568—Rev. 3.22—May 2018 VFMADDSS VFMADD132SS VFMADD213SS VFMADD231SS Multiply and Add Scalar Single-Precision Floating-Point Multiplies together two single-precision floating-point values and adds the unrounded product to a third single-precision floating-point value producing a precise result which is then rounded to singleprecision based on the mode specified by the MXCSR[RC] field. The rounded sum is written to the destination register. The role of each of the source operands specified by the assembly language prototypes given below is reflected in the equation in the comment on the right. There are two four-operand forms: VFMADDSS dest, src1, src2/mem32, src3 VFMADDSS dest, src1, src2, src3/mem32 // dest = (src1* src2/mem32) + src3 // dest = (src1* src2) + src3/mem32 and three three-operand forms: VFMADD132SS scr1, src2, src3/mem32 VFMADD213SS scr1, src2, src3/mem32 VFMADD231SS scr1, src2, src3/mem32 // src1 = (src1* src3/mem32) + src2 // src1 = (src2* src1) + src3/mem32 // src1 = (src2* src3/mem32) + src1 All 32-bit single-precision floating-point register-based operands are held in the lower doubleword of XMM registers. The result is written to the low doubleword of the destination register. For those instructions that use a memory-based operand, one of the source operands is a 32-bit value read from memory. For the four-operand forms, VEX.W determines operand configuration. • When VEX.W = 0, the second source is either a register or a 32-bit memory location and the third source is a register. • When VEX.W = 1, the second source is a a register and the third source is either a register or a 32bit memory location. For the three-operand forms, VEX.W is 0. The first and second operands are registers and the third operand is either a register or a 32-bit memory location. The destination is an XMM register. When the result is written to the destination XMM register, bits [127:32] of the destination and bits [255:128] of the corresponding YMM register are cleared. Instruction Support Form Subset Feature Flag VFMADDSS FMA4 CPUID Fn8000_0001_ECX[FMA4] (bit 16) VFMADDnnnSS FMA CPUID Fn0000_0001_ECX[FMA] (bit 12) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 622 VFMADDSS, VFMADDnnnSS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VFMADDSS xmm1, xmm2, xmm3/mem32, xmm4 C4 RXB.03 0.src1.X.01 6A /r /is4 VFMADDSS xmm1, xmm2, xmm3, xmm4/mem32 C4 RXB.03 1.src1.X.01 6A /r /is4 VFMADD132SS xmm1, xmm2, xmm3/mem32 C4 RXB.02 0.src2.X.01 99 /r VFMADD213SS xmm1, xmm2, xmm3/mem32 C4 RXB.02 0.src2.X.01 A9 /r VFMADD231SS xmm1, xmm2, xmm3/mem32 C4 RXB.02 0.src2.X.01 B9 /r Related Instructions VFMADDPD, VFMADD132PD, VFMADD213PD, VFMADD231PD, VFMADDPS, VFMADD132PS, VFMADD213PS, VFMADD231PS, VFMADDSD, VFMADD132SD, VFMADD213SD, VFMADD231SD rFLAGS Affected None MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM 11 OM 10 ZM 9 DM 8 IM 7 DAZ 6 PE UE OE M M M 5 4 3 ZE 2 DE IE M M 1 0 A flag that may be set or cleared is M (modified). Unaffected flags are blank. Instruction Reference VFMADDSS, VFMADDnnnSS 623 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot F F Invalid opcode, #UD F F F F F F Device not available, #NM Stack, #SS Page fault, #PF Alignment check, #AC F F F F F F SIMD floating-point, #XF F General protection, #GP Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. FMA instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Non-aligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE F — FMA, FMA4 exception 624 F F F F F F A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. VFMADDSS, VFMADDnnnSS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology VFMADDSUBPD VFMADDSUB132PD VFMADDSUB213PD VFMADDSUB231PD Multiply with Alternating Add/Subtract Packed Double-Precision Floating-Point Multiplies together two double-precision floating-point vectors, adds odd elements of the unrounded product to odd elements of a third double-precision floating-point vector, and subtracts even elements of the third floating point vector from even elements of unrounded product. The precise result of each addition or subtraction is then rounded to double-precision based on the mode specified by the MXCSR[RC] field and written to the corresponding element of the destination. The role of each of the source operands specified by the assembly language prototypes given below is reflected in the equation in the comment on the right. There are two four-operand forms: VFMADDSUBPD dest, src1, src2/mem, src3 VFMADDSUBPD dest, src1, src2, src3/mem // destodd = (src1odd* src2odd/memodd) + src3odd // desteven = (src1even * src2even /memeven ) − src3even // destodd = (src1odd* src2odd) + src3odd/memodd // desteven = (src1even* src2even) − src3even/memeven and three three-operand forms: VFMADDSUB132PD scr1, src2, src3/mem VFMADDSUB213PD scr1, src2, src3/mem VFMADDSUB231PD scr1, src2, src3/mem // src1odd = (src1odd * src3odd /memodd ) + src2odd // src1even = (src1even* src3even/memeven) − src2even // src1odd = (src2odd * src1odd ) + src3odd /memodd // src1even = (src2even* src1even) − src3even/memeven // src1odd = (src2odd * src3odd /memodd ) + src1odd // src1even = (src2even* src3even/memeven) − src1even When VEX.L = 0, the vector size is 128 bits (two double-precision elements per vector) and registerbased source operands are held in XMM registers. When VEX.L = 1, the vector size is 256 bits (four double-precision elements per vector) and registerbased source operands are held in YMM registers. For the four-operand forms, VEX.W determines operand configuration. • When VEX.W = 0, the second source is either a register or a memory location and the third source is a register. • When VEX.W = 1, the second source is a register and the third source is either a register or a memory location. For the three-operand forms, VEX.W is 1. The first and second operands are registers and the third operand is either a register or a memory location. The destination is either an XMM register or a YMM register, as determined by VEX.L. When the destination is an XMM register (L = 0), bits [255:128] of the corresponding YMM register are cleared. Instruction Reference VFMADDSUBPD, VFMADDSUBnnnPD 625 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Support Form Subset Feature Flag VFMADDSUBPD FMA4 CPUID Fn8000_0001_ECX[FMA4] (bit 16) VFMADDSUBnnnPD FMA CPUID Fn0000_0001_ECX[FMA] (bit 12) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VFMADDSUBPD xmm1, xmm2, xmm3/mem128, xmm4 C4 RXB.03 0.src1.0.01 5D /r /is4 VFMADDSUBPD ymm1, ymm2, ymm3/mem256, ymm4 C4 RXB.03 0.src1.1.01 5D /r /is4 VFMADDSUBPD xmm1, xmm2, xmm3, xmm4/mem128 C4 RXB.03 1.src1.0.01 5D /r /is4 VFMADDSUBPD ymm1, ymm2, ymm3, ymm4/mem256 C4 RXB.03 1.src1.1.01 5D /r /is4 VFMADDSUB132PD xmm1, xmm2, xmm3/mem128 C4 RXB.02 1.src2.0.01 96 /r VFMADDSUB132PD ymm1, ymm2, ymm3/mem256 C4 RXB.02 1.src2.1.01 96 /r VFMADDSUB213PD xmm1, xmm2, xmm3/mem128 C4 RXB.02 1.src2.0.01 A6 /r VFMADDSUB213PD ymm1, ymm2, ymm3/mem256 C4 RXB.02 1.src2.1.01 A6 /r VFMADDSUB231PD xmm1, xmm2, xmm3/mem128 C4 RXB.02 1.src2.0.01 B6 /r VFMADDSUB231PD ymm1, ymm2, ymm3/mem256 C4 RXB.02 1.src2.1.01 B6 /r Related Instructions VFMSUBADDPD, VFMSUBADD132PD, VFMSUBADD213PD, VFMSUBADD231PD, VFMADDSUBPS, VFMADDSUB132PS, VFMADDSUB213PS, VFMADDSUB231PS, VFMSUBADDPS, VFMSUBADD132PS, VFMSUBADD213PS, VFMSUBADD231PS rFLAGS Affected None MXCSR Flags Affected MM 17 Note: 626 FZ 15 RC 14 PM 13 12 UM 11 OM 10 ZM 9 DM 8 IM 7 DAZ 6 PE UE OE M M M 5 4 3 ZE 2 DE IE M M 1 0 A flag that may be set or cleared is M (modified). Unaffected flags are blank. VFMADDSUBPD, VFMADDSUBnnnPD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot F F Invalid opcode, #UD F F F F F F Device not available, #NM Stack, #SS Page fault, #PF Alignment check, #AC F F F F F F SIMD floating-point, #XF F General protection, #GP Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. FMA instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE F — FMA, FMA4 exception Instruction Reference F F F F F F A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. VFMADDSUBPD, VFMADDSUBnnnPD 627 AMD64 Technology 26568—Rev. 3.22—May 2018 VFMADDSUBPS VFMADDSUB132PS VFMADDSUB213PS VFMADDSUB231PS Multiply with Alternating Add/Subtract Packed Single-Precision Floating-Point Multiplies together two single-precision floating-point vectors, adds odd elements of the unrounded product to odd elements of a third single-precision floating-point vector, and subtracts even elements of the third floating point vector from even elements of unrounded product. The precise result of each addition or subtraction is then rounded to single-precision based on the mode specified by the MXCSR[RC] field and written to the corresponding element of the destination. The role of each of the source operands specified by the assembly language prototypes given below is reflected in the equation in the comment on the right. There are two four-operand forms: VFMADDSUBPS dest, src1, src2/mem, src3 VFMADDSUBPS dest, src1, src2, src3/mem // destodd = (src1odd* src2odd/memodd) + src3odd // desteven = (src1even * src2even /memeven ) − src3even // destodd = (src1odd* src2odd) + src3odd/memodd // desteven = (src1even* src2even) − src3even/memeven and three three-operand forms: VFMADDSUB132PS scr1, src2, src3/mem VFMADDSUB213PS scr1, src2, src3/mem VFMADDSUB231PS scr1, src2, src3/mem // src1odd = (src1odd * src3odd /memodd ) + src2odd // src1even = (src1even* src3even/memeven) − src2even // src1odd = (src2odd * src1odd ) + src3odd /memodd // src1even = (src2even* src1even) − src3even/memeven // src1odd = (src2odd * src3odd /memodd ) + src1odd // src1even = (src2even* src3even/memeven) − src1even When VEX.L = 0, the vector size is 128 bits (four single-precision elements per vector) and registerbased source operands are held in XMM registers. When VEX.L = 1, the vector size is 256 bits (eight single-precision elements per vector) and registerbased source operands are held in YMM registers. For the four-operand forms, VEX.W determines operand configuration. • When VEX.W = 0, the second source is either a register or a memory location and the third source is a register. • When VEX.W = 1, the second source is a register and the third source is either a register or a memory location. For the three-operand forms, VEX.W is 0. The first and second operands are registers and the third operand is either a register or a memory location. The destination is either an XMM register or a YMM register, as determined by VEX.L. When the destination is an XMM register (L = 0), bits [255:128] of the corresponding YMM register are cleared. 628 VFMADDSUBPS, VFMADDSUBnnnPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Support Form Subset Feature Flag VFMADDSUBPS FMA4 CPUID Fn8000_0001_ECX[FMA4] (bit 16) VFMADDSUBnnnPS FMA CPUID Fn0000_0001_ECX[FMA] (bit 12) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VFMADDSUBPS xmm1, xmm2, xmm3/mem128, xmm4 C4 RXB.03 0.src1.0.01 5C /r /is4 VFMADDSUBPS ymm1, ymm2, ymm3/mem256, ymm4 C4 RXB.03 0.src1.1.01 5C /r /is4 VFMADDSUBPS xmm1, xmm2, xmm3, xmm4/mem128 C4 RXB.03 1.src1.0.01 5C /r /is4 VFMADDSUBPS ymm1, ymm2, ymm3, ymm4/mem256 C4 RXB.03 1.src1.1.01 5C /r /is4 VFMADDSUB132PS xmm1, xmm2, xmm3/mem128 C4 RXB.02 0.src2.0.01 96 /r VFMADDSUB132PS ymm1, ymm2, ymm3/mem256 C4 RXB.02 0.src2.1.01 96 /r VFMADDSUB213PS xmm1, xmm2, xmm3/mem128 C4 RXB.02 0.src2.0.01 A6 /r VFMADDSUB213PS ymm1, ymm2, ymm3/mem256 C4 RXB.02 0.src2.1.01 A6 /r VFMADDSUB231PS xmm1, xmm2, xmm3/mem128 C4 RXB.02 0.src2.0.01 B6 /r VFMADDSUB231PS ymm1, ymm2, ymm3/mem256 C4 RXB.02 0.src2.1.01 B6 /r Related Instructions VFMADDSUBPD, VFMADDSUB132PD, VFMADDSUB213PD, VFMADDSUB231PD, VFMSUBADDPD, VFMSUBADD132PD, VFMSUBADD213PD, VFMSUBADD231PD, VFMSUBADDPS, VFMSUBADD132PS, VFMSUBADD213PS, VFMSUBADD231PS rFLAGS Affected None MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM 11 OM 10 ZM 9 DM 8 IM 7 DAZ 6 PE UE OE M M M 5 4 3 ZE 2 DE IE M M 1 0 A flag that may be set or cleared is M (modified). Unaffected flags are blank. Instruction Reference VFMADDSUBPS, VFMADDSUBnnnPS 629 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot F F Invalid opcode, #UD F F F F F F Device not available, #NM Stack, #SS Page fault, #PF Alignment check, #AC F F F F F F SIMD floating-point, #XF F General protection, #GP Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. FMA instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE F — FMA, FMA4 exception 630 F F F F F F A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. VFMADDSUBPS, VFMADDSUBnnnPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology VFMSUBADDPD VFMSUBADD132PD VFMSUBADD213PD VFMSUBADD231PD Multiply with Alternating Subtract/Add Packed Double-Precision Floating-Point Multiplies together two double-precision floating-point vectors, adds even elements of the unrounded product to even elements of a third double-precision floating-point vector, and subtracts odd elements of the third floating point vector from odd elements of unrounded product. The precise result of each addition or subtraction is then rounded to double-precision based on the mode specified by the MXCSR[RC] field and written to the corresponding element of the destination. The role of each of the source operands specified by the assembly language prototypes given below is reflected in the equation in the comment on the right. There are two four-operand forms: VFMSUBADDPD dest, src1, src2/mem, src3 VFMSUBADDPD dest, src1, src2, src3/mem // destodd = (src1odd* src2odd/memodd) − src3odd // desteven = (src1even * src2even /memeven ) + src3even // destodd = (src1odd* src2odd) − src3odd/memodd // desteven = (src1even* src2even) + src3even/memeven and three three-operand forms: VFMSUBADD132PD scr1, src2, src3/mem VFMSUBADD213PD scr1, src2, src3/mem VFMSUBADD231PD scr1, src2, src3/mem // src1odd = (src1odd * src3odd /memodd ) − src2odd // src1even = (src1even* src3even/memeven) + src2even // src1odd = (src2odd * src1odd ) − src3odd /memodd // src1even = (src2even* src1even) + src3even/memeven // src1odd = (src2odd * src3odd /memodd ) − src1odd // src1even = (src2even* src3even/memeven) + src1even For VEX.L = 0, vector size is 128 bits and register-based operands are held in XMM registers. For VEX.L = 1, vector size is 256 bits and register-based operands are held in YMM registers. For the four-operand forms, VEX.W determines operand configuration. • When VEX.W = 0, the second source is either a register or a memory location and the third source is a register. • When VEX.W = 1, the second source is a register and the third source operand is either a register or a memory location. For the three-operand forms, VEX.W is 1. The first and second operands are registers and the third operand is either a register or a memory location. The destination is either an XMM register or a YMM register, as determined by VEX.L. When the destination is an XMM register (L = 0), bits [255:128] of the corresponding YMM register are cleared. Instruction Support Form Subset VFMSUBADDPD FMA4 CPUID Fn8000_0001_ECX[FMA4] (bit 16) VFMSUBADDnnnPD FMA CPUID Fn0000_0001_ECX[FMA] (bit 12) Instruction Reference Feature Flag VFMSUBADDPD, VFMSUBADDnnnPD 631 AMD64 Technology 26568—Rev. 3.22—May 2018 For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VFMSUBADDPD xmm1, xmm2, xmm3/mem128, xmm4 C4 RXB.03 0.src1.0.01 5F /r /is4 VFMSUBADDPD ymm1, ymm2, ymm3/mem256, ymm4 C4 RXB.03 0.src1.1.01 5F /r /is4 VFMSUBADDPD xmm1, xmm2, xmm3, xmm4/mem128 C4 RXB.03 1.src1.0.01 5F /r /is4 VFMSUBADDPD ymm1, ymm2, ymm3, ymm4/mem256 C4 RXB.03 1.src1.1.01 5F /r /is4 VFMSUBADD132PD xmm1, xmm2, xmm3/mem128 C4 RXB.02 1.src2.0.01 97 /r VFMSUBADD132PD ymm1, ymm2, ymm3/mem256 C4 RXB.02 1.src2.1.01 97 /r VFMSUBADD213PD xmm1, xmm2, xmm3/mem128 C4 RXB.02 1.src2.0.01 A7 /r VFMSUBADD213PD ymm1, ymm2, ymm3/mem256 C4 RXB.02 1.src2.1.01 A7 /r VFMSUBADD231PD xmm1, xmm2, xmm3/mem128 C4 RXB.02 1.src2.0.01 B7 /r VFMSUBADD231PD ymm1, ymm2, ymm3/mem256 C4 RXB.02 1.src2.1.01 B7 /r Related Instructions VFMADDSUBPD, VFMADDSUB132PD, VFMADDSUB213PD, VFMADDSUB231PD, VFMADDSUBPS, VFMADDSUB132PS, VFMADDSUB213PS, VFMADDSUB231PS, VFMSUBADDPS, VFMSUBADD132PS, VFMSUBADD213PS, VFMSUBADD231PS rFLAGS Affected None MXCSR Flags Affected MM FZ 17 15 Note: 632 RC 14 13 PM UM OM ZM DM IM DAZ 12 11 10 9 8 7 6 PE UE OE M M M 5 4 3 ZE 2 DE IE M M 1 0 A flag that may be set or cleared is M (modified). Unaffected flags are blank. VFMSUBADDPD, VFMSUBADDnnnPD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot F F Invalid opcode, #UD F F F F F F Device not available, #NM Stack, #SS Page fault, #PF Alignment check, #AC F F F F F F SIMD floating-point, #XF F General protection, #GP Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. FMA instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE F — FMA, FMA4 exception Instruction Reference F F F F F F A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. VFMSUBADDPD, VFMSUBADDnnnPD 633 AMD64 Technology 26568—Rev. 3.22—May 2018 VFMSUBADDPS VFMSUBADD132PS VFMSUBADD213PS VFMSUBADD231PS Multiply with Alternating Subtract/Add Packed Single-Precision Floating-Point Multiplies together two single-precision floating-point vectors, adds even elements of the unrounded product to even elements of a third single-precision floating-point vector, and subtracts odd elements of the third floating point vector from odd elements of unrounded product. The precise result of each addition or subtraction is then rounded to single-precision based on the mode specified by the MXCSR[RC] field and written to the corresponding element of the destination. The role of each of the source operands specified by the assembly language prototypes given below is reflected in the equation in the comment on the right. There are two four-operand forms: VFMSUBADDPS dest, src1, src2/mem, src3 VFMSUBADDPS dest, src1, src2, src3/mem // destodd = (src1odd* src2odd/memodd) − src3odd // desteven = (src1even * src2even /memeven ) + src3even // destodd = (src1odd* src2odd) − src3odd/memodd // desteven = (src1even* src2even) + src3even/memeven and three three-operand forms: VFMSUBADD132PS scr1, src2, src3/mem VFMSUBADD213PS scr1, src2, src3/mem VFMSUBADD231PS scr1, src2, src3/mem // src1odd = (src1odd * src3odd /memodd ) − src2odd // src1even = (src1even* src3even/memeven) + src2even // src1odd = (src2odd * src1odd ) − src3odd /memodd // src1even = (src2even* src1even) + src3even/memeven // src1odd = (src2odd * src3odd /memodd ) − src1odd // src1even = (src2even* src3even/memeven) + src1even When VEX.L = 0, the vector size is 128 bits (four single-precision elements per vector) and registerbased source operands are held in XMM registers. When VEX.L = 1, the vector size is 256 bits (eight single-precision elements per vector) and registerbased source operands are held in YMM registers. For the four-operand forms, VEX.W determines operand configuration. • When VEX.W = 0, the second source is either a register or a memory location and the third source is a register. • When VEX.W = 1, the second source is a register and the third source is either a register or a memory location. For the three-operand forms, VEX.W is 0. The first and second operands are registers and the third operand is either a register or a memory location. The destination is either an XMM register or a YMM register, as determined by VEX.L. When the destination is an XMM register (L = 0), bits [255:128] of the corresponding YMM register are cleared. 634 VFMSUBADDPS, VFMSUBADDnnnPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Support Form Subset Feature Flag VFMSUBADDPS FMA4 CPUID Fn8000_0001_ECX[FMA4] (bit 16) VFMSUBADDnnnPS FMA CPUID Fn0000_0001_ECX[FMA] (bit 12) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VFMSUBADDPS xmm1, xmm2, xmm3/mem128, xmm4 C4 RXB.03 0.src1.0.01 5E /r /is4 VFMSUBADDPS ymm1, ymm2, ymm3/mem256, ymm4 C4 RXB.03 0.src1.1.01 5E /r /is4 VFMSUBADDPS xmm1, xmm2, xmm3, xmm4/mem128 C4 RXB.03 1.src1.0.01 5E /r /is4 VFMSUBADDPS ymm1, ymm2, ymm3, ymm4/mem256 C4 RXB.03 1.src1.1.01 5E /r /is4 VFMSUBADD132PS xmm1, xmm2, xmm3/mem128 C4 RXB.00010 0.src2.0.01 97 /r VFMSUBADD132PS ymm1, ymm2, ymm3/mem256 C4 RXB.00010 0.src2.1.01 97 /r VFMSUBADD213PS xmm1, xmm2, xmm3/mem128 C4 RXB.00010 0.src2.0.01 A7 /r VFMSUBADD213PS ymm1, ymm2, ymm3/mem256 C4 RXB.00010 0.src2.1.01 A7 /r VFMSUBADD231PS xmm1, xmm2, xmm3/mem128 C4 RXB.00010 0.src2.0.01 B7 /r VFMSUBADD231PS ymm1, ymm2, ymm3/mem256 C4 RXB.00010 0.src2.1.01 B7 /r Related Instructions VFMADDSUBPD, VFMADDSUB132PD, VFMADDSUB213PD, VFMADDSUB231PD, VFMADDSUBPS, VFMADDSUB132PS, VFMADDSUB213PS, VFMADDSUB231PS, VFMSUBADDPD, VFMSUBADD132PD, VFMSUBADD213PD, VFMSUBADD231PD rFLAGS Affected None MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM 11 OM 10 ZM 9 DM 8 IM 7 DAZ 6 PE UE OE M M M 5 4 3 ZE 2 DE IE M M 1 0 A flag that may be set or cleared is M (modified). Unaffected flags are blank. Instruction Reference VFMSUBADDPS, VFMSUBADDnnnPS 635 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot F F Invalid opcode, #UD F F F F F F Device not available, #NM Stack, #SS Page fault, #PF Alignment check, #AC F F F F F F SIMD floating-point, #XF F General protection, #GP Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. FMA instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE F — FMA, FMA4 exception 636 F F F F F F A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. VFMSUBADDPS, VFMSUBADDnnnPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology VFMSUBPD VFMSUB132PD VFMSUB213PD VFMSUB231PD Multiply and Subtract Packed Double-Precision Floating-Point Multiplies together two double-precision floating-point vectors and subtracts a third double-precision floating-point vector from the unrounded product to produce a precise intermediate result. The intermediate result is then rounded to double-precision based on the mode specified by the MXCSR[RC] field and written to the destination register. The role of each of the source operands specified by the assembly language prototypes given below is reflected in the vector equation in the comment on the right. There are two four-operand forms: VFMSUBPD dest, src1, src2/mem, src3 VFMSUBPD dest, src1, src2, src3/mem // dest = (src1* src2/mem) − src3 // dest = (src1* src2) − src3/mem and three three-operand forms: VFMSUB132PD scr1, src2, src3/mem VFMSUB213PD scr1, src2, src3/mem VFMSUB231PD scr1, src2, src3/mem // src1 = (src1* src3/mem) − src2 // src1 = (src2* src1) − src3/mem // src1 = (src2* src3/mem) − src1 For VEX.L = 0, vector size is 128 bits and register-based operands are held in XMM registers. For VEX.L = 1, vector size is 256 bits and register-based operands are held in YMM registers. For the four-operand forms, VEX.W determines operand configuration. • When VEX.W = 0, the second source is either a register or a memory location and the third source is a register. • When VEX.W = 1, the second source is a register and the third source is either a register or a memory location. For the three-operand forms, VEX.W is 1. The first and second operands are registers and the third operand is either a register or a memory location. The destination is either an XMM register or a YMM register, as determined by VEX.L. When the destination is an XMM register (L = 0), bits [255:128] of the corresponding YMM register are cleared. Instruction Support Form Subset Feature Flag VFMSUBPD FMA4 CPUID Fn8000_0001_ECX[FMA4] (bit 16) VFMSUBnnnPD FMA CPUID Fn0000_0001_ECX[FMA] (bit 12) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Reference VFMSUBPD, VFMSUBnnnPD 637 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VFMSUBPD xmm1, xmm2, xmm3/mem128, xmm4 C4 RXB.03 0.src1.0.01 6D /r /is4 VFMSUBPD ymm1, ymm2, ymm3/mem256, ymm4 C4 RXB.03 0.src1.1.01 6D /r /is4 VFMSUBPD xmm1, xmm2, xmm3, xmm4/mem128 C4 RXB.03 1.src1.0.01 6D /r /is4 VFMSUBPD ymm1, ymm2, ymm3, ymm4/mem256 C4 RXB.03 1.src1.1.01 6D /r /is4 VFMSUB132PD xmm1, xmm2, xmm3/mem128 C4 RXB.02 1.src2.0.01 9A /r VFMSUB132PD ymm1, ymm2, ymm3/mem256 C4 RXB.02 1.src2.1.01 9A /r VFMSUB213PD xmm1, xmm2, xmm3/mem128 C4 RXB.02 1.src2.0.01 AA /r VFMSUB213PD ymm1, ymm2, ymm3/mem256 C4 RXB.02 1.src2.1.01 AA /r VFMSUB231PD xmm1, xmm2, xmm3/mem128 C4 RXB.02 1.src2.0.01 BA /r VFMSUB231PD ymm1, ymm2, ymm3/mem256 C4 RXB.02 1.src2.1.01 BA /r Related Instructions VFMSUBPS, VFMSUB132PS, VFMSUB213PS, VFMSUB231PPS, VFMSUBSD, VFMSUB132SD, VFMSUB213SD, VFMSUB2P31SD, VFMSUBSS, VFMSUB132SS, VFMSUB213SS, VFMSUBP231SS rFLAGS Affected None MXCSR Flags Affected MM FZ 17 15 Note: 638 RC 14 13 PM UM OM ZM DM IM DAZ 12 11 10 9 8 7 6 PE UE OE M M M 5 4 3 ZE 2 DE IE M M 1 0 A flag that may be set or cleared is M (modified). Unaffected flags are blank. VFMSUBPD, VFMSUBnnnPD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot F F Invalid opcode, #UD F F F F F F Device not available, #NM Stack, #SS Page fault, #PF Alignment check, #AC F F F F F F SIMD floating-point, #XF F General protection, #GP Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. FMA instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE F — FMA, FMA4 exception Instruction Reference F F F F F F A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. VFMSUBPD, VFMSUBnnnPD 639 AMD64 Technology 26568—Rev. 3.22—May 2018 VFMSUBPS VFMSUB132PS VFMSUB213PS VFMSUB231PS Multiply and Subtract Packed Single-Precision Floating-Point Multiplies together two single-precision floating-point vectors and subtracts a third single-precision floating-point vector from the unrounded product to produce a precise intermediate result. The intermediate result is then rounded to single-precision based on the mode specified by the MXCSR[RC] field and written to the destination register. The role of each of the source operands specified by the assembly language prototypes given below is reflected in the vector equation in the comment on the right. There are two four-operand forms: VFMSUBPS dest, src1, src2/mem, src3 VFMSUBPS dest, src1, src2, src3/mem // dest = (src1* src2/mem) − src3 // dest = (src1* src2) − src3/mem and three three-operand forms: VFMSUB132PS scr1, src2, src3/mem VFMSUB213PS scr1, src2, src3/mem VFMSUB231PS scr1, src2, src3/mem // src1 = (src1* src3/mem) − src2 // src1 = (src2* src1) − src3/mem // src1 = (src2* src3/mem) − src1 When VEX.L = 0, the vector size is 128 bits (four single-precision elements per vector) and registerbased source operands are held in XMM registers. When VEX.L = 1, the vector size is 256 bits (eight single-precision elements per vector) and registerbased source operands are held in YMM registers. For the four-operand forms, VEX.W determines operand configuration. • When VEX.W = 0, the second source is either a register or a memory location and the third source is a register. • When VEX.W = 1, the second source is a a register and the third source is either a register or a memory location. For the three-operand forms, VEX.W is 0. The first and second operands are registers and the third operand is either a register or a memory location. The destination is either an XMM register or a YMM register, as determined by VEX.L. When the destination is an XMM register (L = 0), bits [255:128] of the corresponding YMM register are cleared. Instruction Support Form Subset Feature Flag VFMSUBPS FMA4 CPUID Fn8000_0001_ECX[FMA4] (bit 16) VFMSUBnnnPS FMA CPUID Fn0000_0001_ECX[FMA] (bit 12) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 640 VFMSUBPS, VFMSUBnnnPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VFMSUBPS xmm1, xmm2, xmm3/mem128, xmm4 C4 RXB.03 0.src1.0.01 6C /r /is4 VFMSUBPS ymm1, ymm2, ymm3/mem256, ymm4 C4 RXB.03 0.src1.1.01 6C /r /is4 VFMSUBPS xmm1, xmm2, xmm3, xmm4/mem128 C4 RXB.03 1.src1.0.01 6C /r /is4 VFMSUBPS ymm1, ymm2, ymm3, ymm4/mem256 C4 RXB.03 1.src1.1.01 6C /r /is4 VFMSUB132PS xmm1, xmm2, xmm3/mem128 C4 RXB.02 0.src2.0.01 9A /r VFMSUB132PS ymm1, ymm2, ymm3/mem256 C4 RXB.02 0.src2.1.01 9A /r VFMSUB213PS xmm1, xmm2, xmm3/mem128 C4 RXB.02 0.src2.0.01 AA /r VFMSUB213PS ymm1, ymm2, ymm3/mem256 C4 RXB.02 0.src2.1.01 AA /r VFMSUB231PS xmm1, xmm2, xmm3/mem128 C4 RXB.02 0.src2.0.01 BA /r VFMSUB231PS ymm1, ymm2, ymm3/mem256 C4 RXB.02 0.src2.1.01 BA /r Related Instructions VFMSUBPD, VFMSUB132PD, VFMSUB213PD, VFMSUB231PD, VFMSUBSD, VFMSUB132SD, VFMSUB213SD, VFMSUB231SD, VFMSUBSS, VFMSUB132SS, VFMSUB213SS, VFMSUB231SS rFLAGS Affected None MXCSR Flags Affected MM FZ 17 15 Note: RC 14 13 PM UM OM ZM DM IM DAZ 12 11 10 9 8 7 6 PE UE OE M M M 5 4 3 ZE 2 DE IE M M 1 0 A flag that may be set or cleared is M (modified). Unaffected flags are blank. Instruction Reference VFMSUBPS, VFMSUBnnnPS 641 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot F F Invalid opcode, #UD F F F F F F Device not available, #NM Stack, #SS Page fault, #PF Alignment check, #AC F F F F F F SIMD floating-point, #XF F General protection, #GP Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. FMA instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE F — FMA, FMA4 exception 642 F F F F F F A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. VFMSUBPS, VFMSUBnnnPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology VFMSUBSD VFMSUB132SD VFMSUB213SD VFMSUB231SD Multiply and Subtract Scalar Double-Precision Floating-Point Multiplies together two double-precision floating-point values and subtracts a third double-precision floating-point value from the unrounded product to produce a precise intermediate result. The intermediate result is then rounded to double-precision based on the mode specified by the MXCSR[RC] field and written to the destination register. The role of each of the source operands specified by the assembly language prototypes given below is reflected in the vector equation in the comment on the right. There are two four-operand forms: VFMSUBSD dest, src1, src2/mem, src3 VFMSUBSD dest, src1, src2, src3/mem // dest = (src1* src2/mem) − src3 // dest = (src1* src2) − src3/mem and three three-operand forms: VFMSUB132SD scr1, src2, src3/mem VFMSUB213SD scr1, src2, src3/mem VFMSUB231SD scr1, src2, src3/mem // src1 = (src1* src3/mem) − src2 // src1 = (src2* src1) − src3/mem // src1 = (src2* src3/mem) − src1 For the four-operand forms, VEX.W determines operand configuration. • When VEX.W = 0, the second source is either a register or 64-bit memory location and the third source is a register. • When VEX.W = 1, the second source is a register and the third source is a register or 64-bit memory location. For the three-operand forms, VEX.W is 1. The first and second operands are registers and the third operand is either a register or a memory location. The destination is an XMM register. When the result is written to the destination XMM register, bits [127:64] of the destination and bits [255:128] of the corresponding YMM register are cleared. Instruction Support Form Subset Feature Flag VFMSUBSD FMA4 CPUID Fn8000_0001_ECX[FMA4] (bit 16) VFMSUBnnnSD FMA CPUID Fn0000_0001_ECX[FMA] (bit 12) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Reference VFMSUBSD, VFMSUBnnnSD 643 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Encoding . Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VFMSUBSD xmm1, xmm2, xmm3/mem64, xmm4 C4 RXB.03 0.src1.X.01 6F /r /is4 VFMSUBSD xmm1, xmm2, xmm3, xmm4/mem64 C4 RXB.03 1.src1.X.01 6F /r /is4 VFMSUB132SD xmm1, xmm2, xmm3/mem64 C4 RXB.02 1.src2.X.01 9B /r VFMSUB213SD xmm1, xmm2, xmm3/mem64 C4 RXB.02 1.src2.X.01 AB /r VFMSUB231SD xmm1, xmm2, xmm3/mem64 C4 RXB.02 1.src2.X.01 BB /r Related Instructions VFMSUBPD, VFMSUB132PD, VFMSUB213PD, VFMSUB231PD, VFMSUBPS, VFMSUB132PS, VFMSUB213PS, VFMSUB231PS, VFMSUBSS, VFMSUB132SS, VFMSUB213SS, VFMSUB231SS rFLAGS Affected None MXCSR Flags Affected MM 17 Note: 644 FZ 15 RC 14 PM 13 12 UM 11 OM 10 ZM 9 DM 8 IM 7 DAZ 6 PE UE OE M M M 5 4 3 ZE 2 DE IE M M 1 0 A flag that may be set or cleared is M (modified). Unaffected flags are blank. VFMSUBSD, VFMSUBnnnSD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot F F Invalid opcode, #UD F F F F F F Device not available, #NM Stack, #SS Page fault, #PF Alignment check, #AC F F F F F F SIMD floating-point, #XF F General protection, #GP Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. FMA instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Non-aligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE F — FMA, FMA4 exception Instruction Reference F F F F F F A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. VFMSUBSD, VFMSUBnnnSD 645 AMD64 Technology 26568—Rev. 3.22—May 2018 VFMSUBSS VFMSUB132SS VFMSUB213SS VFMSUB231SS Multiply and Subtract Scalar Single-Precision Floating-Point Multiplies together two single-precision floating-point values and subtracts a third single-precision floating-point value from the unrounded product to produce a precise intermediate result. The intermediate result is then rounded to single-precision based on the mode specified by the MXCSR[RC] field and written to the destination register. The role of each of the source operands specified by the assembly language prototypes given below is reflected in the vector equation in the comment on the right. There are two four-operand forms: VFMSUBSS dest, src1, src2/mem, src3 VFMSUBSS dest, src1, src2, src3/mem // dest = (src1* src2/mem) − src3 // dest = (src1* src2) − src3/mem and three three-operand forms: VFMSUB132SS scr1, src2, src3/mem VFMSUB213SS scr1, src2, src3/mem VFMSUB231SS scr1, src2, src3/mem // src1 = (src1* src3/mem) − src2 // src1 = (src2* src1) − src3/mem // src1 = (src2* src3/mem) − src1 For the four-operand forms, VEX.W determines operand configuration. • When VEX.W = 0, the second source is either a register or 32-bit memory location and the third source is a register. • When VEX.W = 1, the second source is a register and the third source is a register or 32-bit memory location. For the three-operand forms, VEX.W is 0. The first and second operands are registers and the third operand is either a register or a memory location. The destination is an XMM register. When the result is written to the destination XMM register, bits [127:32] of the XMM register and bits [255:128] of the corresponding YMM register are cleared. Instruction Support Form Subset Feature Flag VFMSUBSS FMA4 CPUID Fn8000_0001_ECX[FMA4] (bit 16) VFMSUBnnnSS FMA CPUID Fn0000_0001_ECX[FMA] (bit 12) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 646 VFMSUBSS, VFMSUBnnnSS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding . Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VFMSUBSS xmm1, xmm2, xmm3/mem32, xmm4 C4 RXB.03 0.src1.X.01 6E /r /is4 VFMSUBSS xmm1, xmm2, xmm3, xmm4/mem32 C4 RXB.03 1.src1.X.01 6E /r /is4 VFMSUB132SS xmm1, xmm2, xmm3/mem32 C4 RXB.02 0.src2.X.01 9B /r VFMSUB213SS xmm1, xmm2, xmm3/mem32 C4 RXB.02 0.src2.X.01 AB /r VFMSUB231SS xmm1, xmm2, xmm3/mem32 C4 RXB.02 0.src2.X.01 BB /r Related Instructions VFMSUBPD, VFMSUB132PD, VFMSUB213PD, VFMSUB231PD, VFMSUBPS, VFMSUB132PS, VFMSUB213PS, VFMSUB231PS, VFMSUBSD, VFMSUB132SD, VFMSUB213SD, VFMSUB231SD rFLAGS Affected None MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM 11 OM 10 ZM 9 DM 8 IM 7 DAZ 6 PE UE OE M M M 5 4 3 ZE 2 DE IE M M 1 0 A flag that may be set or cleared is M (modified). Unaffected flags are blank. Instruction Reference VFMSUBSS, VFMSUBnnnSS 647 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot F F Invalid opcode, #UD F F F F F F Device not available, #NM Stack, #SS Page fault, #PF Alignment check, #AC F F F F F F SIMD floating-point, #XF F General protection, #GP Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. FMA instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Non-aligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE F — FMA, FMA4 exception 648 F F F F F F A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. VFMSUBSS, VFMSUBnnnSS Instruction Reference 26568—Rev. 3.22—May 2018 VFNMADDPD VFNMADD132PD VFNMADD213PD VFNMADD231PD AMD64 Technology Negative Multiply and Add Packed Double-Precision Floating-Point Multiplies together two double-precision floating-point vectors, negates the unrounded product, and adds it to a third double-precision floating-point vector. The precise result is then rounded to doubleprecision based on the mode specified by the MXCSR[RC] field and written to the destination register. The role of each of the source operands specified by the assembly language prototypes given below is reflected in the vector equation in the comment on the right. There are two four-operand forms: VFNMADDPD dest, src1, src2/mem, src3 VFNMADDPD dest, src1, src2, src3/mem // dest = −(src1* src2/mem) + src3 // dest = −(src1* src2) + src3/mem and three three-operand forms: VFNMADD132PD scr1, src2, src3/mem VFNMADD213PD scr1, src2, src3/mem VFNMADD231PD scr1, src2, src3/mem // src1 = −(src1* src3/mem) + src2 // src1 = −(src2* src1) + src3/mem // src1 = −(src2* src3/mem) + src1 When VEX.L = 0, the vector size is 128 bits (two double-precision elements per vector) and registerbased source operands are held in XMM registers. When VEX.L = 1, the vector size is 256 bits (four double-precision elements per vector) and registerbased source operands are held in YMM registers. For the four-operand forms, VEX.W determines operand configuration. • When VEX.W = 0, the second source is either a register or a memory location and the third source is a register. • When VEX.W = 1, the second source is a register and the third source is either a register or a memory location. For the three-operand forms, VEX.W is 1. The first and second operands are registers and the third operand is either a register or a memory location. The destination is either an XMM register or a YMM register, as determined by VEX.L. When the destination is an XMM register (L = 0), bits [255:128] of the corresponding YMM register are cleared. Instruction Support Form Subset Feature Flag VFNMADDPD FMA4 CPUID Fn8000_0001_ECX[FMA4] (bit 16) VFNMADDnnnPD FMA CPUID Fn0000_0001_ECX[FMA] (bit 12) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Reference FNMADDPD, FNMADDnnnPD 649 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VFNMADDPD xmm1, xmm2, xmm3/mem128, xmm4 C4 RXB.03 0.src1.0.01 79 /r /is4 VFNMADDPD ymm1, ymm2, ymm3/mem256, ymm4 C4 RXB.03 0.src1.1.01 79 /r /is4 VFNMADDPD xmm1, xmm2, xmm3, xmm4/mem128 C4 RXB.03 1.src1.0.01 79 /r /is4 VFNMADDPD ymm1, ymm2, ymm3, ymm4/mem256 C4 RXB.03 1.src1.1.01 79 /r /is4 VFNMADD132PD xmm1, xmm2, xmm3/mem128 C4 RXB.02 1.src2.0.01 9C /r VFNMADD132PD ymm1, ymm2, ymm3/mem256 C4 RXB.02 1.src2.1.01 9C /r VFNMADD213PD xmm1, xmm2, xmm3/mem128 C4 RXB.02 1.src2.0.01 AC /r VFNMADD213PD ymm1, ymm2, ymm3/mem256 C4 RXB.02 1.src2.1.01 AC /r VFNMADD231PD xmm1, xmm2, xmm3/mem128 C4 RXB.02 1.src2.0.01 BC /r VFNMADD231PD ymm1, ymm2, ymm3/mem256 C4 RXB.02 1.src2.1.01 BC /r Related Instructions VFNMADDPS, VFNMADD132PS, VFNMADD213PS, VFNMADD231PS, VFNMADDSD, VFNMADD132SD, VFNMADD213SD, VFNMADD231SD, VFNMADDSS, VFNMADD132SS, VFNMADD213SS, VFNMADD231SS rFLAGS Affected None MXCSR Flags Affected MM FZ 17 15 Note: 650 RC 14 13 PM UM OM ZM DM IM DAZ 12 11 10 9 8 7 6 PE UE OE M M M 5 4 3 ZE 2 DE IE M M 1 0 A flag that may be set or cleared is M (modified). Unaffected flags are blank. FNMADDPD, FNMADDnnnPD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot F F Invalid opcode, #UD F F F F F F Device not available, #NM Stack, #SS Page fault, #PF Alignment check, #AC F F F F F F SIMD floating-point, #XF F General protection, #GP Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. FMA instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE F — FMA, FMA4 exception Instruction Reference F F F F F F A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. FNMADDPD, FNMADDnnnPD 651 AMD64 Technology 26568—Rev. 3.22—May 2018 VFNMADDPS VFNMADD132PS VFNMADD213PS VFNMADD231PS Negative Multiply and Add Packed Single-Precision Floating-Point Multiplies together two single-precision floating-point vectors, negates the unrounded product, and adds it to a third single-precision floating-point vector. The precise result is then rounded to singleprecision based on the mode specified by the MXCSR[RC] field and written to the destination register. The role of each of the source operands specified by the assembly language prototypes given below is reflected in the vector equation in the comment on the right. There are two four-operand forms: VFNMADDPS dest, src1, src2/mem, src3 VFNMADDPS dest, src1, src2, src3/mem // dest = −(src1* src2/mem) + src3 // dest = −(src1* src2) + src3/mem and three three-operand forms: VFNMADD132PS scr1, src2, src3/mem VFNMADD213PS scr1, src2, src3/mem VFNMADD231PS scr1, src2, src3/mem // src1 = −(src1* src3/mem) + src2 // src1 = −(src2* src1) + src3/mem // src1 = −(src2* src3/mem) + src1 When VEX.L = 0, the vector size is 128 bits (four single-precision elements per vector) and registerbased source operands are held in XMM registers. When VEX.L = 1, the vector size is 256 bits (eight single-precision elements per vector) and registerbased source operands are held in YMM registers. For the four-operand forms, VEX.W determines operand configuration. • When VEX.W = 0, the second source is either a register or a memory location and the third source is a register. • When VEX.W = 1, the second source is a register and the third source is either a register or a memory location. For the three-operand forms, VEX.W is 0. The first and second operands are registers and the third operand is either a register or a memory location. The destination is either an XMM register or a YMM register, as determined by VEX.L. When the destination is an XMM register (L = 0), bits [255:128] of the corresponding YMM register are cleared. Instruction Support Form Subset Feature Flag VFNMADDPS FMA4 CPUID Fn8000_0001_ECX[FMA4] (bit 16) VFNMADDnnnPS FMA CPUID Fn0000_0001_ECX[FMA] (bit 12) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 652 FNMADDPS, FNMADDnnnPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VFNMADDPS xmm1, xmm2, xmm3/mem128, xmm4 C4 RXB.03 0.src1.0.01 78 /r /is4 VFNMADDPS ymm1, ymm2, ymm3/mem256, ymm4 C4 RXB.03 0.src1.1.01 78 /r /is4 VFNMADDPS xmm1, xmm2, xmm3, xmm4/mem128 C4 RXB.03 1.src1.0.01 78 /r /is4 VFNMADDPS ymm1, ymm2, ymm3, ymm4/mem256 C4 RXB.03 1.src1.1.01 78 /r /is4 VFNMADD132PS xmm1, xmm2, xmm3/mem128 C4 RXB.02 0.src2.0.01 9C / r VFNMADD132PS ymm1, ymm2, ymm3/mem256 C4 RXB.02 0.src2.1.01 9C / r VFNMADD213PS xmm1, xmm2, xmm3/mem128 C4 RXB.02 0.src2.0.01 AC / r VFNMADD213PS ymm1, ymm2, ymm3/mem256 C4 RXB.02 0.src2.1.01 AC / r VFNMADD231PS xmm1, xmm2, xmm3/mem128 C4 RXB.02 0.src2.0.01 BC / r VFNMADD231PS ymm1, ymm2, ymm3/mem256 C4 RXB.02 0.src2.1.01 BC / r Related Instructions VFNMADDPD, VFNMADD132PD, VFNMADD213PD, VFNMADD231PD, VFNMADDSD, VFNMADD132SD, VFNMADD213SD, VFNMADD231SD, VFNMADDSS, VFNMADD132SS, VFNMADD213SS, VFNMADD231SS rFLAGS Affected None MXCSR Flags Affected MM FZ 17 15 Note: RC 14 13 PM UM OM ZM DM IM DAZ 12 11 10 9 8 7 6 PE UE OE M M M 5 4 3 ZE 2 DE IE M M 1 0 A flag that may be set or cleared is M (modified). Unaffected flags are blank. Instruction Reference FNMADDPS, FNMADDnnnPS 653 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot F F Invalid opcode, #UD F F F F F F Device not available, #NM Stack, #SS Page fault, #PF Alignment check, #AC F F F F F F SIMD floating-point, #XF F General protection, #GP Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. FMA instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE F — FMA, FMA4 exception 654 F F F F F F A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. FNMADDPS, FNMADDnnnPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology VFNMADDSD VFNMADD132SD VFNMADD213SD VFNMADD231SD Negative Multiply and Add Scalar Double-Precision Floating-Point Multiplies together two double-precision floating-point values, negates the unrounded product, and adds it to a third double-precision floating-point value. The precise result is then rounded to doubleprecision based on the mode specified by the MXCSR[RC] field and written to the destination register. The role of each of the source operands specified by the assembly language prototypes given below is reflected in the equation in the comment on the right. There are two four-operand forms: VFNMADDSD dest, src1, src2/mem, src3 VFNMADDSD dest, src1, src2, src3/mem // dest = −(src1* src2/mem) + src3 // dest = −(src1* src2) + src3/mem and three three-operand forms: VFNMADD132SD scr1, src2, src3/mem VFNMADD213SD scr1, src2, src3/mem VFNMADD231SD scr1, src2, src3/mem // src1 = −(src1* src3/mem) + src2 // src1 = −(src2* src1) + src3/mem // src1 = −(src2* src3/mem) + src1 For the four-operand forms, VEX.W determines operand configuration. • When VEX.W = 0, the second source is either a register or 64-bit memory location and the third source is a register. • When VEX.W = 1, the second source is a register and the third source is a register or 64-bit memory location. For the three-operand forms, VEX.W is 1. The first and second operands are registers and the third operand is either a register or a 64-bit memory location. The destination is an XMM register. When the result is written to the destination, bits [127:64] of the XMM register and bits [255:128] of the corresponding YMM register are cleared. Instruction Support Form Subset Feature Flag VFNMADDSD FMA4 CPUID Fn8000_0001_ECX[FMA4] (bit 16) VFNMADDnnnSD FMA CPUID Fn0000_0001_ECX[FMA] (bit 12) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Reference VFNMADDSD, VFNMADDnnnSD 655 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VFNMADDSD xmm1, xmm2, xmm3/mem64, xmm4 C4 RXB.03 0.src1.X.01 7B /r /is4 VFNMADDSD xmm1, xmm2, xmm3, xmm4/mem64 C4 RXB.03 1.src1.X.01 7B /r /is4 VFNMADD132SD xmm1, xmm2, xmm3/mem64 C4 RXB.02 1.src2.X.01 9D /r VFNMADD213SD xmm1, xmm2, xmm3/mem64 C4 RXB.02 1.src2.X.01 AD /r VFNMADD231SD xmm1, xmm2, xmm3/mem64 C4 RXB.02 1.src2.X.01 BD /r Related Instructions VFNMADDPD, VFNMADD132PD, VFNMADD213PD, VFNMADD231PD, VFNMADDPS, VFNMADD132PS, VFNMADD213PS, VFNMADD231PS, VFNMADDSS, VFNMADD132SS, VFNMADD213SS, VFNMADD231SS rFLAGS Affected None MXCSR Flags Affected MM 17 Note: 656 FZ 15 RC 14 PM 13 12 UM 11 OM 10 ZM 9 DM 8 IM 7 DAZ 6 PE UE OE M M M 5 4 3 ZE 2 DE IE M M 1 0 A flag that may be set or cleared is M (modified). Unaffected flags are blank. VFNMADDSD, VFNMADDnnnSD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot F F Invalid opcode, #UD F F F F F F Device not available, #NM Stack, #SS Page fault, #PF Alignment check, #AC F F F F F F SIMD floating-point, #XF F General protection, #GP Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. FMA instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Non-aligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE F — FMA, FMA4 exception Instruction Reference F F F F F F A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. VFNMADDSD, VFNMADDnnnSD 657 AMD64 Technology 26568—Rev. 3.22—May 2018 VFNMADDSS VFNMADD132SS VFNMADD213SS VFNMADD231SS Negative Multiply and Add Scalar Single-Precision Floating-Point Multiplies together two single-precision floating-point values, negates the unrounded product, and adds it to a third single-precision floating-point value. The precise result is then rounded to singleprecision based on the mode specified by the MXCSR[RC] field and written to the destination register. The role of each of the source operands specified by the assembly language prototypes given below is reflected in the equation in the comment on the right. There are two four-operand forms: VFNMADDSS dest, src1, src2/mem, src3 VFNMADDSS dest, src1, src2, src3/mem // dest = −(src1* src2/mem) + src3 // dest = −(src1* src2) + src3/mem and three three-operand forms: VFNMADD132SS scr1, src2, src3/mem VFNMADD213SS scr1, src2, src3/mem VFNMADD231SS scr1, src2, src3/mem // src1 = −(src1* src3/mem) + src2 // src1 = −(src2* src1) + src3/mem // src1 = −(src2* src3/mem) + src1 For the four-operand forms, VEX.W determines operand configuration. • When VEX.W = 0, the second source is either a register or 32-bit memory location and the third source is a register. • When VEX.W = 1, the second source is a register and the third source is a register or 32-bit memory location. For the three-operand forms, VEX.W is 0. The first and second operands are registers and the third operand is either a register or a 32-bit memory location. The destination is an XMM register. When the result is written to the destination, bits [127:32] of the XMM register and bits [255:128] of the corresponding YMM register are cleared. Instruction Support Form Subset Feature Flag VFNMADDSS FMA4 CPUID Fn8000_0001_ECX[FMA4] (bit 16) VFNMADDnnnSS FMA CPUID Fn0000_0001_ECX[FMA] (bit 12) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 658 VFNMADDSS, VFNMADDnnnSS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VFNMADDSS xmm1, xmm2, xmm3/mem32, xmm4 C4 RXB.03 0.src1.X.01 7A /r /is4 VFNMADDSS xmm1, xmm2, xmm3, xmm4/mem32 C4 RXB.03 1.src1.X.01 7A /r /is4 VFNMADD132SS xmm1, xmm2, xmm3/mem32 C4 RXB.02 0.src2.X.01 9D /r VFNMADD213SS xmm1, xmm2, xmm3/mem32 C4 RXB.02 0.src2.X.01 AD /r VFNMADD231SS xmm1, xmm2, xmm3/mem32 C4 RXB.02 0.src2.X.01 BD /r Related Instructions VFNMADDPD, VFNMADD132PD, VFNMADD213PD, VFNMADD231PD, VFNMADDPS, VFNMADD132PS, VFNMADD213PS, VFNMADD231PS, VFNMADDSS, VFNMADD132SS, VFNMADD213SS, VFNMADD231SS rFLAGS Affected None MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM 11 OM 10 ZM 9 DM 8 IM 7 DAZ PE UE OE M M M 5 4 3 6 ZE 2 DE IE M M 1 0 A flag that may be set or cleared is M (modified). Unaffected flags are blank. Instruction Reference VFNMADDSS, VFNMADDnnnSS 659 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot F F Invalid opcode, #UD F F F F F F Device not available, #NM Stack, #SS Page fault, #PF Alignment check, #AC F F F F F F SIMD floating-point, #XF F General protection, #GP Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. FMA instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Non-aligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE F — FMA, FMA4 exception 660 F F F F F F A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. VFNMADDSS, VFNMADDnnnSS Instruction Reference 26568—Rev. 3.22—May 2018 VFNMSUBPD VFNMSUB132PD VFNMSUB213PD VFNMSUB231PD AMD64 Technology Negative Multiply and Subtract Packed Double-Precision Floating-Point Multiplies together two double-precision floating-point vectors, negates the unrounded product, and subtracts a third double-precision floating-point vector from it. The precise result is then rounded to double-precision based on the mode specified by the MXCSR[RC] field and written to the destination register. The role of each of the source operands specified by the assembly language prototypes given below is reflected in the vector equation in the comment on the right. There are two four-operand forms: VFNMSUBPD dest, src1, src2/mem, src3 VFNMSUBPD dest, src1, src2, src3/mem // dest = −(src1* src2/mem) − src3 // dest = −(src1* src2) − src3/mem and three three-operand forms: VFNMSUB132PD scr1, src2, src3/mem VFNMSUB213PD scr1, src2, src3/mem VFNMSUB231PD scr1, src2, src3/mem // src1 = −(src1* src3/mem) − src2 // src1 = −(src2* src1) − src3/mem // src1 = −(src2* src3/mem) − src1 When VEX.L = 0, the vector size is 128 bits (two double-precision elements per vector) and registerbased source operands are held in XMM registers. When VEX.L = 1, the vector size is 256 bits (four double-precision elements per vector) and registerbased source operands are held in YMM registers. For the four-operand forms, VEX.W determines operand configuration. • When VEX.W = 0, the second source is either a register or a memory location and the third source is a register. • When VEX.W = 1, the second source is a register and the third source is either a register or a memory location. For the three-operand forms, VEX.W is 1. The first and second operands are registers and the third operand is either a register or a memory location. The destination is either an XMM register or a YMM register, as determined by VEX.L. When the destination is an XMM register (L = 0), bits [255:128] of the corresponding YMM register are cleared. Instruction Support Form Subset Feature Flag VFNMSUBPD FMA4 CPUID Fn8000_0001_ECX[FMA4] (bit 16) VFNMSUBnnnPD FMA CPUID Fn0000_0001_ECX[FMA] (bit 12) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Reference VFNMSUBPD, VFNMSUBnnnPD 661 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VFNMSUBPD xmm1, xmm2, xmm3/mem128, xmm4 C4 RXB.03 0.src1.0.01 7D /r /is4 VFNMSUBPD ymm1, ymm2, ymm3/mem256, ymm4 C4 RXB.03 0.src1.1.01 7D /r /is4 VFNMSUBPD xmm1, xmm2, xmm3, xmm4/mem128 C4 RXB.03 1.src1.0.01 7D /r /is4 VFNMSUBPD ymm1, ymm2, ymm3, ymm4/mem256 C4 RXB.03 1.src1.1.01 7D /r /is4 VFNMSUB132PD xmm1, xmm2, xmm3/mem128 C4 RXB.02 1.src2.0.01 9E /r VFNMSUB132PD ymm1, ymm2, ymm3/mem256 C4 RXB.02 1.src2.1.01 9E /r VFNMSUB213PD xmm1, xmm2, xmm3/mem128 C4 RXB.02 1.src2.0.01 AE /r VFNMSUB213PD ymm1, ymm2, ymm3/mem256 C4 RXB.02 1.src2.1.01 AE /r VFNMSUB231PD xmm1, xmm2, xmm3/mem128 C4 RXB.02 1.src2.0.01 BE /r VFNMSUB231PD ymm1, ymm2, ymm3/mem256 C4 RXB.02 1.src2.1.01 BE /r Related Instructions VFNMSUBPS, VFNMSUB132PS, VFNMSUB213PS, VFNMSUB231PS, VFNMSUBSD, VFNMSUB132SD, VFNMSUB213SD, VFNMSUB231SD, VFNMSUBSS, VFNMSUB132SS, VFNMSUB213SS, VFNMSUB231SS rFLAGS Affected None MXCSR Flags Affected MM FZ 17 15 Note: 662 RC 14 13 PM UM OM ZM DM IM DAZ 12 11 10 9 8 7 6 PE UE OE M M M 5 4 3 ZE 2 DE IE M M 1 0 A flag that may be set or cleared is M (modified). Unaffected flags are blank. VFNMSUBPD, VFNMSUBnnnPD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot F F Invalid opcode, #UD F F F F F F Device not available, #NM Stack, #SS Page fault, #PF Alignment check, #AC F F F F F F SIMD floating-point, #XF F General protection, #GP Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. FMA instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE F — FMA, FMA4 exception Instruction Reference F F F F F F A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. VFNMSUBPD, VFNMSUBnnnPD 663 AMD64 Technology 26568—Rev. 3.22—May 2018 VFNMSUBPS VFNMSUB132PS VFNMSUB213PS VFNMSUB231PS Negative Multiply and Subtract Packed Single-Precision Floating-Point Multiplies together two single-precision floating-point vectors, negates the unrounded product, and subtracts a third single-precision floating-point vector from it. The precise result is then rounded to single-precision based on the mode specified by the MXCSR[RC] field and written to the destination register. The role of each of the source operands specified by the assembly language prototypes given below is reflected in the vector equation in the comment on the right. There are two four-operand forms: VFNMADDPS dest, src1, src2/mem, src3 VFNMADDPS dest, src1, src2, src3/mem // dest = −(src1* src2/mem) − src3 // dest = −(src1* src2) − src3/mem and three three-operand forms: VFNMADD132PS scr1, src2, src3/mem VFNMADD213PS scr1, src2, src3/mem VFNMADD231PS scr1, src2, src3/mem // src1 = −(src1* src3/mem) − src2 // src1 = −(src2* src1) − src3/mem // src1 = −(src2* src3/mem) − src1 When VEX.L = 0, the vector size is 128 bits (four single-precision elements per vector) and registerbased source operands are held in XMM registers. When VEX.L = 1, the vector size is 256 bits (eight single-precision elements per vector) and registerbased source operands are held in YMM registers. For the four-operand forms, VEX.W determines operand configuration. • When VEX.W = 0, the second source is either a register or a memory location and the third source is a register. • When VEX.W = 1, the second source is a register and the third source is either a register or a memory location. For the three-operand forms, VEX.W is 0. The first and second operands are registers and the third operand is either a register or a memory location. The destination is either an XMM register or a YMM register, as determined by VEX.L. When the destination is an XMM register (L = 0), bits [255:128] of the corresponding YMM register are cleared. Instruction Support Form Subset Feature Flag VFNMSUBPS FMA4 CPUID Fn8000_0001_ECX[FMA4] (bit 16) VFNMSUBnnnPS FMA CPUID Fn0000_0001_ECX[FMA] (bit 12) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 664 VFNMSUBPS, VFNMSUBnnnPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VFNMSUBPS xmm1, xmm2, xmm3/mem128, xmm4 C4 RXB.03 0.src1.0.01 7C /r /is4 VFNMSUBPS ymm1, ymm2, ymm3/mem256, ymm4 C4 RXB.03 0.src1.1.01 7C /r /is4 VFNMSUBPS xmm1, xmm2, xmm3, xmm4/mem128 C4 RXB.03 1.src1.0.01 7C /r /is4 VFNMSUBPS ymm1, ymm2, ymm3, ymm4/mem256 C4 RXB.03 1.src1.1.01 7C /r /is4 VFNMSUB132PS xmm1, xmm2, xmm3/mem128 C4 RXB.02 0.src2.0.01 9E /r VFNMSUB132PS ymm1, ymm2, ymm3/mem256 C4 RXB.02 0.src2.1.01 9E /r VFNMSUB213PS xmm1, xmm2, xmm3/mem128 C4 RXB.02 0.src2.0.01 AE /r VFNMSUB213PS ymm1, ymm2, ymm3/mem256 C4 RXB.02 0.src2.1.01 AE /r VFNMSUB231PS xmm1, xmm2, xmm3/mem128 C4 RXB.02 0.src2.0.01 BE /r VFNMSUB231PS ymm1, ymm2, ymm3/mem256 C4 RXB.02 0.src2.1.01 BE /r Related Instructions VFNMSUBPD, VFNMSUB132PD, VFNMSUB213PD, VFNMSUB231PD, VFNMSUBSD, VFNMSUB132SD, VFNMSUB213SD, VFNMSUB231SD, VFNMSUBSS, VFNMSUB132SS, VFNMSUB213SS, VFNMSUB231SS rFLAGS Affected None MXCSR Flags Affected MM FZ 17 15 Note: RC 14 13 PM UM OM ZM DM IM DAZ 12 11 10 9 8 7 6 PE UE OE M M M 5 4 3 ZE 2 DE IE M M 1 0 A flag that may be set or cleared is M (modified). Unaffected flags are blank. Instruction Reference VFNMSUBPS, VFNMSUBnnnPS 665 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot F F Invalid opcode, #UD F F F F F F Device not available, #NM Stack, #SS Page fault, #PF Alignment check, #AC F F F F F F SIMD floating-point, #XF F General protection, #GP Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. FMA instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE F — FMA, FMA4 exception 666 F F F F F F A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. VFNMSUBPS, VFNMSUBnnnPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology VFNMSUBSD VFNMSUB132SD VFNMSUB213SD VFNMSUB231SD Negative Multiply and Subtract Scalar Double-Precision Floating-Point Multiplies together two double-precision floating-point values, negates the unrounded product, and subtracts a third double-precision floating-point value from it. The precise result is then rounded to double-precision based on the mode specified by the MXCSR[RC] field and written to the destination register. The role of each of the source operands specified by the assembly language prototypes given below is reflected in the equation in the comment on the right. There are two four-operand forms: VFNMSUBSD dest, src1, src2/mem, src3 VFNMSUBSD dest, src1, src2, src3/mem // dest = −(src1* src2/mem) − src3 // dest = −(src1* src2) − src3/mem and three three-operand forms: VFNMSUB132SD scr1, src2, src3/mem VFNMSUB213SD scr1, src2, src3/mem VFNMSUB231SD scr1, src2, src3/mem // src1 = −(src1* src3/mem) − src2 // src1 = −(src2* src1) − src3/mem // src1 = −(src2* src3/mem) − src1 For the four-operand forms, VEX.W determines operand configuration. • When VEX.W = 0, the second source is either a register or a 64-bit memory location and the third source is a register. • When VEX.W = 1, the second source is a register and the third source is either a register or a 64-bit memory location. For the three-operand forms, VEX.W is 1. The first and second operands are registers and the third operand is either a register or a 64-bit memory location. The destination is an XMM register. Bits [127:64] of the destination XMM register and bits [255:128] of the corresponding YMM register are cleared. Instruction Support Form Subset Feature Flag VFNMSUBSD FMA4 CPUID Fn8000_0001_ECX[FMA4] (bit 16) VFNMSUBnnnSD FMA CPUID Fn0000_0001_ECX[FMA] (bit 12) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Reference VFNMSUBSD, VFNMSUBnnnSD 667 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VFNMSUBSD xmm1, xmm2, xmm3/mem64, xmm4 C4 RXB.03 0.src1.X.01 7F /r /is4 VFNMSUBSD xmm1, xmm2, xmm3, xmm4/mem64 C4 RXB.03 1.src1.X.01 7F /r /is4 VFNMSUB132SD xmm1, xmm2, xmm3/mem64 C4 RXB.02 1.src2.X.01 9F /r VFNMSUB213SD xmm1, xmm2, xmm3/mem64 C4 RXB.02 1.src2.X.01 AF /r VFNMSUB231SD xmm1, xmm2, xmm3/mem64 C4 RXB.02 1.src2.X.01 BF /r Related Instructions VFNMSUBPD, VFNMSUB132PD, VFNMSUB213PD, VFNMSUB231PD, VFNMSUBPS, VFNMSUB132PS, VFNMSUB213PS, VFNMSUB231PS, VFNMSUBSS, VFNMSUB132SS, VFNMSUB213SS, VFNMSUB231SS rFLAGS Affected None MXCSR Flags Affected MM 17 Note: 668 FZ 15 RC 14 PM 13 12 UM 11 OM 10 ZM 9 DM 8 IM 7 DAZ PE UE OE M M M 5 4 3 6 ZE 2 DE IE M M 1 0 A flag that may be set or cleared is M (modified). Unaffected flags are blank. VFNMSUBSD, VFNMSUBnnnSD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot F F Invalid opcode, #UD F F F F F F Device not available, #NM Stack, #SS Page fault, #PF Alignment check, #AC F F F F F F SIMD floating-point, #XF F General protection, #GP Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. FMA instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Non-aligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE F — FMA, FMA4 exception Instruction Reference F F F F F F A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. VFNMSUBSD, VFNMSUBnnnSD 669 AMD64 Technology 26568—Rev. 3.22—May 2018 VFNMSUBSS VFNMSUB132SS VFNMSUB213SS VFNMSUB231SS Negative Multiply and Subtract Scalar Single-Precision Floating-Point Multiplies together two single-precision floating-point values, negates the unrounded product, and subtracts a third single-precision floating-point value from it. The precise result is then rounded to single-precision based on the mode specified by the MXCSR[RC] field and written to the destination register. The role of each of the source operands specified by the assembly language prototypes given below is reflected in the equation in the comment on the right. There are two four-operand forms: VFNMSUBSS dest, src1, src2/mem, src3 VFNMSUBSS dest, src1, src2, src3/mem // dest = −(src1* src2/mem) − src3 // dest = −(src1* src2) − src3/mem and three three-operand forms: VFNMSUB132SS scr1, src2, src3/mem VFNMSUB213SS scr1, src2, src3/mem VFNMSUB231SS scr1, src2, src3/mem // src1 = −(src1* src3/mem) − src2 // src1 = −(src2* src1) − src3/mem // src1 = −(src2* src3/mem) − src1 For the four-operand forms, VEX.W determines operand configuration. • When VEX.W = 0, the second source is either a register or a 32-bit memory location and the third source is a register. • When VEX.W = 1, the second source is a register and the third source is either a register or a 32-bit memory location. For the three-operand forms, VEX.W is 0. The first and second operands are registers and the third operand is either a register or a 32-bit memory location. The destination is an XMM register. Bits[127:32] of the destination XMM register and bits [255:128] of the corresponding YMM register are cleared. Instruction Support Form Subset Feature Flag VFNMSUBSS FMA4 CPUID Fn8000_0001_ECX[FMA4] (bit 16) VFNMSUBnnnSS FMA CPUID Fn0000_0001_ECX[FMA] (bit 12) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 670 VFNMSUBSS, VFNMSUBnnnSS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VFNMSUBSS xmm1, xmm2, xmm3/mem32, xmm4 C4 RXB.03 0.src1.X.01 7E /r /is4 VFNMSUBSS xmm1, xmm2, xmm3, xmm4/mem32 C4 RXB.03 1.src1.X.01 7E /r /is4 VFNMSUB132SS xmm1, xmm2, xmm3/mem32 C4 RXB.02 0.src2.X.01 9F /r VFNMSUB213SS xmm1, xmm2, xmm3/mem32 C4 RXB.02 0.src2.X.01 AF /r VFNMSUB231SS xmm1, xmm2, xmm3/mem32 C4 RXB.02 0.src2.X.01 BF /r Related Instructions VFNMSUBPD, VFNMSUB132PD, VFNMSUB213PD, VFNMSUB231PD, VFNMSUBPS, VFNMSUB132PS, VFNMSUB213PS, VFNMSUB231PS, VFNMSUBSD, VFNMSUB132SD, VFNMSUB213SD, VFNMSUB231SD rFLAGS Affected None MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM 11 OM 10 ZM 9 DM 8 IM 7 DAZ PE UE OE M M M 5 4 3 6 ZE 2 DE IE M M 1 0 A flag that may be set or cleared is M (modified). Unaffected flags are blank. Instruction Reference VFNMSUBSS, VFNMSUBnnnSS 671 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot F F Invalid opcode, #UD F F F F F F Device not available, #NM Stack, #SS Page fault, #PF Alignment check, #AC F F F F F F SIMD floating-point, #XF F General protection, #GP Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. FMA instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Non-aligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE F — FMA, FMA4 exception 672 F F F F F F A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. VFNMSUBSS, VFNMSUBnnnSS Instruction Reference 26568—Rev. 3.22—May 2018 VFRCZPD AMD64 Technology Extract Fraction Packed Double-Precision Floating-Point Extracts the fractional portion of each double-precision floating-point value of either a source register or a memory location and writes the resulting values to the corresponding elements of the destination. The fractional results are precise. • When XOP.L = 0, the source is either an XMM register or a 128-bit memory location. • When XOP.L = 1, the source is a YMM register or 256-bit memory location. When the destination is an XMM register, bits [255:128] of the corresponding YMM register are cleared. Exception conditions are the same as for other arithmetic instructions, except with respect to the sign of a zero result. A zero is returned in the following cases: • When the operand is a zero. • When the operand is a normal integer. • When the operand is a denormal value and is coerced to zero by MXCSR.DAZ. • When the operand is a denormal value that is not coerced to zero by MXCSR.DAZ. In the first three cases, when MXCSR.RC = 01b (round toward − ∞) the sign of the zero result is negative, and is otherwise positive. In the fourth case, the operand is its own fractional part, which results in underflow, and the result is forced to zero by MXCSR.FZ; the result has the same sign as the operand. Instruction Support Form Subset VFRCZPD XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding XOP RXB.map_select W.vvvv.L.pp Opcode VFRCZPD xmm1, xmm2/mem128 8F RXB.09 0.1111.0.00 81 /r VFRCZPD ymm1, ymm2/mem256 8F RXB.09 0.1111.1.00 81 /r Related Instructions (V)ROUNDPD, (V)ROUNDPS, (V)ROUNDSD, (V)ROUNDSS, VFRCZPS, VFRCZSS, VFRCZSD rFLAGS Affected None Instruction Reference VFRCZPD 673 AMD64 Technology 26568—Rev. 3.22—May 2018 MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM OM 11 10 ZM 9 DM 8 IM 7 DAZ 6 PE UE M M 5 4 OE 3 ZE 2 DE IE M M 1 0 A flag that may be set or cleared is M (modified). Unaffected flags are blank. Exceptions Exception Mode Real Virt Prot X X X X X X X X Invalid opcode, #UD X Device not available, #NM Stack, #SS X X X X X X General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF S S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.W = 1. XOP.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding XOP prefix. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0. See SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Underflow, UE Precision, PE X — XOP exception 674 X X X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. VFRCZPD Instruction Reference 26568—Rev. 3.22—May 2018 VFRCZPS AMD64 Technology Extract Fraction Packed Single-Precision Floating-Point Extracts the fractional portion of each single-precision floating-point value of either a source register or a memory location and writes the resulting values to the corresponding elements of the destination. The fractional results are exact. • When XOP.L = 0, the source is either an XMM register or a 128-bit memory location. • When XOP.L = 1, the source is a YMM register or 256-bit memory location. When the destination is an XMM register, bits [255:128] of the corresponding YMM register are cleared. Exception conditions are the same as for other arithmetic instructions, except with respect to the sign of a zero result. A zero is returned in the following cases: • When the operand is a zero. • When the operand is a normal integer. • When the operand is a denormal value and is coerced to zero by MXCSR.DAZ. • When the operand is a denormal value that is not coerced to zero by MXCSR.DAZ. In the first three cases, when MXCSR.RC = 01b (round toward − ∞) the sign of the zero result is negative, and is otherwise positive. In the fourth case, the operand is its own fractional part, which results in underflow, and the result is forced to zero by MXCSR.FZ; the result has the same sign as the operand. Instruction Support Form Subset VFRCZPS XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding XOP RXB.map_select W.vvvv.L.pp Opcode VFRCZPS xmm1, xmm2/mem128 8F RXB.09 0.1111.0.00 80 /r VFRCZPS ymm1, ymm2/mem256 8F RXB.09 0.1111.1.00 80 /r Related Instructions (V)ROUNDPD, (V)ROUNDPS, (V)ROUNDSD, (V)ROUNDSS, VFRCZPD, VFRCZSS, VFRCZSD rFLAGS Affected None Instruction Reference VFRCZPS 675 AMD64 Technology 26568—Rev. 3.22—May 2018 MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM OM 11 10 ZM 9 DM IM 8 7 DAZ 6 PE UE M M 5 4 OE 3 ZE 2 DE IE M M 1 0 A flag that may be set or cleared is M (modified). Unaffected flags are blank. Exceptions Exception Mode Real Virt Prot X X X X X X X X Invalid opcode, #UD X Device not available, #NM Stack, #SS X X X X X X General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF S S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.W = 1. XOP.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding XOP prefix. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0. See SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Underflow, UE Precision, PE X — XOP exception 676 X X X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. VFRCZPS Instruction Reference 26568—Rev. 3.22—May 2018 VFRCZSD AMD64 Technology Extract Fraction Scalar Double-Precision Floating-Point Extracts the fractional portion of the double-precision floating-point value of either the low-order quadword of an XMM register or a 64-bit memory location and writes the result to the low-order quadword of the destination XMM register. The fractional results are precise. When the result is written to the destination XMM register, bits [127:64] of the destination and bits [255:128] of the corresponding YMM register are cleared. Exception conditions are the same as for other arithmetic instructions, except with respect to the sign of a zero result. A zero is returned in the following cases: • When the operand is a zero. • When the operand is a normal integer. • When the operand is a denormal value and is coerced to zero by MXCSR.DAZ. • When the operand is a denormal value that is not coerced to zero by MXCSR.DAZ. In the first three cases, when MXCSR.RC = 01b (round toward − ∞) the sign of the zero result is negative, and is otherwise positive. In the fourth case, the operand is its own fractional part, which results in underflow, and the result is forced to zero by MXCSR.FZ; the result has the same sign as the operand. Instruction Support Form Subset VFRCZSD XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic VFRCZSD xmm1, xmm2/mem64 Encoding XOP RXB.map_select W.vvvv.L.pp Opcode 8F RXB.09 0.1111.0.00 83 /r Related Instructions (V)ROUNDPD, (V)ROUNDPS, (V)ROUNDSD, (V)ROUNDSS, VFRCZPS, VFRCZPD, VFRCZSS rFLAGS Affected None Instruction Reference VFRCZSD 677 AMD64 Technology 26568—Rev. 3.22—May 2018 MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM OM 11 10 ZM 9 DM 8 IM 7 DAZ 6 PE UE M M 5 4 OE 3 ZE 2 DE IE M M 1 0 A flag that may be set or cleared is M (modified). Unaffected flags are blank. Exceptions Exception Mode Real Virt Prot X X X X X X X X Invalid opcode, #UD X Device not available, #NM Stack, #SS X X X X X X General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF S S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.W = 1. XOP.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding XOP prefix. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0. See SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Underflow, UE Precision, PE X — XOP exception 678 X X X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. VFRCZSD Instruction Reference 26568—Rev. 3.22—May 2018 VFRCZSS AMD64 Technology Extract Fraction Scalar Single-Precision Floating Point Extracts the fractional portion of the single-precision floating-point value of the low-order doubleword of an XMM register or 32-bit memory location and writes the result in the low-order doubleword of the destination XMM register. The fractional results are precise. When the result is written to the destination XMM register, bits [127:32] of the destination and bits [255:128] of the corresponding YMM register are cleared. Exception conditions are the same as for other arithmetic instructions, except with respect to the sign of a zero result. A zero is returned in the following cases: • When the operand is a zero. • When the operand is a normal integer. • When the operand is a denormal value and is coerced to zero by MXCSR.DAZ. • When the operand is a denormal value that is not coerced to zero by MXCSR.DAZ. In the first three cases, when MXCSR.RC = 01b (round toward − ∞) the sign of the zero result is negative, and is otherwise positive. In the fourth case, the operand is its own fractional part, which results in underflow, and the result is forced to zero by MXCSR.FZ; the result has the same sign as the operand. Instruction Support Form Subset VFRCZSS XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic VFRCZSS xmm1, xmm2/mem32 Encoding XOP RXB.map_select W.vvvv.L.pp Opcode 8F RXB.09 0.1111.0.00 82 /r Related Instructions ROUNDPD, ROUNDPS, ROUNDSD, ROUNDSS, VFRCZPS, VFRCZPD, VFRCZSD rFLAGS Affected None Instruction Reference VFRCZSS 679 AMD64 Technology 26568—Rev. 3.22—May 2018 MXCSR Flags Affected MM 17 Note: FZ 15 RC 14 PM 13 12 UM OM 11 10 ZM 9 DM IM 8 7 DAZ 6 PE UE M M 5 4 OE 3 ZE 2 DE IE M M 1 0 A flag that may be set or cleared is M (modified). Unaffected flags are blank. Exceptions Exception Mode Real Virt Prot X X X X X X X X Invalid opcode, #UD X Device not available, #NM Stack, #SS X X X X X X General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF S S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.W = 1. XOP.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding XOP prefix. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0. See SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Underflow, UE Precision, PE X — XOP exception 680 X X X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. VFRCZSS Instruction Reference 26568—Rev. 3.22—May 2018 VGATHERDPD AMD64 Technology Conditionally Gather Double-Precision Floating-Point Values, Doubleword Indices Conditionally loads double-precision (64-bit) values from memory using VSIB addressing with doubleword indices. The instruction is of the form: VGATHERDPD dest, mem64[vm32x], mask Loading of each element of the destination register is conditional based on the value of the corresponding element of the mask operand. If the most-significant bit of the ith element of the mask is set, the ith element of the destination is loaded from memory using the ith address of the array of effective addresses calculated using VSIB addressing. The index register is treated as an array of signed 32-bit values. Quadword elements of the destination for which the corresponding mask element is zero are not affected by the operation. If no exceptions occur, the mask register is set to zero. Execution of the instruction can be suspended by an exception if the exception is triggered by an element other than the rightmost element loaded. When this happens, the destination register and the mask operand may be observed as partially updated. Elements that have been loaded will have their mask elements set to zero. If any traps or faults are pending from elements that have been loaded, they will be delivered in lieu of the exception; in this case, the RF flag is set so that an instruction breakpoint is not re-triggered when the instruction execution is resumed. See Section 1.3, “VSIB Addressing,” on page 6 for a discussion of the VSIB addressing mode. There are 128-bit and 256-bit forms of this instruction. XMM Encoding The destination is an XMM register. The first source operand is up to two 64-bit values located in memory. The second source operand (the mask) is an XMM register. The index vector is the two low-order doublewords of an XMM register; the two high-order doublewords of the index register are not used. Bits [255:128] of the YMM register that corresponds to the destination and bits [255:128] of the YMM register that corresponds to the second source (mask) operand are cleared. YMM Encoding The destination is a YMM register. The first source operand is up to four 64-bit values located in memory. The second source operand (the mask) is a YMM register. The index vector is the four doublewords of an XMM register. Instruction Support Form Subset VGATHERDPD AVX2 Feature Flag Fn0000_00007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Reference VGATHERDPD 681 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VGATHERDPD xmm1, vm32x, xmm2 C4 RXB.02 1.src2.0.01 92 /r VGATHERDPD ymm1, vm32x, ymm2 C4 RXB.02 1.src2.1.01 92 /r Related Instructions VGATHERDPS, VGATHERQPD, VGATHERQPS, VPGATHERDD, VPGATHERDQ, VPGATHERQD, VPGATHERQQ rFLAGS Affected RF MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot A A A A A A A A A A A A A A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF A — AVX2 exception 682 A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. MODRM.mod = 11b MODRM.rm ! = 100b YMM/XMM registers specified for destination, mask, and index not unique. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. VGATHERDPD Instruction Reference 26568—Rev. 3.22—May 2018 VGATHERDPS AMD64 Technology Conditionally Gather Single-Precision Floating-Point Values, Doubleword Indices Conditionally loads single-precision (32-bit) values from memory using VSIB addressing with doubleword indices. The instruction is of the form: VGATHERDPS dest, mem32[vm32x/y], mask Loading of each element of the destination register is conditional based on the value of the corresponding element of the mask operand. If the most-significant bit of the ith element of the mask is set, the ith element of the destination is loaded from memory using the ith address of the array of effective addresses calculated using VSIB addressing. The index register is treated as an array of signed 32-bit values. Doubleword elements of the destination for which the corresponding mask element is zero are not affected by the operation. If no exceptions occur, the mask register is set to zero. Execution of the instruction can be suspended by an exception if the exception is triggered by an element other than the rightmost element loaded. When this happens, the destination register and the mask operand may be observed as partially updated. Elements that have been loaded will have their mask elements set to zero. If any traps or faults are pending from elements that have been loaded, they will be delivered in lieu of the exception; in this case, the RF flag is set so that an instruction breakpoint is not re-triggered when the instruction execution is resumed. See Section 1.3, “VSIB Addressing,” on page 6 for a discussion of the VSIB addressing mode. There are 128-bit and 256-bit forms of this instruction. XMM Encoding The destination is an XMM register. The first source operand is up to four 32-bit values located in memory. The second source operand (the mask) is an XMM register. The index vector is the four doublewords of an XMM register. Bits [255:128] of the YMM register that corresponds to the destination and bits [255:128] of the YMM register that corresponds to the second source (mask) operand are cleared. YMM Encoding The destination is a YMM register. The first source operand is up to eight 32-bit values located in memory. The second source operand (the mask) is a YMM register. The index vector is the eight doublewords of a YMM register. Instruction Support Form Subset VGATHERDPS AVX2 Feature Flag Fn0000_00007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Reference VGATHERDPS 683 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VGATHERDPS xmm1, vm32x, xmm2 C4 RXB.02 0.src2.0.01 92 /r VGATHERDPS ymm1, vm32y, ymm2 C4 RXB.02 0.src2.1.01 92 /r Related Instructions VGATHERDPD, VGATHERQPD, VGATHERQPS, VPGATHERDD, VPGATHERDQ, VPGATHERQD, VPGATHERQQ rFLAGS Affected RF MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot A A A A A A A A A A A A A A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF A — AVX2 exception 684 A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. MODRM.mod = 11b MODRM.rm ! = 100b YMM/XMM registers specified for destination, mask, and index not unique. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. VGATHERDPS Instruction Reference 26568—Rev. 3.22—May 2018 VGATHERQPD AMD64 Technology Conditionally Gather Double-Precision Floating-Point Values, Quadword Indices Conditionally loads double-precision (64-bit) values from memory using VSIB addressing with quadword indices. The instruction is of the form: VGATHERQPD dest, mem64[vm64x/y], mask Loading of each element of the destination register is conditional based on the value of the corresponding element of the mask operand. If the most-significant bit of the ith element of the mask is set, the ith element of the destination is loaded from memory using the ith address of the array of effective addresses calculated using VSIB addressing. The index register is treated as an array of signed 64-bit values. Quadword elements of the destination for which the corresponding mask element is zero are not affected by the operation. If no exceptions occur, the mask register is set to zero. Execution of the instruction can be suspended by an exception if the exception is triggered by an element other than the rightmost element loaded. When this happens, the destination register and the mask operand may be observed as partially updated. Elements that have been loaded will have their mask elements set to zero. If any traps or faults are pending from elements that have been loaded, they will be delivered in lieu of the exception; in this case, the RF flag is set so that an instruction breakpoint is not re-triggered when the instruction execution is resumed. See Section 1.3, “VSIB Addressing,” on page 6 for a discussion of the VSIB addressing mode. There are 128-bit and 256-bit forms of this instruction. XMM Encoding The destination is an XMM register. The first source operand is up to two 64-bit values located in memory. The second source operand (the mask) is an XMM register. The index vector is the two quadwords of an XMM register. Bits [255:128] of the YMM register that corresponds to the destination and bits [255:128] of the YMM register that corresponds to the second source (mask) operand are cleared. YMM Encoding The destination is a YMM register. The first source operand is up to four 64-bit values located in memory. The second source operand (the mask) is a YMM register. The index vector is the four quadwords of a YMM register. Instruction Support Form Subset VGATHERQPD AVX2 Feature Flag Fn0000_00007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Reference VGATHERQPD 685 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VGATHERQPD xmm1, vm64x, xmm2 C4 RXB.02 1.src2.0.01 93 /r VGATHERQPD ymm1, vm64y, ymm2 C4 RXB.02 1.src2.1.01 93 /r Related Instructions VGATHERDPD, VGATHERDPS, VGATHERQPS, VPGATHERDD, VPGATHERDQ, VPGATHERQD, VPGATHERQQ rFLAGS Affected RF MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot A A A A A A A A A A A A A A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF A — AVX2 exception 686 A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. MODRM.mod = 11b MODRM.rm ! = 100b YMM/XMM registers specified for destination, mask, and index not unique. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. VGATHERQPD Instruction Reference 26568—Rev. 3.22—May 2018 VGATHERQPS AMD64 Technology Conditionally Gather Single-Precision Floating-Point Values, Quadword Indices Conditionally loads single-precision (32-bit) values from memory using VSIB addressing with quadword indices. The instruction is of the form: VGATHERQPS dest, mem32[vm64x/y], mask Loading of each element of the destination register is conditional based on the value of the corresponding element of the mask operand. If the most-significant bit of the ith element of the mask is set, the ith element of the destination is loaded from memory using the ith address of the array of effective addresses calculated using VSIB addressing. The index register is treated as an array of signed 64-bit values. Doubleword elements of the destination for which the corresponding mask element is zero are not affected by the operation. The upper half of the destination is zeroed. If no exceptions occur, the mask register is set to zero. Execution of the instruction can be suspended by an exception if the exception is triggered by an element other than the rightmost element loaded. When this happens, the destination register and the mask operand may be observed as partially updated. Elements that have been loaded will have their mask elements set to zero. If any traps or faults are pending from elements that have been loaded, they will be delivered in lieu of the exception; in this case, the RF flag is set so that an instruction breakpoint is not re-triggered when the instruction execution is resumed. See Section 1.3, “VSIB Addressing,” on page 6 for a discussion of the VSIB addressing mode. There are 128-bit and 256-bit forms of this instruction. XMM Encoding The destination is an XMM register. The first source operand is up to two 32-bit values located in memory. The second source operand (the mask) is an XMM register. Only the lower half of the mask is used. The index vector is the two quadwords of an XMM register. Bits [255:64] of the YMM register that corresponds to the destination and bits [255:64] of the YMM register that corresponds to the second source (mask) operand are cleared. YMM Encoding The destination is an XMM register. The first source operand is up to four 32-bit values located in memory. The second source operand (the mask) is an XMM register. The index vector is the four quadwords of a YMM register. Bits [255:128] of the YMM register that corresponds to the destination and bits [255:128] of the YMM register that corresponds to the second source (mask) operand are cleared. Instruction Support Form Subset VGATHERQPS AVX2 Feature Flag Fn0000_00007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Reference VGATHERQPS 687 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VGATHERQPS xmm1, vm64x, xmm2 C4 RXB.02 0.src2.0.01 93 /r VGATHERQPS xmm1, vm64y, xmm2 C4 RXB.02 0.src2.1.01 93 /r Related Instructions VGATHERDPD, VGATHERDPS, VGATHERQPD, VPGATHERDD, VPGATHERDQ, VPGATHERQD, VPGATHERQQ rFLAGS Affected RF MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot A A A A A A A A A A A A A A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF A — AVX2 exception 688 A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. MODRM.mod = 11b MODRM.rm ! = 100b YMM/XMM registers specified for destination, mask, and index not unique. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. VGATHERQPS Instruction Reference 26568—Rev. 3.22—May 2018 VINSERTF128 AMD64 Technology Insert Packed Floating-Point Values 128-bit Combines 128 bits of data from a YMM register with 128-bit packed-value data from an XMM register or a 128-bit memory location, as specified by an immediate byte operand, and writes the combined data to the destination. Only bit [0] of the immediate operand is used. Operation is as follows. • When imm8[0] = 0, copy bits [255:128] of the first source to bits [255:128] of the destination and copy bits [127:0] of the second source to bits [127:0] of the destination. • When imm8[0] = 1, copy bits [127:0] of the first source to bits [127:0] of the destination and copy bits [127:0] of the second source to bits [255:128] of the destination. This extended-form instruction has a single 256-bit encoding. The first source operand is a YMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a YMM register. There is a third immediate byte operand. Instruction Support Form Subset VINSERTF128 AVX Feature Flag CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding VINSERTF128 ymm1, ymm2, xmm3/mem128, imm8 VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.03 0.src.1.01 18 /r ib Related Instructions VBROADCASTF128, VBROADCASTI128, VEXTRACTF128, VEXTRACTI128, VINSERTI128 rFLAGS Affected None MXCSR Flags Affected None Instruction Reference VINSERTF128 689 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC A — AVX exception. 690 A A A A A A A A A A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. VEX.L = 0. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VINSERTF128 Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology VINSERTI128 Insert Packed Integer Values 128-bit Combines 128 bits of data from a YMM register with 128-bit packed-value data from an XMM register or a 128-bit memory location, as specified by an immediate byte operand, and writes the combined data to the destination. Bit [0] of the immediate operand controls how the 128-bit values from the source operands are merged into the destination. The operation is as follows. • When imm8[0] = 0, copy bits [255:128] of the first source to bits [255:128] of the destination and copy bits [127:0] of the second source to bits [127:0] of the destination. • When imm8[0] = 1, copy bits [127:0] of the first source to bits [127:0] of the destination and copy bits [127:0] of the second source to bits [255:128] of the destination. This instruction has a single 256-bit encoding. The first source operand is a YMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a YMM register. The immediate byte is encoded in the instruction. Instruction Support Form Subset VINSERTI128 AVX2 Feature Flag Fn0000_00007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding VINSERTI128 ymm1, ymm2, xmm3/mem128, imm8 VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.03 0.src1.1.01 38 /r ib Related Instructions VBROADCASTF128, VBROADCASTI128, VEXTRACTF128, VEXTRACTI128, VINSERTF128 rFLAGS Affected None MXCSR Flags Affected None Instruction Reference VINSERTI128 691 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC A — AVX exception. 692 A A A A A A A A A A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. VEX.L = 0. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VINSERTI128 Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology VMASKMOVPD Masked Move Packed Double-Precision Moves packed double-precision data elements from a source element to a destination element, as specified by mask bits in a source operand. There are load and store versions of the instruction. For loads, the data elements are in a source memory location; for stores the data elements are in a source register. The mask bits are the most-significant bit of the corresponding data element of a source register. • For loads, when a mask bit = 1, the corresponding data element is copied from the source to the same element of the destination; when a mask bit = 0, the corresponding element of the destination is cleared. • For stores, when a mask bit = 1, the corresponding data element is copied from the source to the same element of the destination; when a mask bit = 0, the corresponding element of the destination is not affected. Exception and trap behavior for elements not selected for loading or storing from/to memory is implementation dependent. For instance, a given implementation may signal a data breakpoint or a page fault for quadwords that are zero-masked and not actually written. XMM Encoding There are load and store encodings. • For loads, there are two 64-bit source data elements in a 128-bit memory location, the mask operand is an XMM register, and the destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. • For stores, there are two 64-bit source data elements in an XMM register, the mask operand is an XMM register, and the destination is a 128-bit memory location. YMM Encoding There are load and store encodings. • For loads, there are four 64-bit source data elements in a 256-bit memory location, the mask operand is a YMM register, and the destination is a YMM register. • For stores, there are four 64-bit source data elements in a YMM register, the mask operand is a YMM register, and the destination is a 128-bit memory location. Instruction Support Form Subset VMASKMOVPD AVX Feature Flag CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Reference VMASKMOVPD 693 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VMASKMOVPD xmm1, xmm2, mem128 C4 RXB.02 0.src1.0.01 2D /r VMASKMOVPD ymm1, ymm2, mem256 C4 RXB.02 0.src1.1.01 2D /r VMASKMOVPD mem128, xmm1, xmm2 C4 RXB.02 0.src1.0.01 2F /r VMASKMOVPD mem256, ymm1, ymm2 C4 RXB.02 0.src1.1.01 2F /r Loads: Stores: Related Instructions VMASKMOVPS rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot A A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP S Page fault, #PF A — AVX exception. 694 S A A A A A A A A A X A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Write to a read-only data segment. Instruction execution caused a page fault. VMASKMOVPD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology VMASKMOVPS Masked Move Packed Single-Precision Moves packed single-precision data elements from a source element to a destination element, as specified by mask bits in a source operand. There are load and store versions of the instruction. For loads, the data elements are in a source memory location; for stores the data elements are in a source register. The mask bits are the most-significant bits of the corresponding data element of a source register. • For loads, when a mask bit = 1, the corresponding data element is copied from the source to the same element of the destination; when a mask bit = 0, the corresponding element of the destination is cleared. • For stores, when a mask bit = 1, the corresponding data element is copied from the source to the same element of the destination; when a mask bit = 0, the corresponding element of the destination is not affected. Exception and trap behavior for elements not selected for loading or storing from/to memory is implementation dependent. For instance, a given implementation may signal a data breakpoint or a page fault for doublewords that are zero-masked and not actually written. XMM Encoding There are load and store encodings. • For loads, there are four 32-bit source data elements in a 128-bit memory location, the mask operand is an XMM register, and the destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. • For stores, there are four 32-bit source data elements in an XMM register, the mask operand is an XMM register, and the destination is a 128-bit memory location. YMM Encoding There are load and store encodings. • For loads, there are eight 32-bit source data elements in a 256-bit memory location, the mask operand is a YMM register, and the destination is a YMM register. • For stores, there are eight 32-bit source data elements in a YMM register, the mask operand is a YMM register, and the destination is a 128-bit memory location. Instruction Support Form Subset VMASKMOVPS AVX Feature Flag CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Reference VMASKMOVPS 695 AMD64 Technology 26568—Rev. 3.22—May 2018 Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VMASKMOVPS xmm1, xmm2, mem128 C4 RXB.02 0.src1.0.01 2C /r VMASKMOVPS ymm1, ymm2, mem256 C4 RXB.02 0.src1.1.01 2C /r VMASKMOVPS mem128, xmm1, xmm2 C4 RXB.02 0.src1.0.01 2E /r VMASKMOVPS mem256, ymm1, ymm2 C4 RXB.02 0.src1.1.01 2E /r Loads: Stores: Related Instructions VMASKMOVPS rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot A A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP S Page fault, #PF A — AVX exception. 696 S A A A A A A A A A X A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Write to a read-only data segment. Instruction execution caused a page fault. VMASKMOVPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology VPBLENDD Blend Packed Doublewords Copies packed doublewords from either of two sources to a destination, as specified by an immediate 8-bit mask operand. Each bit of the mask selects a doubleword from one of the source operands to be copied to the destination. The least-significant bit controls the selection of the doubleword to be copied to the lowest doubleword of the destination. For each doubleword i of the destination: • When mask bit [i] = 0, doubleword i of the first source operand is copied to the corresponding doubleword of the destination. • When mask bit [i] = 1, doubleword i of the second source operand is copied to the corresponding doubleword of the destination. VPBLENDD The instruction has 128-bit and 256-bit encodings. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset VPBLENDD AVX2 Feature Flag Fn0000_00007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPBLENDD xmm1, xmm2, xmm3/mem128, imm8 C4 RXB.03 0.src1.0.01 02 /r /ib VPBLENDD ymm1, ymm2, ymm3/mem256, imm8 C4 RXB.03 0.src1.1.01 02 /r /ib Related Instructions VBLENDW rFLAGS Affected None Instruction Reference VPBLENDD 697 AMD64 Technology 26568—Rev. 3.22—May 2018 MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP A A A A A A A A A A A A Alignment check, #AC A Page fault, #PF A — AVX2 exception A 698 Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. VPBLENDD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology VPBROADCASTB Broadcast Packed Byte Loads a byte from a register or memory and writes it to all 16 or 32 bytes of an XMM or YMM register. This instruction has both 128-bit and 256-bit encodings: XMM Encoding Copies the source operand to all 16 bytes of the destination. The source operand is the least-significant 8 bits of an XMM register or an 8-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Copies the source operand to all 32 bytes of the destination. The source operand is the least-significant 8 bits of an XMM register or an 8-bit memory location. The destination is a YMM register. Instruction Support Form Subset VPBROADCASTB AVX2 Feature Flag CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPBROADCASTB xmm1, xmm2/mem8 C4 RXB.02 0.1111.0.01 78 /r VPBROADCASTB ymm1, xmm2/mem8 C4 RXB.02 0.1111.1.01 78 /r Related Instructions VPBROADCASTD, VPBROADCASTQ, VPBROADCASTW rFLAGS Affected None MXCSR Flags Affected None Instruction Reference VPBROADCASTB 699 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC A — AVX exception. 700 A A A A A A A A A A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. VPBROADCASTB Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology VPBROADCASTD Broadcast Packed Doubleword Loads a doubleword from a register or memory and writes it to all 4 or 8 doublewords of an XMM or YMM register. This instruction has both 128-bit and 256-bit encodings: XMM Encoding Copies the source operand to all 4 doublewords of the destination. The source operand is the least-significant 32 bits of an XMM register or a 32-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Copies the source operand to all 8 doublewords of the destination. The source operand is the least-significant 32 bits of an XMM register or a 32-bit memory location. The destination is a YMM register. Instruction Support Form Subset VPBROADCASTD AVX2 Feature Flag CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPBROADCASTD xmm1, xmm2/mem32 C4 RXB.02 0.1111.0.01 58 /r VPBROADCASTD ymm1, xmm2/mem32 C4 RXB.02 0.1111.1.01 58 /r Related Instructions VPBROADCASTB, VPBROADCASTQ, VPBROADCASTW rFLAGS Affected None MXCSR Flags Affected None Instruction Reference VPBROADCASTD 701 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC A — AVX exception. 702 A A A A A A A A A A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. VPBROADCASTD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology VPBROADCASTQ Broadcast Packed Quadword Loads a quadword from a register or memory and writes it to all 2 or 4 quadwords of an XMM or YMM register. This instruction has both 128-bit and 256-bit encodings: XMM Encoding Copies the source operand to both quadwords of the destination. The source operand is the least-significant 64 bits of an XMM register or a 64-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Copies the source operand to all 4 quadwords of the destination. The source operand is the least-significant 64 bits of an XMM register or a 64-bit memory location. The destination is a YMM register. Instruction Support Form Subset VPBROADCASTQ AVX2 Feature Flag CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPBROADCASTQ xmm1, xmm2/mem64 C4 RXB.02 0.1111.0.01 59 /r VPBROADCASTQ ymm1, xmm2/mem64 C4 RXB.02 0.1111.1.01 59 /r Related Instructions VPBROADCASTB, VPBROADCASTD, VPBROADCASTW rFLAGS Affected None MXCSR Flags Affected None Instruction Reference VPBROADCASTQ 703 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC A — AVX exception. 704 A A A A A A A A A A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. VPBROADCASTQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology VPBROADCASTW Broadcast Packed Word Loads a word from a register or memory and writes it to all 8 or 16 words of an XMM or YMM register. This instruction has both 128-bit and 256-bit encodings: XMM Encoding Copies the source operand to all 8 words of the destination. The source operand is the least-significant 16 bits of an XMM register or a 16-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding Copies the source operand to all 16 words of the destination. The source operand is the least-significant 16 bits of an XMM register or a 16-bit memory location. The destination is a YMM register. Instruction Support Form Subset VPBROADCASTW AVX2 Feature Flag CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPBROADCASTW xmm1, xmm2/mem16 C4 RXB.02 0.1111.0.01 79 /r VPBROADCASTW ymm1, xmm2/mem16 C4 RXB.02 0.1111.1.01 79 /r Related Instructions VPBROADCASTB, VPBROADCASTD, VPBROADCASTQ rFLAGS Affected None MXCSR Flags Affected None Instruction Reference VPBROADCASTW 705 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC A — AVX exception. 706 A A A A A A A A A A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. VPBROADCASTW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology VPCMOV Vector Conditional Move Moves bits of either the first source or the second source to the corresponding positions in the destination, depending on the value of the corresponding bit of a third source. When a bit of the third source = 1, the corresponding bit of the first source is moved to the destination; when a bit of the third source = 0, the corresponding bit of the second source is moved to the destination. This instruction directly implements the C-language ternary “?” operation on each source bit. Arbitrary bit-granular predicates can be constructed by any number of methods, or loaded as constants from memory. This instruction may use the results of any SSE instructions as the predicate in the selector. VPCMPEQB (VPCMPGTB), VPCMPEQW (VPCMPGTW), VPCMPEQD (VPCMPGTD) and VPCMPEQQ (VPCMPGTQ) compare bytes, words, doublewords, quadwords and integers, respectively, and set the predicate in the destination to masks of 1s and 0s accordingly. VCMPPS (VCMPSS) and VCMPPD (VCMPSD) compare word and doubleword floating-point source values, respectively, and provide the predicate for the floating-point instructions. There are four operands: VPCMOV dest, src1, src2, src3. The first source (src1) is an XMM or YMM register specified by XOP.vvvv. XOP.W and bits [7:4] of an immediate byte (imm8) configure src2 and src3: • When XOP.W = 0, src2 is either a register or a memory location specified by ModRM.r/m and src3 is a register specified by imm8[7:4]. • When XOP.W = 1, src2 is a register specified by imm8[7:4] and src3 is either a register or a memory location specified by ModRM.r/m. The destination (dest) is either an XMM or a YMM register, as determined by XOP.L. When the destination is an XMM register, bits [255:128] of the corresponding YMM register are cleared. Instruction Support Form Subset VPCMOV XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding XOP RXB.map_select W.vvvv.L.pp Opcode VPCMOV xmm1, xmm2, xmm3/mem128, xmm4 8F RXB.08 0.src1.0.00 A2 /r ib VPCMOV ymm1, ymm2, ymm3/mem256, ymm4 8F RXB.08 0.src1.1.00 A2 /r ib VPCMOV xmm1, xmm2, xmm3, xmm4/mem128 8F RXB.08 1.src1.0.00 A2 /r ib VPCMOV ymm1, ymm2, ymm3, ymm4/mem256 8F RXB.08 1.src1.1.00 A2 /r ib Related Instructions VPCOMUB, VPCOMUD, VPCOMUQ, VPCOMUW, VCMPPD, VCMPPS Instruction Reference VPCMOV 707 AMD64 Technology 26568—Rev. 3.22—May 2018 rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception 708 X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPCMOV Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology VPCOMB Compare Vector Signed Bytes Compares corresponding packed signed bytes in the first and second sources and writes the result of each comparison in the corresponding byte of the destination. The result of each comparison is an 8bit value of all 1s (TRUE) or all 0s (FALSE). There are four operands: VPCOMB dest, src1, src2, imm8 The destination (dest) is an XMM registers specified by ModRM.reg. When the comparison results are written to the destination XMM register, bits [255:128] of the corresponding YMM register are cleared. The first source (src1) is an XMM register specified by the XOP.vvvv field and the second source (src2) is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field. The comparison type is specified by bits [2:0] of the immediate-byte operand (imm8). Each type has an alias mnemonic to facilitate coding. imm8[2:0] Comparison Mnemonic 000 Less Than VPCOMLTB 001 Less Than or Equal VPCOMLEB 010 Greater Than VPCOMGTB 011 Greater Than or Equal VPCOMGEB 100 Equal VPCOMEQB 101 Not Equal VPCOMNEQB 110 False VPCOMFALSEB 111 True VPCOMTRUEB Instruction Support Form Subset VPCOMB XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic VPCOMB xmm1, xmm2, xmm3/mem128, imm8 Encoding XOP RXB.map_select W.vvvv.L.pp Opcode 8F RXB.08 0.src1.0.00 CC /r ib Related Instructions VPCOMUB, VPCOMUW, VPCOMUD, VPCOMUQ, VPCOMW, VPCOMD, VPCOMQ rFLAGS Affected None Instruction Reference VPCOMB 709 AMD64 Technology 26568—Rev. 3.22—May 2018 MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception 710 X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPCOMB Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology VPCOMD Compare Vector Signed Doublewords Compares corresponding packed signed doublewords in the first and second sources and writes the result of each comparison to the corresponding doubleword of the destination. The result of each comparison is a 32-bit value of all 1s (TRUE) or all 0s (FALSE). There are four operands: VPCOMD dest, src1, src2, imm8 The destination (dest) is an XMM register specified by ModRM.reg. When the results of the comparisons are written to the destination XMM register, bits [255:128] of the corresponding YMM register are cleared. The first source (src1) is an XMM register specified by the XOP.vvvv field and the second source (src2) is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field. The comparison type is specified by bits [2:0] of an immediate-byte operand (imm8). Each type has an alias mnemonic to facilitate coding. imm8[2:0] Comparison Mnemonic 000 Less Than VPCOMLTD 001 Less Than or Equal VPCOMLED 010 Greater Than VPCOMGTD 011 Greater Than or Equal VPCOMGED 100 Equal VPCOMEQD 101 Not Equal VPCOMNEQD 110 False VPCOMFALSED 111 True VPCOMTRUED Instruction Support Form Subset VPCOMD XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding XOP RXB.map_select VPCOMD xmm1, xmm2, xmm3/mem128, imm8 8F RXB.08 W.vvvv.L.pp Opcode 0.src1.0.00 CE /r ib Related Instructions VPCOMUB, VPCOMUW, VPCOMUD, VPCOMUQ, VPCOMB, VPCOMW, VPCOMQ rFLAGS Affected None Instruction Reference VPCOMD 711 AMD64 Technology 26568—Rev. 3.22—May 2018 MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception 712 X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPCOMD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology VPCOMQ Compare Vector Signed Quadwords Compares corresponding packed signed quadwords in the first and second sources and writes the result of each comparison to the corresponding quadword of the destination. The result of each comparison is a 64-bit value of all 1s (TRUE) or all 0s (FALSE). There are four operands: VPCOMQ dest, src1, src2, imm8 The destination (dest) is an XMM register specified by ModRM.reg. When the result is written to the destination XMM register, bits [255:128] of the corresponding YMM register are cleared. The first source (src1) is an XMM register specified by the XOP.vvvv field and the second source (src2) is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field. The comparison type is specified by bits [2:0] of an immediate-byte operand (imm8). Each type has an alias mnemonic to facilitate coding. imm8[2:0] Comparison Mnemonic 000 Less Than VPCOMLTQ 001 Less Than or Equal VPCOMLEQ 010 Greater Than VPCOMGTQ 011 Greater Than or Equal VPCOMGEQ 100 Equal VPCOMEQQ 101 Not Equal VPCOMNEQQ 110 False VPCOMFALSEQ 111 True VPCOMTRUEQ Instruction Support Form Subset VPCOMQ XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding XOP RXB.map_select VPCOMQ xmm1, xmm2, xmm3/mem128, imm8 8F RXB.08 W.vvvv.L.pp Opcode 0.src1.0.00 CF /r ib Related Instructions VPCOMUB, VPCOMUW, VPCOMUD, VPCOMUQ, VPCOMB, VPCOMW, VPCOMD rFLAGS Affected None Instruction Reference VPCOMQ 713 AMD64 Technology 26568—Rev. 3.22—May 2018 MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception 714 X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPCOMQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology VPCOMUB Compare Vector Unsigned Bytes Compares corresponding packed unsigned bytes in the first and second sources and writes the result of each comparison to the corresponding byte of the destination. The result of each comparison is an 8-bit value of all 1s (TRUE) or all 0s (FALSE). There are four operands: VPCOMUB dest, src1, src2, imm8 The destination (dest) is an XMM register specified by ModRM.reg. When the result is written to the destination XMM register, bits [255:128] of the corresponding YMM register are cleared. The first source (src1) is an XMM register specified by the XOP.vvvv field and the second source (src2) is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field. The comparison type is specified by bits [2:0] of an immediate-byte operand (imm8). Each type has an alias mnemonic to facilitate coding. imm8[2:0] Comparison Mnemonic 000 Less Than VPCOMLTUB 001 Less Than or Equal VPCOMLEUB 010 Greater Than VPCOMGTUB 011 Greater Than or Equal VPCOMGEUB 100 Equal VPCOMEQUB 101 Not Equal VPCOMNEQUB 110 False VPCOMFALSEUB 111 True VPCOMTRUEUB Instruction Support Form Subset VPCOMUB XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding XOP RXB.map_select VPCOMUB xmm1, xmm2, xmm3/mem128, imm8 8F RXB.08 W.vvvv.L.pp Opcode 0.src1.0.00 EC /r ib Related Instructions VPCOMUW, VPCOMUD, VPCOMUQ, VPCOMB, VPCOMW, VPCOMD, VPCOMQ rFLAGS Affected None Instruction Reference VPCOMUB 715 AMD64 Technology 26568—Rev. 3.22—May 2018 MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception 716 X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPCOMUB Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology VPCOMUD Compare Vector Unsigned Doublewords Compares corresponding packed unsigned doublewords in the first and second sources and writes the result of each comparison to the corresponding doubleword of the destination. The result of each comparison is a 32-bit value of all 1s (TRUE) or all 0s (FALSE). There are four operands: VPCOMUD dest, src1, src2, imm8 The destination (dest) is an XMM register specified by ModRM.reg. When the results are written to the destination XMM register, bits [255:128] of the corresponding YMM register are cleared. The first source (src1) is an XMM register specified by the XOP.vvvv field and the second source (src2) is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field. The comparison type is specified by bits [2:0] of an immediate-byte operand (imm8). Each type has an alias mnemonic to facilitate coding. imm8[2:0] Comparison Mnemonic 000 Less Than VPCOMLTUD 001 Less Than or Equal VPCOMLEUD 010 Greater Than VPCOMGTUD 011 Greater Than or Equal VPCOMGEUD 100 Equal VPCOMEQUD 101 Not Equal VPCOMNEQUD 110 False VPCOMFALSEUD 111 True VPCOMTRUEUD Instruction Support Form Subset VPCOMUD XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding XOP RXB.map_select VPCOMUD xmm1, xmm2, xmm3/mem128, imm8 8F RXB.08 W.vvvv.L.pp Opcode 0.src1.0.00 EE /r ib Related Instructions VPCOMUB, VPCOMUW, VPCOMUQ, VPCOMB, VPCOMW, VPCOMD, VPCOMQ rFLAGS Affected None Instruction Reference VPCOMUD 717 AMD64 Technology 26568—Rev. 3.22—May 2018 MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception 718 X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPCOMUD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology VPCOMUQ Compare Vector Unsigned Quadwords Compares corresponding packed unsigned quadwords in the first and second sources and writes the result of each comparison to the corresponding quadword of the destination. The result of each comparison is a 64-bit value of all 1s (TRUE) or all 0s (FALSE). There are four operands: VPCOMUQ dest, src1, src2, imm8 The destination (dest) is an XMM register specified by ModRM.reg. When the results are written to the destination XMM register, bits [255:128] of the corresponding YMM register are cleared. The first source (src1) is an XMM register specified by the XOP.vvvv field and the second source (src2) is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field. The comparison type is specified by bits [2:0] of an immediate-byte operand (imm8). Each type has an alias mnemonic to facilitate coding. imm8[2:0] Comparison Mnemonic 000 Less Than VPCOMLTUQ 001 Less Than or Equal VPCOMLEUQ 010 Greater Than VPCOMGTUQ 011 Greater Than or Equal VPCOMGEUQ 100 Equal VPCOMEQUQ 101 Not Equal VPCOMNEQUQ 110 False VPCOMFALSEUQ 111 True VPCOMTRUEUQ Instruction Support Form Subset VPCOMUQ XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding XOP RXB.map_select VPCOMUQ xmm1, xmm2, xmm3/mem128, imm8 8F RXB.08 W.vvvv.L.pp Opcode 0.src1.0.00 EF /r ib Related Instructions VPCOMUB, VPCOMUW, VPCOMUD, VPCOMB, VPCOMW, VPCOMD, VPCOMQ rFLAGS Affected None Instruction Reference VPCOMUQ 719 AMD64 Technology 26568—Rev. 3.22—May 2018 MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception 720 X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPCOMUQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology VPCOMUW Compare Vector Unsigned Words Compares corresponding packed unsigned words in the first and second sources and writes the result of each comparison to the corresponding word of the destination. The result of each comparison is a 16-bit value of all 1s (TRUE) or all 0s (FALSE). There are four operands: VPCOMUW dest, src1, src2, imm8 The destination (dest) is an XMM register specified by ModRM.reg. When the results are written to the destination XMM register, bits [255:128] of the corresponding YMM register are cleared. The first source (src1) is an XMM register specified by the XOP.vvvv field and the second source (src2) is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field. The comparison type is specified by bits [2:0] of an immediate-byte operand (imm8). Each type has an alias mnemonic to facilitate coding. imm8[2:0] Comparison Mnemonic 000 Less Than VPCOMLTUW 001 Less Than or Equal VPCOMLEUW 010 Greater Than VPCOMGTUW 011 Greater Than or Equal VPCOMGEUW 100 Equal VPCOMEQUW 101 Not Equal VPCOMNEQUW 110 False VPCOMFALSEUW 111 True VPCOMTRUEUW Instruction Support Form Subset VPCOMUW XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding XOP RXB.map_select W.vvvv.L.pp VPCOMUW xmm1, xmm2, xmm3/mem128, imm8 8F RXB.08 0.src1.0.00 Opcode ED /r ib Related Instructions VPCOMUB, VPCOMUD, VPCOMUQ, VPCOMB, VPCOMW, VPCOMD, VPCOMQ rFLAGS Affected None Instruction Reference VPCOMUW 721 AMD64 Technology 26568—Rev. 3.22—May 2018 MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception 722 X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPCOMUW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology VPCOMW Compare Vector Signed Words Compares corresponding packed signed words in the first and second sources and writes the result of each comparison in the corresponding word of the destination. The result of each comparison is a 16bit value of all 1s (TRUE) or all 0s (FALSE). There are four operands: VPCOMW dest, src1, src2, imm8 The destination (dest) is an XMM register specified by ModRM.reg. When the results are written to the destination XMM register, bits [255:128] of the corresponding YMM register are cleared. The first source (src1) is an XMM register specified by the XOP.vvvv field and the second source (src2) is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field. The comparison type is specified by bits [2:0] of an immediate-byte operand (imm8). Each type has an alias mnemonic to facilitate coding. imm8[2:0] Comparison Mnemonic 000 Less Than VPCOMLTW 001 Less Than or Equal VPCOMLEW 010 Greater Than VPCOMGTW 011 Greater Than or Equal VPCOMGEW 100 Equal VPCOMEQW 101 Not Equal VPCOMNEQW 110 False VPCOMFALSEW 111 True VPCOMTRUEW Instruction Support Form Subset VPCOMW XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding XOP RXB.map_select VPCOMW xmm1, xmm2, xmm3/mem128, imm8 8F RXB.08 W.vvvv.L.pp Opcode 0.src1.0.00 CD /r ib Related Instructions VPCOMUB, VPCOMUW, VPCOMUD, VPCOMUQ, VPCOMB, VPCOMD, VPCOMQ rFLAGS Affected None Instruction Reference VPCOMW 723 AMD64 Technology 26568—Rev. 3.22—May 2018 MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception 724 X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPCOMW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology VPERM2F128 Permute Floating-Point 128-bit Copies 128 bits of floating-point data from a selected octword of two 256-bit source operands or zero to each octword of a 256-bit destination, as specified by an immediate byte operand. The immediate operand is encoded as follows. Destination Immediate-Byte Bit Field Value of Bit Field Source 1 Bits Copied Source 2 Bits Copied [127:0] [1:0] 00 [127:0] — 01 [255:128] — 10 — [127:0] 11 — [255:128] Setting imm8 [3] clears bits [127:0] of the destination; imm8 [2] is ignored. [255:128] [5:4] 00 [127:0] — 01 [255:128] — 10 — [127:0] 11 — [255:128] Setting imm8 [7] clears bits [255:128] of the destination; imm8 [6] is ignored. This is a 256-bit extended-form instruction: The first source operand is a YMM register and the second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset VPERM2F128 AVX Feature Flag CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding VPERM2F128 ymm1, ymm2, ymm3/mem256, imm8 VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.03 0.src1.1.01 06 /r ib Related Instructions VEXTRACTF128, VINSERTF128, VPERMILPD, VPERMILPS rFLAGS Affected None Instruction Reference VPERM2F128 725 AMD64 Technology 26568—Rev. 3.22—May 2018 MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC A — AVX exception. 726 A A A A A A A A A A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. VEX.L = 0. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPERM2F128 Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology VPERM2I128 Permute Integer 128-bit Copies 128 bits of integer data from a selected octword of two 256-bit source operands or zero to each octword of a 256-bit destination, as specified by an immediate byte operand. The immediate operand is encoded as follows. Destination Immediate-Byte Bit Field Value of Bit Field Source 1 Bits Copied Source 2 Bits Copied [127:0] [1:0] 00 [127:0] — 01 [255:128] — 10 — [127:0] 11 — [255:128] Setting imm8 [3] clears bits [127:0] of the destination; imm8 [2] is ignored. [255:128] [5:4] 00 [127:0] — 01 [255:128] — 10 — [127:0] 11 — [255:128] Setting imm8 [7] clears bits [255:128] of the destination; imm8 [6] is ignored. This is a 256-bit extended-form instruction: The first source operand is a YMM register and the second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Bits 2 and 6 of the immediate byte are ignored. Instruction Support Form Subset VPERM2I128 AVX2 Feature Flag CPUID Fn0000_0007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding VPERM2I128 ymm1, ymm2, ymm3/mem256, imm8 VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.03 0.src1.1.01 46 /r ib Related Instructions VEXTRACTI128, VEXTRACTF128, VINSERTI128, VINSERTF128, VPERMILPD, VPERMILPS rFLAGS Affected None Instruction Reference VPERM2I128 727 AMD64 Technology 26568—Rev. 3.22—May 2018 MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC A — AVX exception. 728 A A A A A A A A A A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. VEX.L = 0. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPERM2I128 Instruction Reference 26568—Rev. 3.22—May 2018 VPERMD AMD64 Technology Packed Permute Doubleword Copies selected doublewords from a 256-bit value located either in memory or a YMM register to specific doublewords of the destination YMM register. For each doubleword of the destination, selection of which doubleword to copy from the source is specified by a selector field in the corresponding doubleword of a YMM register. There is a single form of this instruction: VPERMD dest, src1, src2 The first source operand provides eight 3-bit selectors, each selector occupying the least-significant bits of a doubleword. Each selector specifies the index of the doubleword of the second source operand to be copied to the destination. The doubleword in the destination that each selector controls is based on its position within the first source operand. The index value may be the same in multiple selectors. This results in multiple copies of the same source doubleword being copied to the destination. There is no 128-bit form of this instruction. YMM Encoding The destination is a YMM register. The first source operand is a YMM register and the second source operand is either a YMM register or a 256-bit memory location. Instruction Support Form Subset VPERMD AVX2 Feature Flag Fn0000_00007_EBX[AVX2]_x0 (bit 5) Instruction Encoding Encoding Mnemonic VPERMD ymm1, ymm2, ymm3/mem256 VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.02 0.src1.1.01 36 /r Related Instructions VPERMQ, VPERMPD, VPERMPS rFLAGS Affected None MXCSR Flags Affected None Instruction Reference VPERMD 729 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot A A A A A A A A A A A A A A A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF A — AVX2 exception 730 A A A A A A A A A A A A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L= 0. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. VPERMD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology VPERMIL2PD Permute Two-Source Double-Precision Floating-Point Copies a selected quadword from one of two source operands to a selected quadword of the destination or clears the selected quadword of the destination. Values in a third source operand and an immediate two-bit operand control the operation. There are 128-bit and 256-bit versions of this instruction. Both versions have five operands: VPERMIL2PD dest, src1, src2, src3, m2z. The first four operands are either 128 bits or 256 bits wide, as determined by VEX.L. When the destination is an XMM register, bits [255:128] of the corresponding YMM register are cleared. The third source operand is a selector that specifies how quadwords are copied or cleared in the destination. The selector contains one selector element for each quadword of the destination register. Selector for 128-bit Instruction Form 127 64 63 0 S1 S0 The selector for the 128-bit instruction form is an octword composed of two quadword selector elements S0 and S1. S0 (the lower quadword) controls the value written to destination quadword 0 (bits [63:0]) and S1 (the upper quadword) controls the destination quadword 1 (bits [127:64]). Selector for 256-bit Instruction Form 255 192 191 128 S3 S2 127 64 63 0 S1 S0 The selector for the 256-bit instruction form is a double octword and adds two more selector elements S2 and S3. S0 controls the value written to the destination quadword 0 (bits [63:0]), S1 controls the destination quadword 1 (bits [127:64]), S2 controls the destination quadword 2 (bits [191:128]), and S3 controls the destination quadword 3 (bits [255:192]). The layout of each selector element is as follows: 63 4 3 2 1 0 Reserved, IGN M Bits Mnemonic Description [63:4] — Reserved, IGN [3] M Match [2:1] Sel Select [0] — Reserved, IGN Sel The fields are defined as follows: Instruction Reference VPERMIL2PD 731 AMD64 Technology • • 26568—Rev. 3.22—May 2018 Sel — Select. Selects the source quadword to copy into the corresponding quadword of the destination: Sel Value Source Selected for Destination Quadwords 0 and 1 (both forms) Source Selected for Destination Quadwords 2 and 3 (256-bit form) 00b src1[63:0] src1[191:128] 01b src1[127:64] src1[255:192] 10b src2[63:0] src2[191:128] 11b src2[127:64] src2[255:192] M — Match bit. The combination of the Match bit in each selector element and the value of the M2Z field determines if the Select field is overridden. This is described below. m2z immediate operand The fifth operand is m2z. The assembler uses this 2-bit value to encode the M2Z field in the instruction. M2Z occupies bits [1:0] of an immediate byte. Bits [7:4] of the same byte are used to select one of 16 YMM/XMM registers. This dual use of the immediate byte is indicated in the instruction synopsis by the symbol “is5”. The immediate byte is defined as follows. 7 4 3 2 1 SRS 0 M2Z Bits Mnemonic Description [7:4] SRS Source Register Select [3:2] — Reserved, IGN [1:0] M2Z Match to Zero Fields are defined as follows: • SRS — Source Register Select. As with many other extended instructions, bits in the immediate byte are used to select a source operand register. This field is set by the assembler based on the operands listed in the instruction. See discussion in “src2 and src3 Operand Addressing” below. • M2Z — Match to Zero. This field, combined with the M bit of the selector element, controls the function of the Sel field as follows: . M2Z Field Selector M Bit Value Loaded into Destination Quadword 0Xb X Source quadword selected by selector element Sel field. 10b 0 Source quadword selected by selector element Sel field. 10b 1 Zero 11b 0 Zero 11b 1 Source quadword selected by selector element Sel field. src2 and src3 Operand Addressing In 64-bit mode, VEX.W and bits [7:4] of the immediate byte specify src2 and src3: 732 VPERMIL2PD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology • When VEX.W = 0, src2 is either a register or a memory location specified by ModRM.r/m and src3 is a register specified by bits [7:4] of the immediate byte. • When VEX.W = 1, src2 is a register specified by bits [7:4] of the immediate byte and src3 is either a register or a memory location specified by ModRM.r/m. In non-64-bit mode, bit 7 is ignored. Instruction Support Form Subset VPERMIL2PD XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Encoding Mnemonic VEX RXB.map_select W.vvvv.L.pp Opcode VPERMIL2PD xmm1, xmm2, xmm3/mem128, xmm4, m2z C4 RXB.03 0.src1.0.01 49 /r is5 VPERMIL2PD xmm1, xmm2, xmm3, xmm4/mem128, m2z C4 RXB.03 1.src1.0.01 49 /r is5 VPERMIL2PD ymm1, ymm2, ymm3/mem256, ymm4, m2z C4 RXB.03 0.src1.1.01 49 /r is5 VPERMIL2PD ymm1, ymm2, ymm3, ymm4/mem256, m2z C4 RXB.03 1.src1.1.01 49 /r is5 NOTE: VPERMIL2PD is encoded using the VEX prefix even though it is an XOP instruction. Related Instructions VPERM2F128, VPERMIL2PS, VPERMILPD, VPERMILPS, VPPERM rFLAGS Affected None MXCSR Flags Affected None Instruction Reference VPERMIL2PD 733 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception 734 X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPERMIL2PD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology VPERMIL2PS Permute Two-Source Single-Precision Floating-Point Copies a selected doubleword from one of two source operands to a selected doubleword of the destination or clears the selected doubleword of the destination. Values in a third source operand and an immediate two-bit operand control operation. There are 128-bit and 256-bit versions of this instruction. Both versions have five operands: VPERMIL2PS dest, src1, src2, src3, m2z The first four operands are either 128 bits or 256 bits wide, as determined by VEX.L. When the destination is an XMM register, bits [255:128] of the corresponding YMM register are cleared. The third source operand is a selector that specifies how doublewords are copied or cleared in the destination. The selector contains one selector element for each doubleword of the destination register. Selector for 128-bit Instruction Form 127 96 95 64 63 S3 32 31 S2 S1 0 S0 The selector for the 128-bit instruction form is an octword containing four selector elements S0–S3. S0 controls the value written to the destination doubleword 0 (bits [31:0]), S1 controls the destination doubleword 1 (bits [63:32]), S2 controls the destination doubleword 2 (bits [95:64]), and S3 controls the destination doubleword 3 (bits [127:96]). Selector for 256-bit Instruction Form 255 224 223 192 191 S7 127 160 159 S6 S5 96 95 64 63 S3 128 S4 32 31 S2 S1 0 S0 The selector for the 256-bit instruction form is a double octword and adds four more selector elements S4–S7. S4 controls the value written to the destination doubleword 4 (bits [159:128]), S5 controls the destination doubleword 5 (bits [191:160]), S6 controls the destination doubleword 6 (bits [223:192]), and S7 controls the destination doubleword 7 (bits [255:224]). The layout of each selector element is as follows. 31 4 3 2 1 0 Reserved, IGN M Bits Mnemonic Description [31:4] — Reserved, IGN [3] M Match [2:0] Sel Select Sel The fields are defined as follows: Instruction Reference VPERMIL2PS 735 AMD64 Technology • • 26568—Rev. 3.22—May 2018 Sel — Select. Selects the source doubleword to copy into the corresponding doubleword of the destination: Sel Value Source Selected for Destination Doublewords 0, 1, 2 and 3 (both forms) Source Selected for Destination Doublewords 4, 5, 6 and 7 (256-bit form) 000b src1[31:0] src1[159:128] 001b src1[63:32] src1[191:160] 010b src1[95:64] src1[223:192] 011b src1[127:96] src1[255:224] 100b src2[31:0] src2[159:128] 101b src2[63:32] src2[191:160] 110b src2[95:64] src2[223:192] 111b src2[127:96] src2[255:224] M — Match. The combination of the M bit in each selector element and the value of the M2Z field determines if the Sel field is overridden. This is described below. m2z immediate operand The fifth operand is m2z. The assembler uses this 2-bit value to encode the M2Z field in the instruction. M2Z occupies bits [1:0] of an immediate byte. Bits [7:4] of the same byte are used to select one of 16 YMM/XMM registers. This dual use of the immediate byte is indicated in the instruction synopsis by the symbol “is5”. The immediate byte is defined as follows. 7 4 3 2 SRS 1 0 M2Z Bits Mnemonic Description [7:4] SRS Source Register Select [3:2] — Reserved, IGN [1:0] M2Z Match to Zero Fields are defined as follows: • SRS — Source Register Select. As with many other extended instructions, bits in the immediate byte are used to select a source operand register. This field is set by the assembler based on the operands listed in the instruction. See discussion in “src2 and src3 Operand Addressing” below. • M2Z — Match to Zero. This field, combined with the M bit of the selector element, controls the function of the Sel field as follows: 736 M2Z Field Selector M Bit Value Loaded into Destination Doubleword 0Xb X Source doubleword selected by Sel field. 10b 0 Source doubleword selected by Sel field. VPERMIL2PS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology M2Z Field Selector M Bit Value Loaded into Destination Doubleword 10b 1 Zero 11b 0 Zero 11b 1 Source doubleword selected by Sel field. src2 and src3 Operand Addressing In 64-bit mode, VEX.W and bits [7:4] of the immediate byte specify src2 and src3: • When VEX.W = 0, src2 is either a register or a memory location specified by ModRM.r/m and src3 is a register specified by bits [7:4] of the immediate byte. • When VEX.W = 1, src2 is a register specified by bits [7:4] of the immediate byte and src3 is either a register or a memory location specified by ModRM.r/m. In non-64-bit mode, bit 7 is ignored. Instruction Support Form Subset VPERMIL2PS XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Encoding Mnemonic VEX RXB.map_select W.vvvv.L.pp Opcode VPERMIL2PS xmm1, xmm2, xmm3/mem128, xmm4, m2z C4 RXB.03 0.src1.0.01 48 /r is5 VPERMIL2PS xmm1, xmm2, xmm3, xmm4/mem128, m2z C4 RXB.03 1.src1.0.01 48 /r is5 VPERMIL2PS ymm1, ymm2, ymm3/mem256, ymm4, m2z C4 RXB.03 0.src1.1.01 48 /r is5 VPERMIL2PS ymm1, ymm2, ymm3, ymm4/mem256, m2z C4 RXB.03 1.src1.1.01 48 /r is5 NOTE: VPERMIL2PS is encoded using the VEX prefix even though it is an XOP instruction. Related Instructions VPERM2F128, VPERMIL2PD, VPERMILPD, VPERMILPS, VPPERM rFLAGS Affected None MXCSR Flags Affected None Instruction Reference VPERMIL2PS 737 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception 738 X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPERMIL2PS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology VPERMILPD Permute Double-Precision Copies double-precision floating-point values from a source to a destination. Source and destination can be selected in two ways. There are different encodings for each selection method. Selection by bits in a source register or memory location: Each quadword of the operand is defined as follows. 63 2 1 0 Sel A bit selects source and destination. Only bit [1] is used; bits [63:2} and bit [0] are ignored. Setting the bit selects the corresponding quadword element of the source and the destination. Selection by bits in an immediate byte: Each bit corresponds to a destination quadword. Only bits [3:2] and bits [1:0] are used; bits [7:4] are ignored. Selections are defined as follows. Destination Quadword Immediate-Byte Bit Field Value of Bit Field Source 1 Bits Copied Used by 128-bit encoding and 256-bit encoding [63:0] [127:64] [0] [1] 0 [63:0] 1 [127:64] 0 [63:0] 1 [127:64] Used only by 256-bit encoding [191:128] [255:192] [2] [3] 0 [191:128] 1 [255:192] 0 [191:128] 1 [255:192] This extended-form instruction has both 128-bit and 256-bit encoding. XMM Encoding There are two encodings, one for each selection method: • The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. • The first source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. There is a third, immediate byte operand. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding There are two encodings, one for each selection method: Instruction Reference VPERMILPD 739 AMD64 Technology • • 26568—Rev. 3.22—May 2018 The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. The first source operand is either a YMM register or a 256-bit memory location. The destination is a YMM register. There is a third, immediate byte operand. Instruction Support Form Subset VPERMILPD AVX Feature Flag CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode Selection by source register or memory: VPERMILPD xmm1, xmm2, xmm3/mem128 C4 RXB.02 0.src1.0.01 0D /r VPERMILPD ymm1, ymm2, ymm3/mem256 C4 RXB.02 0.src1.1.01 0D /r VPERMILPD xmm1, xmm2/mem128, imm8 C4 RXB.03 0.1111.0.01 05 /r ib VPERMILPD ymm1, ymm2/mem256, imm8 C4 RXB.03 0.1111.1.01 05 /r ib Selection by immediate byte operand: Related Instructions VPERM2F128, VPERMIL2PD, VPERMIL2PS, VPERMILPS, VPPERM rFLAGS Affected None MXCSR Flags Affected None 740 VPERMILPD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC A — AVX exception. Instruction Reference A A A A A A A A A A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. VEX.vvvv ! = 1111b (for versions with immediate byte operand only). REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. VPERMILPD 741 AMD64 Technology 26568—Rev. 3.22—May 2018 VPERMILPS Permute Single-Precision Copies single-precision floating-point values from a source to a destination. Source and destination can be selected in two ways. There are different encodings for each selection method. Selection by bit fields in a source register or memory location: Each doubleword of the operand is defined as follows. 31 2 1 0 Sel Each bit field corresponds to a destination doubleword. Bit values select a source doubleword. Only bits [1:0] of each word are used; bits [31:2} are ignored. The 128-bit encoding uses four two-bit fields; the 256-bit version uses eight two-bit fields. Field encoding is as follows. 742 Destination Doubleword [31:0] Immediate Operand Bit Field [1:0] [63:32] [33:32] [95:64] [65:64] [127:96] [97:96] VPERMILPS Value of Bit Field 00 01 10 11 00 01 10 11 00 01 10 11 00 01 10 11 Source Bits Copied [31:0] [63:32] [95:64] [127:96] [31:0] [63:32] [95:64] [127:96] [31:0] [63:32] [95:64] [127:96] [31:0] [63:32] [95:64] [127:96] Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Destination Immediate Operand Value of Source Doubleword Bit Field Bit Field Bits Copied Upper 128 bits of 256-bit source and destination used by 256-bit encoding [159:128] [129:128] 00 [159:128] 01 [191:160] 10 [223:192] 11 [255:224] [191:160] [161:160] 00 [159:128] 01 [191:160] 10 [223:192] 11 [255:224] [223:192] [193:192] 00 [159:128] 01 [191:160] 10 [223:192] 11 [255:224] [255:224] [225:224] 00 [159:128] 01 [191:160] 10 [223:192] 11 [255:224] Selection by bit fields in an immediate byte: Each bit field corresponds to a destination doubleword. For the 256-bit encoding, the fields specify sources and destinations in both the upper and lower 128 bits of the register. Selections are defined as follows. Destination Doubleword [31:0] Bit Field [63:32] [3:2] [95:64] [5:4] [127:96] [7:6] Instruction Reference [1:0] VPERMILPS Value of Bit Field 00 01 10 11 00 01 10 11 00 01 10 11 00 01 10 11 Source Bits Copied [31:0] [63:32] [95:64] [127:96] [31:0] [63:32] [95:64] [127:96] [31:0] [63:32] [95:64] [127:96] [31:0] [63:32] [95:64] [127:96] 743 AMD64 Technology 26568—Rev. 3.22—May 2018 Destination Bit Field Value of Bit Source Doubleword Field Bits Copied Upper 128 bits of 256-bit source and destination used by 256-bit encoding [159:128] [1:0] 00 [159:128] 01 [191:160] 10 [223:192] 11 [255:224] [191:160] [3:2] 00 [159:128] 01 [191:160] 10 [223:192] 11 [255:224] [223:192] [5:4] 00 [159:128] 01 [191:160] 10 [223:192] 11 [255:224] [255:224] [7:6] 00 [159:128] 01 [191:160] 10 [223:192] 11 [255:224] This extended-form instruction has both 128-bit and 256-bit encodings: XMM Encoding There are two encodings, one for each selection method: • The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. • The first source operand is either an XMM register or a 128-bit memory location. The destination is an XMM register. There is a third, immediate byte operand. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding There are two encodings, one for each selection method: • The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. • The first source operand is either a YMM register or a 256-bit memory location. The destination is a YMM register. There is a third, immediate byte operand. Instruction Support Form Subset VPERMILPS AVX Feature Flag CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 744 VPERMILPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPERMILPS xmm1, xmm2, xmm3/mem128 C4 RXB.02 0.src1.0.01 0C /r VPERMILPS ymm1, ymm2, ymm3/mem256 C4 RXB.02 0.src1.1.01 0C /r VPERMILPS xmm1, xmm2/mem128, imm8 C4 RXB.03 0.1111.0.01 04 /r ib VPERMILPS ymm1, ymm2/mem256, imm8 C4 RXB.03 0.1111.1.01 04 /r ib Selection by source register or memory: Selection by immediate byte operand: Related Instructions VPERM2F128, VPERMIL2PD, VPERMIL2PS, VPERMILPD, VPPERM rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC A — AVX exception. Instruction Reference A A A A A A A A A A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. VEX.vvvv ! = 1111b (for versions with immediate byte operand only). REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. VPERMILPS 745 AMD64 Technology 26568—Rev. 3.22—May 2018 VPERMPD Packed Permute Double-Precision Floating-Point Copies selected quadwords from a 256-bit value located either in memory or a YMM register to specific quadwords of the destination. For each quadword of the destination, selection of which quadword to copy from the source is specified by a 2 bit selector field in an immediate byte. There is a single form of this instruction: VPERMPD dest, src, imm8 The selection of which quadword of the source operand to copy to each quadword of the destination is specified by four 2-bit selector fields in the immediate byte. Bits [1:0] specify the index of the quadword to be copied to the destination quadword 0. Bits [3:2] select the quadword to be copied to quadword 1, bits [5:4] select the quadword to be copied to quadword 2, and bits [7:6] select the quadword to be copied to quadword 3. The index value may be the same in multiple selectors. This results in multiple copies of the same source quadword being copied to the destination. There is no 128-bit form of this instruction. YMM Encoding The destination is a YMM register. The source operand is a YMM register or a 256-bit memory location. Instruction Support Form Subset VPERMPD AVX2 Feature Flag Fn0000_00007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Encoding Mnemonic VPERMPD ymm1, ymm2/mem256, imm8 VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.03 1.1111.1.01 01 /r ib Related Instructions VPERMD, VPERMQ, VPERMPS rFLAGS Affected None MXCSR Flags Affected None 746 VPERMPD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot A A A A A A A A A A A A A A A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF A — AVX2 exception Instruction Reference A A A A A A A A A A A A A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L= 0. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. VPERMPD 747 AMD64 Technology 26568—Rev. 3.22—May 2018 VPERMPS Packed Permute Single-Precision Floating-Point Copies selected doublewords from a 256-bit value located either in memory or a YMM register to specific doublewords of the destination YMM register. For each doubleword of the destination, selection of which doubleword to copy from the source is specified by a selector field in the corresponding doubleword of a YMM register. There is a single form of this instruction: VPERMPS dest, src1, src2 The first source operand provides eight 3-bit selectors, each selector occupying the least-significant bits of a doubleword. Each selector specifies the index of the doubleword of the second source operand to be copied to the destination. The doubleword in the destination that each selector controls is based on its position within the first source operand. The index value may be the same in multiple selectors. This results in multiple copies of the same source doubleword being copied to the destination. There is no 128-bit form of this instruction. YMM Encoding The destination is a YMM register. The first source operand is a YMM register and the second source operand is either a YMM register or a 256-bit memory location. Instruction Support Form Subset VPERMPS AVX2 Feature Flag Fn0000_00007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Encoding Mnemonic VPERMPS ymm1, ymm2, ymm3/mem256 VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.02 0.src1.1.01 16 /r Related Instructions VPERMD, VPERMQ, VPERMPD rFLAGS Affected None MXCSR Flags Affected None 748 VPERMPS Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF A — AVX2 exception Instruction Reference Mode Real Virt Prot A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L= 0. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. VPERMPS 749 AMD64 Technology 26568—Rev. 3.22—May 2018 VPERMQ Packed Permute Quadword Copies selected quadwords from a 256-bit value located either in memory or a YMM register to specific quadwords of the destination. For each quadword of the destination, selection of which quadword to copy from the source is specified by a 2 bit selector field in an immediate byte. There is a single form of this instruction: VPERMQ dest, src, imm8 The selection of which quadword of the source operand to copy to each quadword of the destination is specified by four 2-bit selector fields in the immediate byte. Bits [1:0] specify the index of the quadword to be copied to the destination quadword 0. Bits [3:2] select the quadword to be copied to quadword 1, bits [5:4] select the quadword to be copied to quadword 2, and bits [7:6] select the quadword to be copied to quadword 3. The index value may be the same in multiple selectors. This results in multiple copies of the same source quadword being copied to the destination. There is no 128-bit form of this instruction. YMM Encoding The destination is a YMM register. The source operand is a YMM register or a 256-bit memory location. Instruction Support Form Subset VPERMQ AVX2 Feature Flag Fn0000_00007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Encoding Mnemonic VPERMQ ymm1, ymm2/mem256, imm8 VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.03 1.1111.1.01 00 /r ib Related Instructions VPERMD, VPERMPD, VPERMPS rFLAGS Affected None MXCSR Flags Affected None 750 VPERMQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot A A A A A A A A A A A A A A A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF A — AVX2 exception Instruction Reference A A A A A A A A A A A A A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L= 0. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. VPERMQ 751 AMD64 Technology 26568—Rev. 3.22—May 2018 VPGATHERDD Conditionally Gather Doublewords, Doubleword Indices Conditionally loads doubleword values from memory using VSIB addressing with doubleword indices. The instruction is of the form: VPGATHERDD dest, mem32[vm32x/y], mask The loading of each element of the destination register is conditional based on the value of the corresponding element of the mask (second source operand). If the most-significant bit of the ith element of the mask is set, the ith element of the destination is loaded from memory using the ith address of the array of effective addresses calculated using VSIB addressing. The index register is treated as an array of signed 32-bit values. Doubleword elements of the destination for which the corresponding mask element is zero are not affected by the operation. If no exceptions occur, the mask register is set to zero. Execution of the instruction can be suspended by an exception if the exception is triggered by an element other than the rightmost element loaded. When this happens, the destination register and the mask operand may be observed as partially updated. Elements that have been loaded will have their mask elements set to zero. If any traps or faults are pending from elements that have been loaded, they will be delivered in lieu of the exception; in this case, the RF flag is set so that an instruction breakpoint is not re-triggered when the instruction execution is resumed. See Section 1.3, “VSIB Addressing,” on page 6 for a discussion of the VSIB addressing mode. There are 128-bit and 256-bit forms of this instruction. XMM Encoding The destination is an XMM register. The first source operand is up to four 32-bit values located in memory. The second source operand (the mask) is an XMM register. The index vector is the four doublewords of an XMM register. Bits [255:128] of the YMM register that corresponds to the destination and bits [255:128] of the YMM register that corresponds to the second source (mask) operand are cleared. YMM Encoding The destination is a YMM register. The first source operand is up to eight 32-bit values located in memory. The second source operand (the mask) is a YMM register. The index vector is the eight doublewords of a YMM register. Instruction Support Form Subset VPGATHERDD AVX2 Feature Flag Fn0000_00007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 752 VPGATHERDD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPGATHERDD xmm1, vm32x, xmm2 C4 RXB.02 0.src2.0.01 90 /r VPGATHERDD ymm1, vm32y, ymm2 C4 RXB.02 0.src2.1.01 90 /r Related Instructions VGATHERDPD, VGATHERDPS, VGATHERQPD, VGATHERQPS, VPGATHERDQ, VPGATHERQD, VPGATHERQQ rFLAGS Affected RF MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot A A A A A A A A A A A A A A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF A — AVX2 exception Instruction Reference A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. MODRM.mod = 11b MODRM.rm ! = 100b YMM/XMM registers specified for destination, mask, and index not unique. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. VPGATHERDD 753 AMD64 Technology 26568—Rev. 3.22—May 2018 VPGATHERDQ Conditionally Gather Quadwords, Doubleword Indices Conditionally loads quadword values from memory using VSIB addressing with doubleword indices. The instruction is of the form: VPGATHERDQ dest, mem64[vm32x], mask The loading of each element of the destination register is conditional based on the value of the corresponding element of the mask (second source operand). If the most-significant bit of the ith element of the mask is set, the ith element of the destination is loaded from memory using the ith address of the array of effective addresses calculated using VSIB addressing. The index register is treated as an array of signed 32-bit values. Quadword elements of the destination for which the corresponding mask element is zero are not affected by the operation. If no exceptions occur, the mask register is set to zero. Execution of the instruction can be suspended by an exception if the exception is triggered by an element other than the rightmost element loaded. When this happens, the destination register and the mask operand may be observed as partially updated. Elements that have been loaded will have their mask elements set to zero. If any traps or faults are pending from elements that have been loaded, they will be delivered in lieu of the exception; in this case, the RF flag is set so that an instruction breakpoint is not re-triggered when the instruction execution is resumed. See Section 1.3, “VSIB Addressing,” on page 6 for a discussion of the VSIB addressing mode. There are 128-bit and 256-bit forms of this instruction. XMM Encoding The destination is an XMM register. The first source operand is up to two 64-bit values located in memory. The second source operand (the mask) is an XMM register. The index vector is the two low-order doublewords of an XMM register; the two high-order doublewords of the index register are not used. Bits [255:128] of the YMM register that corresponds to the destination and bits [255:128] of the YMM register that corresponds to the second source (mask) operand are cleared. YMM Encoding The destination is a YMM register. The first source operand is up to four 64-bit values located in memory. The second source operand (the mask) is a YMM register. The index vector is the four doublewords of an XMM register. Instruction Support Form Subset VPGATHERDQ AVX2 Feature Flag Fn0000_00007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 754 VPGATHERDQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPGATHERDQ xmm1, vm32x, xmm2 C4 RXB.02 1.src2.0.01 90 /r VPGATHERDQ ymm1, vm32x, ymm2 C4 RXB.02 1.src2.1.01 90 /r Related Instructions VGATHERDPD, VGATHERDPS, VGATHERQPD, VGATHERQPS, VPGATHERDD, VPGATHERQD, VPGATHERQQ rFLAGS Affected RF MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot A A A A A A A A A A A A A A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF A — AVX2 exception Instruction Reference A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. MODRM.mod = 11b MODRM.rm ! = 100b YMM/XMM registers specified for destination, mask, and index not unique. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. VPGATHERDQ 755 AMD64 Technology 26568—Rev. 3.22—May 2018 VPGATHERQD Conditionally Gather Doublewords, Quadword Indices Conditionally loads doubleword values from memory using VSIB addressing with quadword indices. The instruction is of the form: VPGATHERQD dest, mem32[vm64x/y], mask The loading of each element of the destination register is conditional based on the value of the corresponding element of the mask (second source operand). If the most-significant bit of the ith element of the mask is set, the ith element of the destination is loaded from memory using the ith address of the array of effective addresses calculated using VSIB addressing. The index register is treated as an array of signed 64-bit values. Doubleword elements of the destination for which the corresponding mask element is zero are not affected by the operation. If no exceptions occur, the mask register is set to zero. Execution of the instruction can be suspended by an exception if the exception is triggered by an element other than the rightmost element loaded. When this happens, the destination register and the mask operand may be observed as partially updated. Elements that have been loaded will have their mask elements set to zero. If any traps or faults are pending from elements that have been loaded, they will be delivered in lieu of the exception; in this case, the RF flag is set so that an instruction breakpoint is not re-triggered when the instruction execution is resumed. See Section 1.3, “VSIB Addressing,” on page 6 for a discussion of the VSIB addressing mode. There are 128-bit and 256-bit forms of this instruction. XMM Encoding The destination is an XMM register. The first source operand is up to two 32-bit values located in memory. The second source operand (the mask) is an XMM register. The index vector is the two quadwords of an XMM register. The upper half of the destination register and the mask register are cleared. Bits [255:128] of the YMM register that corresponds to the destination and bits [255:128] of the YMM register that corresponds to the mask register are cleared. YMM Encoding The destination is an XMM register. The first source operand is up to four 32-bit values located in memory. The second source operand (the mask) is an XMM register. The index vector is the four quadwords of a YMM register. Bits [255:128] of the YMM register that corresponds to the destination and bits [255:128] of the YMM register that corresponds to the mask register are cleared. Instruction Support Form Subset VPGATHERQD AVX2 Feature Flag Fn0000_00007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 756 VPGATHERQD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPGATHERQD xmm1, vm64x, xmm2 C4 RXB.02 0.src2.0.01 91 /r VPGATHERQD xmm1, vm64y, xmm2 C4 RXB.02 0.src2.1.01 91 /r Related Instructions VGATHERDPD, VGATHERDPS, VGATHERQPD, VGATHERQPS, VPGATHERDD, VPGATHERDQ, VPGATHERQQ rFLAGS Affected RF MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot A A A A A A A A A A A A A A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF A — AVX2 exception Instruction Reference A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. MODRM.mod = 11b MODRM.rm ! = 100b YMM/XMM registers specified for destination, mask, and index not unique. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. VPGATHERQD 757 AMD64 Technology 26568—Rev. 3.22—May 2018 VPGATHERQQ Conditionally Gather Quadwords, Quadword Indices Conditionally loads quadword values from memory using VSIB addressing with quadword indices. The instruction is of the form: VPGATHERQQ dest, mem64[vm64x/y], mask The loading of each element of the destination register is conditional based on the value of the corresponding element of the mask (second source operand). If the most-significant bit of the ith element of the mask is set, the ith element of the destination is loaded from memory using the ith address of the array of effective addresses calculated using VSIB addressing. The index register is treated as an array of signed 64-bit values. Quadword elements of the destination for which the corresponding mask element is zero are not affected by the operation. If no exceptions occur, the mask register is set to zero. Execution of the instruction can be suspended by an exception if the exception is triggered by an element other than the rightmost element loaded. When this happens, the destination register and the mask operand may be observed as partially updated. Elements that have been loaded will have their mask elements set to zero. If any traps or faults are pending from elements that have been loaded, they will be delivered in lieu of the exception; in this case, the RF flag is set so that an instruction breakpoint is not re-triggered when the instruction execution is resumed. See Section 1.3, “VSIB Addressing,” on page 6 for a discussion of the VSIB addressing mode. There are 128-bit and 256-bit forms of this instruction. XMM Encoding The destination is an XMM register. The first source operand is up to two 64-bit values located in memory. The second source operand (the mask) is an XMM register. The index vector is the two quadwords of an XMM register. Bits [255:128] of the YMM register that corresponds to the destination and bits [255:128] of the YMM register that corresponds to the second source (mask) operand are cleared. YMM Encoding The destination is a YMM register. The first source operand is up to four 64-bit values located in memory. The second source operand (the mask) is a YMM register. The index vector is the four quadwords of a YMM register. Instruction Support Form Subset VPGATHERQQ AVX2 Feature Flag Fn0000_00007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 758 VPGATHERQQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPGATHERQQ xmm1, vm64x, xmm2 C4 RXB.02 1.src2.0.01 91 /r VPGATHERQQ ymm1, vm64y, ymm2 C4 RXB.02 1.src2.1.01 91 /r Related Instructions VGATHERDPD, VGATHERDPS, VGATHERQPD, VGATHERQPS, VPGATHERDD, VPGATHERDQ, VPGATHERQD rFLAGS Affected RF MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot A A A A A A A A A A A A A A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF A — AVX2 exception Instruction Reference A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. MODRM.mod = 11b MODRM.rm ! = 100b YMM/XMM registers specified for destination, mask, and index not unique. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. VPGATHERQQ 759 AMD64 Technology 26568—Rev. 3.22—May 2018 VPHADDBD Packed Horizontal Add Signed Byte to Signed Doubleword Adds four sets of four 8-bit signed integer values of the source and packs the sign-extended sums into the corresponding doubleword of the destination. There are two operands: VPHADDBD dest, src The destination is an XMM register and the source is either an XMM register or a 128-bit memory location. Bits [255:128] of the corresponding YMM register are cleared. Instruction Support Form Subset VPHADDBD XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic VPHADDBD xmm1, xmm2/mem128 Encoding XOP RXB.map_select W.vvvv.L.pp Opcode 8F RXB.09 0.1111.0.00 C2 /r Related Instructions VPHADDBW, VPHADDBQ, VPHADDWD, VPHADDWQ, VPHADDDQ rFLAGS Affected None MXCSR Flags Affected None 760 VPHADDBD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception Instruction Reference X X X X A X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.W = 1. XOP.vvvv ! = 1111b. XOP.L = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPHADDBD 761 AMD64 Technology 26568—Rev. 3.22—May 2018 VPHADDBQ Packed Horizontal Add Signed Byte to Signed Quadword Adds two sets of eight 8-bit signed integer values of the source and packs the sign-extended sums into the corresponding quadword of the destination. There are two operands: VPHADDBQ dest, src The destination is an XMM register and the source is either an XMM register or a 128-bit memory location. Bits [255:128] of the corresponding YMM register are cleared. Instruction Support Form Subset VPHADDBQ XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic VPHADDBQ xmm1, xmm2/mem128 Encoding XOP RXB.map_select W.vvvv.L.pp Opcode 8F RXB.09 0.1111.0.00 C3 /r Related Instructions VPHADDBW, VPHADDBD, VPHADDWD, VPHADDWQ, VPHADDDQ rFLAGS Affected None MXCSR Flags Affected None 762 VPHADDBQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception Instruction Reference X X X X A X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.W = 1. XOP.vvvv ! = 1111b. XOP.L = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPHADDBQ 763 AMD64 Technology 26568—Rev. 3.22—May 2018 VPHADDBW Packed Horizontal Add Signed Byte to Signed Word Adds each adjacent pair of 8-bit signed integer values of the source and packs the sign-extended 16bit integer result of each addition into the corresponding word element of the destination. There are two operands: VPHADDBW dest, src The destination is an XMM register and the source is either an XMM register or a 128-bit memory location. Bits [255:128] of the corresponding YMM register are cleared. Instruction Support Form Subset VPHADDBW XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic VPHADDBW xmm1, xmm2/mem128 Encoding XOP RXB.map_select W.vvvv.L.pp Opcode 8F RXB.09 0.1111.0.00 C1 /r Related Instructions VPHADDBD, VPHADDBQ, VPHADDWD, VPHADDWQ, VPHADDDQ rFLAGS Affected None MXCSR Flags Affected None 764 VPHADDBW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception Instruction Reference X X X X A X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.W = 1. XOP.vvvv ! = 1111b. XOP.L = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPHADDBW 765 AMD64 Technology 26568—Rev. 3.22—May 2018 VPHADDDQ Packed Horizontal Add Signed Doubleword to Signed Quadword Adds each adjacent pair of signed doubleword integer values of the source and packs the signextended sums into the corresponding quadword of the destination. There are two operands: VPHADDDQ dest, src The source is either an XMM register or a 128-bit memory location and the destination is an XMM register. Bits [255:128] of the corresponding YMM register are cleared. Instruction Support Form Subset VPHADDDQ XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic VPHADDDQ xmm1, xmm2/mem128 Encoding XOP RXB.map_select W.vvvv.L.pp Opcode 8F RXB.09 0.1111.0.00 CB /r Related Instructions VPHADDBW, VPHADDBD, VPHADDBQ, VPHADDWD, VPHADDWQ rFLAGS Affected None MXCSR Flags Affected None 766 VPHADDDQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception Instruction Reference X X X X A X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.W = 1. XOP.vvvv ! = 1111b. XOP.L = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPHADDDQ 767 AMD64 Technology 26568—Rev. 3.22—May 2018 VPHADDUBD Packed Horizontal Add Unsigned Byte to Doubleword Adds four sets of four 8-bit unsigned integer values of the source and packs the sums into the corresponding doublewords of the destination. There are two operands: VPHADDUBD dest, src The destination is an XMM register and the source is either an XMM register or a 128-bit memory location. Bits [255:128] of the corresponding YMM register are cleared. Instruction Support Form Subset VPHADDUBD XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic VPHADDUBD xmm1, xmm2/mem128 Encoding XOP RXB.map_select W.vvvv.L.pp Opcode 8F RXB.09 0.1111.0.00 D2 /r Related Instructions VPHADDUBW, VPHADDUBQ, VPHADDUWD, VPHADDUWQ, VPHADDUDQ rFLAGS Affected None MXCSR Flags Affected None 768 VPHADDUBD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception Instruction Reference X X X X A X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.W = 1. XOP.vvvv ! = 1111b. XOP.L = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPHADDUBD 769 AMD64 Technology 26568—Rev. 3.22—May 2018 VPHADDUBQ Packed Horizontal Add Unsigned Byte to Quadword Adds two sets of eight 8-bit unsigned integer values from the second source and packs the sums into the corresponding quadword of the destination. There are two operands: VPHADDUBQ dest, src The destination is an XMM register and the source is either an XMM register or a 128-bit memory location. When the destination XMM register is written, bits [255:128] of the corresponding YMM register are cleared. Instruction Support Form Subset VPHADDUBQ XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic VPHADDUBQ xmm1, xmm2/mem128 Encoding XOP RXB.map_select W.vvvv.L.pp Opcode 8F RXB.09 0.1111.0.00 D3 /r Related Instructions VPHADDUBW, VPHADDUBD, VPHADDUWD, VPHADDUWQ, VPHADDUDQ rFLAGS Affected None MXCSR Flags Affected None 770 VPHADDUBQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception Instruction Reference X X X X A X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.W = 1. XOP.vvvv ! = 1111b. XOP.L = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPHADDUBQ 771 AMD64 Technology 26568—Rev. 3.22—May 2018 VPHADDUBW Packed Horizontal Add Unsigned Byte to Word Adds each adjacent pair of 8-bit unsigned integer values of the source and packs the 16-bit integer sums to the corresponding word of the destination. There are two operands: VPHADDUBW dest, src The destination is an XMM register and the source is either an XMM register or a 128-bit memory location. Bits [255:128] of the corresponding YMM register are cleared. Instruction Support Form Subset VPHADDUBW XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic VPHADDUBW xmm1, xmm2/mem128 Encoding XOP RXB.map_select W.vvvv.L.pp Opcode 8F RXB.09 0.1111.0.00 D1 /r Related Instructions VPHADDUBD, VPHADDUBQ, VPHADDUWD, VPHADDUWQ, VPHADDUDQ rFLAGS Affected None MXCSR Flags Affected None 772 VPHADDUBW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception Instruction Reference X X X X A X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.W = 1. XOP.vvvv ! = 1111b. XOP.L = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPHADDUBW 773 AMD64 Technology 26568—Rev. 3.22—May 2018 VPHADDUDQ Packed Horizontal Add Unsigned Doubleword to Quadword Adds two adjacent pairs of 32-bit unsigned integer values of the source and packs the sums into the corresponding quadword of the destination. There are two operands: VPHADDUDQ dest, src The destination is an XMM register and the source is either an XMM register or a 128-bit memory location. Bits [255:128] of the corresponding YMM register are cleared. Instruction Support Form Subset VPHADDUDQ XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic VPHADDUDQ xmm1, xmm2/mem128 Encoding XOP RXB.map_select W.vvvv.L.pp Opcode 8F RXB.09 0.1111.0.00 DB /r Related Instructions VPHADDUBW, VPHADDUBD, VPHADDUBQ, VPHADDUWD, VPHADDUWQ rFLAGS Affected None MXCSR Flags Affected None 774 VPHADDUDQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception Instruction Reference X X X X A X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.W = 1. XOP.vvvv ! = 1111b. XOP.L = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPHADDUDQ 775 AMD64 Technology 26568—Rev. 3.22—May 2018 VPHADDUWD Packed Horizontal Add Unsigned Word to Doubleword Adds four adjacent pairs of 16-bit unsigned integer values of the source and packs the sums into the corresponding doubleword of the destination. There are two operands: VPHADDUWD dest, src The destination is an XMM register and the source is either an XMM register or a 128-bit memory location. Bits [255:128] of the corresponding YMM register are cleared. Instruction Support Form Subset VPHADDUWD XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic VPHADDUWD xmm1, xmm2/mem128 Encoding XOP RXB.map_select W.vvvv.L.pp Opcode 8F RXB.09 0.1111.0.00 D6 /r Related Instructions VPHADDUBW, VPHADDUBD, VPHADDUBQ, VPHADDUWQ, VPHADDUDQ rFLAGS Affected None MXCSR Flags Affected None 776 VPHADDUWD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception Instruction Reference X X X X A X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.W = 1. XOP.vvvv ! = 1111b. XOP.L = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPHADDUWD 777 AMD64 Technology 26568—Rev. 3.22—May 2018 VPHADDUWQ Packed Horizontal Add Unsigned Word to Quadword Adds two pairs of 16-bit unsigned integer values of the source and packs the sums into the corresponding quadword element of the destination. There are two operands: VPHADDUWQ dest, src The destination is an XMM register and the source is either an XMM register or a 128-bit memory location. Bits [255:128] of the corresponding YMM register are cleared. Instruction Support Form Subset VPHADDUWQ XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic VPHADDUWQ xmm1, xmm2/mem128 Encoding XOP RXB.map_select W.vvvv.L.pp Opcode 8F RXB.09 0.1111.0.00 D7 /r Related Instructions VPHADDUBW, VPHADDUBD, VPHADDUBQ, VPHADDUWD, VPHADDUDQ rFLAGS Affected None MXCSR Flags Affected None 778 VPHADDUWQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception Instruction Reference X X X X A X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.W = 1. XOP.vvvv ! = 1111b. XOP.L = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPHADDUWQ 779 AMD64 Technology 26568—Rev. 3.22—May 2018 VPHADDWD Packed Horizontal Add Signed Word to Signed Doubleword Adds four adjacent pairs of 16-bit signed integer values of the source and packs the sign-extended sums to the corresponding doubleword of the destination. There are two operands: VPHADDWD dest, src The destination is an XMM register and the source is either an XMM register or a 128-bit memory location. Bits [255:128] of the corresponding YMM register are cleared. Instruction Support Form Subset VPHADDWD XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic VPHADDWD xmm1, xmm2/mem128 Encoding XOP RXB.map_select W.vvvv.L.pp Opcode 8F RXB.09 0.1111.0.00 C6 /r Related Instructions VPHADDBW, VPHADDBD, VPHADDBQ, VPHADDWQ, VPHADDDQ rFLAGS Affected None MXCSR Flags Affected None 780 VPHADDWD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception Instruction Reference X X X X A X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.W = 1. XOP.vvvv ! = 1111b. XOP.L = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPHADDWD 781 AMD64 Technology 26568—Rev. 3.22—May 2018 VPHADDWQ Packed Horizontal Add Signed Word to Signed Quadword Adds four successive pairs of 16-bit signed integer values of the source and packs the sign-extended sums to the corresponding quadword of the destination. There are two operands: VPHADDWQ dest, src The destination is an XMM register and the source is either an XMM register or a 128-bit memory location. Bits [255:128] of the corresponding YMM register are cleared. Instruction Support Form Subset VPHADDWQ XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic VPHADDWQ xmm1, xmm2/mem128 Encoding XOP RXB.map_select W.vvvv.L.pp Opcode 8F RXB.09 0.1111.0.00 C7 /r Related Instructions VPHADDBW, VPHADDBD, VPHADDBQ, VPHADDWD, VPHADDDQ rFLAGS Affected None MXCSR Flags Affected None 782 VPHADDWQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception Instruction Reference X X X X A X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.W = 1. XOP.vvvv ! = 1111b. XOP.L = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPHADDWQ 783 AMD64 Technology 26568—Rev. 3.22—May 2018 VPHSUBBW Packed Horizontal Subtract Signed Byte to Signed Word Subtracts the most significant signed integer byte from the least significant signed integer byte of each word element in the source and packs the sign-extended 16-bit integer differences into the destination. There are two operands: VPHSUBBW dest, src The destination is an XMM register and the source is either an XMM register or a 128-bit memory location. When the destination is written, bits [255:128] of the corresponding YMM register are cleared. Instruction Support Form Subset VPHSUBBW XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic VPHSUBBW xmm1, xmm2/mem128 Encoding XOP RXB.map_select W.vvvv.L.pp Opcode 8F RXB.09 0.1111.0.00 E1 /r Related Instructions VPHSUBWD, VPHSUBDQ rFLAGS Affected None MXCSR Flags Affected None 784 VPHSUBBW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception Instruction Reference X X X X A X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.W = 1. XOP.vvvv ! = 1111b. XOP.L = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPHSUBBW 785 AMD64 Technology 26568—Rev. 3.22—May 2018 VPHSUBDQ Packed Horizontal Subtract Signed Doubleword to Signed Quadword Subtracts the most significant signed integer doubleword from the least significant signed integer doubleword of each quadword in the source and packs the sign-extended 64-bit integer differences into the corresponding quadword element of the destination. There are two operands: VPHSUBDQ dest, src The destination is an XMM register and the source is either an XMM register or a 128-bit memory location. When the destination is written, bits [255:128] of the corresponding YMM register are cleared. Instruction Support Form Subset VPHSUBDQ XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic VPHSUBDQ xmm1, xmm2/mem128 Encoding XOP RXB.map_select W.vvvv.L.pp Opcode 8F RXB.09 0.1111.0.00 E3 /r Related Instructions VPHSUBBW, VPHSUBWD rFLAGS Affected None MXCSR Flags Affected None 786 VPHSUBDQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception Instruction Reference X X X X A X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.W = 1. XOP.vvvv ! = 1111b. XOP.L = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPHSUBDQ 787 AMD64 Technology 26568—Rev. 3.22—May 2018 VPHSUBWD Packed Horizontal Subtract Signed Word to Signed Doubleword Subtracts the most significant signed integer word from the least significant signed integer word of each doubleword of the source and packs the sign-extended 32-bit integer differences into the destination. There are two operands: VPHSUBWD dest, src The destination is an XMM register and the source is either an XMM register or a 128-bit memory location. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset VPHSUBWD XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic VPHSUBWD xmm1, xmm2/mem128 Encoding XOP RXB.map_select W.vvvv.L.pp Opcode 8F RXB.09 0.1111.0.00 E2 /r Related Instructions VPHSUBBW, VPHSUBDQ rFLAGS Affected None MXCSR Flags Affected None 788 VPHSUBWD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception Instruction Reference X X X X A X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.W = 1. XOP.vvvv ! = 1111b. XOP.L = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPHSUBWD 789 AMD64 Technology VPMACSDD 26568—Rev. 3.22—May 2018 Packed Multiply Accumulate Signed Doubleword to Signed Doubleword Multiplies each packed 32-bit signed integer value of the first source by the corresponding value of the second source, adds the corresponding value of the third source to the 64-bit signed integer product, and writes four 32-bit sums to the destination. No saturation is performed on the sum. When the result of the multiplication causes non-zero values to be set in the upper 32 bits of the 64-bit product, they are ignored. When the result of the add overflows, the carry is ignored (neither the overflow nor carry bit in rFLAGS is set). In both cases, only the signed low-order 32 bits of the result are written to the destination. There are four operands: VPMACSDD dest, src1, src2, src3 dest = src1* src2 + src3 The destination (dest) is an XMM register specified by ModRM.reg. When the destination is written, bits [255:128] of the corresponding YMM register are cleared. The first source (src1) is an XMM register specified by XOP.vvvv; the second source (src2) is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field; and the third source (src3) is an XMM register specified by bits [7:4] of an immediate byte operand. When the third source designates the same XMM register as the destination, the XMM register behaves as an accumulator. Instruction Support Form Subset VPMACSDD XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding XOP RXB.map_select VPMACSDD xmm1, xmm2, xmm3/mem128, xmm4 8F RXB.08 W.vvvv.L.pp Opcode 0.src1.0.00 9E /r ib Related Instructions VPMACSSWW, VPMACSWW, VPMACSSWD, VPMACSWD, VPMACSSDD, VPMACSSDQL, VPMACSSDQH, VPMACSDQL, VPMACSDQH, VPMADCSSWD, VPMADCSWD rFLAGS Affected None MXCSR Flags Affected None 790 VPMACSDD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception Instruction Reference X X X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.W = 1. XOP.L = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPMACSDD 791 AMD64 Technology 26568—Rev. 3.22—May 2018 VPMACSDQH Packed Multiply Accumulate Signed High Doubleword to Signed Quadword Multiplies the second 32-bit signed integer value of the first source by the corresponding value of the second source, then adds the low-order 64-bit signed integer value of the third source to the 64-bit signed integer product. Simultaneously, multiplies the fourth 32-bit signed integer value of the first source by the fourth 32-bit signed integer value of the second source, then adds the high-order 64-bit signed integer value of the third source to the 64-bit signed integer product. Writes two 64-bit sums to the destination. No saturation is performed on the sum. When the result of the add overflows, the carry is ignored (neither the overflow nor carry bit in rFLAGS is set). There are four operands: VPMACSDQH dest, src1, src2, src3 dest = src1* src2 + src3 The destination (dest) is an XMM register specified by ModRM.reg. When the destination is written, bits [255:128] of the corresponding YMM register are cleared. The first source (src1) is an XMM register specified by the XOP.vvvv field; the second source (src2) is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field; and the third source (src3) is an XMM register specified by bits [7:4] of an immediate byte operand. When the third source designates the same XMM register as the destination, the XMM register behaves as an accumulator. Instruction Support Form Subset VPMACSDQH XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding XOP RXB.map_select VPMACSDQH xmm1, xmm2, xmm3/mem128, xmm4 8F RXB.01000 W.vvvv.L.pp Opcode 0.src1.0.00 9F /r ib Related Instructions VPMACSSWW, VPMACSWW, VPMACSSWD, VPMACSWD, VPMACSSDD, VPMACSDD, VPMACSSDQL, VPMACSSDQH, VPMACSDQL, VPMADCSSWD, VPMADCSWD rFLAGS Affected None MXCSR Flags Affected None 792 VPMACSDQH Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception Instruction Reference X X X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.W = 1. XOP.L = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPMACSDQH 793 AMD64 Technology 26568—Rev. 3.22—May 2018 VPMACSDQL Packed Multiply Accumulate Signed Low Doubleword to Signed Quadword Multiplies the low-order 32-bit signed integer value of the first source by the corresponding value of the second source, then adds the low-order 64-bit signed integer value of the third source to the 64-bit signed integer product. Simultaneously, multiplies the third 32-bit signed integer value of the first source by the corresponding value of the second source, then adds the high-order 64-bit signed integer value of the third source to the 64-bit signed integer product. Writes two 64-bit sums to the destination register. No saturation is performed on the sum. When the result of the add overflows, the carry is ignored (neither the overflow nor carry bit in rFLAGS is set). Only the low-order 64 bits of each result are written to the destination. There are four operands: VPMACSDQL dest, src1, src2, src3 dest = src1* src2 + src3 The destination is a YMM register specified by ModRM.reg. When the destination is written, bits [255:128] of the corresponding YMM register are cleared. The first source (src1) is an XMM register specified by XOP.vvvv; the second source (src2) is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field; and the third source (src3) is an XMM register specified by bits [7:4] of an immediate byte operand. When src3 designates the same XMM register as the dest register, the XMM register behaves as an accumulator. Instruction Support Form Subset VPMACSDQL XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding XOP RXB.map_select VPMACSDQL xmm1, xmm2, xmm3/mem128, xmm4 8F RXB.08 W.vvvv.L.pp Opcode 0.src1.0.00 97 /r ib Related Instructions VPMACSSWW, VPMACSWW, VPMACSSWD, VPMACSWD, VPMACSSDD, VPMACSDD, VPMACSSDQL, VPMACSSDQH, VPMACSDQH, VPMADCSSWD, VPMADCSWD rFLAGS Affected None MXCSR Flags Affected None 794 VPMACSDQL Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception Instruction Reference X X X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.W = 1. XOP.L = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPMACSDQL 795 AMD64 Technology 26568—Rev. 3.22—May 2018 VPMACSSDD Packed Multiply Accumulate with Saturation Signed Doubleword to Signed Doubleword Multiplies each packed 32-bit signed integer value of the first source by the corresponding value of the second source, then adds the corresponding packed 32-bit signed integer value of the third source to each 64-bit signed integer product. Writes four saturated 32-bit sums to the destination. Out of range results of the addition are saturated to fit into a signed 32-bit integer. For each packed value of the destination, when the value is larger than the largest signed 32-bit integer, it is saturated to 7FFF_FFFFh, and when the value is smaller than the smallest signed 32-bit integer, it is saturated to 8000_0000h. There are four operands: VPMACSSDD dest, src1, src2, src3 dest = src1* src2 + src3 The destination (dest) is an XMM register specified by ModRM.reg. When the destination is written, bits [255:128] of the corresponding YMM register are cleared. The first source (src1) is an XMM register specified by XOP.vvvv; the second source (src2) is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field; and the third source (src3) is an XMM register specified by bits [7:4] of an immediate byte operand. When src3 designates the same XMM register as the dest register, the XMM register behaves as an accumulator. Instruction Support Form Subset VPMACSSDD XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding XOP RXB.map_select VPMACSSDD xmm1, xmm2, xmm3/mem128, xmm4 8F RXB.08 W.vvvv.L.pp Opcode X.src1.0.00 8E /r ib Related Instructions VPMACSSWW, VPMACSWW, VPMACSSWD, VPMACSWD, VPMACSDD, VPMACSSDQL, VPMACSSDQH, VPMACSDQL, VPMACSDQH, VPMADCSSWD, VPMADCSWD rFLAGS Affected None MXCSR Flags Affected None 796 VPMACSSDD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception Instruction Reference X X X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.W = 1. XOP.L = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPMACSSDD 797 AMD64 Technology VPMACSSDQH 26568—Rev. 3.22—May 2018 Packed Multiply Accumulate with Saturation Signed High Doubleword to Signed Quadword Multiplies the second 32-bit signed integer value of the first source by the corresponding value of the second source, then adds the low-order 64-bit signed integer value of the third source to the 64-bit signed integer product. Simultaneously, multiplies the fourth 32-bit signed integer value of the first source by the corresponding value of the second source, then adds the high-order 64-bit signed integer value of the third source to the 64-bit signed integer product. Writes two saturated sums to the destination. Out of range results of the addition are saturated to fit into a signed 64-bit integer. For each packed value of the destination, when the value is larger than the largest signed 64-bit integer, it is saturated to 7FFF_FFFF_FFFF_FFFFh, and when the value is smaller than the smallest signed 64-bit integer, it is saturated to 8000_0000_0000_0000h. There are four operands: VPMACSSDQH dest, src1, src2, src3 dest = src1* src2 + src3 The destination (dest) is an XMM register specified by ModRM.reg. When the destination XMM register is written, bits [255:128] of the corresponding YMM register are cleared. The first source (src1) is an XMM register specified by XOP.vvvv; the second source (src2) is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field; and the third source (src3) is an XMM register specified by bits [7:4] of an immediate byte operand. When src3 designates the same XMM register as the dest register, the XMM register behaves as an accumulator. Instruction Support Form Subset VPMACSSDQH XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding XOP RXB.map_select W.vvvv.L.pp VPMACSSDQH xmm1, xmm2, xmm3/mem128, xmm4 8F RXB.08 0.src1.0.00 Opcode 8F /r ib Related Instructions VPMACSSWW, VPMACSWW, VPMACSSWD, VPMACSWD, VPMACSSDD, VPMACSDD, VPMACSSDQL, VPMACSDQL, VPMACSDQH, VPMADCSSWD, VPMADCSWD rFLAGS Affected None MXCSR Flags Affected None 798 VPMACSSDQH Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception Instruction Reference X X X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.W = 1. XOP.L = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPMACSSDQH 799 AMD64 Technology VPMACSSDQL 26568—Rev. 3.22—May 2018 Packed Multiply Accumulate with Saturation Signed Low Doubleword to Signed Quadword Multiplies the low-order 32-bit signed integer value of the first source by the corresponding value of the second source, then adds the low-order 64-bit signed integer value of the third source to the 64-bit signed integer product. Simultaneously, multiplies the third 32-bit signed integer value of the first source by the third 32-bit signed integer value of the second source, then adds the high-order 64-bit signed integer value of the third source to the 64-bit signed integer product. Writes two saturated sums to the destination. Out of range results of the addition are saturated to fit into a signed 64-bit integer. For each packed value of the destination, when the value is larger than the largest signed 64-bit integer, it is saturated to 7FFF_FFFF_FFFF_FFFFh, and when the value is smaller than the smallest signed 64-bit integer, it is saturated to 8000_0000_0000_0000h. There are four operands: VPMACSSDQL dest, src1, src2, src3 dest = src1* src2 + src3 The destination (dest) register is an XMM register specified by ModRM.reg. When the destination is written, bits [255:128] of the corresponding YMM register are cleared. The first source (src1) is an XMM register specified by XOP.vvvv; the second source (src2) is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field; and the third source (src3) is an XMM register specified by bits [7:4] of an immediate byte operand. When src3 designates the same XMM register as the dest register, the XMM register behaves as an accumulator. Instruction Support Form Subset VPMACSSDQL XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding XOP RXB.map_select VPMACSSDQL xmm1, xmm2, xmm3/mem128, xmm4 8F RXB.08 W.vvvv.L.pp Opcode 0.src1.0.00 87 /r ib Related Instructions VPMACSSWW, VPMACSWW, VPMACSSWD, VPMACSWD, VPMACSSDD, VPMACSDD, VPMACSSDQH, VPMACSDQL, VPMACSDQH, VPMADCSSWD, VPMADCSWD rFLAGS Affected None MXCSR Flags Affected None 800 VPMACSSDQL Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception Instruction Reference X X X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.W = 1. XOP.L = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPMACSSDQL 801 AMD64 Technology VPMACSSWD 26568—Rev. 3.22—May 2018 Packed Multiply Accumulate with Saturation Signed Word to Signed Doubleword Multiplies the odd-numbered packed 16-bit signed integer values of the first source by the corresponding values of the second source, then adds the corresponding packed 32-bit signed integer values of the third source to the 32-bit signed integer products. Writes four saturated sums to the destination. Out of range results of the addition are saturated to fit into a signed 32-bit integer. For each packed value of the destination, when the value is larger than the largest signed 32-bit integer, it is saturated to 7FFF_FFFFh, and when the value is smaller than the smallest signed 32-bit integer, it is saturated to 8000_0000h. There are four operands: VPMACSSWD dest, src1, src2, src3 dest = src1* src2 + src3 The destination (dest) is an XMM register specified by ModRM.reg. When the destination XMM register is written, bits [255:128] of the corresponding YMM register are cleared. The first source (src1) is an XMM register specified by the XOP.vvvv field; the second source (src2) is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field; and the third source (src3) is an XMM register specified by bits [7:4] of an immediate byte operand. When src3 designates the same XMM register as the dest register, the XMM register behaves as an accumulator. Instruction Support Form Subset VPMACSSWD XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding XOP RXB.map_select W.vvvv.L.pp Opcode VPMACSSWD xmm1, xmm2, xmm3/mem128, xmm4 8F RXB.08 0.src1.0.00 86 /r ib Related Instructions VPMACSSWW, VPMACSWW, VPMACSWD, VPMACSSDD, VPMACSDD, VPMACSSDQL, VPMACSSDQH, VPMACSDQL, VPMACSDQH, VPMADCSSWD, VPMADCSWD rFLAGS Affected None MXCSR Flags Affected None 802 VPMACSSWD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception Instruction Reference X X X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.W = 1. XOP.L = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPMACSSWD 803 AMD64 Technology VPMACSSWW 26568—Rev. 3.22—May 2018 Packed Multiply Accumulate with Saturation Signed Word to Signed Word Multiplies each packed 16-bit signed integer value of the first source by the corresponding packed 16bit signed integer value of the second source, then adds the corresponding packed 16-bit signed integer value of the third source to the 32-bit signed integer products. Writes eight saturated sums to the destination. Out of range results of the addition are saturated to fit into a signed 16-bit integer. For each packed value of the destination, when the value is larger than the largest signed 16-bit integer, it is saturated to 7FFFh, and when the value is smaller than the smallest signed 16-bit integer, it is saturated to 8000h. There are four operands: VPMACSSWW dest, src1, src2, src3 dest = src1* src2 + src3 The destination is an XMM register specified by ModRM.reg. When the destination is written, bits [255:128] of the corresponding YMM register are cleared. The first source (src1) is an XMM register specified by XOP.vvvv; the second source (src2) is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field; and the third source (src3) is an XMM register specified by bits [7:4] of an immediate byte. When src3 and dest designate the same XMM register, this register behaves as an accumulator. Instruction Support Form Subset VPMACSSWW XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding XOP RXB.map_select W.vvvv.L.pp VPMACSSWW xmm1, xmm2, xmm3/mem128, xmm4 8F RXB.08 X.src1.0.00 Opcode 85 /r ib Related Instructions VPMACSWW, VPMACSSWD, VPMACSWD, VPMACSSDD, VPMACSDD, VPMACSSDQL, VPMACSSDQH, VPMACSDQL,VPMACSDQH, VPMADCSSWD, VPMADCSWD rFLAGS Affected None MXCSR Flags Affected None 804 VPMACSSWW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception Instruction Reference X X X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.W = 1. XOP.L = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPMACSSWW 805 AMD64 Technology VPMACSWD 26568—Rev. 3.22—May 2018 Packed Multiply Accumulate Signed Word to Signed Doubleword Multiplies each odd-numbered packed 16-bit signed integer value of the first source by the corresponding value of the second source, then adds the corresponding packed 32-bit signed integer value of the third source to the 32-bit signed integer products. Writes four 32-bit results to the destination. When the result of the add overflows, the carry is ignored (neither the overflow nor carry bit in rFLAGS is set). Only the low-order 32 bits of the result are written to the destination. There are four operands: VPMACSWD dest, src1, src2, src3 dest = src1* src2 + src3 The destination (dest) register is an XMM register specified by ModRM.reg. When the destination XMM register is written, bits [255:128] of the corresponding YMM register are cleared. The first source (src1) is an XMM register specified by XOP.vvvv; the second source (src2) is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field; and the third source (src3) is an XMM register specified by bits [7:4] of an immediate byte operand. When src3 designates the same XMM register as the dest register, the XMM register behaves as an accumulator. Instruction Support Form Subset VPMACSWD XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding VPMACSWD xmm1, xmm2, xmm3/mem128, xmm4 XOP RXB.map_select W.vvvv.L.pp Opcode 8F RXB.08 0.src1.0.00 96 /r ib Related Instructions VPMACSSWW, VPMACSWW, VPMACSSWD, VPMACSSDD, VPMACSDO, VPMACSSDQL, VPMACSSDQH, VPMACSDQL, VPMACSDQH, VPMADCSSWD, VPMADCSWD rFLAGS Affected None MXCSR Flags Affected None 806 VPMACSWD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception Instruction Reference X X X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.W = 1. XOP.L = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPMACSWD 807 AMD64 Technology 26568—Rev. 3.22—May 2018 VPMACSWW Packed Multiply Accumulate Signed Word to Signed Word Multiplies each packed 16-bit signed integer value of the first source by the corresponding value of the second source, then adds the corresponding packed 16-bit signed integer value of the third source to each 32-bit signed integer product. Writes eight 16-bit results to the destination. No saturation is performed on the sum. When the result of the multiplication causes non-zero values to be set in the upper 16 bits of the 32 bit result, they are ignored. When the result of the add overflows, the carry is ignored (neither the overflow nor carry bit in rFLAGS is set). In both cases, only the signed low-order 16 bits of the result are written to the destination. There are four operands: VPMACSWW dest, src1, src2, src3 dest = src1* src2 + src3 The destination (dest) is an XMM register specified by ModRM.reg. When the destination XMM register is written, bits [255:128] of the corresponding YMM register are cleared. The first source (src1) is an XMM register specified by XOP.vvvv; the second source (src2) is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field; and the third source (src3) is an XMM register specified by bits [7:4] of an immediate byte operand. When src3 designates the same XMM register as the dest register, the XMM register behaves as an accumulator. Instruction Support Form Subset VPMACSWW XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding XOP RXB.map_select W.vvvv.L.pp Opcode VPMACSWW xmm1, xmm2, xmm3/mem128, xmm4 8F RXB.08 0.src1.0.00 95 /r ib Related Instructions VPMACSSWW, VPMACSSWD, VPMACSWD, VPMACSSDD, VPMACSDD, VPMACSSDQL, VPMACSSDQH, VPMACSDQL, VPMACSDQH, VPMADCSSWD, VPMADCSWD rFLAGS Affected None MXCSR Flags Affected None 808 VPMACSWW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception Instruction Reference X X X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.W = 1. XOP.L = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPMACSWW 809 AMD64 Technology VPMADCSSWD 26568—Rev. 3.22—May 2018 Packed Multiply Add Accumulate with Saturation Signed Word to Signed Doubleword Multiplies each packed 16-bit signed integer value of the first source by the corresponding value of the second source, then adds the 32-bit signed integer products of the even-odd adjacent words. Each resulting sum is then added to the corresponding packed 32-bit signed integer value of the third source. Writes four 32-bit signed-integer results to the destination. Out of range results of the addition are saturated to fit into a signed 32-bit integer. For each packed value of the destination, when the value is larger than the largest signed 32-bit integer, it is saturated to 7FFF_FFFFh, and when the value is smaller than the smallest signed 32-bit integer, it is saturated to 8000_0000h. There are four operands: VPMADCSSWD dest, src1, src2, src3 dest = src1* src2 + src3 The destination is an XMM register specified by ModRM.reg. When the destination is written, bits [255:128] of the corresponding YMM register are cleared. The first source is an XMM register specified by XOP.vvvv; the second source is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field; and the third source is an XMM register specified by bits [7:4] of an immediate byte operand. When src3 designates the same XMM register as the dest register, the XMM register behaves as an accumulator. Instruction Support Form Subset VPMADCSSWD XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding XOP RXB.map_select W.vvvv.L.pp VPMADCSSWD xmm1, xmm2, xmm3/mem128, xmm4 8F RXB.08 0.src1.0.00 Opcode A6 /r ib Related Instructions VPMACSSWW, VPMACSWW, VPMACSSWD, VPMACSWD, VPMACSSDD, VPMACSDD, VPMACSSDQL, VPMACSSDQH, VPMACSDQL, VPMACSDQH, VPMADCSWD rFLAGS Affected None MXCSR Flags Affected None 810 VPMADCSSWD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception Instruction Reference X X X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.W = 1. XOP.L = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPMADCSSWD 811 AMD64 Technology VPMADCSWD 26568—Rev. 3.22—May 2018 Packed Multiply Add Accumulate Signed Word to Signed Doubleword Multiplies each packed 16-bit signed integer value of the first source by the corresponding value of the second source, then adds the 32-bit signed integer products of the even-odd adjacent words together and adds the sums to the corresponding packed 32-bit signed integer values of the third source. Writes four 32-bit sums to the destination. No saturation is performed on the sum. When the result of the addition overflows, the carry is ignored (neither the overflow nor carry bit in rFLAGS is set). Only the signed 32-bits of the result are written to the destination. There are four operands: VPMADCSWD dest, src1, src2, src3 dest = src1* src2 + src3 The destination is an XMM register specified by ModRM.reg. When the destination is written, bits [255:128] of the corresponding YMM register are cleared. The first source is an XMM register specified by XOP.vvvv, the second source is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field; and the third source is an XMM register specified by bits [7:4] of an immediate byte operand. When src3 designates the same XMM register as the dest register, the XMM register behaves as an accumulator. Instruction Support Form Subset PMADCSWD XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding XOP RXB.map_select PMADCSWD xmm1, xmm2, xmm3/mem128, xmm4 8F RXB.08 W.vvvv.L.pp Opcode 0.src1.0.00 B6 /r ib Related Instructions VPMACSSWW, VPMACSWW, VPMACSSWD, VPMACSWD, VPMACSSDD, VPMACSDD, VPMACSSDQL, VPMACSSDQH, VPMACSDQL, VPMACSDQH, VPMADCSSWD rFLAGS Affected None MXCSR Flags Affected None 812 VPMADCSWD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception Instruction Reference X X X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.W = 1. XOP.L = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPMADCSWD 813 AMD64 Technology 26568—Rev. 3.22—May 2018 VPMASKMOVD Masked Move Packed Doubleword Moves packed doublewords from a second source operand to a destination, as specified by mask bits in a first source operand. There are load and store versions of the instruction. The mask bits are the most-significant bit of each doubleword in the first source operand (mask). • For loads, when a mask bit = 1, the corresponding doubleword is copied from the source to the same element of the destination; when a mask bit = 0, the corresponding element of the destination is cleared. • For stores, when a mask bit = 1, the corresponding doubleword is copied from the source to the same element of the destination; when a mask bit = 0, the corresponding element of the destination is not affected. Exception and trap behavior for elements not selected for loading or storing from/to memory is implementation dependent. For instance, a given implementation may signal a data breakpoint or a page fault for doublewords that are zero-masked and not actually written. This instruction provides no non-temporal access hint. This instruction has both 128-bit and 256-bit forms: XMM Encoding There are load and store encodings. • For loads, the four doublewords that make up the source operand are located in a 128-bit memory location, the mask operand is an XMM register, and the destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. • For stores, the four doublewords that make up the source operand are located in an XMM register, the mask operand is an XMM register, and the destination is a 128-bit memory location. YMM Encoding There are load and store encodings. • For loads, the eight doublewords that make up the source operand are located in a 256-bit memory location, the mask operand is a YMM register, and the destination is a YMM register. • For stores, the eight doublewords that make up the source operand are located in a YMM register, the mask operand is a YMM register, and the destination is a 256-bit memory location. Instruction Support Form Subset VPMASKMOVD AVX2 Feature Flag Fn0000_00007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 814 VPMASKMOVD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPMASKMOVD xmm1, xmm2, mem128 C4 RXB.02 0.src1.0.01 8C /r VPMASKMOVD ymm1, ymm2, mem256 C4 RXB.02 0.src1.1.01 8C /r VPMASKMOVD mem128, xmm1, xmm2 C4 RXB.02 0.src1.0.01 8E /r VPMASKMOVD mem256, ymm1, ymm2 C4 RXB.02 0.src1.1.01 8E /r Loads: Stores: Related Instructions VPMASKMOVQ rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP A A A A A A A A A A A Alignment check, #AC A Page fault, #PF A — AVX2 exception A Instruction Reference Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. VPMASKMOVD 815 AMD64 Technology 26568—Rev. 3.22—May 2018 VPMASKMOVQ Masked Move Packed Quadword Moves packed quadwords from a second source operand to a destination, as specified by mask bits in a first source operand. There are load and store versions of the instruction. The mask bits are the most-significant bit of each quadword in the mask first source operand (mask). • For loads, when a mask bit = 1, the corresponding quadword is copied from the source to the same element of the destination; when a mask bit = 0, the corresponding element of the destination is cleared. • For stores, when a mask bit = 1, the corresponding quadword is copied from the source to the same element of the destination; when a mask bit = 0, the corresponding element of the destination is not affected. Exception and trap behavior for elements not selected for loading or storing from/to memory is implementation dependent. For instance, a given implementation may signal a data breakpoint or a page fault for quadwords that are zero-masked and not actually written. This instruction provides no non-temporal access hint. This instruction has both 128-bit and 256-bit forms: XMM Encoding There are load and store encodings. • For loads, the two quadwords that make up the source operand are located in a 128-bit memory location, the mask operand is an XMM register, and the destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. • For stores, the two quadwords that make up the source operand are located in an XMM register, the mask operand is an XMM register, and the destination is a 128-bit memory location. YMM Encoding There are load and store encodings. • For loads, the four quadwords that make up the source operand are located in a 256-bit memory location, the mask operand is a YMM register, and the destination is a YMM register. • For stores, the four quadwords that make up the source operand are located in a YMM register, the mask operand is a YMM register, and the destination is a 256-bit memory location. Instruction Support Form Subset VPMASKMOVQ AVX2 Feature Flag Fn0000_00007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. 816 VPMASKMOVQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPMASKMOVQ xmm1, xmm2, mem128 C4 RXB.02 1.src1.0.01 8C /r VPMASKMOVQ ymm1, ymm2, mem256 C4 RXB.02 1.src1.1.01 8C /r VPMASKMOVQ mem128, xmm1, xmm2 C4 RXB.02 1.src1.0.01 8E /r VPMASKMOVQ mem256, ymm1, ymm2 C4 RXB.02 1.src1.1.01 8E /r Loads: Stores: Related Instructions VPMASKMOVD rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP A A A A A A A A A A A Alignment check, #AC A Page fault, #PF A — AVX2 exception A Instruction Reference Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. VPMASKMOVQ 817 AMD64 Technology 26568—Rev. 3.22—May 2018 VPPERM Packed Permute Bytes Selects 16 of 32 packed bytes from two concatenated sources, applies a logical transformation to each selected byte, then writes the byte to a specified position in the destination. There are four operands: VPPERM dest, src1, src2, src3 The second (src2) and first (src1) sources are concatenated to form the 32-byte source. The src1 operand is an XMM register specified by XOP.vvvv. The third source (src3) contains 16 control bytes. Each control byte specifies the source byte and the logical operation to perform on that byte. The order of the bytes in the destination is the same as that of the control bytes in the src3. For each byte of the 16-byte result, the corresponding src3 byte is used as follows: • Bits [7:5] select a logical operation to perform on the selected byte. Bit Value • Selected Operation 000 Source byte (no logical operation) 001 Invert source byte 010 Bit reverse of source byte 011 Bit reverse of inverted source byte 100 00h (zero-fill) 101 FFh (ones-fill) 110 Most significant bit of source byte replicated in all bit positions. 111 Invert most significant bit of source byte and replicate in all bit positions. Bits [4:0] select a source byte to move from src2:src1. Bit Value Source Byte Bit Value Source Byte Bit Value Source Byte Bit Value Source Byte 00000 src1[7:0] 01000 src1[71:64] 10000 src2[7:0] 11000 src2[71:64] 00001 src1[15:8] 01001 src1[79:72] 10001 src2[15:8] 11001 src2[79:72] 00010 src1[23:16] 01010 src1[87:80] 10010 src2[23:16] 11010 src2[87:80] 00011 src1[31:24] 01011 src1[95:88] 10011 src2[31:24] 11011 src2[95:88] 00100 src1[39:32] 01100 src1[103:96] 10100 src2[39:32] 11100 src2[103:96] 00101 src1[47:40] 01101 src1[111:104] 10101 src2[47:40] 11101 src2[111:104] 00110 src1[55:48] 01110 src1[119:112] 10110 src2[55:48] 11110 src2[119:112] 00111 src1[63:56] 01111 src1[127:120] 10111 src2[63:56] 11111 src2[127:120] XOP.W and an immediate byte (imm8) determine register configuration. • When XOP.W = 0, src2 is either an XMM register or a 128-bit memory location specified by ModRM.r/m and src3 is an XMM register specified by imm8[7:4]. 818 VPPERM Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology • When XOP.W = 1, src2 is an XMM register specified by imm8[7:4] and src3 is either an XMM register or a 128-bit memory location specified by ModRM.r/m. The destination (dest) is an XMM register specified by ModRM.reg. When the result is written to the dest XMM register, bits [255:128] of the corresponding YMM register are cleared. Instruction Support Form Subset VPPERM XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding XOP RXB.map_select W.vvvv.L.pp Opcode VPPERM xmm1, xmm2, xmm3/mem128, xmm4 8F RXB.08 0.src1.0.00 A3 /r ib VPPERM xmm1, xmm2, xmm3, xmm4/mem128 8F RXB.08 1.src1.0.00 A3 /r ib Related Instructions VPSHUFHW, VPSHUFD, VPSHUFLW, VPSHUFW, VPERMIL2PS, VPERMIL2PD rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception Instruction Reference X X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.L = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPPERM 819 AMD64 Technology 26568—Rev. 3.22—May 2018 VPROTB Packed Rotate Bytes Rotates each byte of the source as specified by a count operand and writes the result to the corresponding byte of the destination. There are two versions of the instruction, one for each source of the count byte: • VPROTB dest, src, fixed-count • VPROTB dest, src, variable-count For both versions of the instruction, the destination (dest) operand is an XMM register specified by ModRM.reg. The fixed-count version of the instruction rotates each byte of the source (src) the number of bits specified by the immediate fixed-count byte. All bytes are rotated the same amount. The source XMM register or memory location is selected by the ModRM.r/m field. The variable-count version of the instruction rotates each byte of the source the amount specified in the corresponding byte element of the variable-count. Both src and variable-count are configured by XOP.W. • When XOP.W = 0, variable-count is an XMM register specified by XOP.vvvv and src is either an XMM register or a 128-bit memory location specified by ModRM.r/m. • When XOP.W = 1, variable-count is either an XMM register or a 128-bit memory location specified by ModRM.r/m and src is an XMM register specified by XOP.vvvv. When the count value is positive, bits are rotated to the left (toward the more significant bit positions). The bits rotated out left of the most significant bit are rotated back in at the right end (least-significant bit) of the byte. When the count value is negative, bits are rotated to the right (toward the least significant bit positions). The bits rotated to the right out of the least significant bit are rotated back in at the left end (most-significant bit) of the byte. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset VPROTB XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding XOP RXB.map_select W.vvvv.L.pp Opcode VPROTB xmm1, xmm2/mem128, xmm3 8F RXB.09 0.count.0.00 90 /r VPROTB xmm1, xmm2, xmm3/mem128 8F RXB.09 1.src.0.00 90 /r VPROTB xmm1, xmm2/mem128, imm8 8F RXB.08 0.1111.0.00 C0 /r ib 820 VPROTB Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Related Instructions VPROTW, VPROTD, VPROTQ,VPSHLB, VPSHLW, VPSHLD, VPSHLQ, VPSHAB, VPSHAW, VPSHAD, VPSHAQ rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception Instruction Reference X X X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.vvvv ! = 1111b (for immediate operand variant only) XOP.L field = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPROTB 821 AMD64 Technology 26568—Rev. 3.22—May 2018 VPROTD Packed Rotate Doublewords Rotates each doubleword of the source as specified by a count operand and writes the result to the corresponding doubleword of the destination. There are two versions of the instruction, one for each source of the count byte: • VPROTD dest, src, fixed-count • VPROTD dest, src, variable-count For both versions of the instruction, the dest operand is an XMM register specified by ModRM.reg. The fixed count version of the instruction rotates each doubleword of the source operand the number of bits specified by the immediate fixed-count byte operand. All doublewords are rotated the same amount. The src XMM register or memory location is selected by the ModRM.r/m field. The variable count version of the instruction rotates each doubleword of the source by the amount specified in the low order byte of the corresponding doubleword of the variable-count operand vector. Both src and variable-count are configured by XOP.W. • When XOP.W = 0, src is either an XMM register or a128-bit memory location specified by the ModRM.r/m field and variable-count is an XMM register specified by XOP.vvvv. • When XOP.W = 1, src is an XMM register specified by XOP.vvvv and variable-count is either an XMM register or a 128-bit memory location specified by the ModRM.r/m field. When the count value is positive, bits are rotated to the left (toward the more significant bit positions). The bits rotated out to the left of the most significant bit of each source doubleword operand are rotated back in at the right end (least-significant bit) of the doubleword. When the count value is negative, bits are rotated to the right (toward the least significant bit positions). The bits rotated to the right out of the least significant bit of each source doubleword operand are rotated back in at the left end (most-significant bit) of the doubleword. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset VPROTD XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding XOP RXB.map_select W.vvvv.L.pp Opcode VPROTD xmm1, xmm2/mem128, xmm3 8F RXB.09 0.count.0.00 92 /r VPROTD xmm1, xmm2, xmm3/mem128 8F RXB.09 1.src.0.00 92 /r VPROTD xmm1, xmm2/mem128, imm8 8F RXB.08 0.1111.0.00 C2 /r ib 822 VPROTD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Related Instructions VPROTB, VPROTW, VPROTQ, VPSHLB, VPSHLW, VPSHLD, VPSHLQ, VPSHAB, VPSHAW, VPSHAD, VPSHAQ rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception Instruction Reference X X X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.vvvv ! = 1111b (for immediate operand variant only) XOP.L field = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPROTD 823 AMD64 Technology 26568—Rev. 3.22—May 2018 VPROTQ Packed Rotate Quadwords Rotates each quadword of the source operand as specified by a count operand and writes the result to the corresponding quadword of the destination. There are two versions of the instruction, one for each source of the count byte: • VPROTQ dest, src, fixed-count • VPROTQ dest, src, variable-count For both versions of the instruction, the dest operand is an XMM register specified by ModRM.reg. The fixed count version of the instruction rotates each quadword in the source the number of bits specified by the immediate fixed-count byte operand. All quadword elements of the source are rotated the same amount. The src XMM register or memory location is selected by the ModRM.r/m field. The variable count version of the instruction rotates each quadword of the source the amount specified ny the low order byte of the corresponding quadword of the variable-count operand. Both src and variable-count are configured by XOP.W. • When XOP.W = 0, src is either an XMM register or a 128-bit memory location specified by ModRM.r/m and variable-count is an XMM register specified by XOP.vvvv. • When XOP.W = 1, src is an XMM register specified by XOP.vvvv and variable-count is either an XMM register or a128-bit memory location specified by ModRM.r/m. When the count value is positive, bits are rotated to the left (toward the more significant bit positions) of the operand element. The bits rotated out to the left of the most significant bit of the word element are rotated back in at the right end (least-significant bit). When the count value is negative, operand element bits are rotated to the right (toward the least significant bit positions). The bits rotated to the right out of the least significant bit are rotated back in at the left end (most-significant bit) of the word element. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset VPROTQ XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding XOP RXB.map_select W.vvvv.L.pp Opcode VPROTQ xmm1, xmm2/mem128, xmm3 8F RXB.09 0.count.0.00 93 /r VPROTQ xmm1, xmm2, xmm3/mem128 8F RXB.09 1.src.0.00 93 /r VPROTQ xmm1, xmm2/mem128, imm8 8F RXB.08 0.1111.0.00 C3 /r ib 824 VPROTQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Related Instructions VPROTB, VPROTW, VPROTD, VPSHLB, VPSHLW, VPSHLD, VPSHLQ, VPSHAB, VPSHAW, VPSHAD, VPSHAQ rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception Instruction Reference X X X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.vvvv ! = 1111b (for immediate operand variant only) XOP.L field = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPROTQ 825 AMD64 Technology 26568—Rev. 3.22—May 2018 VPROTW Packed Rotate Words Rotates each word of the source as specified by a count operand and writes the result to the corresponding word of the destination. There are two versions of the instruction, one for each source of the count byte: • VPROTW dest, src, fixed-count • VPROTW dest, src, variable-count For both versions of the instruction, the dest operand is an XMM register specified by ModRM.reg. The fixed count version of the instruction rotates each word of the source the number of bits specified by the immediate fixed-count byte operand. All words of the source operand are rotated the same amount. The src XMM register or memory location is selected by the ModRM.r/m field. The variable count version of this instruction rotates each word of the source operand by the amount specified in the low order byte of the corresponding word of the variable-count operand. Both src and variable-count are configured by XOP.W. • When XOP.W = 0, src is either an XMM register or a 128-bit memory location specified by ModRM.r/m and variable-count is an XMM register specified by XOP.vvvv. • When XOP.W = 1, src is an XMM register specified by XOP.vvvv and variable-count is either an XMM register or a 128-bit memory location specified by ModRM.r/m. When the count value is positive, bits are rotated to the left (toward the more significant bit positions). The bits rotated out to the left of the most significant bit of an element are rotated back in at the right end (least-significant bit) of the word element. When the count value is negative, bits are rotated to the right (toward the least significant bit positions) of the element. The bits rotated to the right out of the least significant bit of an element are rotated back in at the left end (most-significant bit) of the word element. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset VPROTW XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding XOP RXB.map_select W.vvvv.L.pp Opcode VPROTW xmm1, xmm2/mem128, xmm3 8F RXB.09 0.count.0.00 91 /r VPROTW xmm1, xmm2, xmm3/mem128 8F RXB.09 1.src.0.00 91 /r VPROTW xmm1, xmm2/mem128, imm8 8F RXB.08 0.1111.0.00 C1 /r ib 826 VPROTW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Related Instructions VPROTB, VPROTD, VPROTQ, VPSHLB, VPSHLW, VPSHLD, VPSHLQ, VPSHAB, VPSHAW, VPSHAD, VPSHAQ rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception Instruction Reference X X X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.vvvv ! = 1111b (for immediate operand variant only) XOP.L field = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPROTW 827 AMD64 Technology 26568—Rev. 3.22—May 2018 VPSHAB Packed Shift Arithmetic Bytes Shifts each signed byte of the source as specified by a count byte and writes the result to the corresponding byte of the destination. The count bytes are 8-bit signed two's-complement values in the corresponding bytes of the count operand. When the count value is positive, bits are shifted to the left (toward the more significant bit positions). Zeros are shifted in at the right end (least-significant bit) of the byte. When the count value is negative, bits are shifted to the right (toward the least significant bit positions). The most significant bit (sign bit) is replicated and shifted in at the left end (most-significant bit) of the byte. There are three operands: VPSHAB dest, src, count The destination (dest) is an XMM register specified by ModRM.reg. Both src and count are configured by XOP.W. • When XOP.W = 0, count is an XMM register specified by XOP.vvvv and src is either an XMM register or a128-bit memory location specified by ModRM.r/m. • When XOP.W = 1, count is either an XMM register or a 128-bit memory location specified by ModRM.r/m and src is an XMM register specified by XOP.vvvv. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset VPSHAB XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding XOP RXB.map_select W.vvvv.L.pp Opcode VPSHAB xmm1, xmm2/mem128, xmm3 8F RXB.09 0.count.0.00 98 /r VPSHAB xmm1, xmm2, xmm3/mem128 8F RXB.09 1.src.0.00 98 /r Related Instructions VPROTB, VPROTW, VPROTD, VPROTQ, VPSHLB, VPSHLW, VPSHLD, VPSHLQ, VPSHAW, VPSHAD, VPSHAQ rFLAGS Affected None MXCSR Flags Affected None 828 VPSHAB Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception Instruction Reference X X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.L = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPSHAB 829 AMD64 Technology 26568—Rev. 3.22—May 2018 VPSHAD Packed Shift Arithmetic Doublewords Shifts each signed doubleword of the source operand as specified by a count byte and writes the result to the corresponding doubleword of the destination. The count bytes are 8-bit signed two's-complement values located in the low-order byte of the corresponding doubleword of the count operand. When the count value is positive, bits are shifted to the left (toward the more significant bit positions). Zeros are shifted in at the right end (least-significant bit) of the doubleword. When the count value is negative, bits are shifted to the right (toward the least significant bit positions). The most significant bit (sign bit) is replicated and shifted in at the left end (most-significant bit) of the doubleword. There are three operands: VPSHAD dest, src, count The destination (dest) is an XMM register specified by ModRM.reg. Both src and count are configured by XOP.W. • When XOP.W = 0, count is an XMM register specified by XOP.vvvv and src is either an XMM register or a memory location specified by ModRM.r/m. • When XOP.W = 1, count is either an XMM register or a memory location specified by ModRM.r/m and src is an XMM register specified by XOP.vvvv. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset VPSHAD XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding XOP RXB.map_select W.vvvv.L.pp Opcode VPSHAD xmm1, xmm2/mem128, xmm3 8F RXB.09 0.count.0.00 9A /r VPSHAD xmm1, xmm2, xmm3/mem128 8F RXB.09 1.src.0.00 9A /r Related Instructions VPROTB, VPROTW, VPROTD, VPROTQ, VPSHLB, VPSHLW, VPSHLD, VPSHLQ, VPSHAB, VPSHAW, VPSHAQ rFLAGS Affected None MXCSR Flags Affected None 830 VPSHAD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception Instruction Reference X X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.L = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPSHAD 831 AMD64 Technology 26568—Rev. 3.22—May 2018 VPSHAQ Packed Shift Arithmetic Quadwords Shifts each signed quadword of the source as specified by a count byte and writes the result to the corresponding quadword of the destination. The count bytes are 8-bit signed two's-complement values located in the low-order byte of the corresponding quadword element of the count operand. When the count value is positive, bits are shifted to the left (toward the more significant bit positions). Zeros are shifted in at the right end (least-significant bit) of the quadword. When the count value is negative, bits are shifted to the right (toward the least significant bit positions). The most significant bit is replicated and shifted in at the left end (most-significant bit) of the quadword. The shift amount is stored in two’s-complement form. The count is modulo 64. There are three operands: VPSHAQ dest, src, count The destination (dest) is an XMM register specified by ModRM.reg. Both src and count are configured by XOP.W. • When XOP.W = 0, count is an XMM register specified by XOP.vvvv and src is either an XMM register or a memory location specified by ModRM.r/m. • When XOP.W = 1, count is either an XMM register or a memory location specified by ModRM.r/m and src is an XMM register specified by XOP.vvvv. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset VPSHAQ XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding XOP RXB.map_select W.vvvv.L.pp Opcode VPSHAQ xmm1, xmm2/mem128, xmm3 8F RXB.09 0.count.0.00 9B /r VPSHAQ xmm1, xmm2, xmm3/mem128 8F RXB.09 1.src.0.00 9B /r Related Instructions VPROTB, VPROTW, VPROTD, VPROTQ, VPSHLB, VPSHLW, VPSHLD, VPSHLQ, VPSHAB, VPSHAW, VPSHAD 832 VPSHAQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception Instruction Reference X X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.L = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPSHAQ 833 AMD64 Technology 26568—Rev. 3.22—May 2018 VPSHAW Packed Shift Arithmetic Words Shifts each signed word of the source as specified by a count byte and writes the result to the corresponding word of the destination. The count bytes are 8-bit signed two's-complement values located in the low-order byte of the corresponding word of the count operand. When the count value is positive, bits are shifted to the left (toward the more significant bit positions). Zeros are shifted in at the right end (least-significant bit) of the word. When the count value is negative, bits are shifted to the right (toward the least significant bit positions). The most significant bit (signed bit) is replicated and shifted in at the left end (most-significant bit) of the word. The shift amount is stored in two’s-complement form. The count is modulo 16. There are three operands: VPSHAW dest, src, count The destination (dest) is an XMM register specified by ModRM.reg. Both src and count are configured by XOP.W. • When XOP.W = 0, count is an XMM register specified by XOP.vvvv and src is either an XMM register or a memory location specified by ModRM.r/m. • When XOP.W = 1, count is either an XMM register or a memory location specified by ModRM.r/m and src is an XMM register specified by XOP.vvvv. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset VPSHAW XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding XOP RXB.map_select W.vvvv.L.pp Opcode VPSHAW xmm1, xmm2/mem128, xmm3 8F RXB.09 0.count.0.00 99 /r VPSHAW xmm1, xmm2, xmm3/mem128 8F RXB.09 1.src.0.00 99 /r Related Instructions VPROTB, VPROTW, VPROTD, VPROTQ, VPSHLB, VPSHLW, VPSHLD, VPSHLQ, VPSHAB, VPSHAD, VPSHAQ rFLAGS Affected None 834 VPSHAW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception Instruction Reference X X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.L = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPSHAW 835 AMD64 Technology 26568—Rev. 3.22—May 2018 VPSHLB Packed Shift Logical Bytes Shifts each packed byte of the source as specified by a count byte and writes the result to the corresponding byte of the destination. The count bytes are 8-bit signed two's-complement values located in the corresponding byte element of the count operand. When the count value is positive, bits are shifted to the left (toward the more significant bit positions). Zeros are shifted in at the right end (least-significant bit) of the byte. When the count value is negative, bits are shifted to the right (toward the least significant bit positions). Zeros are shifted in at the left end (most-significant bit) of the byte. There are three operands: VPSHLB dest, src, count The destination (dest) is an XMM register specified by ModRM.reg. Both src and count are configured by XOP.W. • When XOP.W = 0, count is an XMM register specified by XOP.vvvv and src is either an XMM register or a memory location specified by ModRM.r/m. • When XOP.W = 1, count is either an XMM register or a memory location specified by ModRM.r/m and src is an XMM register specified by XOP.vvvv. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset VPSHLB XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding XOP RXB.map_select W.vvvv.L.pp Opcode VPSHLB xmm1, xmm2/mem128, xmm3 8F RXB.09 0.count.0.00 94 /r VPSHLB xmm1, xmm2, xmm3/mem128 8F RXB.09 1.src.0.00 94 /r Related Instructions VPROTB, VPROTW, VPROTD, VPROTQ, VPSHLW, VPSHLD, VPSHLQ, VPSHAB, VPSHAW, VPSHAD, VPSHAQ rFLAGS Affected None MXCSR Flags Affected None 836 VPSHLB Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception Instruction Reference X X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.L = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPSHLB 837 AMD64 Technology 26568—Rev. 3.22—May 2018 VPSHLD Packed Shift Logical Doublewords Shifts each doubleword of the source operand as specified by a count byte and writes the result to the corresponding doubleword of the destination. The count bytes are 8-bit signed two's-complement values located in the low-order byte of the corresponding doubleword element of the count operand. When the count value is positive, bits are shifted to the left (toward the more significant bit positions). Zeros are shifted in at the right end (least-significant bit) of the doubleword. When the count value is negative, bits are shifted to the right (toward the least significant bit positions). Zeros are shifted in at the left end (most-significant bit) of the doubleword. The shift amount is stored in two’s-complement form. The count is modulo 32. There are three operands: VPSHLD dest, src, count The destination (dest) is an XMM register specified by ModRM.reg. Both src and count are configured by XOP.W. • When XOP.W = 0, count is an XMM register specified by XOP.vvvv and src is either an XMM register or a memory location specified by ModRM.r/m. • When XOP.W = 1, count is either an XMM register or a memory location specified by ModRM.r/m and src is an XMM register specified by XOP.vvvv. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset VPSHLD XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding XOP RXB.map_select W.vvvv.L.pp Opcode VPSHLD xmm1, xmm3/mem128, xmm2 8F RXB.09 0.count.0.00 96 /r VPSHLD xmm1, xmm2, xmm3/mem128 8F RXB.09 1.src.0.00 96 /r Related Instructions VPROTB, VPROTW, VPROTD, VPROTQ, VPSHLB, VPSHLW, VPSHLQ, VPSHAB, VPSHAW, VPSHAD, VPSHAQ rFLAGS Affected None 838 VPSHLD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception Instruction Reference X X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.L = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPSHLD 839 AMD64 Technology 26568—Rev. 3.22—May 2018 VPSHLQ Packed Shift Logical Quadwords Shifts each quadwords of the source by as specified by a count byte and writes the result in the corresponding quadword of the destination. The count bytes are 8-bit signed two's-complement values located in the low-order byte of the corresponding quadword element of the count operand. Bit 6 of the count byte is ignored. When the count value is positive, bits are shifted to the left (toward the more significant bit positions). Zeros are shifted in at the right end (least-significant bit) of the quadword. When the count value is negative, bits are shifted to the right (toward the least significant bit positions). Zeros are shifted in at the left end (most-significant bit) of the quadword. There are three operands: VPSHLQ dest, src, count The destination (dest) is an XMM register specified by ModRM.reg. Both src and count are configured by XOP.W. • When XOP.W = 0, count is an XMM register specified by XOP.vvvv and src is either an XMM register or a memory location specified by ModRM.r/m. • When XOP.W = 1, count is either an XMM register or a memory location specified by ModRM.r/m and src is an XMM register specified by XOP.vvvv. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset VPSHLQ XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding XOP RXB.map_select W.vvvv.L.pp Opcode VPSHLQ xmm1, xmm3/mem128, xmm2 8F RXB.09 0.count.0.00 97 /r VPSHLQ xmm1, xmm2, xmm3/mem128 8F RXB.09 1.src.0.00 97 /r Related Instructions VPROTB, VPROTW, VPROTD, VPROTQ, VPSHLB, VPSHLW, VPSHLD, VPSHAB, VPSHAW, VPSHAD, VPSHAQ rFLAGS Affected None 840 VPSHLQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception Instruction Reference X X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.L = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPSHLQ 841 AMD64 Technology 26568—Rev. 3.22—May 2018 VPSHLW Packed Shift Logical Words Shifts each word of the source operand as specified by a count byte and writes the result to the corresponding word of the destination. The count bytes are 8-bit signed two's-complement values located in the low-order byte of the corresponding word element of the count operand. When the count value is positive, bits are shifted to the left (toward the more significant bit positions). Zeros are shifted in at the right end (least-significant bit) of the word. When the count value is negative, bits are shifted to the right (toward the least significant bit positions). Zeros are shifted in at the left end (most-significant bit) of the word. There are three operands: VPSHLW dest, src, count The destination (dest) is an XMM register specified by ModRM.reg. Both src and count are configured by XOP.W. • When XOP.W = 0, count is an XMM register specified by XOP.vvvv and src is either an XMM register or a memory location specified by ModRM.r/m. • When XOP.W = 1, count is either an XMM register or a memory location specified by ModRM.r/m and src is an XMM register specified by XOP.vvvv. Bits [255:128] of the YMM register that corresponds to the destination are cleared. Instruction Support Form Subset VPSHLW XOP Feature Flag CPUID Fn8000_0001_ECX[XOP] (bit 11) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding XOP RXB.map_select W.vvvv.L.pp Opcode VPSHLW xmm1, xmm3/mem128, xmm2 8F RXB.09 0.count.0.00 95 /r VPSHLW xmm1, xmm2, xmm3/mem128 8F RXB.09 1.src.0.00 95 /r Related Instructions VPROTB, VPROLW, VPROTD, VPROTQ, VPSHLB, VPSHLD, VPSHLQ, VPSHAB, VPSHAW, VPSHAD, VPSHAQ rFLAGS Affected None MXCSR Flags Affected None 842 VPSHLW Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception Instruction Reference X X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.L = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. VPSHLW 843 AMD64 Technology 26568—Rev. 3.22—May 2018 VPSLLVD Variable Shift Left Logical Doublewords Left-shifts the bits of each doubleword in the first source operand by a count specified in the corresponding doubleword of a second source operand and writes the shifted values to the destination. The second source operand is treated as an array of unsigned 32-bit integers. Each integer specifies the shift count of the corresponding doubleword of the first source operand. Each doubleword is shifted independently. Low-order bits emptied by shifting are cleared. High-order bits shifted out of each doubleword are discarded. When the shift count for any doubleword is greater than 31, that doubleword is cleared in the destination. This instruction has 128-bit and 256-bit encodings: XMM Encoding The first source operand is an XMM register. The shift count array is specified by either a second XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The shift count array is specified by either a second YMM register or a 256-bit memory location. The destination is a YMM register. Instruction Support Form Subset VPSLLVD AVX2 Feature Flag CPUID Fn0000_00007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPSLLVD xmm1, xmm2, xmm3/mem128 C4 RXB.02 0.src1.0.01 47 /r VPSLLVD ymm1, ymm2, ymm3/mem256 C4 RXB.02 0.src1.1.01 47 /r Related Instructions (V)PSLLD, (V)PSLLDQ, (V)PSLLQ, (V)PSLLW, (V)PSRAD, (V)PSRAW, (V)PSRLD, (V)PSRLDQ, (V)PSRLQ, (V)PSRLW, VPSLLVQ, VPSRAVD, VPSRLVD, VPSRLVQ rFLAGS Affected None MXCSR Flags Affected None 844 VPSLLVD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP A A A A A A A A A A A Alignment check, #AC A Page fault, #PF A — AVX2 exception A Instruction Reference Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. VPSLLVD 845 AMD64 Technology 26568—Rev. 3.22—May 2018 VPSLLVQ Variable Shift Left Logical Quadwords Left-shifts the bits of each quadword in the first source operand by a count specified in the corresponding quadword of a second source operand and writes the shifted values to the destination. The second source operand is treated as an array of unsigned 64-bit integers. Each integer specifies the shift count of the corresponding quadword of the first source operand. Each quadword is shifted independently. Low-order bits emptied by shifting are cleared. High-order bits shifted out of each quadword are discarded. When the shift count for any quadword is greater than 63, that quadword is cleared in the destination. This instruction has 128-bit and 256-bit encodings: XMM Encoding The first source operand is an XMM register. The shift count array is specified by either a second XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The shift count array is specified by either a second YMM register or a 256-bit memory location. The destination is a YMM register. Instruction Support Form Subset VPSLLVQ AVX2 Feature Flag CPUID Fn0000_00007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPSLLVQ xmm1, xmm2, xmm3/mem128 C4 RXB.02 1.src1.0.01 47 /r VPSLLVQ ymm1, ymm2, ymm3/mem256 C4 RXB.02 1.src1.1.01 47 /r Related Instructions (V)PSLLD, (V)PSLLDQ, (V)PSLLQ, (V)PSLLW, (V)PSRAD, (V)PSRAW, (V)PSRLD, (V)PSRLDQ, (V)PSRLQ, (V)PSRLW, VPSLLVD, VPSRAVD, VPSRLVD, VPSRLVQ rFLAGS Affected None MXCSR Flags Affected None 846 VPSLLVQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP A A A A A A A A A A A Alignment check, #AC A Page fault, #PF A — AVX2 exception A Instruction Reference Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. VPSLLVQ 847 AMD64 Technology 26568—Rev. 3.22—May 2018 VPSRAVD Variable Shift Right Arithmetic Doublewords Performs a right arithmetic shift of each signed 32-bit integer in the first source operand by a count specified in the corresponding doubleword of a second source operand and writes the shifted values to the destination. The second source operand is treated as an array of unsigned 32-bit integers. Each integer specifies the shift count of the corresponding doubleword of the first source operand. Each doubleword is shifted independently. A copy of the sign bit is shifted into the most-significant bit of the element on each right-shift. Loworder bits shifted out of each element are discarded. If a doubleword contains a positive integer and the shift count is greater than 31, that doubleword is cleared in the destination. If a doubleword contains a negative integer and the shift count is greater than 31, that doubleword is set to -1 in the destination. This instruction has 128-bit and 256-bit encodings: XMM Encoding The first source operand is an XMM register. The shift count array is specified by either a second XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The shift count array is specified by either a second YMM register or a 256-bit memory location. The destination is a YMM register. Instruction Support Form Subset VPSRAVD AVX2 Feature Flag CPUID Fn0000_00007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPSRAVD xmm1, xmm2, xmm3/mem128 C4 RXB.02 0.src1.0.01 46 /r VPSRAVD ymm1, ymm2, ymm3/mem256 C4 RXB.02 0.src1.1.01 46 /r Related Instructions (V)PSLLD, (V)PSLLDQ, (V)PSLLQ, (V)PSLLW, (V)PSRAD, (V)PSRAW, (V)PSRLD, (V)PSRLDQ, (V)PSRLQ, (V)PSRLW, VPSLLVD, VPSLLVQ, VPSRLVD, VPSRLVQ rFLAGS Affected None 848 VPSRAVD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP A A A A A A A A A A A A Alignment check, #AC A Page fault, #PF A — AVX2 exception A Instruction Reference Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. VPSRAVD 849 AMD64 Technology 26568—Rev. 3.22—May 2018 VPSRLVD Variable Shift Right Logical Doublewords Right-shifts each doubleword in the first source operand by a count specified in the corresponding doubleword of a second source operand and writes the shifted values to the destination. The second source operand is treated as an array of unsigned 32-bit integers. Each integer specifies the shift count of the corresponding doubleword of the first source operand. Each doubleword is shifted independently. Zero is shifted into the most-significant bit of the element on each right-shift. Low-order bits shifted out of each element are discarded. If the shift count for any doubleword is greater than 31, that doubleword is cleared in the destination. This instruction has 128-bit and 256-bit encodings: XMM Encoding The first source operand is an XMM register. The shift count array is specified by either a second XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The shift count array is specified by either a second YMM register or a 256-bit memory location. The destination is a YMM register. Instruction Support Form Subset VPSRLVD AVX2 Feature Flag CPUID Fn0000_00007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPSRLVD xmm1, xmm2, xmm3/mem128 C4 RXB.02 0.src1.0.01 45 /r VPSRLVD ymm1, ymm2, ymm3/mem256 C4 RXB.02 0.src1.1.01 45 /r Related Instructions (V)PSLLD, (V)PSLLDQ, (V)PSLLQ, (V)PSLLW, (V)PSRAD, (V)PSRAW, (V)PSRLD, (V)PSRLDQ, (V)PSRLQ, (V)PSRLW, VPSLLVD, VPSLLVQ, VPSRAVD, VPSRLVQ rFLAGS Affected None MXCSR Flags Affected None 850 VPSRLVD Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP A A A A A A A A A A A Alignment check, #AC A Page fault, #PF A — AVX2 exception A Instruction Reference Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. VPSRLVD 851 AMD64 Technology 26568—Rev. 3.22—May 2018 VPSRLVQ Variable Shift Right Logical Quadwords Right-shifts each quadword in the first source operand by a count specified in the corresponding quadword of a second source operand and writes the shifted values to the destination. The second source operand is treated as an array of unsigned 64-bit integers. Each integer specifies the shift count of the corresponding quadword of the first source operand. Each quadword is shifted independently. Zero is shifted into the most-significant bit of the element on each right-shift. Low-order bits shifted out of each element are discarded. If the shift count for any quadword is greater than 63, that quadword is cleared in the destination. This instruction has 128-bit and 256-bit encodings: XMM Encoding The first source operand is an XMM register. The shift count array is specified by either a second XMM register or a 128-bit memory location. The destination is an XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register. The shift count array is specified by either a second YMM register or a 256-bit memory location. The destination is a YMM register. Instruction Support Form Subset VPSRLVQ AVX2 Feature Flag CPUID Fn0000_00007_EBX[AVX2]_x0 (bit 5) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VPSRLVQ xmm1, xmm2, xmm3/mem128 C4 RXB.02 1.src1.0.01 45 /r VPSRLVQ ymm1, ymm2, ymm3/mem256 C4 RXB.02 1.src1.1.01 45 /r Related Instructions (V)PSLLD, (V)PSLLDQ, (V)PSLLQ, (V)PSLLW, (V)PSRAD, (V)PSRAW, (V)PSRLD, (V)PSRLDQ, (V)PSRLQ, (V)PSRLW, VPSLLVD, VPSLLVQ, VPSRAVD, VPSRLVD rFLAGS Affected None MXCSR Flags Affected None 852 VPSRLVQ Instruction Reference 26568—Rev. 3.22—May 2018 AMD64 Technology Exceptions Exception Mode Real Virt Prot A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP A A A A A A A A A A A Alignment check, #AC A Page fault, #PF A — AVX2 exception A Instruction Reference Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. VPSRLVQ 853 26568—Rev. 3.22—May 2018 AMD64 Technology VTESTPD Packed Bit Test Performs two different logical operations on the sign bits of the first and second packed floating-point operands and updates the ZF and CF flags based on the results. First, performs a bitwise AND of the sign bits of each double-precision floating-point element of the first source operand with the sign bits of the corresponding elements of the second source operand. Sets rFLAGS.ZF when all bit operations = 0; else, clears ZF. Second, performs a bitwise AND of the complements (NOT) of the sign bits of each double-precision floating-point element of the first source with the sign bits of the corresponding elements of the second source operand. Sets rFLAGS.CF when all bit operations = 0; else, clears CF. Neither source operand is modified. This extended-form instruction has both 128-bit and 256-bit encoding. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. Instruction Support Form Subset VTESTPD AVX Feature Flag CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VTESTPD xmm1, xmm2/mem128 C4 RXB.02 0.1111.0.01 0F /r VTESTPD ymm1, ymm2/mem256 C4 RXB.02 0.1111.1.01 0F /r Related Instructions PTEST, VTESTPS Instruction Reference VTESTPD 854 26568—Rev. 3.22—May 2018 AMD64 Technology rFLAGS Affected ID VIP VIF AC VM RF NT IOPL OF DF IF TF 0 21 Note: 20 19 18 17 16 14 13:12 11 10 9 8 SF ZF AF PF CF M M M M M 7 6 4 2 0 Bits 31:22, 15, 5, 3 and 1 are reserved. A flag set or cleared is M (modified). Unaffected flags are blank. Undefined flags are U. MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X X X X X X X X X X X X X X X X S S S X A X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX exception Instruction Reference X X X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. VTESTPD 855 26568—Rev. 3.22—May 2018 AMD64 Technology VTESTPS Packed Bit Test Performs two different logical operations on the sign bits of the first and second packed floating-point operands and updates the ZF and CF flags based on the results. First, performs a bitwise AND of the sign bits of each single-precision floating-point element of the first source operand with the sign bits of the corresponding elements of the second source operand. Sets rFLAGS.ZF when all bit operations = 0; else, clears ZF. Second, performs a bitwise AND of the complements (NOT) of the sign bits of each single-precision floating-point element of the first source with the sign bits of the corresponding elements of the second source operand. Sets rFLAGS.CF when all bit operations = 0; else, clears CF. Neither source operand is modified. This extended-form instruction has both 128-bit and 256-bit encoding. XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. YMM Encoding The first source operand is a YMM register. The second source operand is either a YMM register or a 256-bit memory location. Instruction Support Form Subset VTESTPS AVX Feature Flag CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VTESTPS xmm1, xmm2/mem128 C4 RXB.02 0.1111.0.01 0E /r VTESTPS ymm1, ymm2/mem256 C4 RXB.02 0.1111.1.01 0E /r Related Instructions PTEST, VTESTPD Instruction Reference VTESTPS 856 26568—Rev. 3.22—May 2018 AMD64 Technology rFLAGS Affected ID VIP VIF AC VM RF NT IOPL OF DF IF TF 0 21 Note: 20 19 18 17 16 14 13:12 11 10 9 8 SF ZF AF PF CF M M M M M 7 6 4 2 0 Bits 31:22, 15, 5, 3 and 1 are reserved. A flag set or cleared is M (modified). Unaffected flags are blank. Undefined flags are U. MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot X X X X X X X X X X X X X X X X S S S X A X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX exception Instruction Reference X X X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. VTESTPS 857 AMD64 Technology 26568—Rev. 3.22—May 2018 VZEROALL Zero All YMM Registers Clears all YMM registers. In 64-bit mode, YMM0–15 are all cleared (set to all zeros). In legacy and compatibility modes, only YMM0–7 are cleared. The contents of the MXCSR is unaffected. Instruction Support Form Subset VZEROALL AVX Feature Flag CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding VZEROALL VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.01 X.1111.1.00 77 Related Instructions VZEROUPPER rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot A A Invalid opcode, #UD Device not available, #NM A — AVX exception. 858 A A A A A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. 26568—Rev. 3.22—May 2018 AMD64 Technology VZEROUPPER Zero All YMM Registers Upper Clears the upper octword of all YMM registers. The corresponding XMM registers (lower octword of each YMM register) are not affected. In 64-bit mode, the instruction operates on registers YMM0–15. In legacy and compatibility mode, the instruction operates on YMM0–7. The contents of the MXCSR is unaffected. Instruction Support Form Subset VZEROUPPER AVX Feature Flag CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Encoding VZEROUPPER VEX RXB.map_select W.vvvv.L.pp Opcode C4 RXB.01 X.1111.0.00 77 Related Instructions VZEROUPPER rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot A A Invalid opcode, #UD Device not available, #NM A — AVX exception. A A A A A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. 859 AMD64 Technology 26568—Rev. 3.22—May 2018 XGETBV Get Extended Control Register Value Copies the content of the extended control register (XCR) specified by the ECX register into the EDX:EAX register pair. The high-order 32 bits of the XCR are loaded into EDX and the low-order 32 bits are loaded into EAX. The corresponding high-order 32 bits of RAX and RDX are cleared. This instruction and associated data structures extend the FXSAVE/FXRSTOR memory image used to manage processor states and provide additional functionality. See the XSAVE instruction description for more information. Values returned to EDX:EAX in unimplemented bit locations are undefined. Specifying a reserved or unimplemented XCR in ECX causes a general protection exception. Currently, only XCR0 (the XFEATURE_ENABLED_MASK register) is supported. If CPUID reports support for ECX=1 (see table below), then the XGETBV instruction supports an ECX value of 1. When ECX=1, XGETBV returns the logical and of XCR0 and the current value of the XINUSE statecomponent bitmap. Instruction Support Form XGETBV XGETBV Subset Feature Flag XSAVE/XRSTOR CPUID Fn0000_0001_ECX[XSAVE] (bit 26) ECX=1 support CPUID Fn0000_000D_EAX_x1[2] = 1 For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode XGETBV 0F 01 D0 Description Copies content of the XCR specified by ECX into EDX:EAX. Related Instructions RDMSR, XSETBV rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Invalid opcode, #UD General protection, #GP X — exception generated 860 Mode Real Virt Prot X X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. Lock prefix (F0h) preceding opcode. CR4.OSXSAVE = 0 ECX specifies a reserved or unimplemented XCR address. 26568—Rev. 3.22—May 2018 AMD64 Technology XORPD VXORPD XOR Packed Double-Precision Floating-Point Performs bitwise XOR of two packed double-precision floating-point values in the first source operand with the corresponding values of the second source operand and writes the results into the corresponding elements of the destination. There are legacy and extended forms of the instruction: XORPD The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VXORPD The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register and the second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset XORPD SSE2 VXORPD AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic XORPD xmm1, xmm2/mem128 Opcode Description 66 0F 57 /r Performs bitwise XOR of two packed double-precision floating-point values in xmm1 with corresponding values in xmm2 or mem128. Writes the result to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VXORPD xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.01 57 /r VXORPD ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.01 57 /r Related Instructions (V)ANDNPS, (V)ANDPD, (V)ANDPS, (V)ORPD, (V)ORPS, (V)XORPS 861 AMD64 Technology 26568—Rev. 3.22—May 2018 rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception 862 X A S S X A S S X S S S S S S S S S S S S S S A X S S A A A X X X X S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Memory operand not 16-byte aligned and MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. 26568—Rev. 3.22—May 2018 AMD64 Technology XORPS VXORPS XOR Packed Single-Precision Floating-Point Performs bitwise XOR of four packed single-precision floating-point values in the first source operand with the corresponding values of the second source operand and writes the results into the corresponding elements of the destination. There are legacy and extended forms of the instruction: XORPS The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The first source register is also the destination. Bits [255:128] of the YMM register that corresponds to the destination are not affected. VXORPS The extended form of the instruction has both 128-bit and 256-bit encodings: XMM Encoding The first source operand is an XMM register. The second source operand is either an XMM register or a 128-bit memory location. The destination is a third XMM register. Bits [255:128] of the YMM register that corresponds to the destination are cleared. YMM Encoding The first source operand is a YMM register and the second source operand is either a YMM register or a 256-bit memory location. The destination is a third YMM register. Instruction Support Form Subset XORPS SSE2 VXORPS AVX Feature Flag CPUID Fn0000_0001_EDX[SSE2] (bit 26) CPUID Fn0000_0001_ECX[AVX] (bit 28) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic XORPS xmm1, xmm2/mem128 Opcode Description 66 0F 57 /r Performs bitwise XOR of four packed single-precision floating-point values in xmm1 with corresponding values in xmm2 or mem128. Writes the result to xmm1. Mnemonic Encoding VEX RXB.map_select W.vvvv.L.pp Opcode VXORPS xmm1, xmm2, xmm3/mem128 C4 RXB.01 X.src1.0.00 57 /r VXORPS ymm1, ymm2, ymm3/mem256 C4 RXB.01 X.src1.1.00 57 /r Related Instructions (V)ANDNPS, (V)ANDPD, (V)ANDPS, (V)ORPD, (V)ORPS, (V)XORPD 863 AMD64 Technology 26568—Rev. 3.22—May 2018 rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Mode Real Virt Prot Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception 864 X A S S X A S S X S S S S S S S S S S S S S S A X S S A A A X X X X S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Memory operand not 16-byte aligned and MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. 26568—Rev. 3.22—May 2018 AMD64 Technology XRSTOR Restore Extended States Restores a partial or full processor state from memory. This instruction and associated data structures extend the FXSAVE/FXRSTOR memory image used to manage processor states and provide additional functionality. See the descriptions of XSAVE and XRSTOR instructions for basic operational details. The XRSTOR instruction may operate on the buffer in standard form or a compact form. The compact form is indicated in the memory buffer with XCOMP_BV[63]=1. In either form, the instruction creates a Requested Feature Bit Map (RBFM) which is the logical AND of EDX:EAX and XCR0. Then for each feature bit: 1. If RFBM = 0, XRSTOR does not update the component. 2. If RFBM = 1 but the corresponding XSTATE_BV bit is 0, the component is set to its reset state without reading anything out of the buffer. 3. IF RFBM =1 and XSTATE_BV =1, the component state is read from the buffer. 4. XRSTOR loads an internal state value XRSTOR_INFO that can be used to further optimize a subsequent XSAVEOPT or XSAVES. This reflects the current privilege level and virtualization mode as well as the save area's base address and XCOMP_BV field. 5. If RFBM=1, the corresponding XINUSE bit is set to the state of XSTATE_BV. For standard mode, MXCSR is loaded if RFBM[1]=1 or RFBM[2]=1. It is never initialized. For compact mode, MXCSR is associated with RFBM[1]. In some generations, the FP error pointers were only restored if there was a Floating point error logged. In newer generations, the FP error pointers are always restored. This is indicated by CPUID Fn8000_0008_EBX[2]. Instruction Support Form Subset XRSTOR XRSTOR Feature Flag CPUID Fn0000_00001_ECX[XSAVE] (bit 26) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode Description XRSTOR mem 0F AE /5 Restores user-specified processor state from memory. Related Instructions XGETBV, XRSTORS, XSAVE, XSAVEC, XSAVES, XSETBV rFLAGS Affected None MXCSR Flags Affected None 865 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF X — exception generated 866 Mode Real Virt Prot X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. CR4.OSXSAVE = 0. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not aligned on 64-byte boundary. Any must be zero (MBZ) bits in the save area were set. Attempt to set reserved bits in MXCSR. XCOMP_BV[i] = 0 & XSTATE_BV[i] = 1 XCOMP_BV[I] = 1 & XCR0[i] = 0 Bytes 63:16 of header are non-zero Instruction execution caused a page fault. 26568—Rev. 3.22—May 2018 AMD64 Technology XRSTORS Restore extended states supervisor Restores processor state from memory. XRSTORS is very similar to the XRSTOR instruction in compacted form with the following differences: 1. XRSTORS must be executed at CPL=0 2. XRSTORS must read XCOMP_BV[63]=1, otherwise it will cause a #GP(0) exception 3. XRSTORS is able to restore state enabled from the IA32_XSS MSR. All other behavior is the same as XRSTOR with the compact form. Instruction Support Form Subset XRSTOR XRSTOR Feature Flag CPUID Fn0000_00001_ECX_X1[XSAVES] (bit 3) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode XRSTOR mem 0F C7 /3 Description Saves user-specified processor state to memory Related Instructions XGETBV, XRSTOR, XSAVE, XSAVEC, XSAVES, XSETBV rFLAGS Affected None MXCSR Flags Affected None 867 AMD64 Technology Exception Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF X — exception generated 868 26568—Rev. 3.22—May 2018 Mode Real Virt Prot X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. CR4.OSXSAVE = 0. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not aligned on 64-byte boundary. Any must be zero (MBZ) bits in the save area were set. Attempt to set reserved bits in MXCSR. CPL <> 0 (XSTATE_BV[i] & ~IA321_XSS[i]) = 1 Instruction execution caused a page fault. 26568—Rev. 3.22—May 2018 AMD64 Technology XSAVE Save Extended States Saves a user-defined subset of enabled processor state data to a specified memory address. This instruction and associated data structures extend the FXSAVE/FXRSTOR memory image used to manage processor states and provide additional functionality. The XSAVE/XRSTOR save area consists of a header section, and individual save areas for each processor state component. A component is saved when both the corresponding bits in the mask operand (EDX:EAX) and the XFEATURE_ENABLED_MASK (XCR0) register are set. This bit-wise logical AND of EDX:EAX and XCR0 is known as the Requested Feature Bit Map (RFBM). A component is not saved when its corresponding RFBM bit is zero. Software can set any bit in EDX:EAX, regardless of whether the bit position in XCR0 is valid for the processor. When the mask operand contains all 1's, all processor state components enabled in XCR0 are saved. For each component saved, XSAVE sets the corresponding bit in the XSTATE_BV field of the save area header. XSAVE does not clear XSTATE_BV bits or modify individual save areas for components that are not saved. If a saved component is in the hardware-specified initialized state, XSAVE may clear the corresponding XSTATE_BV bit instead of setting it. This optimization is implementationdependent. The MXCSR register is saved if either of RFBM bits 0 or 1 are set to 1. If there is no floating point error present, some generations would not write out any of the FP error pointers. On newer generations, these fields are written to zeros. This is indicated by CPUID Fn8000_0008_EBX[2]. Instruction Support Form XSAVE Subset Feature Flag XSAVE/XRSTOR CPUID Fn0000_0001_ECX[XSAVE] (bit 26) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode XSAVE mem 0F AE /4 Description Saves user-specified processor state to memory. Related Instructions XGETBV, XRSTOR, XSAVEOPT, XSETBV rFLAGS Affected None MXCSR Flags Affected None 869 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF X — exception generated 870 Mode Real Virt Prot X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. CR4.OSXSAVE = 0. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not aligned on 64-byte boundary. Attempt to write read-only memory. Instruction execution caused a page fault. 26568—Rev. 3.22—May 2018 XSAVEC AMD64 Technology Save extended states in compacted form Saves a user-defined subset of enabled processor state data to a specified memory address, possibly in a compacted form. This instruction and associated data structures extend the FXSAVE/FXRSTOR memory image used to manage processor states and provides compaction functionality for more efficient context switching. See the XSAVE and XRSTOR instruction descriptions for basic operational details.. XSAVEC is very similar to XSAVE but provides the following alternate functionality: 1. XSAVEC differs from XSAVE by using the init optimization and compaction. 2. XSAVEC differs by only saving a component if its RFBM=1 and its XINUSE=1. XINUSE is a means by which the processor determines whether the feature is in its Initial state. 3. XSAVEC never writes bytes 511:464 of the legacy XSAVE data structure. 4. XSAVEC calculates XSTATE_BV by performing the logical AND of the RFBM and XINUSE bitmaps and writes it to the XSAVE area. 5. XSAVEC calculates XCOMP_BV as [63]=1 and 62:0 = RFBM, and writes it to the XSAVE area. 6. XSAVEC does not modify any other parts of the header except as indicated in 4 and 5. 7. XSAVEC uses the compacted format of the XSAVE extended region while saving state. Instruction Support Form Subset XSAVE mem XSAVEC Feature Flag CPUID Fn0000_0000D_EAX_x1[XSAVEC] (bit 1) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode XSAVEOPT mem 0F C7 /4 Description Saves user-specified processor state to memory. Related Instructions XGETBV, XRSTOR, XRSTORS, XSAVE, XSAVES, XSETBV rFLAGS Affected None MXCSR Flags Affected None 871 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF X — exception generated 872 Mode Real Virt Prot X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. CR4.OSXSAVE = 0. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not aligned on 64-byte boundary. Attempt to write read-only memory. Instruction execution caused a page fault. 26568—Rev. 3.22—May 2018 AMD64 Technology XSAVEOPT Save Extended States Performance Optimized Saves a user-defined subset of enabled processor state data to a specified memory address. This instruction and associated data structures extend the FXSAVE/FXRSTOR memory image used to manage processor states and provide additional functionality. See the XSAVE and XRSTOR instruction descriptions for basic operational details. The XSAVE/XRSTOR save area consists of a header section, and individual save areas for each processor state component. A component is saved when both the corresponding bits in the mask operand (EDX:EAX) and the XFEATURE_ENABLED_MASK (XCR0) register are set. A component is not saved when either of the corresponding bits in EDX:EAX or XCR0 is cleared. Software can set any bit in EDX:EAX, regardless of whether the bit position in XCR0 is valid for the processor. When the mask operand contains all 1's, all processor state components enabled in XCR0 are saved. For each component saved, XSAVEOPT sets the corresponding bit in the XSTATE_BV field of the save area header. XSAVEOPT does not clear XSTATE_BV bits or modify individual save areas for components that are not saved. If a saved component is in the hardware-specified initialized state, XSAVEOPT may clear the corresponding XSTATE_BV bit instead of setting it. This optimization is implementation-dependent. XSAVEOPT may provide other implementation-specific optimizations, such as the modified optimization described for XSAVES. Instruction Support Form Subset XSAVEOPT XSAVEOPT Feature Flag CPUID Fn0000_0000D_EAX_x1[XSAVEOPT] (bit 0) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode XSAVEOPT mem 0F AE /6 Description Saves user-specified processor state to memory. Related Instructions XGETBV, XRSTOR, XSAVE, XSETBV rFLAGS Affected None MXCSR Flags Affected None 873 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF X — exception generated 874 Mode Real Virt Prot X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. CR4.OSXSAVE = 0. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not aligned on 64-byte boundary. Attempt to write read-only memory. Instruction execution caused a page fault. 26568—Rev. 3.22—May 2018 AMD64 Technology XSAVES Save Extended States Supervisor Saves a user-defined subset of enabled processor state data to a specified memory address, possibly in a compacted form. This instruction and associated data structures extend the XSAVE/XRSTOR memory image used to manage processor states and provides compaction functionality. See the XSAVE and XRSTOR instruction descriptions for basic operational details. The XSAVES is very similar to XSAVEC but provides the following alternate functionality: 1. XSAVES must be executed at CPL=0 2. XSAVES can save state enabled in the IA32_XSS MSR. The specific state elements saved are determined by the logical AND of EDX:EAX with the logical OR of XCR0 with the IA32_XSS MSR. 3. XSAVES can use the modified optimization to not save components, even if RFBM=1 and XINUSE=1 for the stated component. If the component state has not been modified internally since the last execution of XRSTOR or XRSTORS and the XRSTOR_INFO state (an execution environment signature created by the last XRSTOR) matches the current execution state of this XSAVES, the state save can be skipped. Instruction Support Form Subset XSAVES XSAVES Feature Flag CPUID Fn0000_0000D_EAX_x1[XSAVES] (bit 3) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic Opcode XSAVES mem 0F C7 /5 Description Saves user-specified processor state to memory Related Instructions XGETBV, XRSTOR, XRSTORS, XSAVE, XSAVEC, XSETBV rFLAGS Affected None MXCSR Flags Affected None 875 AMD64 Technology 26568—Rev. 3.22—May 2018 Exceptions Exception Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF X — exception generated 876 Mode Real Virt Prot X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. CR4.OSXSAVE = 0. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not aligned on 64-byte boundary. Attempt to write read-only memory. Instruction execution caused a page fault. 26568—Rev. 3.22—May 2018 AMD64 Technology XSETBV Set Extended Control Register Value Writes the content of the EDX:EAX register pair into the extended control register (XCR) specified by the ECX register. The high-order 32 bits of the XCR are loaded from EDX and the low-order 32 bits are loaded from EAX. The corresponding high-order 32 bits of RAX and RDX are ignored. This instruction and associated data structures extend the FXSAVE/FXRSTOR memory image used to manage processor states and provide additional functionality. See the XSAVE instruction description for more information. Currently, only the XFEATURE_ENABLED_MASK register (XCR0) is supported. Specifying a reserved or unimplemented XCR in ECX causes a general protection exception (#GP). Executing XSETBV at a privilege level other than 0 causes a general-protection exception. A general protection exception also occurs when software attempts to write to reserved bits of an XCR. Instruction Support Form XSETBV Subset Feature Flag XSAVE/XRSTOR CPUID Fn0000_0001_ECX[XSAVE] (bit 26) For more on using the CPUID instruction to obtain processor feature support information, see Appendix E of Volume 3. Instruction Encoding Mnemonic XSETBV Opcode Description 0F 01 D1 Writes the content of the EDX:EAX register pair to the XCR specified by the ECX register. Related Instructions XGETBV, XRSTOR, XSAVE, XSAVEOPT rFLAGS Affected None MXCSR Flags Affected None Exceptions Exception Invalid opcode, #UD Mode Real Virt Prot X X X X General protection, #GP X X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. CR4.OSXSAVE = 0. Lock prefix (F0h) preceding opcode. CPL != 0. ECX specifies a reserved or unimplemented XCR address. Any must be zero (MBZ) bits in the XCR were set. Setting XCR0[2:1] to 10b. Writing 0 to XCR[0]. X — exception generated 877 AMD64 Technology 878 26568—Rev. 3.22—May 2018 26568—Rev. 3.22—May 2018 3 AMD64 Technology Exception Summary This chapter provides a ready reference to instruction exceptions. Table 3-1 shows instructions grouped by exception class, with the extended and legacy instruction type (if applicable). Hyperlinks in the table point to the exception tables which follow. Table 3-1. Instructions By Exception Class Mnemonic Extended Type Legacy Type Class 1 — AVX / SSE Vector Aligned (VEX.vvvv != 1111) AVX SSE2 MOVAPD VMOVAPD AVX SSE MOVAPS VMOVAPS AVX SSE2 MOVDQA VMOVDQA AVX SSE2 MOVNTDQ VMOVNTDQ AVX SSE2 MOVNTPD VMOVNTPD AVX SSE MOVNTPS VMOVNTPS Class 1X — SSE / AXV / AVX2 Vector (VEX.vvvv != 1111b or VEX.L=1 && !AVX2) AVX, AVX2 SSE4.1 MOVNTDQA VMOVNTDQA Class 2 — AVX / SSE Vector (SIMD 111111) AVX SSE2 DIVPD VDIVPD AVX SSE DIVPS VDIVPS Class 2-1 — AVX / SSE Vector (SIMD 111011) AVX SSE2 ADDPD VADDPD AVX SSE ADDPS VADDPS AVX SSE2 ADDSUBPD VADDSUBPD AVX SSE ADDSUBPS VADDSUBPS AVX SSE4.1 DPPS VDPPS AVX SSE3 HADDPD VHADDPD AVX SSE3 HADDPS VHADDPS AVX SSE3 HSUBPD VHSUBPD AVX SSE3 HSUBPS VHSUBPS AVX SSE2 SUBPD VSUBPD AVX SSE SUBPS VSUBPS Class 2-2 — AVX / SSE Vector (SIMD 000011) AVX SSE2 CMPPD VCMPPD AVX SSE CMPPS VCMPPS AVX SSE2 MAXPD VMAXPD AVX SSE MAXPS VMAXPS AVX SSE2 MINPD VMINPD AVX SSE MINPS VMINPS AVX SSE2 MULPD VMULPD AVX SSE MULPS VMULPS Class 2-3 — AVX / SSE Vector (SIMD 100001) — — (unused) 879 AMD64 Technology 26568—Rev. 3.22—May 2018 Table 3-1. Instructions By Exception Class (continued) Mnemonic Extended Type Class 2A — AVX / SSE Vector (SIMD 111111, VEX.L = 1) — (unused) Class 2A-1 — AVX / SSE Vector (SIMD 111011, VEX.L = 1) AVX DPPD VDPPD Class 2B — AVX / SSE Vector (SIMD 111111, VEX.vvvv != 1111b) — (unused) Class 2B-1 — AVX / SSE Vector (SIMD 100000, VEX.vvvv != 1111b) AVX CVTDQ2PS VCVTDQ2PS Class 2B-2 — AVX / SSE Vector (SIMD 100001, VEX.vvvv != 1111b) AVX CVTPD2DQ VCVTPD2DQ AVX CVTPS2DQ VCVTPS2DQ AVX CVTTPS2DQ VCVTTPS2DQ AVX CVTTPD2DQ VCVTTPD2DQ AVX ROUNDPD, VROUNDPD AVX ROUNDPS, VROUNDPS Class 2B-3 — AVX / SSE Vector (SIMD 111011, VEX.vvvv != 1111b) AVX CVTPD2PS VCVTPD2PS Class 2B-4 — AVX / SSE Vector (SIMD 100011, VEX.vvvv != 1111b) AVX SQRTPD VSQRTPD AVX SQRTPS VSQRTPS Class 3 — AVX / SSE Scalar (SIMD 111111) AVX DIVSD VDIVSD AVX DIVSS VDIVSS Class 3-1 — AVX / SSE Scalar (SIMD 111011) AVX ADDSD VADDSD AVX ADDSS VADDSS AVX CVTSD2SS VCVTSD2SS AVX SUBSD VSUBSD AVX SUBSS VSUBSS Class 3-2 — AVX / SSE Scalar (SIMD 000011) AVX CMPSD VCMPSD AVX CMPSS VCMPSS AVX CVTSS2SD VCVTSS2SD AVX MAXSD VMAXSD AVX MAXSS VMAXSS AVX MINSD VMINSD AVX MINSS VMINSS AVX MULSD VMULSD AVX MULSS VMULSS AVX UCOMISD VUCOMISD AVX UCOMISS VUCOMISS 880 Legacy Type — SSE4.1 — SSE2 SSE2 SSE2 SSE2 SSE2 SSE4.1 SSE4.1 SSE2 SSE2 SSE SSE2 SSE SSE2 SSE SSE2 SSE2 SSE SSE2 SSE SSE2 SSE2 SSE SSE2 SSE SSE2 SSE SSE2 SSE 26568—Rev. 3.22—May 2018 AMD64 Technology Table 3-1. Instructions By Exception Class (continued) Mnemonic Extended Type Class 3-3 — AVX / SSE Scalar (SIMD 100000) AVX CVTSI2SD VCVTSI2SD AVX CVTSI2SS VCVTSI2SS Class 3-4 — AVX / SSE Scalar (SIMD 100001) AVX ROUNDSD, VROUNDSD AVX ROUNDSS, VROUNDSS Class 3-5 — AVX / SSE Scalar (SIMD 100011) AVX SQRTSD VSQRTSD AVX SQRTSS VSQRTSS Class 3A — AVX / SSE Scalar (SIMD 111111, VEX.vvvv != 1111b) — (unused) Class 3A-1 — AVX / SSE Scalar (SIMD 000011, VEX.vvvv != 1111b) AVX COMISD VCOMISD AVX COMISS VCOMISS AVX CVTPS2PD VCVTPS2PD Class 3A-2 — AVX / SSE Scalar (SIMD 100001, VEX.vvvv != 1111b) AVX CVTSD2SI VCVTSD2SI AVX CVTSS2SI VCVTSS2SI AVX CVTTSD2SI VCVTTSD2SI AVX CVTTSS2SI VCVTTSS2SI Class 4 — AVX / SSE Vector AVX AESDEC VAESDEC AVX AESDECLAST VAESDECLAST AVX AESENC VAESENC AVX AESENCLAST VAESENCLAST AVX AESIMC VAESIMC AVX AESKEYGENASSIST VAESKEYGENASSIST AVX ANDNPD VANDNPD AVX ANDNPS VANDNPS AVX ANDPD VANDPD AVX ANDPS VANDPS AVX BLENDPD VBLENDPD AVX BLENDPS VBLENDPS AVX ORPD VORPD AVX ORPS VORPS AVX PCLMULQDQ VPCLMULQDQ AVX SHUFPD VSHUFPD AVX SHUFPS VSHUFPS AVX UNPCKHPD VUNPCKHPD AVX UNPCKHPS VUNPCKHPS AVX UNPCKLPD VUNPCKLPD AVX UNPCKLPS VUNPCKLPS Legacy Type SSE2 SSE SSE4.1 SSE4.1 SSE2 SSE — SSE2 SSE SSE2 SSE2 SSE SSE2 SSE AES AES AES AES AES AES SSE2 SSE SSE2 SSE SSE4.1 SSE4.1 SSE2 SSE CLMUL SSE2 SSE2 SSE2 SSE SSE2 SSE 881 AMD64 Technology 26568—Rev. 3.22—May 2018 Table 3-1. Instructions By Exception Class (continued) Mnemonic XORPD VXORPD XORPS VXORPS Class 4A — AVX / SSE Vector (VEX.W = 1) BLENDVPD VBLENDVPD BLENDVPS VBLENDVPS Class 4B — AVX / SSE Vector (VEX.L = 1) (unused) Class 4B-X — SSE / AVX / AVX2 (VEX.L = 1 && !AVX2) MPSADBW VMPSADBW PACKSSDW VPACKSSDW PACKSSWB VPACKSSWB PACKUSDW VPACKUSDW PACKUSWB VPACKUSWB PADDB VPADDB PADDD VPADDD PADDQ VPADDQ PADDSB VPADDSB PADDSW VPADDSW PADDUSB VPADDUSB PADDUSW VPADDUSW PADDW VPADDW PALIGNR VPALIGNR PAND VPAND PANDN VPANDN PAVGB VPAVGB PAVGW VPAVGW PBLENDW VPBLENDW PCMPEQB VPCMPEQB PCMPEQD VPCMPEQD PCMPEQQ VPCMPEQQ PCMPEQW VPCMPEQW PCMPGTB VPCMPGTB PCMPGTD VPCMPGTD PCMPGTQ VPCMPGTQ PCMPGTW VPCMPGTW PHADDD VPHADDD PHADDSW VPHADDSW PHADDW VPHADDW PHSUBD VPHSUBD PHSUBW VPHSUBW PHSUBSW VPHSUBSW PMADDUBSW VPMADDUBSW 882 Extended Type Legacy Type AVX SSE2 AVX SSE AVX SSE4.1 AVX SSE4.1 — — AVX, AVX2 SSE4.1 AVX, AVX2 SSE2 AVX, AVX2 SSE2 AVX, AVX2 SSE4.1 AVX, AVX2 SSE2 AVX, AVX2 SSE2 AVX, AVX2 SSE2 AVX, AVX2 SSE2 AVX, AVX2 SSE2 AVX, AVX2 SSE2 AVX, AVX2 SSE2 AVX, AVX2 SSE2 AVX, AVX2 SSE2 AVX, AVX2 SSSE3 AVX, AVX2 SSE2 AVX, AVX2 SSE2 AVX, AVX2 SSE AVX, AVX2 SSE AVX, AVX2 SSE4.1 AVX, AVX2 SSE2 AVX, AVX2 SSE2 AVX, AVX2 SSE4.1 AVX, AVX2 SSE2 AVX, AVX2 SSE2 AVX, AVX2 SSE2 AVX, AVX2 SSE4.2 AVX, AVX2 SSE2 AVX, AVX2 SSSE3 AVX, AVX2 SSSE3 AVX, AVX2 SSSE3 AVX, AVX2 SSSE3 AVX, AVX2 SSSE3 AVX, AVX2 SSSE3 AVX, AVX2 SSSE3 26568—Rev. 3.22—May 2018 AMD64 Technology Table 3-1. Instructions By Exception Class (continued) Mnemonic PMADDWD VPMADDWD PMAXSB VPMAXSB PMAXSD VPMAXSD PMAXSW VPMAXSW PMAXUB VPMAXUB PMAXUD VPMAXUD PMAXUW VPMAXUW PMINSB VPMINSB PMINSD VPMINSD PMINSW VPMINSW PMINUB VPMINUB PMINUD VPMINUD PMINUW VPMINUW PMULDQ VPMULDQ PMULHRSW VPMULHRSW PMULHUW VPMULHUW PMULHW VPMULHW PMULLD VPMULLD PMULLW VPMULLW PMULUDQ VPMULUDQ POR VPOR PSADBW VPSADBW PSHUFB VPSHUFB PSIGNB VPSIGNB PSIGND VPSIGND PSIGNW VPSIGNW PSUBB VPSUBB PSUBD VPSUBD PSUBQ VPSUBQ PSUBSB VPSUBSB PSUBSW VPSUBSW PSUBUSB VPSUBUSB PSUBUSW VPSUBUSW PSUBW VPSUBW PUNPCKHBW VPUNPCKHBW PUNPCKHDQ VPUNPCKHDQ PUNPCKHQDQ VPUNPCKHQDQ PUNPCKHWD VPUNPCKHWD PUNPCKLBW VPUNPCKLBW PUNPCKLDQ VPUNPCKLDQ PUNPCKLQDQ VPUNPCKLQDQ PUNPCKLWD VPUNPCKLWD Extended Type Legacy Type AVX, AVX2 SSE2 AVX, AVX2 SSE4.1 AVX, AVX2 SSE4.1 AVX, AVX2 SSE AVX, AVX2 SSE AVX, AVX2 SSE4.1 AVX, AVX2 SSE4.1 AVX, AVX2 SSE4.1 AVX, AVX2 SSE4.1 AVX, AVX2 SSE AVX, AVX2 SSE AVX, AVX2 SSE4.1 AVX, AVX2 SSE4.1 AVX, AVX2 SSE4.1 AVX, AVX2 SSSE3 AVX, AVX2 SSE2 AVX, AVX2 SSE2 AVX, AVX2 SSE4.1 AVX, AVX2 SSE2 AVX, AVX2 SSE2 AVX, AVX2 SSE2 AVX, AVX2 SSE AVX, AVX2 SSSE3 AVX, AVX2 SSSE3 AVX, AVX2 SSSE3 AVX, AVX2 SSSE3 AVX, AVX2 SSE2 AVX, AVX2 SSE2 AVX, AVX2 SSE2 AVX, AVX2 SSE2 AVX, AVX2 SSE2 AVX, AVX2 SSE2 AVX, AVX2 SSE2 AVX, AVX2 SSE2 AVX, AVX2 SSE2 AVX, AVX2 SSE2 AVX, AVX2 SSE2 AVX, AVX2 SSE2 AVX, AVX2 SSE2 AVX, AVX2 SSE2 AVX, AVX2 SSE2 AVX, AVX2 SSE2 883 AMD64 Technology 26568—Rev. 3.22—May 2018 Table 3-1. Instructions By Exception Class (continued) Mnemonic Extended Type Legacy Type AVX, AVX2 SSE2 PXOR VPXOR Class 4C — AVX / SSE Vector (VEX.vvvv != 1111b) AVX SSE3 MOVSHDUP VMOVSHDUP AVX SSE3 MOVSLDUP VMOVSLDUP AVX SSE4.1 PTEST VPTEST AVX SSE RCPPS VRCPPS AVX SSE RSQRTPS VRSQRTPS Class 4C-1 — AVX / SSE Vector (write to RO memory, VEX.vvvv != 1111b) AVX SSE3 LDDQU VLDDQU AVX SSE2 MOVDQU VMOVDQU AVX SSE2 MOVUPD VMOVUPD AVX SSE MOVUPS VMOVUPS Class 4D — AVX / SSE Vector (VEX.vvvv != 1111b, VEX.L = 1) AVX SSE2 MASKMOVDQU VMASKMOVDQU AVX SSE4.2 PCMPESTRI VPCMPESTRI AVX SSE4.2 PCMPESTRM VPCMPESTRM AVX SSE4.2 PCMPISTRI VPCMPISTRI AVX SSE4.2 PCMPISTRM VPCMPISTRM AVX SSE4.1 PHMINPOSUW VPHMINPOSUW Class 4D-X — SSE / AVX / AVX2 Vector (VEX.vvvv != 1111b, (VEX.L = 1 && !AVX2)) AVX, AVX2 SSSE3 PABSB VPABSB AVX, AVX2 SSSE3 PABSD VPABSD AVX, AVX2 SSSE3 PABSW VPABSW AVX, AVX2 SSE2 PSHUFD VPSHUFD AVX, AVX2 SSE2 PSHUFHW VPSHUFHW AVX, AVX2 SSE2 PSHUFLW VPSHUFLW Class 4E — AVX / SSE Vector (VEX.W = 1, VEX.L = 1) — — (unused) Class 4E-X — SSE / AVX / AVX2 Vector (VEX.W = 1, (VEX.L = 1 && !AVX2)) AVX SSE4.1 PBLENDVB VPBLENDVB Class 4F — AVX / SSE (VEX.L = 1) — — (unused) Class 4F-X — SSE / AVX / AVX2 Vector (VEX.L = 1 && !AVX2) AVX, AVX2 SSE2 PSLLD VPSLLD AVX, AVX2 SSE2 PSLLQ VPSLLQ AVX, AVX2 SSE2 PSLLW VPSLLW AVX, AVX2 SSE2 PSRAD VPSRAD AVX, AVX2 SSE2 PSRAW VPSRAW AVX, AVX2 SSE2 PSRLD VPSRLD AVX, AVX2 SSE2 PSRLQ VPSRLQ AVX, AVX2 SSE2 PSRLW VPSRLW Class 4G — AVX Vector (VEX.W = 1, VEX.vvvv != 1111b) 884 26568—Rev. 3.22—May 2018 AMD64 Technology Table 3-1. Instructions By Exception Class (continued) Mnemonic Extended Type VTESTPD VTESTPS Class 4H — AVX, 256-bit only (VEX.L = 0; No SIMD Exceptions) VPERMD VPERMPS Class 4H-1 — AVX2, 256-bit only (VEX.L = 0, VEX.vvvv != 1111b) VPERMPD VPERMQ Class 4J — AVX2 (VEX.W = 1) VPBLENDD VPSRAVD Legacy Type AVX — AVX — AVX2 — AVX2 — AVX2 — AVX2 — AVX2 AVX2 — — AVX2 AVX2 AVX2 AVX2 AVX2 AVX2 — — — — — — Class 4K — AVX2 VPMASKMOVD VPMASKMOVQ VPSLLVD VPSLLVQ VPSRLVD VPSRLVQ Class 5 — AVX / SSE Scalar AVX RCPSS VRCPSS AVX RSQRTSS VRSQRTSS Class 5A — AVX / SSE Scalar (VEX.L = 1) AVX INSERTPS VINSERTPS Class 5B — AVX / SSE Scalar (VEX.vvvv != 1111b) AVX CVTDQ2PD VCVTDQ2PD AVX MOVDDUP VMOVDDUP Class 5C — AVX /SSE Scalar (VEX.vvvv != 1111b, VEX.L = 1) AVX PINSRB VPINSRB AVX PINSRD VPINSRD AVX PINSRQ VPINSRQ AVX PINSRW VPINSRW Class 5C-X — SSE / AVX / AVX2 Scalar (VEX.vvvv != 1111b, (VEX.L = 1 && !AVX2)) AVX, AVX2 PMOVSXBD VPMOVSXBD AVX, AVX2 PMOVSXBQ VPMOVSXBQ AVX, AVX2 PMOVSXBW VPMOVSXBW AVX, AVX2 PMOVSXDQ VPMOVSXDQ AVX, AVX2 PMOVSXWD VPMOVSXWD AVX, AVX2 PMOVSXWQ VPMOVSXWQ AVX, AVX2 PMOVZXBD VPMOVZXBD AVX, AVX2 PMOVZXBQ VPMOVZXBQ AVX, AVX2 PMOVZXBW VPMOVZXBW AVX, AVX2 PMOVZXDQ VPMOVZXDQ SSE SSE SSE4.1 SSE2 SSE3 SSE4.1 SSE4.1 SSE4.1 SSE SSE4.1 SSE4.1 SSE4.1 SSE4.1 SSE4.1 SSE4.1 SSE4.1 SSE4.1 SSE4.1 SSE4.1 885 AMD64 Technology 26568—Rev. 3.22—May 2018 Table 3-1. Instructions By Exception Class (continued) Mnemonic Extended Type Legacy Type AVX, AVX2 SSE4.1 PMOVZXWD VPMOVZXWD AVX, AVX2 SSE4.1 PMOVZXWQ VPMOVZXWQ Class 5C-1 — AVX / SSE Scalar (write to RO memory, VEX.vvvv != 1111b, VEX.L = 1) AVX SSE4.1 EXTRACTPS VEXTRACTPS AVX SSE2 MOVD VMOVD AVX SSE2 MOVQ VMOVQ AVX SSE4.1 PEXTRB VPEXTRB AVX SSE4.1 PEXTRD VPEXTRD AVX SSE4.1 PEXTRQ VPEXTRQ AVX SSE4.1 PEXTRW VPEXTRW Class 5D — AVX / SSE Scalar (write to RO memory, VEX.vvvv != 1111b (variant)) AVX SSE2 MOVSD VMOVSD AVX SSE MOVSS VMOVSS Class 5E — AVX / SSE Scalar (write to RO, VEX.vvvv != 1111b (variant), VEX.L = 1) AVX SSE2 MOVHPD VMOVHPD AVX SSE MOVHPS VMOVHPS AVX SSE2 MOVLPD VMOVLPD AVX SSE MOVLPS VMOVLPS Class 6 — AVX Mixed Memory Argument — — (unused) Class 6A — AVX Mixed Memory Argument (VEX.W = 1) — — (unused) Class 6A-1 — AVX Mixed Memory Argument (write to RO memory, VEX.W = 1) AVX — VMASKMOVPD AVX — VMASKMOVPS Class 6B — AVX Mixed Memory Argument (VEX.W = 1, VEX.L = 0) AVX — VINSERTF128 AVX2 — VINSERTI128 AVX — VPERM2F128 AVX2 — VPERM2I128 Class 6B-1 — AVX Mixed Memory Argument (write to RO, VEX.W = 1, VEX.L = 0) AVX — VEXTRACTF128 Class 6C — AVX Mixed Memory Argument (VEX.W = 1, VEX.L = 0, VEX.vvvv != 1111b) AVX — VBROADCASTF128 AVX2 — VBROADCASTI128 AVX2 — VEXTRACTI128 Class 6C-X — AVX / AVX2 (W=1, vvvv!=1111b, L=0, (reg src op specified && !AVX2)) AVX, AVX2 — VBROADCASTSD Class 6D — AVX Mixed Memory Argument (VEX.W = 1, VEX.vvvv != 1111b) AVX2 — VPBROADCASTB AVX2 — VPBROADCASTD AVX2 — VPBROADCASTQ 886 26568—Rev. 3.22—May 2018 AMD64 Technology Table 3-1. Instructions By Exception Class (continued) Mnemonic Extended Type Legacy Type AVX2 — VPBROADCASTW Class 6D-X — AVX / AVX2 (W = 1, vvvv != 1111b, (ModRM.mod = 11b && !AVX2)) AVX, AVX2 — VBROADCASTSS Class 6E — AVX Mixed Memory Argument (VEX.W = 1, VEX.vvvv != 1111b (variant)) AVX — VPERMILPD AVX — VPERMILPS Class 6F — AVX2 (VEX.W = 1, VEX.vvvv != 1111b, VEX.L = 0, ModRM.mod = 11b) AVX2 — VBROADCASTI128 Class 7 — AVX / SSE No Memory Argument — — (unused) Class 7A — AVX /SSE No Memory Argument (VEX.L = 1) AVX SSE MOVHLPS VMOVHLPS AVX SSE MOVLHPS VMOVLHPS Class 7A-X SSE / AVX / AVX2 Vector (VEX.L = 1 && !AVX2) AVX, AVX2 SSE2 PSLLDQ VPSLLDQ AVX, AVX2 SSE2 PSRLDQ VPSRLDQ Class 7B — AVX /SSE No Memory Argument (VEX.vvvv != 1111b) AVX SSE2 MOVMSKPD VMOVMSKPD AVX SSE MOVMSKPS VMOVMSKPS Class 7C — AVX / SSE No Memory Argument (VEX.vvvv != 1111b, VEX.L = 1) — — (not used) Class 7C-X SSE / AVX / AVX2 Vector (VEX.vvvv != 1111b, (VEX.L = 1 && !AVX2)) AVX, AVX2 SSE2 PMOVMSKB VPMOVMSKB Class 8 — AVX No Memory Argument (VEX.vvvv != 1111b, VEX.W = 1) AVX — VZEROALL AVX — VZEROUPPER Class 9 — AVX 4-byte Argument (write to RO memory, VEX.vvvv != 1111b, VEX.L = 1) AVX SSE STMXCSR VSTMXCSR Class 9A — AVX 4-byte argument (reserved MBZ = 1, VEX.vvvv != 1111b, VEX.L = 1) AVX SSE LDMXCSR VLDMXCSR 887 AMD64 Technology 26568—Rev. 3.22—May 2018 Table 3-1. Instructions By Exception Class (continued) Mnemonic Extended Type Class 10 — XOP Base XOP VPCMOV XOP VPCOMB XOP VPCOMD XOP VPCOMQ XOP VPCOMUB XOP VPCOMUD XOP VPCOMUQ XOP VPCOMUW XOP VPCOMW XOP VPERMIL2PS XOP VPERMIL2PD Class 10A — XOP Base (XOP.L = 1) XOP VPPERM XOP VPSHAB XOP VPSHAD XOP VPSHAQ XOP VPSHAW XOP VPSHLB XOP VPSHLD XOP VPSHLQ XOP VPSHLW Class 10B — XOP Base (XOP.W = 1, XOP.L = 1) XOP VPMACSDD XOP VPMACSDQH XOP VPMACSDQL XOP VPMACSSDD XOP VPMACSSDQH XOP VPMACSSDQL XOP VPMACSSWD XOP VPMACSSWW XOP VPMACSWD XOP VPMACSWW XOP VPMADCSSWD XOP VPMADCSWD Class 10C — XOP Base (XOP.W = 1, XOP.vvvv != 1111b, XOP.L = 1) XOP VPHADDBD XOP VPHADDBQ XOP VPHADDBW XOP VPHADDD XOP VPHADDDQ XOP VPHADDUBD 888 Legacy Type — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — 26568—Rev. 3.22—May 2018 AMD64 Technology Table 3-1. Instructions By Exception Class (continued) Mnemonic Extended Type XOP VPHADDUBQ XOP VPHADDUBW XOP VPHADDUDQ XOP VPHADDUWD XOP VPHADDUWQ XOP VPHADDWD XOP VPHADDWQ XOP VPHSUBBW XOP VPHSUBDQ XOP VPHSUBWD Class 10D — XOP Base (SIMD 110011, XOP.vvvv != 1111b, XOP.W = 1) XOP VFRCZPD XOP VFRCZPS XOP VFRCZSD XOP VFRCZSS Class 10E — XOP Base (XOP.vvvv != 1111b (variant), XOP.L = 1) XOP VPROTB XOP VPROTD XOP VPROTQ XOP VPROTW Class 11 — F16C Instructions F16C VCVTPH2PS F16C VCVTPS2PH Class 12 — AVX2 VSID (ModRM.mod = 11b, ModRM.rm != 100b) VGATHERDPD VGATHERDPS VGATHERQPD VGATHERQPS VPGATHERDD VPGATHERDQ VPGATHERQD VPGATHERQQ AVX2 AVX2 AVX2 AVX2 AVX2 AVX2 AVX2 AVX2 Class FMA-2 — FMA / FMA4 Vector (SIMD Exceptions PE, UE, OE, DE, IE) FMA4 VFMADDPD FMA4 VFMADDPS FMA4 VFMADDSUBPD FMA4 VFMADDSUBPS FMA4 VFMSUBADDPD FMA4 VFMSUBADDPS FMA4 VFMSUBPD FMA4 VFMSUBPS FMA4 VFNMADDPD Legacy Type — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — 889 AMD64 Technology 26568—Rev. 3.22—May 2018 Table 3-1. Instructions By Exception Class (continued) Mnemonic Extended Type FMA4 VFNMADDPS FMA4 VFNMSUBPD FMA4 VFNMSUBPS Class FMA-3 — FMA / FMA4 Scalar (SIMD Exceptions PE, UE, OE, DE, IE) FMA4 VFMADDSD FMA4 VFMADDSS FMA4 VFMSUBSD FMA4 VFMSUBSS FMA4 VFNMADDSD FMA4 VFNMADDSS FMA4 VFNMSUBSD FMA4 VFNMSUBSS Unique Cases — XGETBV — XRSTOR — XSAVE/XSAVEOPT — XSETBV 890 Legacy Type — — — — — — — — — — — — — — — 26568—Rev. 3.22—May 2018 AMD64 Technology Class 1 — AVX / SSE Vector Aligned (VEX.vvvv != 1111) Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X S X A Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception S X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Memory operand not aligned on a 16-byte boundary. Write to a read-only data segment. VEX256: Memory operand not 32-byte aligned. VEX128: Memory operand not 16-byte aligned. Null data segment used to reference memory. Instruction execution caused a page fault. 891 AMD64 Technology 26568—Rev. 3.22—May 2018 Class 1X — SSE / AXV / AVX2 Vector (VEX.vvvv != 1111b or VEX.L=1 && !AVX2) Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS X S S A A A A A X X X X S X General protection, #GP A Page fault, #PF S X — AVX, AVX2, and SSE exception A — AVX, AVX2 exception S — SSE exception 892 X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Memory operand not aligned on a 16-byte boundary. Write to a read-only data segment. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Null data segment used to reference memory. Instruction execution caused a page fault. 26568—Rev. 3.22—May 2018 AMD64 Technology Class 2 — AVX / SSE Vector (SIMD 111111) Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X S X S S S S A X S X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD floating-point, #XF S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Division by zero, ZE Overflow, OE Underflow, UE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception S S S S S S S S S S S S S S X X X X X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Division of finite dividend by zero-value divisor. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. 893 AMD64 Technology 26568—Rev. 3.22—May 2018 Class 2-1 — AVX / SSE Vector (SIMD 111011) Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X S X S S S S A X S X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD floating-point, #XF S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception 894 S S S S S S S S S S S S X X X X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. 26568—Rev. 3.22—May 2018 AMD64 Technology Class 2-2 — AVX / SSE Vector (SIMD 000011) Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X S X S S S S A X S X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD floating-point, #XF S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE X — AVX and SSE exception A — AVX exception S — SSE exception S S S S S S X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. 895 AMD64 Technology 26568—Rev. 3.22—May 2018 Class 2-3 — AVX / SSE Vector (SIMD 100001) Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X S X S S S S A X S X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD floating-point, #XF S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception 896 S S S S S S X X X A source operand was an SNaN value. Undefined operation. A result could not be represented exactly in the destination format. 26568—Rev. 3.22—May 2018 AMD64 Technology Class 2A — AVX / SSE Vector (SIMD 111111, VEX.L = 1) Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A A X S S X S S S S S S S S S X X X S X X S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC S X A SIMD floating-point, #XF S S S S S S S S S S S S S S S S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Division by zero, ZE Overflow, OE Underflow, UE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception X X X X X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Division of finite dividend by zero-value divisor. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. 897 AMD64 Technology 26568—Rev. 3.22—May 2018 Class 2A-1 — AVX / SSE Vector (SIMD 111011, VEX.L = 1) Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A A X S S X S S S S S S S S X X X S X S S S S A X S S X S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception 898 X X X X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. 26568—Rev. 3.22—May 2018 AMD64 Technology Class 2B — AVX / SSE Vector (SIMD 111111, VEX.vvvv != 1111b) Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A A X S S X S S S S S S S S X X X S X S S S S A X S S X S S S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Division by zero, ZE Overflow, OE Underflow, UE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception X X X X X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Division of finite dividend by zero-value divisor. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. 899 AMD64 Technology 26568—Rev. 3.22—May 2018 Class 2B-1 — AVX / SSE Vector (SIMD 100000, VEX.vvvv != 1111b) Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A A X S S X S S S S S S S S X X X S X S S S S A X S S X Precision, PE S X — AVX and SSE exception A — AVX exception S — SSE exception S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions 900 X A result could not be represented exactly in the destination format. 26568—Rev. 3.22—May 2018 AMD64 Technology Class 2B-2 — AVX / SSE Vector (SIMD 100001, VEX.vvvv != 1111b) Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A A X S S X S S S S S S S S X X X S X S S S S A X S S X S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception X X X A source operand was an SNaN value. Undefined operation. A result could not be represented exactly in the destination format. 901 AMD64 Technology 26568—Rev. 3.22—May 2018 Class 2B-3 — AVX / SSE Vector (SIMD 111011, VEX.vvvv != 1111b) Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A A X S S X S S S S S S S S X X X S X S S S S A X S S X S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception 902 X X X X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. 26568—Rev. 3.22—May 2018 AMD64 Technology Class 2B-4 — AVX / SSE Vector (SIMD 100011, VEX.vvvv != 1111b) Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A A X S S X S S S S S S S S X X X S X S S S S A X S S X S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Non-aligned memory operand while MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception X X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. A result could not be represented exactly in the destination format. 903 AMD64 Technology 26568—Rev. 3.22—May 2018 Class 3 — AVX / SSE Scalar (SIMD 111111) Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X X X X S S X S S S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Division by zero, ZE Overflow, OE Underflow, UE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception 904 X X X X X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Division of finite dividend by zero-value divisor. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. 26568—Rev. 3.22—May 2018 AMD64 Technology Class 3-1 — AVX / SSE Scalar (SIMD 111011) Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X X X X S S X S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception X X X X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. 905 AMD64 Technology 26568—Rev. 3.22—May 2018 Class 3-2 — AVX / SSE Scalar (SIMD 000011) Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X X X X S S X S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE X — AVX and SSE exception A — AVX exception S — SSE exception 906 X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. 26568—Rev. 3.22—May 2018 AMD64 Technology Class 3-3 — AVX / SSE Scalar (SIMD 100000) Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X X X X S S X Precision, PE S X — AVX and SSE exception A — AVX exception S — SSE exception S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions X A result could not be represented exactly in the destination format. 907 AMD64 Technology 26568—Rev. 3.22—May 2018 Class 3-4 — AVX / SSE Scalar (SIMD 100001) Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X X X X S S X S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception 908 X X X A source operand was an SNaN value. Undefined operation. A result could not be represented exactly in the destination format. 26568—Rev. 3.22—May 2018 AMD64 Technology Class 3-5 — AVX / SSE Scalar (SIMD 100011) Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A X S S X S S S S S S S S X X X X X X S S X S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception X X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. A result could not be represented exactly in the destination format. 909 AMD64 Technology 26568—Rev. 3.22—May 2018 Class 3A — AVX / SSE Scalar (SIMD 111111, VEX.vvvv != 1111b) Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A A X S S X S S S S S S S S X X X X X X S S X S S S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Division by zero, ZE Overflow, OE Underflow, UE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception 910 X X X X X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Division of finite dividend by zero-value divisor. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. 26568—Rev. 3.22—May 2018 AMD64 Technology Class 3A-1 — AVX / SSE Scalar (SIMD 000011, VEX.vvvv != 1111b) Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A A X S S X S S S S S S S S X X X X X X S S X S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE X — AVX and SSE exception A — AVX exception S — SSE exception X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. 911 AMD64 Technology 26568—Rev. 3.22—May 2018 Class 3A-2 — AVX / SSE Scalar (SIMD 100001, VEX.vvvv != 1111b) Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S A A A A X S S X S S S S S S S S X X X X X X S S X S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Precision, PE X — AVX and SSE exception A — AVX exception S — SSE exception 912 X X X A source operand was an SNaN value. Undefined operation. A result could not be represented exactly in the destination format. 26568—Rev. 3.22—May 2018 AMD64 Technology Class 4 — AVX / SSE Vector Exceptions Exception Mode Real Virt Prot Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception X A S S X A S S X S S S S S S S S S S S S S S A X S S A A A X X X X S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Memory operand not 16-byte aligned and MXCSR.MM = 0. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. 913 AMD64 Technology 26568—Rev. 3.22—May 2018 Class 4A — AVX / SSE Vector (VEX.W = 1) Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S S S A X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception 914 X S S A A A A X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. 26568—Rev. 3.22—May 2018 AMD64 Technology Class 4B — AVX / SSE Vector (VEX.L = 1) Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S S S A X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception X S S A A A A X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. 915 AMD64 Technology 26568—Rev. 3.22—May 2018 Class 4B-X — SSE / AVX / AVX2 (VEX.L = 1 && !AVX2) Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 916 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. 26568—Rev. 3.22—May 2018 AMD64 Technology Class 4C — AVX / SSE Vector (VEX.vvvv != 1111b) Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S S S A X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception X S S A A A A X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. 917 AMD64 Technology 26568—Rev. 3.22—May 2018 Class 4C-1 — AVX / SSE Vector (write to RO memory, VEX.vvvv != 1111b) Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Alignment check, #AC S Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP 918 X S S A A A A X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Write to a read-only data segment. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. 26568—Rev. 3.22—May 2018 AMD64 Technology Class 4D — AVX / SSE Vector (VEX.vvvv != 1111b, VEX.L = 1) Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S S S A X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception X S S A A A A A X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. 919 AMD64 Technology 26568—Rev. 3.22—May 2018 Class 4D-X — SSE / AVX / AVX2 Vector (VEX.vvvv != 1111b, (VEX.L = 1 && !AVX2)) Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 920 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. 26568—Rev. 3.22—May 2018 AMD64 Technology Class 4E — AVX / SSE Vector (VEX.W = 1, VEX.L = 1) Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S S S A X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception X S S A A A A A X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. VEX.L = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. 921 AMD64 Technology 26568—Rev. 3.22—May 2018 Class 4E-X — SSE / AVX / AVX2 Vector (VEX.W = 1, (VEX.L = 1 && !AVX2)) Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A A X X X X X S Alignment check, #AC A Page fault, #PF S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 922 X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. 26568—Rev. 3.22—May 2018 AMD64 Technology Class 4F — AVX / SSE (VEX.L = 1) Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX and SSE exception A — AVX exception S — SSE exception X S S A A A A X X A A A S A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. 923 AMD64 Technology 26568—Rev. 3.22—May 2018 Class 4F-X — SSE / AVX / AVX2 Vector (VEX.L = 1 && !AVX2) Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP X S S A A A A X X A A A S Alignment check, #AC A Page fault, #PF X — AVX, AVX2, and SSE exception A — AVX and AVX2 exception S — SSE exception 924 A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. When alignment checking enabled: • 128-bit memory operand not 16-byte aligned. • 256-bit memory operand not 32-byte aligned. Instruction execution caused a page fault. 26568—Rev. 3.22—May 2018 AMD64 Technology Class 4G — AVX Vector (VEX.W = 1, VEX.vvvv != 1111b) Exceptions Exception Mode Real Virt Prot X X X X X X X X X X X X X X X X S S S X A X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF X — AVX exception X X X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled and MXCSR.MM = 1. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. 925 AMD64 Technology 26568—Rev. 3.22—May 2018 Class 4H — AVX, 256-bit only (VEX.L = 0; No SIMD Exceptions) Exceptions Exception Invalid opcode, #UD Mode Real Virt Prot A A A A A A A A A A A A Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF A — AVX2 exception 926 A A A A A A A A A A A A A A A A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L= 0. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. 26568—Rev. 3.22—May 2018 AMD64 Technology Class 4H-1 — AVX2, 256-bit only (VEX.L = 0, VEX.vvvv != 1111b) Exceptions Exception Mode Real Virt Prot A A A A A A A A A A A A A A A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF A — AVX2 exception A A A A A A A A A A A A A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L= 0. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not 16-byte aligned when alignment checking enabled. Instruction execution caused a page fault. 927 AMD64 Technology 26568—Rev. 3.22—May 2018 Class 4J — AVX2 (VEX.W = 1) Exceptions Exception Mode Real Virt Prot A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP A A A A A A A A A A A A Alignment check, #AC A Page fault, #PF A — AVX2 exception A 928 Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. 26568—Rev. 3.22—May 2018 AMD64 Technology Class 4K — AVX2 Exceptions Exception Mode Real Virt Prot A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP A A A A A A A A A A A Alignment check, #AC A Page fault, #PF A — AVX2 exception A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. 929 AMD64 Technology 26568—Rev. 3.22—May 2018 Class 5 — AVX / SSE Scalar Exceptions Exception Mode Real Virt Prot Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — AVX and SSE exception A — AVX exception S — SSE exception 930 X A S S X A S S S S S S S S S S S S X S S A A A X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. 26568—Rev. 3.22—May 2018 AMD64 Technology Class 5A — AVX / SSE Scalar (VEX.L = 1) Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — AVX and SSE exception A — AVX exception S — SSE exception S S X S S A A A A X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. 931 AMD64 Technology 26568—Rev. 3.22—May 2018 Class 5B — AVX / SSE Scalar (VEX.vvvv != 1111b) Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — AVX and SSE exception A — AVX exception S — SSE exception 932 S S X S S A A A A X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference with alignment checking enabled. 26568—Rev. 3.22—May 2018 AMD64 Technology Class 5C — AVX /SSE Scalar (VEX.vvvv != 1111b, VEX.L = 1) Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — AVX and SSE exception A — AVX exception S — SSE exception S S X S S A A A A A X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. 933 AMD64 Technology 26568—Rev. 3.22—May 2018 Class 5C-X — SSE / AVX / AVX2 Scalar (VEX.vvvv != 1111b, (VEX.L = 1 && !AVX2)) Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF S Alignment check, #AC S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception 934 X S S A A A A A X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. 26568—Rev. 3.22—May 2018 AMD64 Technology Class 5C-1 — AVX / SSE Scalar (write to RO memory, VEX.vvvv != 1111b, VEX.L = 1) Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — AVX and SSE exception A — AVX exception S — SSE exception S S X S S A A A A A X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Write to a read-only data segment. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. 935 AMD64 Technology 26568—Rev. 3.22—May 2018 Class 5D — AVX / SSE Scalar (write to RO memory, VEX.vvvv != 1111b (variant)) Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — AVX and SSE exception A — AVX exception S — SSE exception 936 S S X S S A A A A X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b (for memory destination enoding only). REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Write to a read-only data segment. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. 26568—Rev. 3.22—May 2018 AMD64 Technology Class 5E — AVX / SSE Scalar (write to RO, VEX.vvvv != 1111b (variant), VEX.L = 1) Exceptions Exception Mode Real Virt Prot X A S S X A S S S S S S S S S S S S Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — AVX and SSE exception A — AVX exception S — SSE exception S S X S S A A A A A X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b (for memory destination encoding only). VEX.L = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Write to a read-only data segment. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. 937 AMD64 Technology 26568—Rev. 3.22—May 2018 Class 6 — AVX Mixed Memory Argument Exceptions Exception Mode Real Virt Prot A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC A — AVX exception. 938 A A A A A A A A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. 26568—Rev. 3.22—May 2018 AMD64 Technology Class 6A — AVX Mixed Memory Argument (VEX.W = 1) Exceptions Exception Mode Real Virt Prot A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC A — AVX exception. A A A A A A A A A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. 939 AMD64 Technology 26568—Rev. 3.22—May 2018 Class 6A-1 — AVX Mixed Memory Argument (write to RO memory, VEX.W = 1) Exceptions Exception Mode Real Virt Prot A A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP S Page fault, #PF A — AVX exception. 940 S A A A A A A A A A X A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Write to a read-only data segment. Instruction execution caused a page fault. 26568—Rev. 3.22—May 2018 AMD64 Technology Class 6B — AVX Mixed Memory Argument (VEX.W = 1, VEX.L = 0) Exceptions Exception Mode Real Virt Prot A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC A — AVX exception. A A A A A A A A A A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. VEX.L = 0. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. 941 AMD64 Technology 26568—Rev. 3.22—May 2018 Class 6B-1 — AVX Mixed Memory Argument (write to RO, VEX.W = 1, VEX.L = 0) Exceptions Exception Mode Real Virt Prot A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC A — AVX exception. 942 A A A A A A A A A A A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. VEX.L = 0. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Write to a read-only data segment. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. 26568—Rev. 3.22—May 2018 AMD64 Technology Class 6C — AVX Mixed Memory Argument (VEX.W = 1, VEX.L = 0, VEX.vvvv != 1111b) Exceptions Exception Mode Real Virt Prot A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC A — AVX exception. A A A A A A A A A A A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. VEX.vvvv ! = 1111b. VEX.L = 0. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. 943 AMD64 Technology 26568—Rev. 3.22—May 2018 Class 6C-X — AVX / AVX2 (W=1, vvvv!=1111b, L=0, (reg src op specified && !AVX2)) Exceptions Exception Mode Real Virt Prot A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC A — AVX, AVX2 exception. 944 A A A A A A A A A A A A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. VEX.vvvv ! = 1111b. VEX.L = 0. Register-based source operand specified when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. 26568—Rev. 3.22—May 2018 AMD64 Technology Class 6D — AVX Mixed Memory Argument (VEX.W = 1, VEX.vvvv != 1111b) Exceptions Exception Mode Real Virt Prot A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC A — AVX exception. A A A A A A A A A A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. 945 AMD64 Technology 26568—Rev. 3.22—May 2018 Class 6D-X — AVX / AVX2 (W = 1, vvvv != 1111b, (ModRM.mod = 11b && !AVX2)) Exceptions Exception Mode Real Virt Prot A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC A — AVX, AVX2 exception. 946 A A A A A A A A A A A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. VEX.vvvv ! = 1111b. MODRM.mod = 11b when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. 26568—Rev. 3.22—May 2018 AMD64 Technology Class 6E — AVX Mixed Memory Argument (VEX.W = 1, VEX.vvvv != 1111b (variant)) Exceptions Exception Mode Real Virt Prot A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC A — AVX exception. A A A A A A A A A A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. VEX.vvvv ! = 1111b (for versions with immediate byte operand only). REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. 947 AMD64 Technology 26568—Rev. 3.22—May 2018 Class 6F — AVX2 (VEX.W = 1, VEX.vvvv != 1111b, VEX.L = 0, ModRM.mod = 11b) Exceptions Exception Mode Real Virt Prot A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC A — AVX exception. 948 A A A A A A A A A A A A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. VEX.vvvv ! = 1111b. VEX.L = 0. Register-based source operand specified (MODRM.mod = 11b) REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. 26568—Rev. 3.22—May 2018 AMD64 Technology Class 7 — AVX / SSE No Memory Argument Exceptions Exception Mode Real Virt Prot Invalid opcode, #UD Device not available, #NM X — AVX and SSE exception A — AVX exception S — SSE exception X A S S X A S S X S X S X S S A A A X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. 949 AMD64 Technology 26568—Rev. 3.22—May 2018 Class 7A — AVX /SSE No Memory Argument (VEX.L = 1) Exceptions Exception Mode Real Virt Prot X A S S X A S S X Device not available, #NM S X — AVX and SSE exception A — AVX exception S — SSE exception X S Invalid opcode, #UD 950 X S S A A A A X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. 26568—Rev. 3.22—May 2018 AMD64 Technology Class 7A-X SSE / AVX / AVX2 Vector (VEX.L = 1 && !AVX2) Exceptions Exception Mode Real Virt Prot X A S S X A S S Invalid opcode, #UD X X Device not available, #NM S S X — SSE, AVX, and AVX2 exception A — AVX, AVX2 exception S — SSE exception X S S A A A A X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.L = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. 951 AMD64 Technology 26568—Rev. 3.22—May 2018 Class 7B — AVX /SSE No Memory Argument (VEX.vvvv != 1111b) Exceptions Exception Mode Real Virt Prot X A S S X A S S X Device not available, #NM S X — AVX and SSE exception A — AVX exception S — SSE exception X S Invalid opcode, #UD 952 X S S A A A A X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. 26568—Rev. 3.22—May 2018 AMD64 Technology Class 7C — AVX / SSE No Memory Argument (VEX.vvvv != 1111b, VEX.L = 1) Exceptions Exception Mode Real Virt Prot X A S S X A S S X Device not available, #NM S X — AVX and SSE exception A — AVX exception S — SSE exception X S Invalid opcode, #UD X S S A A A A A X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv field ! = 1111b. VEX.L field = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. 953 AMD64 Technology 26568—Rev. 3.22—May 2018 Class 7C-X SSE / AVX / AVX2 Vector (VEX.vvvv != 1111b, (VEX.L = 1 && !AVX2)) Exceptions Exception Mode Real Virt Prot X A S S X A S S Invalid opcode, #UD X X Device not available, #NM S S X — SSE, AVX and AVX2 exception A — AVX, AVX2exception S — SSE exception 954 X S S A A A A A X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR0.EM = 1. CR4.OSFXSR = 0. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv field ! = 1111b. VEX.L field = 1 when AVX2 not supported. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. 26568—Rev. 3.22—May 2018 AMD64 Technology Class 8 — AVX No Memory Argument (VEX.vvvv != 1111b, VEX.W = 1) Exceptions Exception Mode Real Virt Prot A A Invalid opcode, #UD Device not available, #NM A — AVX exception. A A A A A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W = 1. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. 955 AMD64 Technology 26568—Rev. 3.22—May 2018 Class 9 — AVX 4-byte Argument (write to RO memory, VEX.vvvv != 1111b, VEX.L = 1) Exceptions Exception Mode Real Virt Prot Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — AVX and SSE exception A — AVX exception S — SSE exception 956 X A X A S S S S X S S S S S X S S S S S S X A S S A A A A X X X X X S X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. CR0.EM = 1. CR4.OSFXSR = 0. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Write to a read-only data segment. Null data segment used to reference memory. Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. 26568—Rev. 3.22—May 2018 AMD64 Technology Class 9A — AVX 4-byte argument (reserved MBZ = 1, VEX.vvvv != 1111b, VEX.L = 1) Exceptions Exception Mode Real Virt Prot Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — AVX and SSE exception A — AVX exception S — SSE exception X A X A S S S S X S S S S S X S S S S S S X A S S A A A A X X X X S X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. CR0.EM = 1. CR4.OSFXSR = 0. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.vvvv ! = 1111b. VEX.L = 1. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Attempt to load non-zero values into reserved MXCSR bits Instruction execution caused a page fault. Unaligned memory reference when alignment checking enabled. 957 AMD64 Technology 26568—Rev. 3.22—May 2018 Class 10 — XOP Base Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception 958 X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. 26568—Rev. 3.22—May 2018 AMD64 Technology Class 10A — XOP Base (XOP.L = 1) Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception X X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.L = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. 959 AMD64 Technology 26568—Rev. 3.22—May 2018 Class 10B — XOP Base (XOP.W = 1, XOP.L = 1) Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception 960 X X X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.W = 1. XOP.L = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. 26568—Rev. 3.22—May 2018 AMD64 Technology Class 10C — XOP Base (XOP.W = 1, XOP.vvvv != 1111b, XOP.L = 1) Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception X X X X A X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.W = 1. XOP.vvvv ! = 1111b. XOP.L = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. 961 AMD64 Technology 26568—Rev. 3.22—May 2018 Class 10D — XOP Base (SIMD 110011, XOP.vvvv != 1111b, XOP.W = 1) Exceptions Exception Mode Real Virt Prot X X X X X X X X Invalid opcode, #UD X Device not available, #NM Stack, #SS X X X X X X General protection, #GP Page fault, #PF Alignment check, #AC SIMD floating-point, #XF S S X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.W = 1. XOP.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding XOP prefix. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0. See SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Underflow, UE Precision, PE X — XOP exception 962 X X X X X A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. 26568—Rev. 3.22—May 2018 AMD64 Technology Class 10E — XOP Base (XOP.vvvv != 1111b (variant), XOP.L = 1) Exceptions Exception Mode Real Virt Prot X X Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF Alignment check, #AC X — XOP exception X X X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. XOP instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. XOP.vvvv ! = 1111b (for immediate operand variant only) XOP.L field = 1. REX, F2, F3, or 66 prefix preceding XOP prefix. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. 963 AMD64 Technology 26568—Rev. 3.22—May 2018 Class 11 — F16C Instructions Exceptions Exception Mode Real Virt Prot F F F F Invalid opcode, #UD F F A F F F Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF SIMD Floating-Point Exception, #XF F F F F F F F Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. VEX.W field = 1. VEX.vvvv ! = 1111b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Unaligned memory reference when alignment checking enabled. Instruction execution caused a page fault. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid-operation exception (IE) F F A source operand was an SNaN value. Undefined operation. Denormalized-operand exception (DE) Overflow exception (OE) Underflow exception (UE) Precision exception (PE) F — F16C exception. F A source operand was a denormal value. F F F Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. 964 26568—Rev. 3.22—May 2018 AMD64 Technology Class 12 — AVX2 VSID (ModRM.mod = 11b, ModRM.rm != 100b) Exceptions Exception Mode Real Virt Prot A A A A A A A A A A A A A A A Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Alignment check, #AC Page fault, #PF A — AVX2 exception A A A A Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. AVX instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. MODRM.mod = 11b MODRM.rm ! = 100b YMM/XMM registers specified for destination, mask, and index not unique. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Alignment checking enabled and: 256-bit memory operand not 32-byte aligned or 128-bit memory operand not 16-byte aligned. Instruction execution caused a page fault. 965 AMD64 Technology 26568—Rev. 3.22—May 2018 Class FMA-2 — FMA / FMA4 Vector (SIMD Exceptions PE, UE, OE, DE, IE) Exceptions Exception Mode Real Virt Prot F F Invalid opcode, #UD F F F F F F Device not available, #NM Stack, #SS Page fault, #PF Alignment check, #AC F F F F F F SIMD floating-point, #XF F General protection, #GP Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. FMA instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Memory operand not 16-byte aligned when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE F — FMA, FMA4 exception 966 F F F F F F A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. 26568—Rev. 3.22—May 2018 AMD64 Technology Class FMA-3 — FMA / FMA4 Scalar (SIMD Exceptions PE, UE, OE, DE, IE) Exceptions Exception Mode Real Virt Prot F F Invalid opcode, #UD F F F F F F Device not available, #NM Stack, #SS Page fault, #PF Alignment check, #AC F F F F F F SIMD floating-point, #XF F General protection, #GP Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. FMA instructions are only recognized in protected mode. CR4.OSXSAVE = 0, indicated by CPUID Fn0000_0001_ECX[OSXSAVE]. XFEATURE_ENABLED_MASK[2:1] ! = 11b. REX, F2, F3, or 66 prefix preceding VEX prefix. Lock prefix (F0h) preceding opcode. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0, see SIMD Floating-Point Exceptions below for details. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Instruction execution caused a page fault. Non-aligned memory reference when alignment checking enabled. Unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1, see SIMD Floating-Point Exceptions below for details. SIMD Floating-Point Exceptions Invalid operation, IE Denormalized operand, DE Overflow, OE Underflow, UE Precision, PE F — FMA, FMA4 exception F F F F F F A source operand was an SNaN value. Undefined operation. A source operand was a denormal value. Rounded result too large to fit into the format of the destination operand. Rounded result too small to fit into the format of the destination operand. A result could not be represented exactly in the destination format. 967 AMD64 Technology 26568—Rev. 3.22—May 2018 XGETBV Exceptions Exception Invalid opcode, #UD General protection, #GP X — exception generated 968 Mode Real Virt Prot X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. Lock prefix (F0h) preceding opcode. ECX specifies a reserved or unimplemented XCR address. 26568—Rev. 3.22—May 2018 AMD64 Technology XRSTOR Exceptions Exception Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF X — exception generated Mode Real Virt Prot X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. CR4.OSFXSR = 0. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not aligned on 64-byte boundary. Any must be zero (MBZ) bits in the save area were set. Attempt to set reserved bits in MXCSR. Instruction execution caused a page fault. 969 AMD64 Technology 26568—Rev. 3.22—May 2018 XSAVE/XSAVEOPT Exceptions Exception Invalid opcode, #UD Device not available, #NM Stack, #SS General protection, #GP Page fault, #PF X — exception generated 970 Mode Real Virt Prot X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. CR4.OSFXSR = 0. Lock prefix (F0h) preceding opcode. CR0.TS = 1. Memory address exceeding stack segment limit or non-canonical. Memory address exceeding data segment limit or non-canonical. Null data segment used to reference memory. Memory operand not aligned on 64-byte boundary. Attempt to write read-only memory. Instruction execution caused a page fault. 26568—Rev. 3.22—May 2018 AMD64 Technology XSETBV Exceptions Exception Invalid opcode, #UD General protection, #GP Mode Real Virt Prot X X X X X X X X X X X X X X X X X X X X X Cause of Exception Instruction not supported, as indicated by CPUID feature identifier. CR4.OSFXSR = 0. Lock prefix (F0h) preceding opcode. CPL != 0. ECX specifies a reserved or unimplemented XCR address. Any must be zero (MBZ) bits in the save area were set. Writing 0 to XCR0. X — exception generated Note: In virtual mode, only #UD for Instruction not supported and #GP for CPL != 0 are supported. 971 AMD64 Technology 972 26568—Rev. 3.22—May 2018 26568—Rev. 3.22—May 2018 AMD64 Technology Appendix A AES Instructions This appendix gives background information concerning the use of the AES instruction subset in the implementation of encryption compliant to the Advanced Encryption Standard (AES). A.1 AES Overview This section provides an overview of AMD64 instructions that support AES software implementation. The U.S. National Institute of Standards and Technology has adopted the Rijndael algorithm, a block cipher that processes 16-byte data blocks using a shared key of variable length, as the Advanced Encryption Standard (AES). The standard is defined in Federal Information Processing Standards Publication 197 (FIPS 197), Specification for the Advanced Encryption Standard (AES). There are three versions of the algorithm, based on key widths of 16 (AES-128), 24 (AES-192), and 32 (AES256) bytes. The following AMD64 instructions support AES implementation: • • • AESDEC/VAESDEC and AESDECLAST/VAESDECLAST Perform one round of AES decryption AESENC/VAESENC and AESENCLAST/VAESENCLAST Perform one round of AES encryption AESIMC/VAESIMC Perform the AES InvMixColumn transformation - AESKEYGENASSIST/VAESKEYGENASSIST Assist AES round key generation - PCLMULQDQ, VPCLMULQDQ Perform carry-less multiplication See Chapter 2, “Instruction Reference” for detailed descriptions of the instructions. A.2 Coding Conventions This overview uses descriptive code that has the following basic characteristics. • • Syntax and notation based on the C language Four numerical data types: - bool: The numbers 0 and 1, the values of the Boolean constants false and true - nat: The infinite set of all natural numbers, including bool as a subtype - int: The infinite set of all integers, including nat as a subtype - rat: The infinite set of all rational numbers, including int as a subtype 973 AMD64 Technology • • • • • • 26568—Rev. 3.22—May 2018 Standard logical and arithmetic operators Enumeration (enum) types, arrays, structures (struct), and union types Global and local variable and constant declarations, initializations, and assignments Standard control constructs (if, then, else, for, while, switch, break, and continue) Function subroutines Macro definitions (#define) A.3 AES Data Structures The AES instructions operate on 16-byte blocks of text called the state. Each block is represented as a 4 × 4 matrix of bytes which is assigned the Galois field matrix data type (GFMatrix). In the AMD64 implementation, the matrices are formatted as 16-byte vectors in XMM registers or 128-bit memory locations. This overview represents each matrix as a sequence of 16 bytes in little-endian format (least significant byte on the right and most significant byte on the left). Figure A-1 shows a state block in 4 × 4 matrix representation. GFMatrix = X3,0 X2,0 X1,0 X0,0 X3,1 X2,1 X1,1 X0,1 X3,2 X2,2 X1,2 X0,2 X3,3 X2,3 X1,3 X0,3 Figure A-1. GFMatrix Representation of 16-byte Block Figure A-2 shows the AMD64 AES format, with the corresponding mapping of FIPS 197 AES “words” to operand bytes. XMM Register or 128-bit Memory Operand 127 120119112111104103 96 95 88 87 80 79 72 71 64 63 56 55 48 47 40 39 32 31 24 23 16 15 87 0 X3,3 X2,3 X1,3 X0,3 X3,2 X2,2 X1,2 X0,2 X3,1 X2,1 X1,1 X0,1 X3,0 X2,0 X1,0 X0,0 AES Word 3 AES Word 2 AES Word 1 AES Word 0 Figure A-2. GFMatrix to Operand Byte Mappings A.4 Algebraic Preliminaries AES operations are based on the Galois field GF = GF(28), of order 256, constructed by adjoining a root of the irreducible polynomial 974 26568—Rev. 3.22—May 2018 AMD64 Technology p(X) = X8 + X4 + X3 + X + 1 to the field of two elements, 2. Equivalently, GF is the quotient field 2[X]/p(X) and thus may be viewed as the set of all polynomials of degree less than 8 in 2[X] with the operations of addition and multiplication modulo p(X). These operations may be implemented efficiently by exploiting the mapping from 2[X] to the natural numbers given by anXn + … + a1X+a0 → 2nan + … + 2a1 + a0 → an … a1a0b For example: 1 → 01h X → 02h X2 → 04h X4 + X3 + 1 → 19h p(X) → 11Bh Thus, each element of GF is identified with a unique byte. This overview uses the data type GF256 as an alias of nat, to identify variables that are to be thought of as elements of GF. The operations of addition and multiplication in GF are denoted by ⊕ and of characteristic 2, addition is simply the “exclusive or” operation: , respectively. Since 2 is x ⊕ y = x^ y In particular, every element of GF is its own additive inverse. Multiplication in GF may be computed as a sequence of additions and multiplications by 2. Note that this operation may be viewed as multiplication in 2[X] followed by a possible reduction modulo p(X). Since 2 corresponds to the polynomial X and 11B corresponds to p(X), for any x ∈ GF, 2 x << 1 x= (x << 1) ⊕ 11Bh if x < 80h if x ≥ 80h Now, if y = b7…b1b0b, then x y=2 (…(2 (2 (b7 x) ⊕ b6 x ) ⊕ b5 x) …b0. This computation is performed by the GFMul( ) function. A.4.1 Multiplication in the Field GF The GFMul( ) function operates on GF256 elements in SRC1 and SRC2 and returns a GF256 matrix in the destination. GF256 GFMul(GF256 x, GF256 y) { nat sum = 0; 975 AMD64 Technology } 26568—Rev. 3.22—May 2018 for (int i=7; i>=0; i--) { // Multiply sum by 2. This amounts to a shift followed // by reduction mod 0x11B: sum <<= 1; if (sum > 0xFF) {sum = sum ^ 0x11B;} // Add y[i]*x: if (y[i]) {sum = sum ^ x;} } return sum; Because the multiplicative group GF* is of order 255, the inverse of an element x of GF may be computed by repeated multiplication as x--1 = x254. A more efficient computation, however, is performed by the GFInv( ) function as an application of Euclid’s greatest common divisor algorithm. See Section A.11, “Computation of GFInv with Euclidean Greatest Common Divisor” for an analysis of this computation and the GFInv( ) function. The AES algorithms operate on the vector space GF4, of dimension 4 over GF, which is represented by the array type GFWord. FIPS 197 refers to an object of this type as a word. This overview uses the term GF word in order to avoid confusion with the AMD64 notion of a 16-bit word. A GFMatrix is an array of four GF words, which are viewed as the rows of a 4 × 4 matrix over GF. The field operation symbols ⊕ and are used to denote addition and multiplication of matrices over GF as well. The GFMatrixMul( ) function computes the product A B of 4 × 4 matrices. A.4.2 Multiplication of 4x4 Matrices Over GF , GFMatrix GFMatrixMul(GFMatrix a, GFMatrix b) { GFMatrix c; for (nat i=0; i<4; i++) { for (nat j=0; j<4; j++) { c[i][j] = 0; for (nat k=0; k<4; k++) { c[i][j] = c[i][j] ^ GFMul(a[i][k], b[k][j]); } } } return c; } A.5 AES Operations The AES encryption and decryption procedures may be specified as follows, in terms of a set of basic operations that are defined later in this section. See the alphabetic instruction reference for detailed descriptions of the instructions that are used to implement the procedures. Call the Encrypt or Decrypt procedure, which pass the same expanded key to the functions TextBlock Cipher(TextBlock in, ExpandedKey w, nat Nk) and 976 26568—Rev. 3.22—May 2018 AMD64 Technology TextBlock InvCipher(TextBlock in, ExpandedKey w, nat Nk) In both cases, the input text is converted by GFMatrix Text2Matrix(TextBlock A) to a matrix, which becomes the initial state of the process. This state is transformed through the sequence of Nr + 1 rounds and ultimately converted back to a linear array by TextBlock Matrix2Text(GFMatrix M). In each round i, the round key Ki is extracted from the expanded key w and added to the state by GFMatrix AddRoundKey(GFMatrix state, ExpandedKey w, nat round). Note that AddRoundKey does not explicitly construct Ki , but operates directly on the bytes of w. The rounds of Cipher are numbered 0,…Nr . Let X be the initial state an an execution, i.e., the input in matrix format, let Si be the state produced by round i, and let Y = SNr be the final state. Let Σ , R , and C denote the operations performed by SubBytes, ShiftRows, MixColumns, respectively. Then The initial round is a simple addition: Each of the next Nr + 1 rounds is a composition of four operations: for The MixColumns transformation is omitted from the final round: Composing these expressions yields Note that the rounds of InvCipher are numbered in reverse order, Nr ,…,0. If Ʃ’ and Y’ are the initial and final states and S’i is the state following round i , then 977 AMD64 Technology 26568—Rev. 3.22—May 2018 for Composing these expressions yields In order to show that InvCipher is the inverse of Cipher, it is only necessary to combine these expanded expressions by replacing X’ with Y and cancel inverse operations to yield Y’ = X. A.5.1 Sequence of Operations • • • 1. 2. 3. 4. Use predefined SBox and InvSBox matrices or initialize the matrices using the ComputeSBox and ComputeInvSBox functions. Call the Encrypt or Decrypt procedure. For the Encrypt procedure: Load the input TextBlock and CipherKey. Expand the cipher key using the KeyExpansion function. Call the Cipher function to perform the number of rounds determined by the cipher key length. Perform round entry operations. a. Convert input text block to state matrix using the Text2Matrix function. b. Combine state and round key bytes by bitwise XOR using the AddRoundKey function. 5. Perform round iteration operations. a. Replace each state byte with another by non-linear substitution using the SubBytes function. b. Shift each row of the state cyclically using the ShiftRows function. c. Combine the four bytes in each column of the state using the MixColumns function. d. Perform AddRoundKey. 6. Perform round exit operations. a. Perform SubBytes. b. Perform ShiftRows. c. Perform AddRoundKey. d. Convert state matrix to output text block using the Matrix2Text function and return TextBlock. • For the Decrypt procedure: 1. Load the input TextBlock and CipherKey. 978 26568—Rev. 3.22—May 2018 AMD64 Technology 2. Expand the cipher key using the KeyExpansion function. 3. Call the InvCipher function to perform the number of rounds determined by the cipher key length. 4. Perform round entry operations. a. Convert input text block to state matrix using the Text2Matrix function. b. Combine state and round key bytes by bitwise XOR using the AddRoundKey function. 5. Perform round iteration operations. a. Shift each row of the state cyclically using the InvShiftRows function. b. Replace each state byte with another by non-linear substitution using the InvSubBytes function. c. Perform AddRoundKey. d. Combine the four bytes in each column of the state using the InvMixColumns function. 6. Perform round exit operations. a. Perform InvShiftRows. b. Perform InvSubBytes (InvSubWord). c. Perform AddRoundKey. d. Convert state matrix to output text block using the Matrix2Text function and return TextBlock. A.6 Initializing the Sbox and InvSBox Matrices The AES makes use of a bijective mapping σ : GF → GF, which is encoded, along with its inverse mapping, in the 16 × 16 arrays SBox (for encryption) and InvSBox (for decryption), as follows: for all x ∈ G, σ(x) = SBox[x[7:4], x[3:0]] and σ−1(x) = InvSBox[x[7:4], x[3:0]] While the FIPS 197 standard defines the contents of the SBox[ ] and InvSbox [ ] matrices, the matrices may also be initialized algebraically (and algorithmically) by means of the ComputeSBox( ) and ComputeInvSBox( ) functions, discussed below. The bijective mappings for encryption and decryption are computed by the SubByte( ) and InvSubByte ( ) functions, respectively: SubByte( ) computation: GF256 SubByte(GF256 x) { return SBox[x[7:4]][x[3:0]]; } InvSubByte ( ) computation: GF256 InvSubByte(GF256 x) { return InvSBox[x[7:4]][x[3:0]]; } 979 AMD64 Technology 26568—Rev. 3.22—May 2018 A.6.1 Computation of SBox and InvSBox Computation of SBox and InvSBox elements has a direct relationship to the cryptographic properties of the AES, but not to the algorithms that use the tables. Readers who prefer to view σ as a primitive operation may skip the remainder of this section. The algorithmic definition of the bijective mapping σ is based on the consideration of GF as an 8-dimensional vector space over the subfield 2. Let ϕ be a linear operator on this vector space and let M = [aij] be the matrix representation of ϕ with respect to the ordered basis {1, 2, 4, 10, 20, 40, 80}. Then ϕ may be encoded concisely as an array of bytes A of dimension 8, each entry of which is the concatenation of the corresponding row of M: A[i] = ai8 ai7…ai0 This expression may be represented algorithmically by means of the ApplyLinearOp( ) function, which applies a linear operator to an element of GF. The ApplyLinear Op( ) function is used in the initialization of both the sBox[] and InvSBox[ ] matrices. // The following function takes the array A representing a linear operator phi and // an element x of G and returns phi(x): GF256 ApplyLinearOp(GF256 A[8], GF256 x) { GF256 result = 0; for (nat i=0; i<8; i++) { bool sum = 0; for (nat j=0; j<8; j++) { sum = sum ^ (A[i][j] & x[j]); } result[i] = sum; } return result; } The definition of σ involves the linear operator ϕ with matrix In this case, A = {F1, E3, C7, 8F, 1F, 3E, 7C, F8}. Initialization of SBox[ ] The mapping σ : G → G is defined by 980 26568—Rev. 3.22—May 2018 AMD64 Technology σ(x) = ϕ (x–1) ⊕ 63 This computation is performed by ComputeSBox( ). ComputeSBox( ) GF256[16][16] ComputeSBox() { GF256 result[16][16]; GF256 A[8] = {0xF1, 0xE3, 0xC7, 0x8F, 0x1F, 0x3E, 0x7C, 0xF8}; for (nat i=0; i<16; i++) { for (nat j=0; j<16; j++) { GF256 x = (i << 4) | j; result[i][j] = ApplyLinearOp(A, GFInv(x)) ^ 0x63; } } return result; } const GF256 SBox[16][16] = ComputeSBox(); Table A-1 shows the resulting SBox[ ], as defined in FIPS 197. 981 AMD64 Technology 26568—Rev. 3.22—May 2018 Table A-1. SBox Definition S[3:0] S[7:4] 0 1 2 3 4 5 6 7 8 9 a b c d e f 0 63 7c 77 7b f2 6b 6f c5 30 01 67 2b fe d7 ab 76 1 ca 82 c9 7d fa 59 47 f0 ad d4 a2 af 9c a4 72 c0 2 b7 fd 93 26 36 3f f7 cc 34 a5 e5 f1 71 d8 31 a5 3 04 c7 23 c3 18 96 05 9a 07 12 80 e2 eb 27 b2 75 4 09 83 2c 1a 1b 6e 5a a0 52 3b d6 b3 29 e3 2f 84 5 53 d1 00 ed 20 fc b1 5b 6a cb be 39 4a 4c 58 cf 6 d0 ef aa fb 43 4d 33 85 45 f9 02 7f 50 3c 9f a8 7 51 a3 40 8f 92 9d 38 f5 bc b6 da 21 10 ff f3 d2 8 cd 0c 13 ec 5f 97 44 17 c4 a7 7e 3d 64 5d 19 73 9 60 81 4f dc 22 2a 90 88 46 ee b8 14 de 5e 0b db a e0 32 3a 0a 49 06 24 5c c2 d3 ac 62 91 95 e4 79 b e7 c8 37 6d 8d d5 4e a9 6c 56 f4 ea 65 7a ae 08 c ba 78 25 2e 1c a6 b4 c6 e8 dd 74 1f 4b bd 8b 8a d 70 3e b5 66 48 03 f6 0e 61 35 57 b9 86 c1 1d 9e e e1 f8 98 11 69 d9 8e 94 9b 1e 87 e9 ce 55 28 df f 8c a1 89 0d bf e6 42 68 41 99 2d 0f b0 54 bb 16 A.6.2 Initialization of InvSBox[ ] A straightforward calculation confirms that the matrix M is nonsingular with inverse. Thus, ϕ is invertible and ϕ–1 is encoded as the array M–1 = 0 1 0 1 0 0 1 0 0 0 1 0 1 0 0 1 1 0 0 1 0 1 0 0 0 1 0 0 1 0 1 0 0 0 1 0 0 1 0 1 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 1 0 1 0 0 1 0 0 B = {A4, 49, 92, 25, 4A, 94, 29, 52}. If y = σ(x), then 982 26568—Rev. 3.22—May 2018 AMD64 Technology (ϕ-1((y) ⊕ 5) –1= (ϕ-1(y ⊕ ϕ(5))–1 = (ϕ-1(y ⊕ 63))–1 = (ϕ-1(ϕ(x–1) ⊕ 63 ⊕ 63))–1 = (ϕ-1(ϕ(x–1)))–1 = x, and σ is a permutation of GF with σ-1(y) = (ϕ-1(y) ⊕ 5)–1 This computation is performed by ComputeInvSBox( ). ComputeInvSBox( ) GF256[16][16] ComputeInvSBox() { GF256 result[16][16]; GF256 B[8] = {0xA4, 0x49, 0x92, 0x25, 0x4A, 0x94, 0x29, 0x52}; for (nat i=0; i<16; i++) { for (nat j=0; j<16; j++) { GF256 y = (i << 4) | j; result[i][j] = GFInv(ApplyLinearOp(B, y) ^ 0x5); } } return result; } const GF256 InvSBox[16][16] = ComputeInvSBox(); Table A-2 shows the resulting InvSBox[ ], as defined in the FIPS 197. 983 AMD64 Technology 26568—Rev. 3.22—May 2018 Table A-2. InvSBox Definition S[3:0] S[7:4] A.7 0 1 2 3 4 5 6 7 8 9 a b c d e f 0 52 09 6a d5 30 36 a5 38 bf 40 a3 9e 81 f3 d7 fb 1 7c e3 39 82 9b 2f ff 87 34 8e 43 44 c4 de e9 cb 2 54 7b 94 32 a6 c2 23 3d ee 4c 95 0b 42 fa c3 4e 3 08 2e a1 66 28 d9 24 b2 76 5b a2 49 6d 8b d1 25 4 72 f8 f6 64 86 68 98 16 d4 a4 5c cc 5d 65 b6 92 5 6c 70 48 50 fd ed b9 da 5e 15 46 57 a7 8d 9d 84 6 90 d8 ab 00 8c bc d3 0a f7 e4 58 05 b8 b3 45 06 7 d0 2c 1e 8f ca 3f 0f 02 c1 af bd 03 01 13 8a 6b 8 3a 91 11 41 4f 67 dc ea 97 f2 cf ce f0 b4 e6 73 9 96 ac 74 22 e7 ad 35 85 e2 f9 37 e8 1c 75 df 6e a 47 f1 1a 71 1d 29 c5 89 6f b7 62 0e aa 18 be 1b b fc 56 3e 4b c6 d2 79 20 9a db c0 fe 78 cd 5a f4 c 1f dd a8 33 88 07 c7 31 b1 12 10 59 27 80 ec 5f d 60 51 7f a9 19 b5 4a 0d 2d e5 7a 9f 93 c9 9c ef e a0 e0 3b 4d ae 2a f5 b0 c8 eb bb 3c 83 53 99 61 f 17 2b 04 7e ba 77 d6 26 e1 69 14 63 55 21 0c 7d Encryption and Decryption The AMD64 architecture implements the AES algorithm by means of an iterative function called a round for both encryption and the inverse operation, decryption. The top-level encryption and decryption procedures Encrypt( ) and Decrypt( ) set up the rounds and invoke the functions that perform them. Each of the procedures takes two 128-bit binary arguments: • • input data — a 16-byte block of text stored in a source 128-bit XMM register cipher key — a 16-, 24-, or 32-byte cipher key stored in either a second 128-bit XMM register or 128-bit memory location A.7.1 The Encrypt( ) and Decrypt( ) Procedures TextBlock Encrypt(TextBlock in, CipherKey key, nat Nk) { return Cipher(in, ExpandKey(key, Nk), Nk); } TextBlock Decrypt(TextBlock in, CipherKey key, nat Nk) { return InvCipher(in, ExpandKey(key, Nk), Nk); 984 26568—Rev. 3.22—May 2018 AMD64 Technology } The array types TextBlock and CipherKey are introduced to accommodate the text and key parameters. The 16-, 24-, or 32-byte cipher keys correspond to AES-128, AES-192, or AES-256 key sizes. The cipher key is logically partitioned into Nk = 4, 6, or 8 AES 32-bit words. Nk is passed as a parameter to determine the AES version to be executed, and the number of rounds to be performed. Both the Encrypt( ) and Decrypt( ) procedures invoke the ExpandKey( ) function to expand the cipher key for use in round key generation. When key expansion is complete, either the Cipher( ) or InvCipher( ) functions are invoked. The Cipher( ) and InvCipher( ) functions are the key components of the encryption and decryption process. See Section A.8, “The Cipher Function” and Section A.9, “The InvCipher Function” for detailed information. A.7.2 Round Sequences and Key Expansion Encryption and decryption are performed in a sequence of rounds indexed by 0, …, Nr, where Nr is determined by the number Nk of GF words in the cipher key. A key matrix called a round key is generated for each round. The number of GF words required to form Nr + 1 round keys is equal to , 4(Nr + 1). Table A-3 shows the relationship between cipher key length, round sequence length, and round key length. Table A-3. Cipher Key, Round Sequence, and Round Key Length Nk Nr 4(Nr + 1) 4 10 44 6 12 52 8 14 60 Expanded keys are generated from the cipher key by the ExpandKey( ) function, where the array type ExpandedKey is defined to accommodate 60 words (the maximum required) corresponding to Nk = 8. The ExpandKey( ) Function ExpandedKey ExpandKey(CipherKey key, nat Nk) { assert((Nk == 4) || (Nk == 6) || (Nk == 8)); nat Nr = Nk + 6; ExpandedKey w; // Copy key into first Nk rows of w: for (nat i=0; i0; round--) { state = InvShiftRows(state); state = InvSubBytes(state); 989 AMD64 Technology 26568—Rev. 3.22—May 2018 state = AddRoundKey(state, w, round); state = InvMixColumns(state); } } state = InvShiftRows(state); state = InvSubBytes(state); state = AddRoundKey(state, w, 0); return Matrix2Text(state); A.9.1 Text to Matrix Conversion Prior to processing, the input text block must be converted to matrix form. The Text2Matrix( ) function stores a TextBlock in a GFMatrix in column-major order as follows. GFMatrix Text2Matrix(TextBlock A) { GFMatrix result; for (nat j=0; j<4; j++) { for (nat i=0; i<4; i++) { result[i][j] = A[4*j+i]; } } return result; } A.9.2 InvCypher Transformations The following functions are used in decryption: InvShiftRows( ) — The inverse of ShiftRows( ). InvSubBytes( ) — The inverse of SubBytes( ). InvSubWord( ) — The inverse of SubWord( ). InvMixColumns( ) — The inverse of MixColumns( ). AddRoundKey( ) — Is its own inverse. Decryption is the inverse of encryption and is accomplished by means of the inverses of the, SubBytes( ), SubWord( ), ShiftRows( ) and MixColumns( ) transformations used in encryption. SubWord( ), SubBytes( ), and ShiftRows( ) are injective. This is also the case with MixColumns( ). A simple computation shows that C is invertible with E 9 C–1 = D B InvShiftRows( ) Function The inverse of ShiftRows( ). GFMatrix InvShiftRows(GFMatrix M) { GFMatrix result; 990 B E 9 D D B E 9 9 D B E 26568—Rev. 3.22—May 2018 AMD64 Technology for (nat i=0; i<4; i++) { result[i] = RotateLeft(M[i], -i); } return result; InvSubBytes( ) Function The inverse of SubBytes( ). GFMatrix InvSubBytes(GFMatrix M) { GFMatrix result; for (nat i=0; i<4; i++) { result[i] = InvSubWord(M[i]); } return result; } InvSubWord( ) Function The inverse of SubWord( ), InvSubBytes( ) applied to each element of a vector or a matrix. GFWord InvSubWord(GFWord x) { GFWord result; for (nat i=0; i<4; i++) { result[i] = InvSubByte(x[i]); } return result; } InvMixColumns( ) Function The inverse of the MixColumns( ) function. Multiplies by the inverse of the predefined fixed matrix, C, C–1, as discussed previously. GFMatrix InvMixColumns(GFMatrix M) { GFMatrix D = { {0x0e,0x0b,0x0d,0x09}, {0x09,0x0e,0x0b,0x0d}, {0x0d,0x09,0x0e,0x0b}, {0x0b,0x0d,0x09,0x0e} }; return GFMatrixMul(D, M); } AddRoundKey( ) Function Extracts the round key from the expanded key and adds it to the state using a bitwise XOR operation. GFMatrix AddRoundKey(GFMatrix state, ExpandedKey w, nat round) { GFMatrix result = state; for (nat i=0; i<4; i++) { for (nat j=0; j<4; j++) { result[i][j] = result[i][j] ^ w[4*round+j][i]; } } return result; 991 AMD64 Technology 26568—Rev. 3.22—May 2018 } A.9.3 Matrix to Text Conversion After processing, the output matrix must be converted to a text block. The Matrix2Text( ) function converts a GFMatrix in column-major order to a TextBlock as follows. TextBlock Matrix2Text(GFMatrix M) { TextBlock result; for (nat j=0; j<4; j++) { for (nat i=0; i<4; i++) { result[4*j+i] = M[i][j]; } } return result; } A.10 An Alternative Decryption Procedure This section outlines an alternative decrypting procedure, TextBlock EqDecrypt(TextBlock in, CipherKey key, nat Nk): TextBlock EqDecrypt(TextBlock in, CipherKey key, nat Nk) { return EqInvCipher(in, MixRoundKeys(ExpandKey(key, Nk), Nk), Nk); } The procedure is based on a variation of InvCipher, TextBlock EqInvCipher(TextBlock in, ExpandedKey w, nat Nk): TextBlock EqInvCipher(TextBlock in, ExpandedKey dw, nat Nk) { assert((Nk == 4) || (Nk == 6) || (Nk == 8)); nat Nr = Nk + 6; GFMatrix state = Text2Matrix(in); state = AddRoundKey(state, dw, Nr); for (nat round=Nr-1; round>0; round--) { state = InvSubBytes(state); state = InvShiftRows(state); state = InvMixColumns(state); state = AddRoundKey(state, dw, round); } state = InvSubBytes(state); state = InvShiftRows(state); state = AddRoundKey(state, dw, 0); return Matrix2Text(state); } The variant structure more closely resembles that of Cipher. This requires a modification of the expanded key generated by ExpandKey, ExpandedKey MixRoundKeys(ExpandedKey w, nat Nk): 992 26568—Rev. 3.22—May 2018 AMD64 Technology ExpandedKey MixRoundKeys(ExpandedKey w, nat Nk) { assert((Nk == 4) || (Nk == 6) || (Nk == 8)); nat Nr = Nk + 6; ExpandedKey result; GFMatrix roundKey; for (nat round=0; round 0) && (round < Nr)) { roundKey = InvMixRows(roundKey); } for (nat i=0; i<4; i++) { result[4*round+i] = roundKey[i]; } } return result; } The transformation MixRoundKeys leaves K0 and KNr unchanged, but for i = 1,…,Nr – 1, it replaces Wi with the matrix product Wi Q, where The effect of this is to replace Ki with for i = 1,…,Nr – 1. The equivalence of EqDecrypt and Decrypt follows from two properties of the basic operations: C is a linear transformation and therefore, so is C–1; Ʃ and R commute, and hence so do Ʃ–1 and R–1, for if then 993 AMD64 Technology 26568—Rev. 3.22—May 2018 Now let X’’ and Y’’ be the initial and final states of an execution of EqDecrypt and let S’’i be the state following round i . Suppose X’’ = X’. Appealing to the definitions of EqDecrypt and EqInvCipher, we have and for i = Nr – 1,…,1, by induction, = = = = = Finally, = = = = A.11 Computation of GFInv with Euclidean Greatest Common Divisor Note that the operations performed by GFInv( ) are in the ring 2[X] rather than the quotient field GF. The initial values of the variables x1 and x2 are the inputs x and 11b, the latter representing the polynomial p(X). The variables a1 and a2 are initialized to 1 and 0. 994 26568—Rev. 3.22—May 2018 AMD64 Technology On each iteration of the loop, a multiple of the lesser of x1 and x2 is added to the other. If x1 ≤ x2, then the values of x2 and a2 are adjusted as follows: x2 → x2 ⊕ 2s x1 a2 → a2 ⊕ 2s a1 where s is the difference in the exponents (i.e., degrees) of x1 and x2 . In the remaining case, x1 and a1 are similarly adjusted. This step is repeated until either x1 = 0 or x2 = 0. We make the following observations: • • • On each iteration, the value added to xi has the same exponent as xi, and hence the sum has lesser exponent. Therefore, termination is guaranteed. Since p(X) is irreducible and x is of smaller degree than p(X), the initial values of x1 and x2 have no non-trivial common factor. This property is clearly preserved by each step. Initially, x1 ⊕ a1 x=x⊕x=0 and x2 ⊕ a2 x = 11b ⊕ 0 = 11b are both divisible by 11b. This property is also invariant, since, for example, the above assignments result in x 2 ⊕ a2 x → (x2 ⊕ 2s x1) ⊕ (a2 ⊕ 2s a1) x = (x2 ⊕ a2 x) ⊕ 2s (x1 ⊕ a1 x). Now suppose that the loop terminates with x2 = 0. Then x1 has no non-trivial factor and, hence, x1 = 1. Thus, 1 ⊕ a1 x is divisible by 11b. Since the final result y is derived by reducing a1 modulo 11b, it follows that 1 ⊕ y x is also divisible by 11b and, hence, in the quotient field GF, 1 + y x = 0, which implies y x = 1. The computation of the multiplicative inverse utilizing Euclid’s algorithm is as follows: 995 AMD64 Technology 26568—Rev. 3.22—May 2018 // Computation of multiplicative inverse based on Euclid's algorithm: GF256 GFInv(GF256 x) { if (x == 0) { return 0; } // Initialization: nat x1 = x; nat x2 = 0x11B; // the irreducible polynomial p(X) nat a1 = 1; nat a2 = 0; nat shift; // difference in exponents while ((x1 != 0) && (x2!= 0)) { // // // // // Termination is guaranteed, since either x1 or x2 decreases on each iteration. We have the following loop invariants, viewing natural numbers as elements of the polynomial ring Z2[X]: (1) x1 and x2 have no common divisor other than 1. (2) x1 ^ GFMul(a1, x) and x2 ^ GFMul(a2, x) are both divisible by p(X). if (x1 <= shift = x2 = x2 a2 = a2 } else { shift = x1 = x1 a1 = a1 } x2) { expo(x2) - expo(x1); ^ (x1 << shift); ^ (a1 << shift); expo(x1) - expo(x2); ^ (x2 << shift); ^ (a2 << shift); } nat y; // Since either x1 or x2 is 0, it follows from (1) above that the other is 1. if (x1 == 1) { // x2 == 0 y = a1; } else if (x2 == 1) { // x1 == 0 y = a2; } else { assert(false); } // Now it follows from (2) that GFMul(y, x) ^ 1 is divisible by 0x11b. // We need only reduce y modulo 0x11b: } nat e = expo(y); while (e >= 8) { y = y ^ (0x11B << (e - 8)); e = expo(y); } return y; 996 26568—Rev. 3.22—May 2018 AMD64 Technology Index Numeric 128-bit media instruction ....................................... 16-bit mode .......................................................... 256-bit media instruction ....................................... 32-bit mode .......................................................... 64-bit media instructions ....................................... 64-bit mode .......................................................... C xxix xxix xxix xxix xxix xxix A absolute displacement ............................................ xxx ADDPD .................................................................. 23 ADDPS ................................................................... 25 Address space identifier ......................................... xxx Address space identifier (ASID).............................. xxx ADDSD .................................................................. 27 ADDSS ................................................................... 29 ADDSUBPD ........................................................... 31 ADDSUBPS............................................................ 33 Advanced Encryption Standard (AES) .............. xxx, 973 data structures .................................................... 974 decryption ........................................... 976, 984, 992 encryption ................................................... 976, 984 Euclidean common divisor .................................. 994 InvSbox ............................................................. 979 operations .......................................................... 978 Sbox .................................................................. 979 AESDEC ................................................................ 35 AESDECLAST ....................................................... 37 AESENC ................................................................ 39 AESENCLAST ....................................................... 41 AESIMC ................................................................. 43 AESKEYGENASSIST............................................. 45 ANDNPD ............................................................... 47 ANDNPS ................................................................ 49 ANDPD .................................................................. 51 ANDPS ................................................................... 53 ASID .................................................................... xxx AVX ..................................................................... xxx B biased exponent ..................................................... xxx BLENDPD .............................................................. 55 BLENDPS .............................................................. 57 BLENDVPD ........................................................... 59 BLENDVPS ............................................................ 61 byte ...................................................................... xxx clear ...................................................................... xxx cleared .................................................................. xxx CMPPD .................................................................. 63 CMPPS ................................................................... 67 CMPSD .................................................................. 71 CMPSS ................................................................... 75 COMISD ................................................................. 79 COMISS ................................................................. 82 commit .................................................................. xxx compatibility mode ................................................ xxx Current privilege level (CPL) .................................. xxx CVTDQ2PD ............................................................ 84 CVTDQ2PS ............................................................ 86 CVTPD2DQ ............................................................ 88 CVTPD2PS ............................................................. 90 CVTPS2DQ ............................................................ 92 CVTPS2PD ............................................................. 94 CVTSD2SI .............................................................. 96 CVTSD2SS ............................................................. 99 CVTSI2SD ............................................................ 101 CVTSI2SS ............................................................ 104 CVTSS2SD ........................................................... 107 CVTSS2SI ............................................................ 109 CVTTPD2DQ ........................................................ 112 CVTTPS2DQ ........................................................ 115 CVTTSD2SI.......................................................... 117 CVTTSS2SI .......................................................... 120 D Definitions ........................................................... direct referencing ................................................... displacement.......................................................... DIVPD .................................................................. DIVPS .................................................................. DIVSD .................................................................. DIVSS .................................................................. double quadword .................................................. doubleword .......................................................... DPPD.................................................................... DPPS .................................................................... xxix xxx xxx 123 125 127 129 xxxi xxxi 131 134 E effective address size ............................................. xxxi effective operand size ............................................ xxxi element ................................................................ xxxi endian order........................................................ xxxix 997 AMD64 Technology 26568—Rev. 3.22—May 2018 exception ............................................................. xxxi exponent ............................................................... xxx extended SSE ....................................................... xxxi extended-register prefix ....................................... xxxiv EXTRQ ................................................................ 139 F flush .................................................................... xxxi FMA .................................................................... xxxi FMA4 .................................................................. xxxi four-operand instruction ............................................. 6 G General notation ................................................. xxviii Global descriptor table (GDT) ............................... xxxi Global interrupt flag (GIF) ................................... xxxii H HADDPD ............................................................. HADDPS .............................................................. HSUBPD .............................................................. HSUBPS ............................................................... 141 143 146 149 I IGN .................................................................... xxxii immediate operands ................................................... 4 indirect ............................................................... xxxii INSERTPS ............................................................ 152 INSERTQ ............................................................. 154 instructions AES .................................................................. xxx Interrupt descriptor table (IDT) ............................. xxxii Interrupt redirection bitmap (IRB) ......................... xxxii Interrupt stack table (IST) ..................................... xxxii Interrupt vector table (IVT) .................................. xxxii L LDDQU ................................................................ 156 LDMXCSR ........................................................... 158 least significant byte ........................................... xxxiii least-significant bit.............................................. xxxiii legacy mode ........................................................ xxxii legacy x86 ........................................................... xxxii little endian ........................................................ xxxix Local descriptor table (LDT) ................................ xxxii long mode ........................................................... xxxii LSB ................................................................... xxxiii lsb ..................................................................... xxxiii M main memory ..................................................... xxxiii 998 mask .................................................................. xxxiii MASKMOVDQU .................................................. 160 MAXPD ................................................................ 162 MAXPS ................................................................ 165 MAXSD ................................................................ 168 MAXSS ................................................................ 170 memory .............................................................. xxxiii MINPD ................................................................. 172 MINPS .................................................................. 175 MINSD ................................................................. 178 MINSS .................................................................. 180 modes 32-bit ................................................................ xxix 64-bit ................................................................ xxix compatibility ...................................................... xxx legacy .............................................................. xxxii long ................................................................. xxxii protected ......................................................... xxxiv real ................................................................. xxxiv virtual-8086..................................................... xxxvi most significant bit .............................................. xxxiii most significant byte ........................................... xxxiii MOVAPD.............................................................. 182 MOVAPS .............................................................. 184 MOVD .................................................................. 186 MOVDDUP .......................................................... 188 MOVDQA ............................................................ 190 MOVDQU ............................................................ 192 MOVHLPS ........................................................... 194 MOVHPD ............................................................. 196 MOVHPS .............................................................. 198 MOVLHPS ........................................................... 200 MOVLPD ............................................................. 202 MOVLPS .............................................................. 204 MOVMSKPD ........................................................ 206 MOVMSKPS ........................................................ 208 MOVNTDQ .......................................................... 210 MOVNTDQA ........................................................ 212 MOVNTPD ........................................................... 214 MOVNTPS ........................................................... 216 MOVNTSD ........................................................... 218 MOVNTSS ........................................................... 220 MOVQ .................................................................. 222 MOVSD ................................................................ 224 MOVSHDUP ........................................................ 226 MOVSLDUP ......................................................... 228 MOVSS ................................................................ 230 MOVUPD ............................................................. 232 MOVUPS .............................................................. 234 MPSADBW .......................................................... 236 MSB .................................................................. xxxiii msb .................................................................... xxxiii 26568—Rev. 3.22—May 2018 MULPD ................................................................ 241 MULPS ................................................................ 243 MULSD ................................................................ 245 MULSS ................................................................ 247 Must be zero (MBZ) ........................................... xxxiii N Notation conventions ..................................................... xxviii register ........................................................... xxxvi O octword .............................................................. xxxiii offset ................................................................. xxxiii operands immediate .............................................................. 4 ORPD ................................................................... 249 ORPS ................................................................... 251 overflow ............................................................ xxxiii P PABSB ................................................................. 253 PABSD ................................................................. 255 PABSW ................................................................ 257 packed ............................................................... xxxiii PACKSSDW ......................................................... 259 PACKSSWB ......................................................... 261 PACKUSDW ........................................................ 263 PACKUSWB ......................................................... 265 PADDB................................................................. 267 PADDD ................................................................ 269 PADDQ ................................................................ 271 PADDSB............................................................... 273 PADDSW.............................................................. 275 PADDUSB ............................................................ 277 PADDUSW ........................................................... 279 PADDW................................................................ 281 PALIGNR ............................................................. 283 PAND ................................................................... 285 PANDN ................................................................ 287 PAVGB ................................................................. 289 PAVGW ................................................................ 291 PBLENDVB ......................................................... 293 PBLENDW ........................................................... 295 PCLMULQDQ ...................................................... 297 PCMPEQB............................................................ 299 PCMPEQD ........................................................... 301 PCMPEQQ ........................................................... 303 PCMPEQW........................................................... 305 PCMPESTRI ......................................................... 307 PCMPESTRM ....................................................... 310 PCMPGTB............................................................ 313 AMD64 Technology PCMPGTD ............................................................ 315 PCMPGTQ ............................................................ 317 PCMPGTW ........................................................... 319 PCMPISTRI .......................................................... 321 PCMPISTRM ........................................................ 324 PEXTRB ............................................................... 327 PEXTRD ............................................................... 329 PEXTRQ ............................................................... 331 PEXTRW .............................................................. 333 PHADDD .............................................................. 335 PHADDSW ........................................................... 337 PHADDUBD ......................................................... 768 PHADDW ............................................................. 340 PHMINPOSUW .................................................... 343 PHSUBD .............................................................. 345 PHSUBSW ............................................................ 347 PHSUBW .............................................................. 350 Physical address extension (PAE) ......................... xxxiii physical memory ................................................. xxxiv PINSRB ................................................................ 353 PINSRD ................................................................ 356 PINSRQ ................................................................ 358 PINSRW ............................................................... 360 PMADDUBSW ..................................................... 362 PMADDWD .......................................................... 365 PMAXSB .............................................................. 367 PMAXSD .............................................................. 369 PMAXSW ............................................................. 371 PMAXUB ............................................................. 373 PMAXUD ............................................................. 375 PMAXUW ............................................................ 377 PMINSB ............................................................... 379 PMINSD ............................................................... 381 PMINSW .............................................................. 383 PMINUB ............................................................... 385 PMINUD .............................................................. 387 PMINUW .............................................................. 389 PMOVMSKB ........................................................ 391 PMOVSXBD ......................................................... 393 PMOVSXBQ ......................................................... 395 PMOVSXBW ........................................................ 397 PMOVSXDQ ........................................................ 399 PMOVSXWD ........................................................ 401 PMOVSXWQ ........................................................ 403 PMOVZXBD ........................................................ 405 PMOVZXBQ ........................................................ 407 PMOVZXBW ........................................................ 409 PMOVZXDQ ........................................................ 411 PMOVZXWD ....................................................... 413 PMOVZXWQ ....................................................... 415 PMULDQ ............................................................. 417 999 AMD64 Technology 26568—Rev. 3.22—May 2018 PMULHRSW ........................................................ 419 PMULHUW .......................................................... 421 PMULHW ............................................................ 423 PMULLD .............................................................. 425 PMULLW ............................................................. 427 PMULUDQ........................................................... 429 POR ..................................................................... 431 probe ................................................................. xxxiv protected mode ................................................... xxxiv PSADBW ............................................................. 433 PSHUFB ............................................................... 435 PSHUFD ............................................................... 437 PSHUFHW ........................................................... 440 PSHUFLW ............................................................ 443 PSIGNB ................................................................ 446 PSIGND ............................................................... 448 PSIGNW ............................................................... 450 PSLLD ................................................................. 452 PSLLDQ ............................................................... 455 PSLLQ ................................................................. 457 PSLLW ................................................................. 460 PSRAD ................................................................. 463 PSRAW ................................................................ 466 PSRLD ................................................................. 469 PSRLDQ ............................................................... 472 PSRLQ ................................................................. 474 PSRLW ................................................................. 477 PSUBB ................................................................. 480 PSUBD ................................................................. 482 PSUBQ ................................................................. 484 PSUBSB ............................................................... 486 PSUBSW .............................................................. 488 PSUBUSB ............................................................ 490 PSUBUSW ........................................................... 492 PSUBW ................................................................ 494 PTEST .................................................................. 496 PUNPCKHBW ...................................................... 498 PUNPCKHDQ ...................................................... 501 PUNPCKHQDQ .................................................... 504 PUNPCKHWD...................................................... 507 PUNPCKLBW ...................................................... 510 PUNPCKLDQ ....................................................... 513 PUNPCKLQDQ .................................................... 516 PUNPCKLWD ...................................................... 519 PXOR ................................................................... 522 Q quadword ........................................................... xxxiv R RCPPS .................................................................. 524 1000 RCPSS .................................................................. 526 Read as zero (RAZ) ............................................. xxxiv real address mode. See real mode real mode ........................................................... xxxiv Register extension prefix (REX) ........................... xxxiv Register notation ................................................. xxxvi relative ............................................................... xxxiv Relative instruction pointer (RIP) ......................... xxxiv reserved ............................................................. xxxiv revision history ..................................................... xxiii RIP-relative addressing........................................ xxxiv Rip-relative addressing ........................................ xxxiv ROUNDPD ........................................................... 528 ROUNDSD ........................................................... 534 ROUNDSS ............................................................ 537 ROUNDTPS.......................................................... 531 RSQRTPS ............................................................. 540 RSQRTSS ............................................................. 542 S SBZ ................................................................... xxxiv scalar .................................................................. xxxv set ....................................................................... xxxv SHUFPD ............................................................... 558 SHUFPS ............................................................... 561 Single instruction multiple data (SIMD)................. xxxv SQRTPD ............................................................... 564 SQRTPS ................................................................ 566 SQRTSD ............................................................... 568 SQRTSS ................................................................ 570 SSE..................................................................... xxxv SSE Instructions legacy .............................................................. xxxii SSE instructions AVX .................................................................. xxx SSE1 ................................................................... xxxv SSE2 ................................................................... xxxv SSE3 ................................................................... xxxv SSE4.1 ................................................................ xxxv SSE4.2 ................................................................ xxxv SSE4A ................................................................ xxxv SSSE3 ................................................................. xxxv sticky bit ............................................................. xxxv STMXCSR ............................................................ 572 Streaming SIMD Extensions ................................. xxxv string compare instructions ....................................... 10 string comparison ..................................................... 10 SUBPD ................................................................. 574 SUBPS .................................................................. 576 SUBSD ................................................................. 578 SUBSS .................................................................. 580 26568—Rev. 3.22—May 2018 T Task state segment (TSS)...................................... xxxv Terminology ......................................................... xxix three-operand instruction ............................................ 5 two-operand instruction .............................................. 4 U UCOMISD ............................................................ 582 UCOMISS ............................................................ 584 underflow ........................................................... xxxvi UNPCKHPD ......................................................... 586 UNPCKHPS.......................................................... 588 UNPCKLPD ......................................................... 590 UNPCKLPS .......................................................... 592 V VADDPD ................................................................ 23 VADDPS ................................................................ 25 VADDSD ................................................................ 27 VADDSUBPD ......................................................... 31 VADDSUBPS ......................................................... 33 VADSS ................................................................... 29 VAESDEC .............................................................. 35 VAESDECLAST ..................................................... 37 VAESENC .............................................................. 39 VAESENCLAST ..................................................... 41 VAESIMC ............................................................... 43 VAESKEYGENASSIST .......................................... 45 VANDNPD ............................................................. 47 VANDNPS .............................................................. 49 VANDPD ................................................................ 51 VANDPS ................................................................ 53 VBLENDPD ........................................................... 55 VBLENDPS ............................................................ 57 VBLENDVPD......................................................... 59 VBLENDVPS ......................................................... 61 VBROADCASTF128 ............................................ 594 VBROADCASTI128 ............................................. 596 VBROADCASTSD ............................................... 598 VBROADCASTSS ................................................ 600 VCMPPD................................................................ 63 VCMPPS ................................................................ 67 VCMPSD................................................................ 71 VCMPSS ................................................................ 75 VCOMISD .............................................................. 79 VCOMISS .............................................................. 82 VCVTDQ2PD ......................................................... 84 VCVTDQ2PS.......................................................... 86 VCVTPD2DQ ......................................................... 88 VCVTPD2PS .......................................................... 90 VCVTPH2PS ........................................................ 602 AMD64 Technology VCVTPS2DQ .......................................................... 92 VCVTPS2PD .......................................................... 94 VCVTPS2PH ........................................................ 605 VCVTSD2SI ........................................................... 96 VCVTSD2SS .......................................................... 99 VCVTSI2SD ......................................................... 101 VCVTSI2SS .......................................................... 104 VCVTSS2SD ........................................................ 107 VCVTSS2SI .......................................................... 109 VCVTTPD2DQ ..................................................... 112 VCVTTPS2DQ...................................................... 115 VCVTTSD2SI ....................................................... 117 VCVTTSS2SI ........................................................ 120 VDIVPD ............................................................... 123 VDIVPS ................................................................ 125 VDIVSD ............................................................... 127 VDIVSS ................................................................ 129 VDPPD ................................................................. 131 VDPPS ................................................................. 134 vector ................................................................. xxxvi VEX prefix ......................................................... xxxvi VEXTRACT128 .................................................... 609 VEXTRACTI128 ................................................... 611 VFMADD132PD ................................................... 613 VFMADD132PS.................................................... 616 VFMADD132SD ................................................... 619 VFMADD132SS.................................................... 622 VFMADD213PD ................................................... 613 VFMADD213PS.................................................... 616 VFMADD213SD ................................................... 619 VFMADD213SS.................................................... 622 VFMADD231PD ................................................... 613 VFMADD231PS.................................................... 616 VFMADD231SD ................................................... 619 VFMADD231SS.................................................... 622 VFMADDPD ........................................................ 613 VFMADDPS ......................................................... 616 VFMADDSD ........................................................ 619 VFMADDSS ......................................................... 622 VFMADDSUB132PD ............................................ 625 VFMADDSUB132PS ............................................ 628 VFMADDSUB213PD ............................................ 625 VFMADDSUB213PS ............................................ 628 VFMADDSUB231PD ............................................ 625 VFMADDSUB231PS ............................................ 628 VFMADDSUBPD ................................................. 625 VFMADDSUBPS .................................................. 628 VFMSUB132PD .................................................... 637 VFMSUB132PS .................................................... 640 VFMSUB132SD .................................................... 643 VFMSUB132SS .................................................... 646 1001 AMD64 Technology VFMSUB213PD ................................................... VFMSUB213PS .................................................... VFMSUB213SD ................................................... VFMSUB213SS .................................................... VFMSUB231PD ................................................... VFMSUB231PS .................................................... VFMSUB231SD ................................................... VFMSUB231SS .................................................... VFMSUBADD132PD ............................................ VFMSUBADD132PS ............................................ VFMSUBADD213PD ............................................ VFMSUBADD213PS ............................................ VFMSUBADD231PD ............................................ VFMSUBADD231PS ............................................ VFMSUBADDPD ................................................. VFMSUBADDPS .................................................. VFMSUBPD ......................................................... VFMSUBPS.......................................................... VFMSUBSD ......................................................... VFMSUBSS.......................................................... VFNMADD132PD ................................................ VFNMADD132PS ................................................. VFNMADD132SS ................................................. VFNMADD213PD ................................................ VFNMADD213PS ................................................. VFNMADD213SS ................................................. VFNMADD231PD ................................................ VFNMADD231PS ................................................. VFNMADD231SS ................................................. VFNMADDPD...................................................... VFNMADDPS ...................................................... VFNMADDSD...................................................... VFNMADDSS ...................................................... VFNMSUB132PD ................................................. VFNMSUB132PS ................................................. VFNMSUB132SD ................................................. VFNMSUB132SS ................................................. VFNMSUB213PD ................................................. VFNMSUB213PS ................................................. VFNMSUB213SD ................................................. VFNMSUB213SS ................................................. VFNMSUB231PD ................................................. VFNMSUB231PS ................................................. VFNMSUB231SD ................................................. VFNMSUB231SS ................................................. VFNMSUBPD ...................................................... VFNMSUBPS ....................................................... VFNMSUBSD ...................................................... VFNMSUBSS ....................................................... VFRCZPD ............................................................ VFRCZPS ............................................................. 1002 26568—Rev. 3.22—May 2018 637 640 643 646 637 640 643 646 631 634 631 634 631 634 631 634 637 640 643 646 649 652 658 649 652 658 649 652 658 649 652 655 658 661 664 667 670 661 664 667 670 661 664 667 670 661 664 667 670 673 675 VFRCZSD ............................................................ 677 VFRCZSS ............................................................. 679 VGATHERDPD..................................................... 681 VGATHERDPS ..................................................... 683 VGATHERQPD..................................................... 685 VGATHERQPS ..................................................... 687 VHADDPD ........................................................... 141 VHADDPS ............................................................ 143 VHSUBPD ............................................................ 146 VHSUBPS ............................................................ 149 VINSERTF128 ...................................................... 689 VINSERTI128 ....................................................... 691 VINSERTPS .......................................................... 152 Virtual machine control block (VMCB) ................ xxxvi Virtual machine monitor (VMM) .......................... xxxvi virtual-8086 mode ............................................... xxxvi VLDDQU ............................................................. 156 VLDMXCSR ......................................................... 158 VMASKMOVDQU ............................................... 160 VMASKMOVPD................................................... 693 VMASKMOVPS ................................................... 695 VMAXPD ............................................................. 162 VMAXPS .............................................................. 165 VMAXSD ............................................................. 168 VMAXSS .............................................................. 170 VMINPD .............................................................. 172 VMINPS ............................................................... 175 VMINSD .............................................................. 178 VMINSS ............................................................... 180 VMOVAPS ........................................................... 184 VMOVD ............................................................... 186 VMOVDDUP ........................................................ 188 VMOVDQA .......................................................... 190 VMOVDQU .......................................................... 192 VMOVHLPS ......................................................... 194 VMOVHPD .......................................................... 196 VMOVHPS ........................................................... 198 VMOVLHPS ......................................................... 200 VMOVLPD ........................................................... 202 VMOVLPS ........................................................... 204 VMOVMSKPD ..................................................... 206 VMOVMSKPS ...................................................... 208 VMOVNTDQ ........................................................ 210 VMOVNTDQA ..................................................... 212 VMOVNTPD ........................................................ 214 VMOVNTPS ......................................................... 216 VMOVQ ............................................................... 222 VMOVSD ............................................................. 224 VMOVSHDUP ...................................................... 226 VMOVSLDUP ...................................................... 228 VMOVSS .............................................................. 230 26568—Rev. 3.22—May 2018 VMOVUPD .......................................................... VMOVUPS ........................................................... VMPSADBW........................................................ VMULPD ............................................................. VMULPS .............................................................. VMULSD ............................................................. VMULSS .............................................................. VORPD ................................................................ VORPS ................................................................. VPABSB ............................................................... VPABSD............................................................... VPABSW .............................................................. VPACKSSDW ...................................................... VPACKSSWB ....................................................... VPACKUSDW ...................................................... VPACKUSWB ...................................................... VPADDD .............................................................. VPADDQ .............................................................. VPADDSB ............................................................ VPADDSW ........................................................... VPADDUSB ......................................................... VPADDUSW ........................................................ VPADDW ............................................................. VPALIGNR........................................................... VPAND ................................................................ VPANDN .............................................................. VPAVGB .............................................................. VPAVGW ............................................................. VPBLENDD ......................................................... VPBLENDVB ....................................................... VPBLENDW ........................................................ VPBROADCASTB ............................................... VPBROADCASTD ............................................... VPBROADCASTQ ............................................... VPBROADCASTW .............................................. VPCLMULQDQ ................................................... VPCMOV ............................................................. VPCMPEQB ......................................................... VPCMPEQD ......................................................... VPCMPEQQ ......................................................... VPCMPEQW ........................................................ VPCMPESTRI ...................................................... VPCMPESTRM .................................................... VPCMPGTB ......................................................... VPCMPGTD ......................................................... VPCMPGTQ ......................................................... VPCMPGTW ........................................................ VPCMPISTRI ....................................................... VPCMPISTRM ..................................................... VPCOMB ............................................................. VPCOMD ............................................................. AMD64 Technology 232 234 236 241 243 245 247 249 251 253 255 257 259 261 263 265 269 271 273 275 277 279 281 283 285 287 289 291 697 293 295 699 701 703 705 297 707 299 301 303 305 307 310 313 315 317 319 321 324 709 711 VPCOMQ ............................................................. VPCOMUB ........................................................... VPCOMUD ........................................................... VPCOMUQ ........................................................... VPCOMUW .......................................................... VPCOMW ............................................................ VPERM2F128 ....................................................... VPERM2I128 ........................................................ VPERMD .............................................................. VPERMIL2PD ...................................................... VPERMIL2PS ....................................................... VPERMILPD ........................................................ VPERMILPS ......................................................... VPERMPD ............................................................ VPERMPS ............................................................ VPERMQ .............................................................. VPEXTRB ............................................................ VPEXTRD ............................................................ VPEXTRQ ............................................................ VPEXTRW ........................................................... VPGATHERDD..................................................... VPGATHERDQ..................................................... VPGATHERQD..................................................... VPGATHERQQ..................................................... VPHADDBD ......................................................... VPHADDBQ ......................................................... VPHADDBW ........................................................ VPHADDD ........................................................... VPHADDDQ ........................................................ VPHADDSW ........................................................ VPHADDUBQ ...................................................... VPHADDUBW ..................................................... VPHADDUDQ ...................................................... VPHADDUWD ..................................................... VPHADDUWQ ..................................................... VPHADDW .......................................................... VPHADDWD ........................................................ VPHADDWQ ........................................................ VPHMINPOSUW .................................................. VPHSUBBW ......................................................... VPHSUBD ............................................................ VPHSUBDQ ......................................................... VPHSUBSW ......................................................... VPHSUBW ........................................................... VPHSUBWD ........................................................ VPINSRB ............................................................. VPINSRD ............................................................. VPINSRQ ............................................................. VPINSRW ............................................................. VPMACSDD ......................................................... VPMACSDQH ...................................................... 713 715 717 719 721 723 725 727 729 731 735 739 742 746 748 750 327 329 331 333 752 754 756 758 760 762 764 335 766 337 770 772 774 776 778 340 780 782 343 784 345 786 347 350 788 353 356 358 360 790 792 1003 AMD64 Technology VPMACSDQL ...................................................... VPMACSSDD ...................................................... VPMACSSDQL .................................................... VPMACSSQH ...................................................... VPMACSSWD...................................................... VPMACSSWW ..................................................... VPMACSWD........................................................ VPMACSWW ....................................................... VPMADCSSWD ................................................... VPMADCSWD ..................................................... VPMADDUBSW .................................................. VPMADDWD ....................................................... VPMASKMOVD .................................................. VPMASKMOVQ .................................................. VPMAXSB ........................................................... VPMAXSD ........................................................... VPMAXSW .......................................................... VPMAXUB .......................................................... VPMAXUD .......................................................... VPMAXUW ......................................................... VPMINSB ............................................................ VPMINSD ............................................................ VPMINSW ........................................................... VPMINUB ............................................................ VPMINUD............................................................ VPMINUW ........................................................... VPMOVMSKB ..................................................... VPMOVSXBD ...................................................... VPMOVSXBQ ...................................................... VPMOVSXBW ..................................................... VPMOVSXDQ...................................................... VPMOVSXWD ..................................................... VPMOVSXWQ ..................................................... VPMOVZXBD...................................................... VPMOVZXBQ...................................................... VPMOVZXBW ..................................................... VPMOVZXDQ ..................................................... VPMOVZXWD..................................................... VPMOVZXWQ..................................................... VPMULDQ........................................................... VPMULHRSW ..................................................... VPMULHUW ....................................................... VPMULHW .......................................................... VPMULLD ........................................................... VPMULLW .......................................................... VPMULUDQ ........................................................ VPOR ................................................................... VPPERM .............................................................. VPROTB .............................................................. VPROTD .............................................................. VPROTQ .............................................................. 1004 26568—Rev. 3.22—May 2018 794 796 800 798 802 804 806 808 810 812 362 365 814 816 367 369 371 373 375 377 379 381 383 385 387 389 391 393 395 397 399 401 403 405 407 409 411 413 415 417 419 421 423 425 427 429 431 818 820 822 824 VPROTW ............................................................. VPSADBW ........................................................... VPSHAB .............................................................. VPSHAD .............................................................. VPSHAQ .............................................................. VPSHAW .............................................................. VPSHLB ............................................................... VPSHLD ............................................................... VPSHLQ ............................................................... VPSHLW .............................................................. VPSHUFB ............................................................ VPSHUFD ............................................................ VPSHUFHW ......................................................... VPSHUFLW ......................................................... VPSIGNB ............................................................. VPSIGND ............................................................. VPSIGNW ............................................................ VPSLLD ............................................................... VPSLLDQ ............................................................ VPSLLQ ............................................................... VPSLLVD ............................................................. VPSLLVQ ............................................................. VPSLLW............................................................... VPSRAD .............................................................. VPSRAVD ............................................................ VPSRAW .............................................................. VPSRLD ............................................................... VPSRLDQ ............................................................ VPSRLQ ............................................................... VPSRLVD............................................................. VPSRLVQ............................................................. VPSRLW .............................................................. VPSUBB ............................................................... VPSUBD .............................................................. VPSUBQ .............................................................. VPSUBSB ............................................................. VPSUBSW ............................................................ VPSUBUSB .......................................................... VPSUBUSW ......................................................... VPSUBW .............................................................. VPTEST ............................................................... VPUNPCKHBW ................................................... VPUNPCKHDQ .................................................... VPUNPCKHQDQ ................................................. VPUNPCKHWD ................................................... VPUNPCKLBW .................................................... VPUNPCKLDQ .................................................... VPUNPCKLQDQ .................................................. VPUNPCKLWD .................................................... VPXOR ................................................................ VRCPPS ............................................................... 826 433 828 830 832 834 836 838 840 842 435 437 440 443 446 448 450 452 455 457 844 846 460 463 848 466 469 472 474 850 852 477 480 482 484 486 488 490 492 494 496 498 501 504 507 510 513 516 519 522 524 26568—Rev. 3.22—May 2018 VRCPSS ............................................................... VROUNDPD ........................................................ VROUNDPS ......................................................... VROUNDSD ........................................................ VROUNDSS ......................................................... VRSQRTPS .......................................................... VRSQRTSS .......................................................... VSHUFPD ............................................................ VSHUFPS ............................................................. VSQRTPD ............................................................ VSQRTPS ............................................................. VSQRTSD ............................................................ VSQRTSS ............................................................. VSTMXCSR ......................................................... VSUBPD .............................................................. VSUBPS ............................................................... VSUBSD .............................................................. VSUBSS ............................................................... VTESTPD............................................................. VTESTPS ............................................................. VUCOMISD ......................................................... VUCOMISS .......................................................... VUNPCKHPD ...................................................... VUNPCKHPS ....................................................... VUNPCKLPD ....................................................... VUNPCKLPS ....................................................... VXORPD .............................................................. VXORPS .............................................................. VZEROALL ......................................................... VZEROUPPER ..................................................... AMD64 Technology 526 528 531 534 537 540 542 558 561 564 566 568 570 572 574 576 578 580 854 856 582 584 586 588 590 592 861 863 858 859 W word .................................................................. xxxvi X x86 .................................................................... xxxvi XGETBV .............................................................. 860 XOP instructions ................................................ xxxvi XOP prefix ......................................................... xxxvi XORPD ................................................................ 861 XORPS ................................................................. 863 XRSTOR .............................................................. 865 XSAVE ................................................................. 869 XSAVEOPT .......................................................... 873 XSETBV .............................................................. 877 1005
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.7 Linearized : No Author : AMD Copyright : © 2002 – 2010 Advanced Micro Devices, Inc. All rights reserved. Create Date : 2018:05:24 13:48:03Z Keywords : AMD64, SIMD, extended media instructions, legacy media instructions Modify Date : 2018:06:13 14:17:39+08:00 Subject : AMD64 128-Bit and 256-Bit Media Instructions Page Mode : UseOutlines Page Count : 1047 Has XFA : No XMP Toolkit : Adobe XMP Core 5.2-c001 63.139439, 2010/09/27-13:37:26 Format : application/pdf Description : AMD64 128-Bit and 256-Bit Media Instructions Title : AMD64 Architecture Programmer’s Manual, Volume 4: 128-Bit and 256-Bit Media Instructions Creator : AMD Producer : Acrobat Distiller 10.1.16 (Windows) Creator Tool : FrameMaker 9.0 Metadata Date : 2018:05:24 16:43:59-05:00 Document ID : uuid:5643f610-5f2d-4060-8d1d-3e7be7312e8c Instance ID : uuid:59decf66-93dc-4a16-a0ca-491f3ec34679EXIF Metadata provided by EXIF.tools