Intel Instructions

User Manual:

Open the PDF directly: View PDF .
Page Count: 1849 [warning: Documents this large are best viewed by clicking the View PDF Link!]

AAA—ASCII Adjust After Addition

INSTRUCTION SET REFERENCE, A-L

3-18 Vol. 2A

AAA—ASCII Adjust After Addition

Instruction Operand Encoding

Description

Adjusts the sum of two unpacked BCD values to create an unpacked BCD result. The AL register is the implied

source and destination operand for this instruction. The AAA instruction is only useful when it follows an ADD

instruction that adds (binary addition) two unpacked BCD values and stores a byte result in the AL register. The

AAA instruction then adjusts the contents of the AL register to contain the correct 1-digit unpacked BCD result.

If the addition produces a decimal carry, the AH register increments by 1, and the CF and AF flags are set. If there

was no decimal carry, the CF and AF flags are cleared and the AH register is unchanged. In either case, bits 4

through 7 of the AL register are set to 0.

This instruction executes as described in compatibility mode and legacy mode. It is not valid in 64-bit mode.

Operation

IF 64-Bit Mode

THEN

#UD;

ELSE

IF ((AL AND 0FH) > 9) or (AF = 1)

THEN

AX ← AX + 106H;

AF ← 1;

CF ← 1;

ELSE

AF ← 0;

CF ← 0;

FI;

AL ← AL AND 0FH;

FI;

Flags Affected

The AF and CF flags are set to 1 if the adjustment results in a decimal carry; otherwise they are set to 0. The OF,

SF, ZF, and PF flags are undefined.

Protected Mode Exceptions

#UD If the LOCK prefix is used.

Real-Address Mode Exceptions

Same exceptions as protected mode.

Virtual-8086 Mode Exceptions

Same exceptions as protected mode.

Opcode Instruction Op/

64-bit

Mode

Compat/

Leg Mode

Description

37 AAA NP Invalid Valid ASCII adjust AL after addition.

Op/En Operand 1 Operand 2 Operand 3 Operand 4

NP NA NA NA NA

AAA—ASCII Adjust After Addition

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-19

Compatibility Mode Exceptions

Same exceptions as protected mode.

64-Bit Mode Exceptions

#UD If in 64-bit mode.

AAD—ASCII Adjust AX Before Division

INSTRUCTION SET REFERENCE, A-L

3-20 Vol. 2A

AAD—ASCII Adjust AX Before Division

Instruction Operand Encoding

Description

Adjusts two unpacked BCD digits (the least-significant digit in the AL register and the most-significant digit in the

AH register) so that a division operation performed on the result will yield a correct unpacked BCD value. The AAD

instruction is only useful when it precedes a DIV instruction that divides (binary division) the adjusted value in the

AX register by an unpacked BCD value.

The AAD instruction sets the value in the AL register to (AL + (10 * AH)), and then clears the AH register to 00H.

The value in the AX register is then equal to the binary equivalent of the original unpacked two-digit (base 10)

number in registers AH and AL.

The generalized version of this instruction allows adjustment of two unpacked digits of any number base (see the

“Operation” section below), by setting the imm8 byte to the selected number base (for example, 08H for octal, 0AH

for decimal, or 0CH for base 12 numbers). The AAD mnemonic is interpreted by all assemblers to mean adjust

ASCII (base 10) values. To adjust values in another number base, the instruction must be hand coded in machine

code (D5 imm8).

This instruction executes as described in compatibility mode and legacy mode. It is not valid in 64-bit mode.

Operation

IF 64-Bit Mode

THEN

#UD;

ELSE

tempAL ← AL;

tempAH ← AH;

AL ← (tempAL + (tempAH ∗ imm8)) AND FFH;

(* imm8 is set to 0AH for the AAD mnemonic.*)

AH ← 0;

FI;

The immediate value (imm8) is taken from the second byte of the instruction.

Flags Affected

The SF, ZF, and PF flags are set according to the resulting binary value in the AL register; the OF, AF, and CF flags

are undefined.

Protected Mode Exceptions

#UD If the LOCK prefix is used.

Real-Address Mode Exceptions

Same exceptions as protected mode.

Opcode Instruction Op/

64-bit

Mode

Compat/

Leg Mode

Description

D5 0A AAD NP Invalid Valid ASCII adjust AX before division.

D5 ib AAD imm8 NP Invalid Valid Adjust AX before division to number base

imm8.

Op/En Operand 1 Operand 2 Operand 3 Operand 4

NP NA NA NA NA

AAD—ASCII Adjust AX Before Division

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-21

Virtual-8086 Mode Exceptions

Same exceptions as protected mode.

Compatibility Mode Exceptions

Same exceptions as protected mode.

64-Bit Mode Exceptions

#UD If in 64-bit mode.

AAM—ASCII Adjust AX After Multiply

INSTRUCTION SET REFERENCE, A-L

3-22 Vol. 2A

AAM—ASCII Adjust AX After Multiply

Instruction Operand Encoding

Description

Adjusts the result of the multiplication of two unpacked BCD values to create a pair of unpacked (base 10) BCD

values. The AX register is the implied source and destination operand for this instruction. The AAM instruction is

only useful when it follows an MUL instruction that multiplies (binary multiplication) two unpacked BCD values and

stores a word result in the AX register. The AAM instruction then adjusts the contents of the AX register to contain

the correct 2-digit unpacked (base 10) BCD result.

The generalized version of this instruction allows adjustment of the contents of the AX to create two unpacked

digits of any number base (see the “Operation” section below). Here, the imm8 byte is set to the selected number

base (for example, 08H for octal, 0AH for decimal, or 0CH for base 12 numbers). The AAM mnemonic is interpreted

by all assemblers to mean adjust to ASCII (base 10) values. To adjust to values in another number base, the

instruction must be hand coded in machine code (D4 imm8).

This instruction executes as described in compatibility mode and legacy mode. It is not valid in 64-bit mode.

Operation

IF 64-Bit Mode

THEN

#UD;

ELSE

tempAL ← AL;

AH ← tempAL / imm8; (* imm8 is set to 0AH for the AAM mnemonic *)

AL ← tempAL MOD imm8;

FI;

The immediate value (imm8) is taken from the second byte of the instruction.

Flags Affected

The SF, ZF, and PF flags are set according to the resulting binary value in the AL register. The OF, AF, and CF flags

are undefined.

Protected Mode Exceptions

#DE If an immediate value of 0 is used.

#UD If the LOCK prefix is used.

Real-Address Mode Exceptions

Same exceptions as protected mode.

Virtual-8086 Mode Exceptions

Same exceptions as protected mode.

Opcode Instruction Op/

64-bit

Mode

Compat/

Leg Mode

Description

D4 0A AAM NP Invalid Valid ASCII adjust AX after multiply.

D4 ib AAM imm8 NP Invalid Valid Adjust AX after multiply to number base

imm8.

Op/En Operand 1 Operand 2 Operand 3 Operand 4

NP NA NA NA NA

AAM—ASCII Adjust AX After Multiply

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-23

Compatibility Mode Exceptions

Same exceptions as protected mode.

64-Bit Mode Exceptions

#UD If in 64-bit mode.

AAS—ASCII Adjust AL After Subtraction

INSTRUCTION SET REFERENCE, A-L

3-24 Vol. 2A

AAS—ASCII Adjust AL After Subtraction

Instruction Operand Encoding

Description

Adjusts the result of the subtraction of two unpacked BCD values to create a unpacked BCD result. The AL register

is the implied source and destination operand for this instruction. The AAS instruction is only useful when it follows

a SUB instruction that subtracts (binary subtraction) one unpacked BCD value from another and stores a byte

result in the AL register. The AAA instruction then adjusts the contents of the AL register to contain the correct 1-

digit unpacked BCD result.

If the subtraction produced a decimal carry, the AH register decrements by 1, and the CF and AF flags are set. If no

decimal carry occurred, the CF and AF flags are cleared, and the AH register is unchanged. In either case, the AL

This instruction executes as described in compatibility mode and legacy mode. It is not valid in 64-bit mode.

Operation

IF 64-bit mode

THEN

#UD;

ELSE

IF ((AL AND 0FH) > 9) or (AF = 1)

THEN

AX ← AX – 6;

AH ← AH – 1;

AF ← 1;

CF ← 1;

AL ← AL AND 0FH;

ELSE

CF ← 0;

AF ← 0;

AL ← AL AND 0FH;

FI;

Flags Affected

The AF and CF flags are set to 1 if there is a decimal borrow; otherwise, they are cleared to 0. The OF, SF, ZF, and

PF flags are undefined.

Protected Mode Exceptions

#UD If the LOCK prefix is used.

Real-Address Mode Exceptions

Same exceptions as protected mode.

Opcode Instruction Op/

64-bit

Mode

Compat/

Leg Mode

Description

3F AAS NP Invalid Valid ASCII adjust AL after subtraction.

Op/En Operand 1 Operand 2 Operand 3 Operand 4

NP NA NA NA NA

AAS—ASCII Adjust AL After Subtraction

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-25

Virtual-8086 Mode Exceptions

Same exceptions as protected mode.

Compatibility Mode Exceptions

Same exceptions as protected mode.

64-Bit Mode Exceptions

#UD If in 64-bit mode.

ADC—Add with Carry

INSTRUCTION SET REFERENCE, A-L

3-26 Vol. 2A

ADC—Add with Carry

Instruction Operand Encoding

Description

Adds the destination operand (first operand), the source operand (second operand), and the carry (CF) flag and

stores the result in the destination operand. The destination operand can be a register or a memory location; the

source operand can be an immediate, a register, or a memory location. (However, two memory operands cannot be

used in one instruction.) The state of the CF flag represents a carry from a previous addition. When an immediate

value is used as an operand, it is sign-extended to the length of the destination operand format.

Opcode Instruction Op/

64-bit

Mode

Compat/

Leg Mode

Description

14 ib ADC AL, imm8 I Valid Valid Add with carry imm8 to AL.

15 iw ADC AX, imm16 I Valid Valid Add with carry imm16 to AX.

15 id ADC EAX, imm32 I Valid Valid Add with carry imm32 to EAX.

REX.W + 15 id ADC RAX, imm32 I Valid N.E. Add with carry imm32 sign extended to 64-

bits to RAX.

80 /2 ib ADC r/m8, imm8 MI Valid Valid Add with carry imm8 to r/m8.

REX + 80 /2 ib ADC r/m8*, imm8 MI Valid N.E. Add with carry imm8 to r/m8.

81 /2 iw ADC r/m16, imm16 MI Valid Valid Add with carry imm16 to r/m16.

81 /2 id ADC r/m32, imm32 MI Valid Valid Add with CF imm32 to r/m32.

REX.W + 81 /2 id ADC r/m64, imm32 MI Valid N.E. Add with CF imm32 sign extended to 64-bits

to r/m64.

83 /2 ib ADC r/m16, imm8 MI Valid Valid Add with CF sign-extended imm8 to r/m16.

83 /2 ib ADC r/m32, imm8 MI Valid Valid Add with CF sign-extended imm8 into r/m32.

REX.W + 83 /2 ib ADC r/m64, imm8 MI Valid N.E. Add with CF sign-extended imm8 into r/m64.

10 /rADC r/m8, r8 MR Valid Valid Add with carry byte register to r/m8.

REX + 10 /rADC r/m8*, r8*MR Valid N.E. Add with carry byte register to r/m64.

11 /rADC r/m16, r16 MR Valid Valid Add with carry r16 to r/m16.

11 /rADC r/m32, r32 MR Valid Valid Add with CF r32 to r/m32.

REX.W + 11 /rADC r/m64, r64 MR Valid N.E. Add with CF r64 to r/m64.

12 /rADC r8, r/m8 RM Valid Valid Add with carry r/m8 to byte register.

REX + 12 /rADC r8*, r/m8*RM Valid N.E. Add with carry r/m64 to byte register.

13 /rADC r16, r/m16 RM Valid Valid Add with carry r/m16 to r16.

13 /rADC r32, r/m32 RM Valid Valid Add with CF r/m32 to r32.

REX.W + 13 /rADC r64, r/m64 RM Valid N.E. Add with CF r/m64 to r64.

NOTES:

*In 64-bit mode, r/m8 can not be encoded to access the following byte registers if a REX prefix is used: AH, BH, CH, DH.

Op/En Operand 1 Operand 2 Operand 3 Operand 4

RM ModRM:reg (r, w) ModRM:r/m (r) NA NA

MR ModRM:r/m (r, w) ModRM:reg (r) NA NA

MI ModRM:r/m (r, w) imm8 NA NA

I AL/AX/EAX/RAX imm8 NA NA

ADC—Add with Carry

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-27

The ADC instruction does not distinguish between signed or unsigned operands. Instead, the processor evaluates

the result for both data types and sets the OF and CF flags to indicate a carry in the signed or unsigned result,

respectively. The SF flag indicates the sign of the signed result.

The ADC instruction is usually executed as part of a multibyte or multiword addition in which an ADD instruction is

followed by an ADC instruction.

This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically.

In 64-bit mode, the instruction’s default operation size is 32 bits. Using a REX prefix in the form of REX.R permits

access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See

the summary chart at the beginning of this section for encoding data and limits.

Operation

DEST ← DEST + SRC + CF;

Intel C/C++ Compiler Intrinsic Equivalent

ADC: extern unsigned char _addcarry_u8(unsigned char c_in, unsigned char src1, unsigned char src2, unsigned char *sum_out);

ADC: extern unsigned char _addcarry_u16(unsigned char c_in, unsigned short src1, unsigned short src2, unsigned short

*sum_out);

ADC: extern unsigned char _addcarry_u32(unsigned char c_in, unsigned int src1, unsigned char int, unsigned int *sum_out);

ADC: extern unsigned char _addcarry_u64(unsigned char c_in, unsigned __int64 src1, unsigned __int64 src2, unsigned __int64

*sum_out);

Flags Affected

The OF, SF, ZF, AF, CF, and PF flags are set according to the result.

Protected Mode Exceptions

#GP(0) If the destination is located in a non-writable segment.

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.

If the DS, ES, FS, or GS register is used to access memory and it contains a NULL segment

selector.

#SS(0) If a memory operand effective address is outside the SS segment limit.

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the

current privilege level is 3.

#UD If the LOCK prefix is used but the destination is not a memory operand.

Real-Address Mode Exceptions

#GP If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.

#SS If a memory operand effective address is outside the SS segment limit.

#UD If the LOCK prefix is used but the destination is not a memory operand.

Virtual-8086 Mode Exceptions

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.

#SS(0) If a memory operand effective address is outside the SS segment limit.

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory reference is made.

#UD If the LOCK prefix is used but the destination is not a memory operand.

ADC—Add with Carry

INSTRUCTION SET REFERENCE, A-L

3-28 Vol. 2A

Compatibility Mode Exceptions

Same exceptions as in protected mode.

64-Bit Mode Exceptions

#SS(0) If a memory address referencing the SS segment is in a non-canonical form.

#GP(0) If the memory address is in a non-canonical form.

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the

current privilege level is 3.

#UD If the LOCK prefix is used but the destination is not a memory operand.

ADCX — Unsigned Integer Addition of Two Operands with Carry Flag

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-29

ADCX — Unsigned Integer Addition of Two Operands with Carry Flag

Instruction Operand Encoding

Description

Performs an unsigned addition of the destination operand (first operand), the source operand (second operand)

and the carry-flag (CF) and stores the result in the destination operand. The destination operand is a general-

purpose register, whereas the source operand can be a general-purpose register or memory location. The state of

CF can represent a carry from a previous addition. The instruction sets the CF flag with the carry generated by the

unsigned addition of the operands.

The ADCX instruction is executed in the context of multi-precision addition, where we add a series of operands with

a carry-chain. At the beginning of a chain of additions, we need to make sure the CF is in a desired initial state.

Often, this initial state needs to be 0, which can be achieved with an instruction to zero the CF (e.g. XOR).

This instruction is supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64-

bit mode.

In 64-bit mode, the default operation size is 32 bits. Using a REX Prefix in the form of REX.R permits access to addi-

tional registers (R8-15). Using REX Prefix in the form of REX.W promotes operation to 64 bits.

ADCX executes normally either inside or outside a transaction region.

Note: ADCX defines the OF flag differently than the ADD/ADC instructions as defined in Intel® 64 and IA-32 Archi-

tectures Software Developer’s Manual, Volume 2A.

Operation

IF OperandSize is 64-bit

THEN CF:DEST[63:0] ← DEST[63:0] + SRC[63:0] + CF;

ELSE CF:DEST[31:0] ← DEST[31:0] + SRC[31:0] + CF;

FI;

Flags Affected

CF is updated based on result. OF, SF, ZF, AF and PF flags are unmodified.

Intel C/C++ Compiler Intrinsic Equivalent

unsigned char _addcarryx_u32 (unsigned char c_in, unsigned int src1, unsigned int src2, unsigned int *sum_out);

unsigned char _addcarryx_u64 (unsigned char c_in, unsigned __int64 src1, unsigned __int64 src2, unsigned __int64 *sum_out);

SIMD Floating-Point Exceptions

None

Protected Mode Exceptions

#UD If the LOCK prefix is used.

If CPUID.(EAX=07H, ECX=0H):EBX.ADX[bit 19] = 0.

#SS(0) For an illegal address in the SS segment.

Opcode/

Instruction

Op/

64/32bit

Mode

Support

CPUID

Feature

Flag

Description

66 0F 38 F6 /r

ADCX r32, r/m32

RM V/V ADX Unsigned addition of r32 with CF, r/m32 to r32, writes CF.

66 REX.w 0F 38 F6 /r

ADCX r64, r/m64

RM V/NE ADX Unsigned addition of r64 with CF, r/m64 to r64, writes CF.

Op/En Operand 1 Operand 2 Operand 3 Operand 4

RM ModRM:reg (r, w) ModRM:r/m (r) NA NA

ADCX — Unsigned Integer Addition of Two Operands with Carry Flag

INSTRUCTION SET REFERENCE, A-L

3-30 Vol. 2A

#GP(0) For an illegal memory operand effective address in the CS, DS, ES, FS or GS segments.

If the DS, ES, FS, or GS register is used to access memory and it contains a null segment

selector.

#PF(fault-code) For a page fault.

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the

current privilege level is 3.

Real-Address Mode Exceptions

#UD If the LOCK prefix is used.

If CPUID.(EAX=07H, ECX=0H):EBX.ADX[bit 19] = 0.

#SS(0) For an illegal address in the SS segment.

#GP(0) If any part of the operand lies outside the effective address space from 0 to FFFFH.

Virtual-8086 Mode Exceptions

#UD If the LOCK prefix is used.

If CPUID.(EAX=07H, ECX=0H):EBX.ADX[bit 19] = 0.

#SS(0) For an illegal address in the SS segment.

#GP(0) If any part of the operand lies outside the effective address space from 0 to FFFFH.

#PF(fault-code) For a page fault.

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the

current privilege level is 3.

Compatibility Mode Exceptions

Same exceptions as in protected mode.

64-Bit Mode Exceptions

#UD If the LOCK prefix is used.

If CPUID.(EAX=07H, ECX=0H):EBX.ADX[bit 19] = 0.

#SS(0) If a memory address referencing the SS segment is in a non-canonical form.

#GP(0) If the memory address is in a non-canonical form.

#PF(fault-code) For a page fault.

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the

current privilege level is 3.

ADD—Add

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-31

ADD—Add

Instruction Operand Encoding

Description

Adds the destination operand (first operand) and the source operand (second operand) and then stores the result

in the destination operand. The destination operand can be a register or a memory location; the source operand

can be an immediate, a register, or a memory location. (However, two memory operands cannot be used in one

instruction.) When an immediate value is used as an operand, it is sign-extended to the length of the destination

operand format.

The ADD instruction performs integer addition. It evaluates the result for both signed and unsigned integer oper-

ands and sets the CF and OF flags to indicate a carry (overflow) in the signed or unsigned result, respectively. The

SF flag indicates the sign of the signed result.

Opcode Instruction Op/

64-bit

Mode

Compat/

Leg Mode

Description

04 ib ADD AL, imm8 IValid Valid Add imm8 to AL.

05 iw ADD AX, imm16 IValid Valid Add imm16 to AX.

05 id ADD EAX, imm32 IValid Valid Add imm32 to EAX.

REX.W + 05 id ADD RAX, imm32 IValid N.E. Add imm32 sign-extended to 64-bits to RAX.

80 /0 ib ADD r/m8, imm8 MI Valid Valid Add imm8 to r/m8.

REX + 80 /0 ib ADD r/m8*, imm8 MI Valid N.E. Add sign-extended imm8 to r/m64.

81 /0 iw ADD r/m16, imm16 MI Valid Valid Add imm16 to r/m16.

81 /0 id ADD r/m32, imm32 MI Valid Valid Add imm32 to r/m32.

REX.W + 81 /0 id ADD r/m64, imm32 MI Valid N.E. Add imm32 sign-extended to 64-bits to

r/m64.

83 /0 ib ADD r/m16, imm8 MI Valid Valid Add sign-extended imm8 to r/m16.

83 /0 ib ADD r/m32, imm8 MI Valid Valid Add sign-extended imm8 to r/m32.

REX.W + 83 /0 ib ADD r/m64, imm8 MI Valid N.E. Add sign-extended imm8 to r/m64.

00 /rADD r/m8, r8 MR Valid Valid Add r8 to r/m8.

REX + 00 /rADD r/m8*, r8*MR Valid N.E. Add r8 to r/m8.

01 /rADD r/m16, r16 MR Valid Valid Add r16 to r/m16.

01 /rADD r/m32, r32 MR Valid Valid Add r32 to r/m32.

REX.W + 01 /rADD r/m64, r64 MR Valid N.E. Add r64 to r/m64.

02 /rADD r8, r/m8 RM Valid Valid Add r/m8 to r8.

REX + 02 /rADD r8*, r/m8*RM Valid N.E. Add r/m8 to r8.

03 /rADD r16, r/m16 RM Valid Valid Add r/m16 to r16.

03 /rADD r32, r/m32 RM Valid Valid Add r/m32 to r32.

REX.W + 03 /rADD r64, r/m64 RM Valid N.E. Add r/m64 to r64.

NOTES:

*In 64-bit mode, r/m8 can not be encoded to access the following byte registers if a REX prefix is used: AH, BH, CH, DH.

Op/En Operand 1 Operand 2 Operand 3 Operand 4

RM ModRM:reg (r, w) ModRM:r/m (r) NA NA

MR ModRM:r/m (r, w) ModRM:reg (r) NA NA

MI ModRM:r/m (r, w) imm8 NA NA

I AL/AX/EAX/RAX imm8 NA NA

ADD—Add

INSTRUCTION SET REFERENCE, A-L

3-32 Vol. 2A

This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically.

In 64-bit mode, the instruction’s default operation size is 32 bits. Using a REX prefix in the form of REX.R permits

access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See

the summary chart at the beginning of this section for encoding data and limits.

Operation

DEST ← DEST + SRC;

Flags Affected

The OF, SF, ZF, AF, CF, and PF flags are set according to the result.

Protected Mode Exceptions

#GP(0) If the destination is located in a non-writable segment.

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.

If the DS, ES, FS, or GS register is used to access memory and it contains a NULL segment

selector.

#SS(0) If a memory operand effective address is outside the SS segment limit.

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the

current privilege level is 3.

#UD If the LOCK prefix is used but the destination is not a memory operand.

Real-Address Mode Exceptions

#GP If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.

#SS If a memory operand effective address is outside the SS segment limit.

#UD If the LOCK prefix is used but the destination is not a memory operand.

Virtual-8086 Mode Exceptions

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.

#SS(0) If a memory operand effective address is outside the SS segment limit.

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory reference is made.

#UD If the LOCK prefix is used but the destination is not a memory operand.

Compatibility Mode Exceptions

Same exceptions as in protected mode.

64-Bit Mode Exceptions

#SS(0) If a memory address referencing the SS segment is in a non-canonical form.

#GP(0) If the memory address is in a non-canonical form.

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the

current privilege level is 3.

#UD If the LOCK prefix is used but the destination is not a memory operand.

ADDPD—Add Packed Double-Precision Floating-Point Values

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-33

ADDPD—Add Packed Double-Precision Floating-Point Values

Instruction Operand Encoding

Description

Add two, four or eight packed double-precision floating-point values from the first source operand to the second

source operand, and stores the packed double-precision floating-point results in the destination operand.

EVEX encoded versions: The first source operand is a ZMM/YMM/XMM register. The second source operand can be

a ZMM/YMM/XMM register, a 512/256/128-bit memory location or a 512/256/128-bit vector broadcasted from a

64-bit memory location. The destination operand is a ZMM/YMM/XMM register conditionally updated with

writemask k1.

VEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM

of the corresponding ZMM register destination are zeroed.

VEX.128 encoded version: the first source operand is a XMM register. The second source operand is an XMM

of the corresponding ZMM register destination are zeroed.

128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-

nation is not distinct from the first source XMM register and the upper Bits (MAX_VL-1:128) of the corresponding

ZMM register destination are unmodified.

Operation

VADDPD (EVEX encoded versions) when src2 operand is a vector register

(KL, VL) = (2, 128), (4, 256), (8, 512)

IF (VL = 512) AND (EVEX.b = 1)

THEN

SET_RM(EVEX.RC);

ELSE

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

66 0F 58 /r

ADDPD xmm1, xmm2/m128

RM V/V SSE2 Add packed double-precision floating-point values from

xmm2/mem to xmm1 and store result in xmm1.

VEX.NDS.128.66.0F.WIG 58 /r

VADDPD xmm1,xmm2,

xmm3/m128

RVM V/V AVX Add packed double-precision floating-point values from

xmm3/mem to xmm2 and store result in xmm1.

VEX.NDS.256.66.0F.WIG 58 /r

VADDPD ymm1, ymm2,

ymm3/m256

RVM V/V AVX Add packed double-precision floating-point values from

ymm3/mem to ymm2 and store result in ymm1.

EVEX.NDS.128.66.0F.W1 58 /r

VADDPD xmm1 {k1}{z}, xmm2,

xmm3/m128/m64bcst

FV V/V AVX512VL

AVX512F

Add packed double-precision floating-point values from

xmm3/m128/m64bcst to xmm2 and store result in xmm1

with writemask k1.

EVEX.NDS.256.66.0F.W1 58 /r

VADDPD ymm1 {k1}{z}, ymm2,

ymm3/m256/m64bcst

FV V/V AVX512VL

AVX512F

Add packed double-precision floating-point values from

ymm3/m256/m64bcst to ymm2 and store result in ymm1

with writemask k1.

EVEX.NDS.512.66.0F.W1 58 /r

VADDPD zmm1 {k1}{z}, zmm2,

zmm3/m512/m64bcst{er}

FV V/V AVX512F Add packed double-precision floating-point values from

zmm3/m512/m64bcst to zmm2 and store result in zmm1

with writemask k1.

Op/En Operand 1 Operand 2 Operand 3 Operand 4

RM ModRM:reg (r, w) ModRM:r/m (r) NA NA

RVM ModRM:reg (w) VEX.vvvv ModRM:r/m (r) NA

FV-RVM ModRM:reg (w) EVEX.vvvv ModRM:r/m (r) NA

ADDPD—Add Packed Double-Precision Floating-Point Values

INSTRUCTION SET REFERENCE, A-L

3-34 Vol. 2A

SET_RM(MXCSR.RM);

FI;

FOR j  0 TO KL-1

i  j * 64

IF k1[j] OR *no writemask*

THEN DEST[i+63:i]  SRC1[i+63:i] + SRC2[i+63:i]

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+63:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+63:i]  0

FI;

ENDFOR

DEST[MAX_VL-1:VL]  0

VADDPD (EVEX encoded versions) when src2 operand is a memory source

(KL, VL) = (2, 128), (4, 256), (8, 512)

FOR j  0 TO KL-1

i  j * 64

IF k1[j] OR *no writemask*

THEN

IF (EVEX.b = 1)

THEN

DEST[i+63:i]  SRC1[i+63:i] + SRC2[63:0]

ELSE

DEST[i+63:i]  SRC1[i+63:i] + SRC2[i+63:i]

FI;

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+63:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+63:i]  0

FI;

ENDFOR

DEST[MAX_VL-1:VL]  0

VADDPD (VEX.256 encoded version)

DEST[63:0]  SRC1[63:0] + SRC2[63:0]

DEST[127:64]  SRC1[127:64] + SRC2[127:64]

DEST[191:128]  SRC1[191:128] + SRC2[191:128]

DEST[255:192]  SRC1[255:192] + SRC2[255:192]

DEST[MAX_VL-1:256]  0

ADDPD—Add Packed Double-Precision Floating-Point Values

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-35

VADDPD (VEX.128 encoded version)

DEST[63:0]  SRC1[63:0] + SRC2[63:0]

DEST[127:64]  SRC1[127:64] + SRC2[127:64]

DEST[MAX_VL-1:128]  0

ADDPD (128-bit Legacy SSE version)

DEST[63:0]  DEST[63:0] + SRC[63:0]

DEST[127:64]  DEST[127:64] + SRC[127:64]

DEST[MAX_VL-1:128] (Unmodified)

Intel C/C++ Compiler Intrinsic Equivalent

VADDPD __m512d _mm512_add_pd (__m512d a, __m512d b);

VADDPD __m512d _mm512_mask_add_pd (__m512d s, __mmask8 k, __m512d a, __m512d b);

VADDPD __m512d _mm512_maskz_add_pd (__mmask8 k, __m512d a, __m512d b);

VADDPD __m256d _mm256_mask_add_pd (__m256d s, __mmask8 k, __m256d a, __m256d b);

VADDPD __m256d _mm256_maskz_add_pd (__mmask8 k, __m256d a, __m256d b);

VADDPD __m128d _mm_mask_add_pd (__m128d s, __mmask8 k, __m128d a, __m128d b);

VADDPD __m128d _mm_maskz_add_pd (__mmask8 k, __m128d a, __m128d b);

VADDPD __m512d _mm512_add_round_pd (__m512d a, __m512d b, int);

VADDPD __m512d _mm512_mask_add_round_pd (__m512d s, __mmask8 k, __m512d a, __m512d b, int);

VADDPD __m512d _mm512_maskz_add_round_pd (__mmask8 k, __m512d a, __m512d b, int);

ADDPD __m256d _mm256_add_pd (__m256d a, __m256d b);

ADDPD __m128d _mm_add_pd (__m128d a, __m128d b);

SIMD Floating-Point Exceptions

Overflow, Underflow, Invalid, Precision, Denormal

Other Exceptions

VEX-encoded instruction, see Exceptions Type 2.

EVEX-encoded instruction, see Exceptions Type E2.

ADDPS—Add Packed Single-Precision Floating-Point Values

INSTRUCTION SET REFERENCE, A-L

3-36 Vol. 2A

ADDPS—Add Packed Single-Precision Floating-Point Values

Instruction Operand Encoding

Description

Add four, eight or sixteen packed single-precision floating-point values from the first source operand with the

second source operand, and stores the packed single-precision floating-point results in the destination operand.

EVEX encoded versions: The first source operand is a ZMM/YMM/XMM register. The second source operand can be

a ZMM/YMM/XMM register, a 512/256/128-bit memory location or a 512/256/128-bit vector broadcasted from a

32-bit memory location. The destination operand is a ZMM/YMM/XMM register conditionally updated with

writemask k1.

VEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM

of the corresponding ZMM register destination are zeroed.

VEX.128 encoded version: the first source operand is a XMM register. The second source operand is an XMM

of the corresponding ZMM register destination are zeroed.

128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-

nation is not distinct from the first source XMM register and the upper Bits (MAX_VL-1:128) of the corresponding

ZMM register destination are unmodified.

Operation

VADDPS (EVEX encoded versions) when src2 operand is a register

(KL, VL) = (4, 128), (8, 256), (16, 512)

IF (VL = 512) AND (EVEX.b = 1)

THEN

SET_RM(EVEX.RC);

ELSE

SET_RM(MXCSR.RM);

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

0F 58 /r

ADDPS xmm1, xmm2/m128

RM V/V SSE Add packed single-precision floating-point values from

xmm2/m128 to xmm1 and store result in xmm1.

VEX.NDS.128.0F.WIG 58 /r

VADDPS xmm1,xmm2, xmm3/m128

RVM V/V AVX Add packed single-precision floating-point values from

xmm3/m128 to xmm2 and store result in xmm1.

VEX.NDS.256.0F.WIG 58 /r

VADDPS ymm1, ymm2, ymm3/m256

RVM V/V AVX Add packed single-precision floating-point values from

ymm3/m256 to ymm2 and store result in ymm1.

EVEX.NDS.128.0F.W0 58 /r

VADDPS xmm1 {k1}{z}, xmm2,

xmm3/m128/m32bcst

FV V/V AVX512VL

AVX512F

Add packed single-precision floating-point values from

xmm3/m128/m32bcst to xmm2 and store result in

xmm1 with writemask k1.

EVEX.NDS.256.0F.W0 58 /r

VADDPS ymm1 {k1}{z}, ymm2,

ymm3/m256/m32bcst

FV V/V AVX512VL

AVX512F

Add packed single-precision floating-point values from

ymm3/m256/m32bcst to ymm2 and store result in

ymm1 with writemask k1.

EVEX.NDS.512.0F.W0 58 /r

VADDPS zmm1 {k1}{z}, zmm2,

zmm3/m512/m32bcst {er}

FV V/V AVX512F Add packed single-precision floating-point values from

zmm3/m512/m32bcst to zmm2 and store result in

zmm1 with writemask k1.

Op/En Operand 1 Operand 2 Operand 3 Operand 4

RM ModRM:reg (r, w) ModRM:r/m (r) NA NA

RVM ModRM:reg (w) VEX.vvvv ModRM:r/m (r) NA

FV-RVM ModRM:reg (w) EVEX.vvvv ModRM:r/m (r) NA

ADDPS—Add Packed Single-Precision Floating-Point Values

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-37

FI;

FOR j  0 TO KL-1

i  j * 32

IF k1[j] OR *no writemask*

THEN DEST[i+31:i]  SRC1[i+31:i] + SRC2[i+31:i]

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+31:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+31:i]  0

FI;

ENDFOR;

DEST[MAX_VL-1:VL]  0

VADDPS (EVEX encoded versions) when src2 operand is a memory source

(KL, VL) = (4, 128), (8, 256), (16, 512)

FOR j  0 TO KL-1

i j * 32

IF k1[j] OR *no writemask*

THEN

IF (EVEX.b = 1)

THEN

DEST[i+31:i]  SRC1[i+31:i] + SRC2[31:0]

ELSE

DEST[i+31:i]  SRC1[i+31:i] + SRC2[i+31:i]

FI;

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+31:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+31:i]  0

FI;

ENDFOR;

DEST[MAX_VL-1:VL]  0

ADDPS—Add Packed Single-Precision Floating-Point Values

INSTRUCTION SET REFERENCE, A-L

3-38 Vol. 2A

VADDPS (VEX.256 encoded version)

DEST[31:0]  SRC1[31:0] + SRC2[31:0]

DEST[63:32]  SRC1[63:32] + SRC2[63:32]

DEST[95:64]  SRC1[95:64] + SRC2[95:64]

DEST[127:96]  SRC1[127:96] + SRC2[127:96]

DEST[159:128]  SRC1[159:128] + SRC2[159:128]

DEST[191:160] SRC1[191:160] + SRC2[191:160]

DEST[223:192]  SRC1[223:192] + SRC2[223:192]

DEST[255:224]  SRC1[255:224] + SRC2[255:224].

DEST[MAX_VL-1:256]  0

VADDPS (VEX.128 encoded version)

DEST[31:0]  SRC1[31:0] + SRC2[31:0]

DEST[63:32]  SRC1[63:32] + SRC2[63:32]

DEST[95:64]  SRC1[95:64] + SRC2[95:64]

DEST[127:96]  SRC1[127:96] + SRC2[127:96]

DEST[MAX_VL-1:128]  0

ADDPS (128-bit Legacy SSE version)

DEST[31:0]  SRC1[31:0] + SRC2[31:0]

DEST[63:32]  SRC1[63:32] + SRC2[63:32]

DEST[95:64]  SRC1[95:64] + SRC2[95:64]

DEST[127:96]  SRC1[127:96] + SRC2[127:96]

DEST[MAX_VL-1:128] (Unmodified)

Intel C/C++ Compiler Intrinsic Equivalent

VADDPS __m512 _mm512_add_ps (__m512 a, __m512 b);

VADDPS __m512 _mm512_mask_add_ps (__m512 s, __mmask16 k, __m512 a, __m512 b);

VADDPS __m512 _mm512_maskz_add_ps (__mmask16 k, __m512 a, __m512 b);

VADDPS __m256 _mm256_mask_add_ps (__m256 s, __mmask8 k, __m256 a, __m256 b);

VADDPS __m256 _mm256_maskz_add_ps (__mmask8 k, __m256 a, __m256 b);

VADDPS __m128 _mm_mask_add_ps (__m128d s, __mmask8 k, __m128 a, __m128 b);

VADDPS __m128 _mm_maskz_add_ps (__mmask8 k, __m128 a, __m128 b);

VADDPS __m512 _mm512_add_round_ps (__m512 a, __m512 b, int);

VADDPS __m512 _mm512_mask_add_round_ps (__m512 s, __mmask16 k, __m512 a, __m512 b, int);

VADDPS __m512 _mm512_maskz_add_round_ps (__mmask16 k, __m512 a, __m512 b, int);

ADDPS __m256 _mm256_add_ps (__m256 a, __m256 b);

ADDPS __m128 _mm_add_ps (__m128 a, __m128 b);

SIMD Floating-Point Exceptions

Overflow, Underflow, Invalid, Precision, Denormal

Other Exceptions

VEX-encoded instruction, see Exceptions Type 2.

EVEX-encoded instruction, see Exceptions Type E2.

ADDSD—Add Scalar Double-Precision Floating-Point Values

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-39

ADDSD—Add Scalar Double-Precision Floating-Point Values

Instruction Operand Encoding

Description

Adds the low double-precision floating-point values from the second source operand and the first source operand

and stores the double-precision floating-point result in the destination operand.

The second source operand can be an XMM register or a 64-bit memory location. The first source and destination

operands are XMM registers.

128-bit Legacy SSE version: The first source and destination operands are the same. Bits (MAX_VL-1:64) of the

corresponding destination register remain unchanged.

EVEX and VEX.128 encoded version: The first source operand is encoded by EVEX.vvvv/VEX.vvvv. Bits (127:64) of

the XMM register destination are copied from corresponding bits in the first source operand. Bits (MAX_VL-1:128)

of the destination register are zeroed.

EVEX version: The low quadword element of the destination is updated according to the writemask.

Software should ensure VADDSD is encoded with VEX.L=0. Encoding VADDSD with VEX.L=1 may encounter

unpredictable behavior across different processor generations.

Operation

VADDSD (EVEX encoded version)

IF (EVEX.b = 1) AND SRC2 *is a register*

THEN

SET_RM(EVEX.RC);

ELSE

SET_RM(MXCSR.RM);

FI;

IF k1[0] or *no writemask*

THEN DEST[63:0]  SRC1[63:0] + SRC2[63:0]

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[63:0] remains unchanged*

ELSE ; zeroing-masking

THEN DEST[63:0]  0

FI;

DEST[127:64]  SRC1[127:64]

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

F2 0F 58 /r

ADDSD xmm1, xmm2/m64

RM V/V SSE2 Add the low double-precision floating-point value from

xmm2/mem to xmm1 and store the result in xmm1.

VEX.NDS.128.F2.0F.WIG 58 /r

VADDSD xmm1, xmm2,

xmm3/m64

RVM V/V AVX Add the low double-precision floating-point value from

xmm3/mem to xmm2 and store the result in xmm1.

EVEX.NDS.LIG.F2.0F.W1 58 /r

VADDSD xmm1 {k1}{z},

xmm2, xmm3/m64{er}

T1S V/V AVX512F Add the low double-precision floating-point value from

xmm3/m64 to xmm2 and store the result in xmm1 with

writemask k1.

Op/En Operand 1 Operand 2 Operand 3 Operand 4

RM ModRM:reg (r, w) ModRM:r/m (r) NA NA

RVM ModRM:reg (w) VEX.vvvv ModRM:r/m (r) NA

T1S-RVM ModRM:reg (w) EVEX.vvvv ModRM:r/m (r) NA

ADDSD—Add Scalar Double-Precision Floating-Point Values

INSTRUCTION SET REFERENCE, A-L

3-40 Vol. 2A

DEST[MAX_VL-1:128]  0

VADDSD (VEX.128 encoded version)

DEST[63:0] SRC1[63:0] + SRC2[63:0]

DEST[127:64] SRC1[127:64]

DEST[MAX_VL-1:128] 0

ADDSD (128-bit Legacy SSE version)

DEST[63:0] DEST[63:0] + SRC[63:0]

DEST[MAX_VL-1:64] (Unmodified)

Intel C/C++ Compiler Intrinsic Equivalent

VADDSD __m128d _mm_mask_add_sd (__m128d s, __mmask8 k, __m128d a, __m128d b);

VADDSD __m128d _mm_maskz_add_sd (__mmask8 k, __m128d a, __m128d b);

VADDSD __m128d _mm_add_round_sd (__m128d a, __m128d b, int);

VADDSD __m128d _mm_mask_add_round_sd (__m128d s, __mmask8 k, __m128d a, __m128d b, int);

VADDSD __m128d _mm_maskz_add_round_sd (__mmask8 k, __m128d a, __m128d b, int);

ADDSD __m128d _mm_add_sd (__m128d a, __m128d b);

SIMD Floating-Point Exceptions

Overflow, Underflow, Invalid, Precision, Denormal

Other Exceptions

VEX-encoded instruction, see Exceptions Type 3.

EVEX-encoded instruction, see Exceptions Type E3.

ADDSS—Add Scalar Single-Precision Floating-Point Values

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-41

ADDSS—Add Scalar Single-Precision Floating-Point Values

Instruction Operand Encoding

Description

Adds the low single-precision floating-point values from the second source operand and the first source operand,

and stores the double-precision floating-point result in the destination operand.

The second source operand can be an XMM register or a 64-bit memory location. The first source and destination

operands are XMM registers.

128-bit Legacy SSE version: The first source and destination operands are the same. Bits (MAX_VL-1:32) of the

corresponding the destination register remain unchanged.

EVEX and VEX.128 encoded version: The first source operand is encoded by EVEX.vvvv/VEX.vvvv. Bits (127:32) of

the XMM register destination are copied from corresponding bits in the first source operand. Bits (MAX_VL-1:128)

of the destination register are zeroed.

EVEX version: The low doubleword element of the destination is updated according to the writemask.

Software should ensure VADDSS is encoded with VEX.L=0. Encoding VADDSS with VEX.L=1 may encounter unpre-

dictable behavior across different processor generations.

Operation

VADDSS (EVEX encoded versions)

IF (EVEX.b = 1) AND SRC2 *is a register*

THEN

SET_RM(EVEX.RC);

ELSE

SET_RM(MXCSR.RM);

FI;

IF k1[0] or *no writemask*

THEN DEST[31:0]  SRC1[31:0] + SRC2[31:0]

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[31:0] remains unchanged*

ELSE ; zeroing-masking

THEN DEST[31:0]  0

FI;

DEST[127:32]  SRC1[127:32]

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

F3 0F 58 /r

ADDSS xmm1, xmm2/m32

RM V/V SSE Add the low single-precision floating-point value from

xmm2/mem to xmm1 and store the result in xmm1.

VEX.NDS.128.F3.0F.WIG 58 /r

VADDSS xmm1,xmm2,

xmm3/m32

RVM V/V AVX Add the low single-precision floating-point value from

xmm3/mem to xmm2 and store the result in xmm1.

EVEX.NDS.LIG.F3.0F.W0 58 /r

VADDSS xmm1{k1}{z}, xmm2,

xmm3/m32{er}

T1S V/V AVX512F Add the low single-precision floating-point value from

xmm3/m32 to xmm2 and store the result in xmm1with

writemask k1.

Op/En Operand 1 Operand 2 Operand 3 Operand 4

RM ModRM:reg (r, w) ModRM:r/m (r) NA NA

RVM ModRM:reg (w) VEX.vvvv ModRM:r/m (r) NA

T1S ModRM:reg (w) EVEX.vvvv ModRM:r/m (r) NA

ADDSS—Add Scalar Single-Precision Floating-Point Values

INSTRUCTION SET REFERENCE, A-L

3-42 Vol. 2A

DEST[MAX_VL-1:128]  0

VADDSS DEST, SRC1, SRC2 (VEX.128 encoded version)

DEST[31:0] SRC1[31:0] + SRC2[31:0]

DEST[127:32] SRC1[127:32]

DEST[MAX_VL-1:128] 0

ADDSS DEST, SRC (128-bit Legacy SSE version)

DEST[31:0] DEST[31:0] + SRC[31:0]

DEST[MAX_VL-1:32] (Unmodified)

Intel C/C++ Compiler Intrinsic Equivalent

VADDSS __m128 _mm_mask_add_ss (__m128 s, __mmask8 k, __m128 a, __m128 b);

VADDSS __m128 _mm_maskz_add_ss (__mmask8 k, __m128 a, __m128 b);

VADDSS __m128 _mm_add_round_ss (__m128 a, __m128 b, int);

VADDSS __m128 _mm_mask_add_round_ss (__m128 s, __mmask8 k, __m128 a, __m128 b, int);

VADDSS __m128 _mm_maskz_add_round_ss (__mmask8 k, __m128 a, __m128 b, int);

ADDSS __m128 _mm_add_ss (__m128 a, __m128 b);

SIMD Floating-Point Exceptions

Overflow, Underflow, Invalid, Precision, Denormal

Other Exceptions

VEX-encoded instruction, see Exceptions Type 3.

EVEX-encoded instruction, see Exceptions Type E3.

ADDSUBPD—Packed Double-FP Add/Subtract

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-43

ADDSUBPD—Packed Double-FP Add/Subtract

Instruction Operand Encoding

Description

Adds odd-numbered double-precision floating-point values of the first source operand (second operand) with the

corresponding double-precision floating-point values from the second source operand (third operand); stores the

result in the odd-numbered values of the destination operand (first operand). Subtracts the even-numbered

double-precision floating-point values from the second source operand from the corresponding double-precision

floating values in the first source operand; stores the result into the even-numbered values of the destination

operand.

In 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers

(XMM8-XMM15).

128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-

nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding

YMM register destination are unmodified. See Figure 3-3.

VEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination

operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are

zeroed.

VEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM

Opcode/

Instruction

Op/

64/32-bit

Mode

CPUID

Feature

Flag

Description

66 0F D0 /r

ADDSUBPD xmm1, xmm2/m128

RM V/V SSE3 Add/subtract double-precision floating-point

values from xmm2/m128 to xmm1.

VEX.NDS.128.66.0F.WIG D0 /r

VADDSUBPD xmm1, xmm2, xmm3/m128

RVM V/V AVX Add/subtract packed double-precision

floating-point values from xmm3/mem to

xmm2 and stores result in xmm1.

VEX.NDS.256.66.0F.WIG D0 /r

VADDSUBPD ymm1, ymm2, ymm3/m256

RVM V/V AVX Add / subtract packed double-precision

floating-point values from ymm3/mem to

ymm2 and stores result in ymm1.

Op/En Operand 1 Operand 2 Operand 3 Operand 4

RM ModRM:reg (r, w) ModRM:r/m (r) NA NA

RVM ModRM:reg (w) VEX.vvvv (r) ModRM:r/m (r) NA

ADDSUBPD—Packed Double-FP Add/Subtract

INSTRUCTION SET REFERENCE, A-L

3-44 Vol. 2A

Operation

ADDSUBPD (128-bit Legacy SSE version)

DEST[63:0]  DEST[63:0] - SRC[63:0]

DEST[127:64]  DEST[127:64] + SRC[127:64]

DEST[VLMAX-1:128] (Unmodified)

VADDSUBPD (VEX.128 encoded version)

DEST[63:0]  SRC1[63:0] - SRC2[63:0]

DEST[127:64]  SRC1[127:64] + SRC2[127:64]

DEST[VLMAX-1:128]  0

VADDSUBPD (VEX.256 encoded version)

DEST[63:0]  SRC1[63:0] - SRC2[63:0]

DEST[127:64]  SRC1[127:64] + SRC2[127:64]

DEST[191:128]  SRC1[191:128] - SRC2[191:128]

DEST[255:192]  SRC1[255:192] + SRC2[255:192]

Intel C/C++ Compiler Intrinsic Equivalent

ADDSUBPD: __m128d _mm_addsub_pd(__m128d a, __m128d b)

VADDSUBPD: __m256d _mm256_addsub_pd (__m256d a, __m256d b)

Exceptions

When the source operand is a memory operand, it must be aligned on a 16-byte boundary or a general-protection

exception (#GP) will be generated.

SIMD Floating-Point Exceptions

Overflow, Underflow, Invalid, Precision, Denormal.

Other Exceptions

See Exceptions Type 2.

Figure 3-3. ADDSUBPD—Packed Double-FP Add/Subtract

[127:64]

xmm1[127:64] + xmm2/m128[127:64] xmm1[63:0] - xmm2/m128[63:0]

[63:0]

[127:64] [63:0]

ADDSUBPD xmm1, xmm2/m128

RESULT:

xmm1

xmm2/m128

ADDSUBPS—Packed Single-FP Add/Subtract

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-45

ADDSUBPS—Packed Single-FP Add/Subtract

Instruction Operand Encoding

Description

Adds odd-numbered single-precision floating-point values of the first source operand (second operand) with the

corresponding single-precision floating-point values from the second source operand (third operand); stores the

result in the odd-numbered values of the destination operand (first operand). Subtracts the even-numbered

single-precision floating-point values from the second source operand from the corresponding single-precision

floating values in the first source operand; stores the result into the even-numbered values of the destination

operand.

In 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers

(XMM8-XMM15).

128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-

nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding

YMM register destination are unmodified. See Figure 3-4.

VEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination

operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding YMM register destination are

zeroed.

VEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM

Opcode/

Instruction

Op/

64/32-bit

Mode

CPUID

Feature

Flag

Description

F2 0F D0 /r

ADDSUBPS xmm1, xmm2/m128

RM V/V SSE3 Add/subtract single-precision floating-point

values from xmm2/m128 to xmm1.

VEX.NDS.128.F2.0F.WIG D0 /r

VADDSUBPS xmm1, xmm2, xmm3/m128

RVM V/V AVX Add/subtract single-precision floating-point

values from xmm3/mem to xmm2 and stores

result in xmm1.

VEX.NDS.256.F2.0F.WIG D0 /r

VADDSUBPS ymm1, ymm2, ymm3/m256

RVM V/V AVX Add / subtract single-precision floating-point

values from ymm3/mem to ymm2 and stores

result in ymm1.

Op/En Operand 1 Operand 2 Operand 3 Operand 4

RM ModRM:reg (r, w) ModRM:r/m (r) NA NA

RVM ModRM:reg (w) VEX.vvvv (r) ModRM:r/m (r) NA

ADDSUBPS—Packed Single-FP Add/Subtract

INSTRUCTION SET REFERENCE, A-L

3-46 Vol. 2A

Operation

ADDSUBPS (128-bit Legacy SSE version)

DEST[31:0]  DEST[31:0] - SRC[31:0]

DEST[63:32]  DEST[63:32] + SRC[63:32]

DEST[95:64]  DEST[95:64] - SRC[95:64]

DEST[127:96]  DEST[127:96] + SRC[127:96]

DEST[VLMAX-1:128] (Unmodified)

VADDSUBPS (VEX.128 encoded version)

DEST[31:0]  SRC1[31:0] - SRC2[31:0]

DEST[63:32]  SRC1[63:32] + SRC2[63:32]

DEST[95:64]  SRC1[95:64] - SRC2[95:64]

DEST[127:96]  SRC1[127:96] + SRC2[127:96]

DEST[VLMAX-1:128]  0

VADDSUBPS (VEX.256 encoded version)

DEST[31:0]  SRC1[31:0] - SRC2[31:0]

DEST[63:32]  SRC1[63:32] + SRC2[63:32]

DEST[95:64]  SRC1[95:64] - SRC2[95:64]

DEST[127:96]  SRC1[127:96] + SRC2[127:96]

DEST[159:128]  SRC1[159:128] - SRC2[159:128]

DEST[191:160] SRC1[191:160] + SRC2[191:160]

DEST[223:192]  SRC1[223:192] - SRC2[223:192]

DEST[255:224]  SRC1[255:224] + SRC2[255:224].

Intel C/C++ Compiler Intrinsic Equivalent

ADDSUBPS: __m128 _mm_addsub_ps(__m128 a, __m128 b)

VADDSUBPS: __m256 _mm256_addsub_ps (__m256 a, __m256 b)

Exceptions

When the source operand is a memory operand, the operand must be aligned on a 16-byte boundary or a general-

protection exception (#GP) will be generated.

Figure 3-4. ADDSUBPS—Packed Single-FP Add/Subtract

OM15992

ADDSUBPS xmm1, xmm2/m128

RESULT:

xmm1

xmm2/

m128

xmm1[31:0] -

xmm2/m128[31:0]

[31:0]

xmm1[63:32] +

xmm2/m128[63:32]

[63:32]

xmm1[95:64] - xmm2/

m128[95:64]

[95:64]

xmm1[127:96] +

xmm2/m128[127:96]

[127:96]

[127:96] [95:64] [63:32] [31:0]

ADDSUBPS—Packed Single-FP Add/Subtract

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-47

SIMD Floating-Point Exceptions

Overflow, Underflow, Invalid, Precision, Denormal.

Other Exceptions

See Exceptions Type 2.

ADOX — Unsigned Integer Addition of Two Operands with Overflow Flag

INSTRUCTION SET REFERENCE, A-L

3-48 Vol. 2A

ADOX — Unsigned Integer Addition of Two Operands with Overflow Flag

Instruction Operand Encoding

Description

Performs an unsigned addition of the destination operand (first operand), the source operand (second operand)

and the overflow-flag (OF) and stores the result in the destination operand. The destination operand is a general-

purpose register, whereas the source operand can be a general-purpose register or memory location. The state of

OF represents a carry from a previous addition. The instruction sets the OF flag with the carry generated by the

unsigned addition of the operands.

The ADOX instruction is executed in the context of multi-precision addition, where we add a series of operands with

a carry-chain. At the beginning of a chain of additions, we execute an instruction to zero the OF (e.g. XOR).

This instruction is supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64-bit

mode.

In 64-bit mode, the default operation size is 32 bits. Using a REX Prefix in the form of REX.R permits access to addi-

tional registers (R8-15). Using REX Prefix in the form of REX.W promotes operation to 64-bits.

ADOX executes normally either inside or outside a transaction region.

Note: ADOX defines the CF and OF flags differently than the ADD/ADC instructions as defined in Intel® 64 and

IA-32 Architectures Software Developer’s Manual, Volume 2A.

Operation

IF OperandSize is 64-bit

THEN OF:DEST[63:0] ← DEST[63:0] + SRC[63:0] + OF;

ELSE OF:DEST[31:0] ← DEST[31:0] + SRC[31:0] + OF;

FI;

Flags Affected

OF is updated based on result. CF, SF, ZF, AF and PF flags are unmodified.

Intel C/C++ Compiler Intrinsic Equivalent

unsigned char _addcarryx_u32 (unsigned char c_in, unsigned int src1, unsigned int src2, unsigned int *sum_out);

unsigned char _addcarryx_u64 (unsigned char c_in, unsigned __int64 src1, unsigned __int64 src2, unsigned __int64 *sum_out);

SIMD Floating-Point Exceptions

None

Opcode/

Instruction

Op/

64/32bit

Mode

Support

CPUID

Feature

Flag

Description

F3 0F 38 F6 /r

ADOX r32, r/m32

RM V/V ADX Unsigned addition of r32 with OF, r/m32 to r32, writes OF.

F3 REX.w 0F 38 F6 /r

ADOX r64, r/m64

RM V/NE ADX Unsigned addition of r64 with OF, r/m64 to r64, writes OF.

Op/En Operand 1 Operand 2 Operand 3 Operand 4

RM ModRM:reg (r, w) ModRM:r/m (r) NA NA

ADOX — Unsigned Integer Addition of Two Operands with Overflow Flag

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-49

Protected Mode Exceptions

#UD If the LOCK prefix is used.

If CPUID.(EAX=07H, ECX=0H):EBX.ADX[bit 19] = 0.

#SS(0) For an illegal address in the SS segment.

#GP(0) For an illegal memory operand effective address in the CS, DS, ES, FS or GS segments.

If the DS, ES, FS, or GS register is used to access memory and it contains a null segment

selector.

#PF(fault-code) For a page fault.

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the

current privilege level is 3.

Real-Address Mode Exceptions

#UD If the LOCK prefix is used.

If CPUID.(EAX=07H, ECX=0H):EBX.ADX[bit 19] = 0.

#SS(0) For an illegal address in the SS segment.

#GP(0) If any part of the operand lies outside the effective address space from 0 to FFFFH.

Virtual-8086 Mode Exceptions

#UD If the LOCK prefix is used.

If CPUID.(EAX=07H, ECX=0H):EBX.ADX[bit 19] = 0.

#SS(0) For an illegal address in the SS segment.

#GP(0) If any part of the operand lies outside the effective address space from 0 to FFFFH.

#PF(fault-code) For a page fault.

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the

current privilege level is 3.

Compatibility Mode Exceptions

Same exceptions as in protected mode.

64-Bit Mode Exceptions

#UD If the LOCK prefix is used.

If CPUID.(EAX=07H, ECX=0H):EBX.ADX[bit 19] = 0.

#SS(0) If a memory address referencing the SS segment is in a non-canonical form.

#GP(0) If the memory address is in a non-canonical form.

#PF(fault-code) For a page fault.

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the

current privilege level is 3.

AESDEC—Perform One Round of an AES Decryption Flow

INSTRUCTION SET REFERENCE, A-L

3-50 Vol. 2A

AESDEC—Perform One Round of an AES Decryption Flow

Instruction Operand Encoding

Description

This instruction performs a single round of the AES decryption flow using the Equivalent Inverse Cipher, with the

round key from the second source operand, operating on a 128-bit data (state) from the first source operand, and

store the result in the destination operand.

Use the AESDEC instruction for all but the last decryption round. For the last decryption round, use the AESDE-

CLAST instruction.

128-bit Legacy SSE version: The first source operand and the destination operand are the same and must be an

XMM register. The second source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-

1:128) of the corresponding YMM destination register remain unchanged.

VEX.128 encoded version: The first source operand and the destination operand are XMM registers. The second

source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination YMM

Operation

AESDEC

STATE ← SRC1;

RoundKey ← SRC2;

STATE ← InvShiftRows( STATE );

STATE ← InvSubBytes( STATE );

STATE ← InvMixColumns( STATE );

DEST[127:0] ← STATE XOR RoundKey;

DEST[VLMAX-1:128] (Unmodified)

VAESDEC

STATE ← SRC1;

RoundKey ← SRC2;

STATE ← InvShiftRows( STATE );

STATE ← InvSubBytes( STATE );

STATE ← InvMixColumns( STATE );

DEST[127:0] ← STATE XOR RoundKey;

DEST[VLMAX-1:128] ← 0

Opcode/

Instruction

Op/

64/32-bit

Mode

CPUID

Feature

Flag

Description

66 0F 38 DE /r

AESDEC xmm1, xmm2/m128

RM V/V AES Perform one round of an AES decryption flow,

using the Equivalent Inverse Cipher, operating

on a 128-bit data (state) from xmm1 with a

128-bit round key from xmm2/m128.

VEX.NDS.128.66.0F38.WIG DE /r

VAESDEC xmm1, xmm2, xmm3/m128

RVM V/V Both AES

and

AVX flags

Perform one round of an AES decryption flow,

using the Equivalent Inverse Cipher, operating

on a 128-bit data (state) from xmm2 with a

128-bit round key from xmm3/m128; store

the result in xmm1.

Op/En Operand 1 Operand2 Operand3 Operand4

RM ModRM:reg (r, w) ModRM:r/m (r) NA NA

RVM ModRM:reg (w) VEX.vvvv (r) ModRM:r/m (r) NA

AESDEC—Perform One Round of an AES Decryption Flow

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-51

Intel C/C++ Compiler Intrinsic Equivalent

(V)AESDEC: __m128i _mm_aesdec (__m128i, __m128i)

SIMD Floating-Point Exceptions

None

Other Exceptions

See Exceptions Type 4.

AESDECLAST—Perform Last Round of an AES Decryption Flow

INSTRUCTION SET REFERENCE, A-L

3-52 Vol. 2A

AESDECLAST—Perform Last Round of an AES Decryption Flow

Instruction Operand Encoding

Description

This instruction performs the last round of the AES decryption flow using the Equivalent Inverse Cipher, with the

round key from the second source operand, operating on a 128-bit data (state) from the first source operand, and

store the result in the destination operand.

128-bit Legacy SSE version: The first source operand and the destination operand are the same and must be an

XMM register. The second source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-

1:128) of the corresponding YMM destination register remain unchanged.

VEX.128 encoded version: The first source operand and the destination operand are XMM registers. The second

source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination YMM

Operation

AESDECLAST

STATE ← SRC1;

RoundKey ← SRC2;

STATE ← InvShiftRows( STATE );

STATE ← InvSubBytes( STATE );

DEST[127:0] ← STATE XOR RoundKey;

DEST[VLMAX-1:128] (Unmodified)

VAESDECLAST

STATE ← SRC1;

RoundKey ← SRC2;

STATE ← InvShiftRows( STATE );

STATE ← InvSubBytes( STATE );

DEST[127:0] ← STATE XOR RoundKey;

DEST[VLMAX-1:128] ← 0

Intel C/C++ Compiler Intrinsic Equivalent

(V)AESDECLAST: __m128i _mm_aesdeclast (__m128i, __m128i)

Opcode/

Instruction

Op/

64/32-bit

Mode

CPUID

Feature

Flag

Description

66 0F 38 DF /r

AESDECLAST xmm1, xmm2/m128

RM V/V AES Perform the last round of an AES decryption

flow, using the Equivalent Inverse Cipher,

operating on a 128-bit data (state) from

xmm1 with a 128-bit round key from

xmm2/m128.

VEX.NDS.128.66.0F38.WIG DF /r

VAESDECLAST xmm1, xmm2, xmm3/m128

RVM V/V Both AES

and

AVX flags

Perform the last round of an AES decryption

flow, using the Equivalent Inverse Cipher,

operating on a 128-bit data (state) from

xmm2 with a 128-bit round key from

xmm3/m128; store the result in xmm1.

Op/En Operand 1 Operand2 Operand3 Operand4

RM ModRM:reg (r, w) ModRM:r/m (r) NA NA

RVM ModRM:reg (w) VEX.vvvv (r) ModRM:r/m (r) NA

AESDECLAST—Perform Last Round of an AES Decryption Flow

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-53

SIMD Floating-Point Exceptions

None

Other Exceptions

See Exceptions Type 4.

AESENC—Perform One Round of an AES Encryption Flow

INSTRUCTION SET REFERENCE, A-L

3-54 Vol. 2A

AESENC—Perform One Round of an AES Encryption Flow

Instruction Operand Encoding

Description

This instruction performs a single round of an AES encryption flow using a round key from the second source

operand, operating on 128-bit data (state) from the first source operand, and store the result in the destination

operand.

Use the AESENC instruction for all but the last encryption rounds. For the last encryption round, use the AESENC-

CLAST instruction.

128-bit Legacy SSE version: The first source operand and the destination operand are the same and must be an

XMM register. The second source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-

1:128) of the corresponding YMM destination register remain unchanged.

VEX.128 encoded version: The first source operand and the destination operand are XMM registers. The second

source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination YMM

Operation

AESENC

STATE ← SRC1;

RoundKey ← SRC2;

STATE ← ShiftRows( STATE );

STATE ← SubBytes( STATE );

STATE ← MixColumns( STATE );

DEST[127:0] ← STATE XOR RoundKey;

DEST[VLMAX-1:128] (Unmodified)

VAESENC

STATE  SRC1;

RoundKey  SRC2;

STATE  ShiftRows( STATE );

STATE  SubBytes( STATE );

STATE  MixColumns( STATE );

DEST[127:0]  STATE XOR RoundKey;

DEST[VLMAX-1:128]  0

Opcode/

Instruction

Op/

64/32-bit

Mode

CPUID

Feature

Flag

Description

66 0F 38 DC /r

AESENC xmm1, xmm2/m128

RM V/V AES Perform one round of an AES encryption flow,

operating on a 128-bit data (state) from

xmm1 with a 128-bit round key from

xmm2/m128.

VEX.NDS.128.66.0F38.WIG DC /r

VAESENC xmm1, xmm2, xmm3/m128

RVM V/V Both AES

and

AVX flags

Perform one round of an AES encryption flow,

operating on a 128-bit data (state) from

xmm2 with a 128-bit round key from the

xmm3/m128; store the result in xmm1.

Op/En Operand 1 Operand2 Operand3 Operand4

RM ModRM:reg (r, w) ModRM:r/m (r) NA NA

RVM ModRM:reg (w) VEX.vvvv (r) ModRM:r/m (r) NA

AESENC—Perform One Round of an AES Encryption Flow

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-55

Intel C/C++ Compiler Intrinsic Equivalent

(V)AESENC: __m128i _mm_aesenc (__m128i, __m128i)

SIMD Floating-Point Exceptions

None

Other Exceptions

See Exceptions Type 4.

AESENCLAST—Perform Last Round of an AES Encryption Flow

INSTRUCTION SET REFERENCE, A-L

3-56 Vol. 2A

AESENCLAST—Perform Last Round of an AES Encryption Flow

Instruction Operand Encoding

Description

This instruction performs the last round of an AES encryption flow using a round key from the second source

operand, operating on 128-bit data (state) from the first source operand, and store the result in the destination

operand.

128-bit Legacy SSE version: The first source operand and the destination operand are the same and must be an

XMM register. The second source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-

1:128) of the corresponding YMM destination register remain unchanged.

VEX.128 encoded version: The first source operand and the destination operand are XMM registers. The second

source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination YMM

Operation

AESENCLAST

STATE ← SRC1;

RoundKey ← SRC2;

STATE ← ShiftRows( STATE );

STATE ← SubBytes( STATE );

DEST[127:0] ← STATE XOR RoundKey;

DEST[VLMAX-1:128] (Unmodified)

VAESENCLAST

STATE  SRC1;

RoundKey  SRC2;

STATE  ShiftRows( STATE );

STATE  SubBytes( STATE );

DEST[127:0]  STATE XOR RoundKey;

DEST[VLMAX-1:128]  0

Intel C/C++ Compiler Intrinsic Equivalent

(V)AESENCLAST: __m128i _mm_aesenclast (__m128i, __m128i)

Opcode/

Instruction

Op/

64/32-bit

Mode

CPUID

Feature

Flag

Description

66 0F 38 DD /r

AESENCLAST xmm1, xmm2/m128

RM V/V AES Perform the last round of an AES encryption

flow, operating on a 128-bit data (state) from

xmm1 with a 128-bit round key from

xmm2/m128.

VEX.NDS.128.66.0F38.WIG DD /r

VAESENCLAST xmm1, xmm2, xmm3/m128

RVM V/V Both AES

and

AVX flags

Perform the last round of an AES encryption

flow, operating on a 128-bit data (state) from

xmm2 with a 128 bit round key from

xmm3/m128; store the result in xmm1.

Op/En Operand 1 Operand2 Operand3 Operand4

RM ModRM:reg (r, w) ModRM:r/m (r) NA NA

RVM ModRM:reg (w) VEX.vvvv (r) ModRM:r/m (r) NA

AESENCLAST—Perform Last Round of an AES Encryption Flow

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-57

SIMD Floating-Point Exceptions

None

Other Exceptions

See Exceptions Type 4.

AESIMC—Perform the AES InvMixColumn Transformation

INSTRUCTION SET REFERENCE, A-L

3-58 Vol. 2A

AESIMC—Perform the AES InvMixColumn Transformation

Instruction Operand Encoding

Description

Perform the InvMixColumns transformation on the source operand and store the result in the destination operand.

The destination operand is an XMM register. The source operand can be an XMM register or a 128-bit memory loca-

tion.

Note: the AESIMC instruction should be applied to the expanded AES round keys (except for the first and last round

key) in order to prepare them for decryption using the “Equivalent Inverse Cipher” (defined in FIPS 197).

128-bit Legacy SSE version: Bits (VLMAX-1:128) of the corresponding YMM destination register remain

unchanged.

VEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register are zeroed.

Note: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.

Operation

AESIMC

DEST[127:0] ← InvMixColumns( SRC );

DEST[VLMAX-1:128] (Unmodified)

VAESIMC

DEST[127:0]  InvMixColumns( SRC );

DEST[VLMAX-1:128]  0;

Intel C/C++ Compiler Intrinsic Equivalent

(V)AESIMC: __m128i _mm_aesimc (__m128i)

SIMD Floating-Point Exceptions

None

Other Exceptions

See Exceptions Type 4; additionally

#UD If VEX.vvvv ≠ 1111B.

Opcode/

Instruction

Op/

64/32-bit

Mode

CPUID

Feature

Flag

Description

66 0F 38 DB /r

AESIMC xmm1, xmm2/m128

RM V/V AES Perform the InvMixColumn transformation on

a 128-bit round key from xmm2/m128 and

store the result in xmm1.

VEX.128.66.0F38.WIG DB /r

VAESIMC xmm1, xmm2/m128

RM V/V Both AES

and

AVX flags

Perform the InvMixColumn transformation on

a 128-bit round key from xmm2/m128 and

store the result in xmm1.

Op/En Operand 1 Operand2 Operand3 Operand4

RM ModRM:reg (w) ModRM:r/m (r) NA NA

AESKEYGENASSIST—AES Round Key Generation Assist

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-59

AESKEYGENASSIST—AES Round Key Generation Assist

Instruction Operand Encoding

Description

Assist in expanding the AES cipher key, by computing steps towards generating a round key for encryption, using

128-bit data specified in the source operand and an 8-bit round constant specified as an immediate, store the

result in the destination operand.

The destination operand is an XMM register. The source operand can be an XMM register or a 128-bit memory loca-

tion.

128-bit Legacy SSE version: Bits (VLMAX-1:128) of the corresponding YMM destination register remain

unchanged.

VEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register are zeroed.

Note: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.

Operation

AESKEYGENASSIST

X3[31:0] ← SRC [127: 96];

X2[31:0] ← SRC [95: 64];

X1[31:0] ← SRC [63: 32];

X0[31:0] ← SRC [31: 0];

RCON[31:0] ← ZeroExtend(Imm8[7:0]);

DEST[31:0] ← SubWord(X1);

DEST[63:32 ] ← RotWord( SubWord(X1) ) XOR RCON;

DEST[95:64] ← SubWord(X3);

DEST[127:96] ← RotWord( SubWord(X3) ) XOR RCON;

DEST[VLMAX-1:128] (Unmodified)

Opcode/

Instruction

Op/

64/32-bit

Mode

CPUID

Feature

Flag

Description

66 0F 3A DF /r ib

AESKEYGENASSIST xmm1, xmm2/m128, imm8

RMI V/V AES Assist in AES round key generation using an 8

bits Round Constant (RCON) specified in the

immediate byte, operating on 128 bits of data

specified in xmm2/m128 and stores the

result in xmm1.

VEX.128.66.0F3A.WIG DF /r ib

VAESKEYGENASSIST xmm1, xmm2/m128, imm8

RMI V/V Both AES

and

AVX flags

Assist in AES round key generation using 8

bits Round Constant (RCON) specified in the

immediate byte, operating on 128 bits of data

specified in xmm2/m128 and stores the

result in xmm1.

Op/En Operand 1 Operand2 Operand3 Operand4

RMI ModRM:reg (w) ModRM:r/m (r) imm8 NA

AESKEYGENASSIST—AES Round Key Generation Assist

INSTRUCTION SET REFERENCE, A-L

3-60 Vol. 2A

VAESKEYGENASSIST

X3[31:0]  SRC [127: 96];

X2[31:0]  SRC [95: 64];

X1[31:0]  SRC [63: 32];

X0[31:0]  SRC [31: 0];

RCON[31:0]  ZeroExtend(Imm8[7:0]);

DEST[31:0]  SubWord(X1);

DEST[63:32 ]  RotWord( SubWord(X1) ) XOR RCON;

DEST[95:64]  SubWord(X3);

DEST[127:96]  RotWord( SubWord(X3) ) XOR RCON;

DEST[VLMAX-1:128]  0;

Intel C/C++ Compiler Intrinsic Equivalent

(V)AESKEYGENASSIST: __m128i _mm_aeskeygenassist (__m128i, const int)

SIMD Floating-Point Exceptions

None

Other Exceptions

See Exceptions Type 4; additionally

#UD If VEX.vvvv ≠ 1111B.

AND—Logical AND

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-61

AND—Logical AND

Instruction Operand Encoding

Description

Performs a bitwise AND operation on the destination (first) and source (second) operands and stores the result in

the destination operand location. The source operand can be an immediate, a register, or a memory location; the

destination operand can be a register or a memory location. (However, two memory operands cannot be used in

one instruction.) Each bit of the result is set to 1 if both corresponding bits of the first and second operands are 1;

otherwise, it is set to 0.

This instruction can be used with a LOCK prefix to allow the it to be executed atomically.

In 64-bit mode, the instruction’s default operation size is 32 bits. Using a REX prefix in the form of REX.R permits

access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See

the summary chart at the beginning of this section for encoding data and limits.

Opcode Instruction Op/

64-bit

Mode

Compat/

Leg Mode

Description

24 ib AND AL, imm8 IValid Valid AL AND imm8.

25 iw AND AX, imm16 IValid Valid AX AND imm16.

25 id AND EAX, imm32 IValid Valid EAX AND imm32.

REX.W + 25 id AND RAX, imm32 IValid N.E. RAX AND imm32 sign-extended to 64-bits.

80 /4 ib AND r/m8, imm8 MI Valid Valid r/m8 AND imm8.

REX + 80 /4 ib AND r/m8*, imm8 MI Valid N.E. r/m8 AND imm8.

81 /4 iw AND r/m16, imm16 MI Valid Valid r/m16 AND imm16.

81 /4 id AND r/m32, imm32 MI Valid Valid r/m32 AND imm32.

REX.W + 81 /4 id AND r/m64, imm32 MI Valid N.E. r/m64 AND imm32 sign extended to 64-bits.

83 /4 ib AND r/m16, imm8 MI Valid Valid r/m16 AND imm8 (sign-extended).

83 /4 ib AND r/m32, imm8 MI Valid Valid r/m32 AND imm8 (sign-extended).

REX.W + 83 /4 ib AND r/m64, imm8 MI Valid N.E. r/m64 AND imm8 (sign-extended).

20 /r AND r/m8, r8 MR Valid Valid r/m8 AND r8.

REX + 20 /r AND r/m8*, r8*MR Valid N.E. r/m64 AND r8 (sign-extended).

21 /rAND r/m16, r16 MR Valid Valid r/m16 AND r16.

21 /rAND r/m32, r32 MR Valid Valid r/m32 AND r32.

REX.W + 21 /rAND r/m64, r64 MR Valid N.E. r/m64 AND r32.

22 /rAND r8, r/m8 RM Valid Valid r8 AND r/m8.

REX + 22 /rAND r8*, r/m8*RM Valid N.E. r/m64 AND r8 (sign-extended).

23 /rAND r16, r/m16 RM Valid Valid r16 AND r/m16.

23 /rAND r32, r/m32 RM Valid Valid r32 AND r/m32.

REX.W + 23 /rAND r64, r/m64 RM Valid N.E. r64 AND r/m64.

NOTES:

*In 64-bit mode, r/m8 can not be encoded to access the following byte registers if a REX prefix is used: AH, BH, CH, DH.

Op/En Operand 1 Operand 2 Operand 3 Operand 4

RM ModRM:reg (r, w) ModRM:r/m (r) NA NA

MR ModRM:r/m (r, w) ModRM:reg (r) NA NA

MI ModRM:r/m (r, w) imm8 NA NA

I AL/AX/EAX/RAX imm8 NA NA

AND—Logical AND

INSTRUCTION SET REFERENCE, A-L

3-62 Vol. 2A

Operation

DEST ← DEST AND SRC;

Flags Affected

The OF and CF flags are cleared; the SF, ZF, and PF flags are set according to the result. The state of the AF flag is

undefined.

Protected Mode Exceptions

#GP(0) If the destination operand points to a non-writable segment.

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.

If the DS, ES, FS, or GS register contains a NULL segment selector.

#SS(0) If a memory operand effective address is outside the SS segment limit.

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the

current privilege level is 3.

#UD If the LOCK prefix is used but the destination is not a memory operand.

Real-Address Mode Exceptions

#GP If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.

#SS If a memory operand effective address is outside the SS segment limit.

#UD If the LOCK prefix is used but the destination is not a memory operand.

Virtual-8086 Mode Exceptions

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.

#SS(0) If a memory operand effective address is outside the SS segment limit.

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory reference is made.

#UD If the LOCK prefix is used but the destination is not a memory operand.

Compatibility Mode Exceptions

Same exceptions as in protected mode.

64-Bit Mode Exceptions

#SS(0) If a memory address referencing the SS segment is in a non-canonical form.

#GP(0) If the memory address is in a non-canonical form.

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the

current privilege level is 3.

#UD If the LOCK prefix is used but the destination is not a memory operand.

ANDN — Logical AND NOT

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-63

ANDN — Logical AND NOT

Instruction Operand Encoding

Description

Performs a bitwise logical AND of inverted second operand (the first source operand) with the third operand (the

second source operand). The result is stored in the first operand (destination operand).

This instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in

64-bit mode. In 64-bit mode operand size 64 requires VEX.W1. VEX.W1 is ignored in non-64-bit modes. An

attempt to execute this instruction with VEX.L not equal to 0 will cause #UD.

Operation

DEST ← (NOT SRC1) bitwiseAND SRC2;

SF ← DEST[OperandSize -1];

ZF ← (DEST = 0);

Flags Affected

SF and ZF are updated based on result. OF and CF flags are cleared. AF and PF flags are undefined.

Intel C/C++ Compiler Intrinsic Equivalent

Auto-generated from high-level language.

SIMD Floating-Point Exceptions

None

Other Exceptions

See Section 2.5.1, “Exception Conditions for VEX-Encoded GPR Instructions”, Table 2-29; additionally

#UD If VEX.W = 1.

Opcode/Instruction Op/

64/32

-bit

Mode

CPUID

Feature

Flag

Description

VEX.NDS.LZ.0F38.W0 F2 /r

ANDN r32a, r32b, r/m32

RVM V/V BMI1 Bitwise AND of inverted r32b with r/m32, store result in r32a.

VEX.NDS.LZ. 0F38.W1 F2 /r

ANDN r64a, r64b, r/m64

RVM V/NE BMI1 Bitwise AND of inverted r64b with r/m64, store result in r64a.

Op/En Operand 1 Operand 2 Operand 3 Operand 4

RVM ModRM:reg (w) VEX.vvvv (r) ModRM:r/m (r) NA

ANDPD—Bitwise Logical AND of Packed Double Precision Floating-Point Values

INSTRUCTION SET REFERENCE, A-L

3-64 Vol. 2A

ANDPD—Bitwise Logical AND of Packed Double Precision Floating-Point Values

Instruction Operand Encoding

Description

Performs a bitwise logical AND of the two, four or eight packed double-precision floating-point values from the first

source operand and the second source operand, and stores the result in the destination operand.

EVEX encoded versions: The first source operand is a ZMM/YMM/XMM register. The second source operand can be

a ZMM/YMM/XMM register, a 512/256/128-bit memory location, or a 512/256/128-bit vector broadcasted from a

64-bit memory location. The destination operand is a ZMM/YMM/XMM register conditionally updated with

writemask k1.

VEX.256 encoded version: The first source operand is a YMM register. The second source operand is a YMM register

or a 256-bit memory location. The destination operand is a YMM register. The upper bits (MAX_VL-1:256) of the

corresponding ZMM register destination are zeroed.

VEX.128 encoded version: The first source operand is an XMM register. The second source operand is an XMM

of the corresponding ZMM register destination are zeroed.

128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-

nation is not distinct from the first source XMM register and the upper bits (MAX_VL-1:128) of the corresponding

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

66 0F 54 /r

ANDPD xmm1, xmm2/m128

RM V/V SSE2 Return the bitwise logical AND of packed double-

precision floating-point values in xmm1 and xmm2/mem.

VEX.NDS.128.66.0F 54 /r

VANDPD xmm1, xmm2,

xmm3/m128

RVM V/V AVX Return the bitwise logical AND of packed double-

precision floating-point values in xmm2 and xmm3/mem.

VEX.NDS.256.66.0F 54 /r

VANDPD ymm1, ymm2,

ymm3/m256

RVM V/V AVX Return the bitwise logical AND of packed double-

precision floating-point values in ymm2 and ymm3/mem.

EVEX.NDS.128.66.0F.W1 54 /r

VANDPD xmm1 {k1}{z}, xmm2,

xmm3/m128/m64bcst

FV V/V AVX512VL

AVX512DQ

Return the bitwise logical AND of packed double-

precision floating-point values in xmm2 and

xmm3/m128/m64bcst subject to writemask k1.

EVEX.NDS.256.66.0F.W1 54 /r

VANDPD ymm1 {k1}{z}, ymm2,

ymm3/m256/m64bcst

FV V/V AVX512VL

AVX512DQ

Return the bitwise logical AND of packed double-

precision floating-point values in ymm2 and

ymm3/m256/m64bcst subject to writemask k1.

EVEX.NDS.512.66.0F.W1 54 /r

VANDPD zmm1 {k1}{z}, zmm2,

zmm3/m512/m64bcst

FV V/V AVX512DQ Return the bitwise logical AND of packed double-

precision floating-point values in zmm2 and

zmm3/m512/m64bcst subject to writemask k1.

Op/En Operand 1 Operand 2 Operand 3 Operand 4

RM ModRM:reg (r, w) ModRM:r/m (r) NA NA

RVM ModRM:reg (w) VEX.vvvv ModRM:r/m (r) NA

FV ModRM:reg (w) EVEX.vvvv ModRM:r/m (r) NA

ANDPD—Bitwise Logical AND of Packed Double Precision Floating-Point Values

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-65

Operation

VANDPD (EVEX encoded versions)

(KL, VL) = (2, 128), (4, 256), (8, 512)

FOR j  0 TO KL-1

i  j * 64

IF k1[j] OR *no writemask*

THEN

IF (EVEX.b == 1) AND (SRC2 *is memory*)

THEN

DEST[i+63:i]  SRC1[i+63:i] BITWISE AND SRC2[63:0]

ELSE

DEST[i+63:i]  SRC1[i+63:i] BITWISE AND SRC2[i+63:i]

FI;

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+63:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+63:i] = 0

FI;

ENDFOR

DEST[MAX_VL-1:VL]  0

VANDPD (VEX.256 encoded version)

DEST[63:0]  SRC1[63:0] BITWISE AND SRC2[63:0]

DEST[127:64]  SRC1[127:64] BITWISE AND SRC2[127:64]

DEST[191:128]  SRC1[191:128] BITWISE AND SRC2[191:128]

DEST[255:192]  SRC1[255:192] BITWISE AND SRC2[255:192]

DEST[MAX_VL-1:256]  0

VANDPD (VEX.128 encoded version)

DEST[63:0]  SRC1[63:0] BITWISE AND SRC2[63:0]

DEST[127:64]  SRC1[127:64] BITWISE AND SRC2[127:64]

DEST[MAX_VL-1:128]  0

ANDPD (128-bit Legacy SSE version)

DEST[63:0]  DEST[63:0] BITWISE AND SRC[63:0]

DEST[127:64]  DEST[127:64] BITWISE AND SRC[127:64]

DEST[MAX_VL-1:128] (Unmodified)

Intel C/C++ Compiler Intrinsic Equivalent

VANDPD __m512d _mm512_and_pd (__m512d a, __m512d b);

VANDPD __m512d _mm512_mask_and_pd (__m512d s, __mmask8 k, __m512d a, __m512d b);

VANDPD __m512d _mm512_maskz_and_pd (__mmask8 k, __m512d a, __m512d b);

VANDPD __m256d _mm256_mask_and_pd (__m256d s, __mmask8 k, __m256d a, __m256d b);

VANDPD __m256d _mm256_maskz_and_pd (__mmask8 k, __m256d a, __m256d b);

VANDPD __m128d _mm_mask_and_pd (__m128d s, __mmask8 k, __m128d a, __m128d b);

VANDPD __m128d _mm_maskz_and_pd (__mmask8 k, __m128d a, __m128d b);

VANDPD __m256d _mm256_and_pd (__m256d a, __m256d b);

ANDPD __m128d _mm_and_pd (__m128d a, __m128d b);

SIMD Floating-Point Exceptions

None

ANDPD—Bitwise Logical AND of Packed Double Precision Floating-Point Values

INSTRUCTION SET REFERENCE, A-L

3-66 Vol. 2A

Other Exceptions

VEX-encoded instruction, see Exceptions Type 4.

EVEX-encoded instruction, see Exceptions Type E4.

ANDPS—Bitwise Logical AND of Packed Single Precision Floating-Point Values

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-67

ANDPS—Bitwise Logical AND of Packed Single Precision Floating-Point Values

Instruction Operand Encoding

Description

Performs a bitwise logical AND of the four, eight or sixteen packed single-precision floating-point values from the

first source operand and the second source operand, and stores the result in the destination operand.

EVEX encoded versions: The first source operand is a ZMM/YMM/XMM register. The second source operand can be

a ZMM/YMM/XMM register, a 512/256/128-bit memory location, or a 512/256/128-bit vector broadcasted from a

32-bit memory location. The destination operand is a ZMM/YMM/XMM register conditionally updated with

writemask k1.

VEX.256 encoded version: The first source operand is a YMM register. The second source operand is a YMM register

or a 256-bit memory location. The destination operand is a YMM register. The upper bits (MAX_VL-1:256) of the

corresponding ZMM register destination are zeroed.

VEX.128 encoded version: The first source operand is an XMM register. The second source operand is an XMM

of the corresponding ZMM register destination are zeroed.

128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-

nation is not distinct from the first source XMM register and the upper bits (MAX_VL-1:128) of the corresponding

ZMM register destination are unmodified.

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

0F 54 /r

ANDPS xmm1, xmm2/m128

RM V/V SSE Return the bitwise logical AND of packed single-precision

floating-point values in xmm1 and xmm2/mem.

VEX.NDS.128.0F 54 /r

VANDPS xmm1,xmm2,

xmm3/m128

RVM V/V AVX Return the bitwise logical AND of packed single-precision

floating-point values in xmm2 and xmm3/mem.

VEX.NDS.256.0F 54 /r

VANDPS ymm1, ymm2,

ymm3/m256

RVM V/V AVX Return the bitwise logical AND of packed single-precision

floating-point values in ymm2 and ymm3/mem.

EVEX.NDS.128.0F.W0 54 /r

VANDPS xmm1 {k1}{z}, xmm2,

xmm3/m128/m32bcst

FV V/V AVX512VL

AVX512DQ

Return the bitwise logical AND of packed single-precision

floating-point values in xmm2 and xmm3/m128/m32bcst

subject to writemask k1.

EVEX.NDS.256.0F.W0 54 /r

VANDPS ymm1 {k1}{z}, ymm2,

ymm3/m256/m32bcst

FV V/V AVX512VL

AVX512DQ

Return the bitwise logical AND of packed single-precision

floating-point values in ymm2 and ymm3/m256/m32bcst

subject to writemask k1.

EVEX.NDS.512.0F.W0 54 /r

VANDPS zmm1 {k1}{z}, zmm2,

zmm3/m512/m32bcst

FV V/V AVX512DQ Return the bitwise logical AND of packed single-precision

floating-point values in zmm2 and zmm3/m512/m32bcst

subject to writemask k1.

Op/En Operand 1 Operand 2 Operand 3 Operand 4

RM ModRM:reg (r, w) ModRM:r/m (r) NA NA

RVM ModRM:reg (w) VEX.vvvv ModRM:r/m (r) NA

FV ModRM:reg (w) EVEX.vvvv ModRM:r/m (r) NA

ANDPS—Bitwise Logical AND of Packed Single Precision Floating-Point Values

INSTRUCTION SET REFERENCE, A-L

3-68 Vol. 2A

Operation

VANDPS (EVEX encoded versions)

(KL, VL) = (4, 128), (8, 256), (16, 512)

FOR j  0 TO KL-1

i  j * 32

IF k1[j] OR *no writemask*

IF (EVEX.b == 1) AND (SRC2 *is memory*)

THEN

DEST[i+63:i]  SRC1[i+31:i] BITWISE AND SRC2[31:0]

ELSE

DEST[i+31:i]  SRC1[i+31:i] BITWISE AND SRC2[i+31:i]

FI;

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+31:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+31:i]  0

FI;

ENDFOR

DEST[MAX_VL-1:VL]  0;

VANDPS (VEX.256 encoded version)

DEST[31:0]  SRC1[31:0] BITWISE AND SRC2[31:0]

DEST[63:32]  SRC1[63:32] BITWISE AND SRC2[63:32]

DEST[95:64]  SRC1[95:64] BITWISE AND SRC2[95:64]

DEST[127:96]  SRC1[127:96] BITWISE AND SRC2[127:96]

DEST[159:128]  SRC1[159:128] BITWISE AND SRC2[159:128]

DEST[191:160]  SRC1[191:160] BITWISE AND SRC2[191:160]

DEST[223:192]  SRC1[223:192] BITWISE AND SRC2[223:192]

DEST[255:224]  SRC1[255:224] BITWISE AND SRC2[255:224].

DEST[MAX_VL-1:256]  0;

VANDPS (VEX.128 encoded version)

DEST[31:0]  SRC1[31:0] BITWISE AND SRC2[31:0]

DEST[63:32]  SRC1[63:32] BITWISE AND SRC2[63:32]

DEST[95:64]  SRC1[95:64] BITWISE AND SRC2[95:64]

DEST[127:96]  SRC1[127:96] BITWISE AND SRC2[127:96]

DEST[MAX_VL-1:128]  0;

ANDPS (128-bit Legacy SSE version)

DEST[31:0]  DEST[31:0] BITWISE AND SRC[31:0]

DEST[63:32]  DEST[63:32] BITWISE AND SRC[63:32]

DEST[95:64]  DEST[95:64] BITWISE AND SRC[95:64]

DEST[127:96]  DEST[127:96] BITWISE AND SRC[127:96]

DEST[MAX_VL-1:128] (Unmodified)

ANDPS—Bitwise Logical AND of Packed Single Precision Floating-Point Values

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-69

Intel C/C++ Compiler Intrinsic Equivalent

VANDPS __m512 _mm512_and_ps (__m512 a, __m512 b);

VANDPS __m512 _mm512_mask_and_ps (__m512 s, __mmask16 k, __m512 a, __m512 b);

VANDPS __m512 _mm512_maskz_and_ps (__mmask16 k, __m512 a, __m512 b);

VANDPS __m256 _mm256_mask_and_ps (__m256 s, __mmask8 k, __m256 a, __m256 b);

VANDPS __m256 _mm256_maskz_and_ps (__mmask8 k, __m256 a, __m256 b);

VANDPS __m128 _mm_mask_and_ps (__m128 s, __mmask8 k, __m128 a, __m128 b);

VANDPS __m128 _mm_maskz_and_ps (__mmask8 k, __m128 a, __m128 b);

VANDPS __m256 _mm256_and_ps (__m256 a, __m256 b);

ANDPS __m128 _mm_and_ps (__m128 a, __m128 b);

SIMD Floating-Point Exceptions

None

Other Exceptions

VEX-encoded instruction, see Exceptions Type 4.

EVEX-encoded instruction, see Exceptions Type E4.

ANDNPD—Bitwise Logical AND NOT of Packed Double Precision Floating-Point Values

INSTRUCTION SET REFERENCE, A-L

3-70 Vol. 2A

ANDNPD—Bitwise Logical AND NOT of Packed Double Precision Floating-Point Values

Instruction Operand Encoding

Description

Performs a bitwise logical AND NOT of the two, four or eight packed double-precision floating-point values from the

first source operand and the second source operand, and stores the result in the destination operand.

EVEX encoded versions: The first source operand is a ZMM/YMM/XMM register. The second source operand can be

a ZMM/YMM/XMM register, a 512/256/128-bit memory location, or a 512/256/128-bit vector broadcasted from a

64-bit memory location. The destination operand is a ZMM/YMM/XMM register conditionally updated with

writemask k1.

VEX.256 encoded version: The first source operand is a YMM register. The second source operand is a YMM register

or a 256-bit memory location. The destination operand is a YMM register. The upper bits (MAX_VL-1:256) of the

corresponding ZMM register destination are zeroed.

VEX.128 encoded version: The first source operand is an XMM register. The second source operand is an XMM

of the corresponding ZMM register destination are zeroed.

128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-

nation is not distinct from the first source XMM register and the upper bits (MAX_VL-1:128) of the corresponding

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

66 0F 55 /r

ANDNPD xmm1, xmm2/m128

RM V/V SSE2 Return the bitwise logical AND NOT of packed double-

precision floating-point values in xmm1 and xmm2/mem.

VEX.NDS.128.66.0F 55 /r

VANDNPD xmm1, xmm2,

xmm3/m128

RVM V/V AVX Return the bitwise logical AND NOT of packed double-

precision floating-point values in xmm2 and xmm3/mem.

VEX.NDS.256.66.0F 55/r

VANDNPD ymm1, ymm2,

ymm3/m256

RVM V/V AVX Return the bitwise logical AND NOT of packed double-

precision floating-point values in ymm2 and ymm3/mem.

EVEX.NDS.128.66.0F.W1 55 /r

VANDNPD xmm1 {k1}{z}, xmm2,

xmm3/m128/m64bcst

FV V/V AVX512VL

AVX512DQ

Return the bitwise logical AND NOT of packed double-

precision floating-point values in xmm2 and

xmm3/m128/m64bcst subject to writemask k1.

EVEX.NDS.256.66.0F.W1 55 /r

VANDNPD ymm1 {k1}{z}, ymm2,

ymm3/m256/m64bcst

FV V/V AVX512VL

AVX512DQ

Return the bitwise logical AND NOT of packed double-

precision floating-point values in ymm2 and

ymm3/m256/m64bcst subject to writemask k1.

EVEX.NDS.512.66.0F.W1 55 /r

VANDNPD zmm1 {k1}{z}, zmm2,

zmm3/m512/m64bcst

FV V/V AVX512DQ Return the bitwise logical AND NOT of packed double-

precision floating-point values in zmm2 and

zmm3/m512/m64bcst subject to writemask k1.

Op/En Operand 1 Operand 2 Operand 3 Operand 4

RM ModRM:reg (r, w) ModRM:r/m (r) NA NA

RVM ModRM:reg (w) VEX.vvvv ModRM:r/m (r) NA

FV ModRM:reg (w) EVEX.vvvv ModRM:r/m (r) NA

ANDNPD—Bitwise Logical AND NOT of Packed Double Precision Floating-Point Values

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-71

Operation

VANDNPD (EVEX encoded versions)

(KL, VL) = (2, 128), (4, 256), (8, 512)

FOR j  0 TO KL-1

i  j * 64

IF k1[j] OR *no writemask*

IF (EVEX.b == 1) AND (SRC2 *is memory*)

THEN

DEST[i+63:i]  (NOT(SRC1[i+63:i])) BITWISE AND SRC2[63:0]

ELSE

DEST[i+63:i]  (NOT(SRC1[i+63:i])) BITWISE AND SRC2[i+63:i]

FI;

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+63:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+63:i] = 0

FI;

ENDFOR

DEST[MAX_VL-1:VL]  0

VANDNPD (VEX.256 encoded version)

DEST[63:0]  (NOT(SRC1[63:0])) BITWISE AND SRC2[63:0]

DEST[127:64]  (NOT(SRC1[127:64])) BITWISE AND SRC2[127:64]

DEST[191:128]  (NOT(SRC1[191:128])) BITWISE AND SRC2[191:128]

DEST[255:192]  (NOT(SRC1[255:192])) BITWISE AND SRC2[255:192]

DEST[MAX_VL-1:256]  0

VANDNPD (VEX.128 encoded version)

DEST[63:0]  (NOT(SRC1[63:0])) BITWISE AND SRC2[63:0]

DEST[127:64]  (NOT(SRC1[127:64])) BITWISE AND SRC2[127:64]

DEST[MAX_VL-1:128]  0

ANDNPD (128-bit Legacy SSE version)

DEST[63:0]  (NOT(DEST[63:0])) BITWISE AND SRC[63:0]

DEST[127:64]  (NOT(DEST[127:64])) BITWISE AND SRC[127:64]

DEST[MAX_VL-1:128] (Unmodified)

Intel C/C++ Compiler Intrinsic Equivalent

VANDNPD __m512d _mm512_andnot_pd (__m512d a, __m512d b);

VANDNPD __m512d _mm512_mask_andnot_pd (__m512d s, __mmask8 k, __m512d a, __m512d b);

VANDNPD __m512d _mm512_maskz_andnot_pd (__mmask8 k, __m512d a, __m512d b);

VANDNPD __m256d _mm256_mask_andnot_pd (__m256d s, __mmask8 k, __m256d a, __m256d b);

VANDNPD __m256d _mm256_maskz_andnot_pd (__mmask8 k, __m256d a, __m256d b);

VANDNPD __m128d _mm_mask_andnot_pd (__m128d s, __mmask8 k, __m128d a, __m128d b);

VANDNPD __m128d _mm_maskz_andnot_pd (__mmask8 k, __m128d a, __m128d b);

VANDNPD __m256d _mm256_andnot_pd (__m256d a, __m256d b);

ANDNPD __m128d _mm_andnot_pd (__m128d a, __m128d b);

SIMD Floating-Point Exceptions

None

ANDNPD—Bitwise Logical AND NOT of Packed Double Precision Floating-Point Values

INSTRUCTION SET REFERENCE, A-L

3-72 Vol. 2A

Other Exceptions

VEX-encoded instruction, see Exceptions Type 4.

EVEX-encoded instruction, see Exceptions Type E4.

ANDNPS—Bitwise Logical AND NOT of Packed Single Precision Floating-Point Values

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-73

ANDNPS—Bitwise Logical AND NOT of Packed Single Precision Floating-Point Values

Instruction Operand Encoding

Description

Performs a bitwise logical AND NOT of the four, eight or sixteen packed single-precision floating-point values from

the first source operand and the second source operand, and stores the result in the destination operand.

EVEX encoded versions: The first source operand is a ZMM/YMM/XMM register. The second source operand can be

a ZMM/YMM/XMM register, a 512/256/128-bit memory location, or a 512/256/128-bit vector broadcasted from a

32-bit memory location. The destination operand is a ZMM/YMM/XMM register conditionally updated with

writemask k1.

VEX.256 encoded version: The first source operand is a YMM register. The second source operand is a YMM register

or a 256-bit memory location. The destination operand is a YMM register. The upper bits (MAX_VL-1:256) of the

corresponding ZMM register destination are zeroed.

VEX.128 encoded version: The first source operand is an XMM register. The second source operand is an XMM

of the corresponding ZMM register destination are zeroed.

128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-

nation is not distinct from the first source XMM register and the upper bits (MAX_VL-1:128) of the corresponding

ZMM register destination are unmodified.

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

0F 55 /r

ANDNPS xmm1, xmm2/m128

RM V/V SSE Return the bitwise logical AND NOT of packed single-precision

floating-point values in xmm1 and xmm2/mem.

VEX.NDS.128.0F 55 /r

VANDNPS xmm1, xmm2,

xmm3/m128

RVM V/V AVX Return the bitwise logical AND NOT of packed single-precision

floating-point values in xmm2 and xmm3/mem.

VEX.NDS.256.0F 55 /r

VANDNPS ymm1, ymm2,

ymm3/m256

RVM V/V AVX Return the bitwise logical AND NOT of packed single-precision

floating-point values in ymm2 and ymm3/mem.

EVEX.NDS.128.0F.W0 55 /r

VANDNPS xmm1 {k1}{z},

xmm2, xmm3/m128/m32bcst

FV V/V AVX512VL

AVX512DQ

Return the bitwise logical AND of packed single-precision

floating-point values in xmm2 and xmm3/m128/m32bcst

subject to writemask k1.

EVEX.NDS.256.0F.W0 55 /r

VANDNPS ymm1 {k1}{z},

ymm2, ymm3/m256/m32bcst

FV V/V AVX512VL

AVX512DQ

Return the bitwise logical AND of packed single-precision

floating-point values in ymm2 and ymm3/m256/m32bcst

subject to writemask k1.

EVEX.NDS.512.0F.W0 55 /r

VANDNPS zmm1 {k1}{z},

zmm2, zmm3/m512/m32bcst

FV V/V AVX512DQ Return the bitwise logical AND of packed single-precision

floating-point values in zmm2 and zmm3/m512/m32bcst

subject to writemask k1.

Op/En Operand 1 Operand 2 Operand 3 Operand 4

RM ModRM:reg (r, w) ModRM:r/m (r) NA NA

RVM ModRM:reg (w) VEX.vvvv ModRM:r/m (r) NA

FV ModRM:reg (w) EVEX.vvvv ModRM:r/m (r) NA

ANDNPS—Bitwise Logical AND NOT of Packed Single Precision Floating-Point Values

INSTRUCTION SET REFERENCE, A-L

3-74 Vol. 2A

Operation

VANDNPS (EVEX encoded versions)

(KL, VL) = (4, 128), (8, 256), (16, 512)

FOR j  0 TO KL-1

i  j * 32

IF k1[j] OR *no writemask*

IF (EVEX.b == 1) AND (SRC2 *is memory*)

THEN

DEST[i+31:i]  (NOT(SRC1[i+31:i])) BITWISE AND SRC2[31:0]

ELSE

DEST[i+31:i]  (NOT(SRC1[i+31:i])) BITWISE AND SRC2[i+31:i]

FI;

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+31:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+31:i] = 0

FI;

ENDFOR

DEST[MAX_VL-1:VL]  0

VANDNPS (VEX.256 encoded version)

DEST[31:0]  (NOT(SRC1[31:0])) BITWISE AND SRC2[31:0]

DEST[63:32]  (NOT(SRC1[63:32])) BITWISE AND SRC2[63:32]

DEST[95:64]  (NOT(SRC1[95:64])) BITWISE AND SRC2[95:64]

DEST[127:96]  (NOT(SRC1[127:96])) BITWISE AND SRC2[127:96]

DEST[159:128]  (NOT(SRC1[159:128])) BITWISE AND SRC2[159:128]

DEST[191:160]  (NOT(SRC1[191:160])) BITWISE AND SRC2[191:160]

DEST[223:192]  (NOT(SRC1[223:192])) BITWISE AND SRC2[223:192]

DEST[255:224]  (NOT(SRC1[255:224])) BITWISE AND SRC2[255:224].

DEST[MAX_VL-1:256]  0

VANDNPS (VEX.128 encoded version)

DEST[31:0]  (NOT(SRC1[31:0])) BITWISE AND SRC2[31:0]

DEST[63:32]  (NOT(SRC1[63:32])) BITWISE AND SRC2[63:32]

DEST[95:64]  (NOT(SRC1[95:64])) BITWISE AND SRC2[95:64]

DEST[127:96]  (NOT(SRC1[127:96])) BITWISE AND SRC2[127:96]

DEST[MAX_VL-1:128]  0

ANDNPS (128-bit Legacy SSE version)

DEST[31:0]  (NOT(DEST[31:0])) BITWISE AND SRC[31:0]

DEST[63:32]  (NOT(DEST[63:32])) BITWISE AND SRC[63:32]

DEST[95:64]  (NOT(DEST[95:64])) BITWISE AND SRC[95:64]

DEST[127:96]  (NOT(DEST[127:96])) BITWISE AND SRC[127:96]

DEST[MAX_VL-1:128] (Unmodified)

ANDNPS—Bitwise Logical AND NOT of Packed Single Precision Floating-Point Values

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-75

Intel C/C++ Compiler Intrinsic Equivalent

VANDNPS __m512 _mm512_andnot_ps (__m512 a, __m512 b);

VANDNPS __m512 _mm512_mask_andnot_ps (__m512 s, __mmask16 k, __m512 a, __m512 b);

VANDNPS __m512 _mm512_maskz_andnot_ps (__mmask16 k, __m512 a, __m512 b);

VANDNPS __m256 _mm256_mask_andnot_ps (__m256 s, __mmask8 k, __m256 a, __m256 b);

VANDNPS __m256 _mm256_maskz_andnot_ps (__mmask8 k, __m256 a, __m256 b);

VANDNPS __m128 _mm_mask_andnot_ps (__m128 s, __mmask8 k, __m128 a, __m128 b);

VANDNPS __m128 _mm_maskz_andnot_ps (__mmask8 k, __m128 a, __m128 b);

VANDNPS __m256 _mm256_andnot_ps (__m256 a, __m256 b);

ANDNPS __m128 _mm_andnot_ps (__m128 a, __m128 b);

SIMD Floating-Point Exceptions

None

Other Exceptions

VEX-encoded instruction, see Exceptions Type 4.

EVEX-encoded instruction, see Exceptions Type E4.

ARPL—Adjust RPL Field of Segment Selector

INSTRUCTION SET REFERENCE, A-L

3-76 Vol. 2A

ARPL—Adjust RPL Field of Segment Selector

Instruction Operand Encoding

Description

Compares the RPL fields of two segment selectors. The first operand (the destination operand) contains one

segment selector and the second operand (source operand) contains the other. (The RPL field is located in bits 0

and 1 of each operand.) If the RPL field of the destination operand is less than the RPL field of the source operand,

the ZF flag is set and the RPL field of the destination operand is increased to match that of the source operand.

Otherwise, the ZF flag is cleared and no change is made to the destination operand. (The destination operand can

be a word register or a memory location; the source operand must be a word register.)

The ARPL instruction is provided for use by operating-system procedures (however, it can also be used by applica-

tions). It is generally used to adjust the RPL of a segment selector that has been passed to the operating system

by an application program to match the privilege level of the application program. Here the segment selector

passed to the operating system is placed in the destination operand and segment selector for the application

program’s code segment is placed in the source operand. (The RPL field in the source operand represents the priv-

ilege level of the application program.) Execution of the ARPL instruction then ensures that the RPL of the segment

selector received by the operating system is no lower (does not have a higher privilege) than the privilege level of

the application program (the segment selector for the application program’s code segment can be read from the

stack following a procedure call).

This instruction executes as described in compatibility mode and legacy mode. It is not encodable in 64-bit mode.

See “Checking Caller Access Privileges” in Chapter 3, “Protected-Mode Memory Management,” of the Intel® 64 and

IA-32 Architectures Software Developer’s Manual, Volume 3A, for more information about the use of this instruc-

tion.

Operation

IF 64-BIT MODE

THEN

See MOVSXD;

ELSE

IF DEST[RPL] < SRC[RPL]

THEN

ZF ← 1;

DEST[RPL] ← SRC[RPL];

ELSE

ZF ← 0;

FI;

Flags Affected

The ZF flag is set to 1 if the RPL field of the destination operand is less than that of the source operand; otherwise,

it is set to 0.

Opcode Instruction Op/

64-bit

Mode

Compat/

Leg Mode

Description

63 /rARPL r/m16, r16 NP N. E. Valid Adjust RPL of r/m16 to not less than RPL of

r16.

Op/En Operand 1 Operand 2 Operand 3 Operand 4

NP ModRM:r/m (w) ModRM:reg (r) NA NA

ARPL—Adjust RPL Field of Segment Selector

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-77

Protected Mode Exceptions

#GP(0) If the destination is located in a non-writable segment.

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.

If the DS, ES, FS, or GS register is used to access memory and it contains a NULL segment

selector.

#SS(0) If a memory operand effective address is outside the SS segment limit.

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the

current privilege level is 3.

#UD If the LOCK prefix is used.

Real-Address Mode Exceptions

#UD The ARPL instruction is not recognized in real-address mode.

If the LOCK prefix is used.

Virtual-8086 Mode Exceptions

#UD The ARPL instruction is not recognized in virtual-8086 mode.

If the LOCK prefix is used.

Compatibility Mode Exceptions

Same exceptions as in protected mode.

64-Bit Mode Exceptions

Not applicable.

BLENDPD — Blend Packed Double Precision Floating-Point Values

INSTRUCTION SET REFERENCE, A-L

3-78 Vol. 2A

BLENDPD — Blend Packed Double Precision Floating-Point Values

Instruction Operand Encoding

Description

Double-precision floating-point values from the second source operand (third operand) are conditionally merged

with values from the first source operand (second operand) and written to the destination operand (first operand).

The immediate bits [3:0] determine whether the corresponding double-precision floating-point value in the desti-

nation is copied from the second source or first source. If a bit in the mask, corresponding to a word, is ”1”, then

the double-precision floating-point value in the second source operand is copied, else the value in the first source

operand is copied.

128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-

nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding

YMM register destination are unmodified.

VEX.128 encoded version: the first source operand is an XMM register. The second source operand is an XMM

the corresponding YMM register destination are zeroed.

VEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM

Operation

BLENDPD (128-bit Legacy SSE version)

IF (IMM8[0] = 0)THEN DEST[63:0]  DEST[63:0]

ELSE DEST [63:0]  SRC[63:0] FI

IF (IMM8[1] = 0) THEN DEST[127:64]  DEST[127:64]

ELSE DEST [127:64]  SRC[127:64] FI

DEST[VLMAX-1:128] (Unmodified)

VBLENDPD (VEX.128 encoded version)

IF (IMM8[0] = 0)THEN DEST[63:0]  SRC1[63:0]

ELSE DEST [63:0]  SRC2[63:0] FI

IF (IMM8[1] = 0) THEN DEST[127:64]  SRC1[127:64]

ELSE DEST [127:64]  SRC2[127:64] FI

DEST[VLMAX-1:128]  0

Opcode/

Instruction

Op/

64/32-bit

Mode

CPUID

Feature

Flag

Description

66 0F 3A 0D /r ib

BLENDPD xmm1, xmm2/m128, imm8

RMI V/V SSE4_1 Select packed DP-FP values from xmm1 and

xmm2/m128 from mask specified in imm8

and store the values into xmm1.

VEX.NDS.128.66.0F3A.WIG 0D /r ib

VBLENDPD xmm1, xmm2, xmm3/m128, imm8

RVMI V/V AVX Select packed double-precision floating-point

Values from xmm2 and xmm3/m128 from

mask in imm8 and store the values in xmm1.

VEX.NDS.256.66.0F3A.WIG 0D /r ib

VBLENDPD ymm1, ymm2, ymm3/m256, imm8

RVMI V/V AVX Select packed double-precision floating-point

Values from ymm2 and ymm3/m256 from

mask in imm8 and store the values in ymm1.

Op/En Operand 1 Operand 2 Operand 3 Operand 4

RMI ModRM:reg (r, w) ModRM:r/m (r) imm8 NA

RVMI ModRM:reg (w) VEX.vvvv (r) ModRM:r/m (r) imm8[3:0]

BLENDPD — Blend Packed Double Precision Floating-Point Values

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-79

VBLENDPD (VEX.256 encoded version)

IF (IMM8[0] = 0)THEN DEST[63:0]  SRC1[63:0]

ELSE DEST [63:0]  SRC2[63:0] FI

IF (IMM8[1] = 0) THEN DEST[127:64]  SRC1[127:64]

ELSE DEST [127:64]  SRC2[127:64] FI

IF (IMM8[2] = 0) THEN DEST[191:128]  SRC1[191:128]

ELSE DEST [191:128]  SRC2[191:128] FI

IF (IMM8[3] = 0) THEN DEST[255:192]  SRC1[255:192]

ELSE DEST [255:192]  SRC2[255:192] FI

Intel C/C++ Compiler Intrinsic Equivalent

BLENDPD: __m128d _mm_blend_pd (__m128d v1, __m128d v2, const int mask);

VBLENDPD: __m256d _mm256_blend_pd (__m256d a, __m256d b, const int mask);

SIMD Floating-Point Exceptions

None

Other Exceptions

See Exceptions Type 4.

BEXTR — Bit Field Extract

INSTRUCTION SET REFERENCE, A-L

3-80 Vol. 2A

BEXTR — Bit Field Extract

Instruction Operand Encoding

Description

Extracts contiguous bits from the first source operand (the second operand) using an index value and length value

specified in the second source operand (the third operand). Bit 7:0 of the second source operand specifies the

starting bit position of bit extraction. A START value exceeding the operand size will not extract any bits from the

second source operand. Bit 15:8 of the second source operand specifies the maximum number of bits (LENGTH)

beginning at the START position to extract. Only bit positions up to (OperandSize -1) of the first source operand are

extracted. The extracted bits are written to the destination register, starting from the least significant bit. All higher

order bits in the destination operand (starting at bit position LENGTH) are zeroed. The destination register is

cleared if no bits are extracted.

This instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in

64-bit mode. In 64-bit mode operand size 64 requires VEX.W1. VEX.W1 is ignored in non-64-bit modes. An

attempt to execute this instruction with VEX.L not equal to 0 will cause #UD.

Operation

START ← SRC2[7:0];

LEN ← SRC2[15:8];

TEMP ← ZERO_EXTEND_TO_512 (SRC1 );

DEST ← ZERO_EXTEND(TEMP[START+LEN -1: START]);

ZF ← (DEST = 0);

Flags Affected

ZF is updated based on the result. AF, SF, and PF are undefined. All other flags are cleared.

Intel C/C++ Compiler Intrinsic Equivalent

BEXTR: unsigned __int32 _bextr_u32(unsigned __int32 src, unsigned __int32 start. unsigned __int32 len);

BEXTR: unsigned __int64 _bextr_u64(unsigned __int64 src, unsigned __int32 start. unsigned __int32 len);

SIMD Floating-Point Exceptions

None

Other Exceptions

See Section 2.5.1, “Exception Conditions for VEX-Encoded GPR Instructions”, Table 2-29; additionally

#UD If VEX.W = 1.

Opcode/Instruction Op/

64/32

-bit

Mode

CPUID

Feature

Flag

Description

VEX.NDS.LZ.0F38.W0 F7 /r

BEXTR r32a, r/m32, r32b

RMV V/V BMI1 Contiguous bitwise extract from r/m32 using r32b as control; store

result in r32a.

VEX.NDS.LZ.0F38.W1 F7 /r

BEXTR r64a, r/m64, r64b

RMV V/N.E. BMI1 Contiguous bitwise extract from r/m64 using r64b as control; store

result in r64a

Op/En Operand 1 Operand 2 Operand 3 Operand 4

RMV ModRM:reg (w) ModRM:r/m (r) VEX.vvvv (r) NA

BLENDPS — Blend Packed Single Precision Floating-Point Values

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-81

BLENDPS — Blend Packed Single Precision Floating-Point Values

Instruction Operand Encoding

Description

Packed single-precision floating-point values from the second source operand (third operand) are conditionally

merged with values from the first source operand (second operand) and written to the destination operand (first

operand). The immediate bits [7:0] determine whether the corresponding single precision floating-point value in

the destination is copied from the second source or first source. If a bit in the mask, corresponding to a word, is

“1”, then the single-precision floating-point value in the second source operand is copied, else the value in the first

source operand is copied.

128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti-

nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding

YMM register destination are unmodified.

VEX.128 encoded version: The first source operand an XMM register. The second source operand is an XMM register

or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the

corresponding YMM register destination are zeroed.

VEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM

Operation

BLENDPS (128-bit Legacy SSE version)

IF (IMM8[0] = 0) THEN DEST[31:0] DEST[31:0]

ELSE DEST [31:0]  SRC[31:0] FI

IF (IMM8[1] = 0) THEN DEST[63:32]  DEST[63:32]

ELSE DEST [63:32]  SRC[63:32] FI

IF (IMM8[2] = 0) THEN DEST[95:64]  DEST[95:64]

ELSE DEST [95:64]  SRC[95:64] FI

IF (IMM8[3] = 0) THEN DEST[127:96]  DEST[127:96]

ELSE DEST [127:96]  SRC[127:96] FI

DEST[VLMAX-1:128] (Unmodified)

Opcode/

Instruction

Op/

64/32-bit

Mode

CPUID

Feature

Flag

Description

66 0F 3A 0C /r ib

BLENDPS xmm1, xmm2/m128, imm8

RMI V/V SSE4_1 Select packed single precision floating-point

values from xmm1 and xmm2/m128 from

mask specified in imm8 and store the values

into xmm1.

VEX.NDS.128.66.0F3A.WIG 0C /r ib

VBLENDPS xmm1, xmm2, xmm3/m128, imm8

RVMI V/V AVX Select packed single-precision floating-point

values from xmm2 and xmm3/m128 from

mask in imm8 and store the values in xmm1.

VEX.NDS.256.66.0F3A.WIG 0C /r ib

VBLENDPS ymm1, ymm2, ymm3/m256, imm8

RVMI V/V AVX Select packed single-precision floating-point

values from ymm2 and ymm3/m256 from

mask in imm8 and store the values in ymm1.

Op/En Operand 1 Operand 2 Operand 3 Operand 4

RMI ModRM:reg (r, w) ModRM:r/m (r) imm8 NA

RVMI ModRM:reg (w) VEX.vvvv (r) ModRM:r/m (r) imm8

BLENDPS — Blend Packed Single Precision Floating-Point Values

INSTRUCTION SET REFERENCE, A-L

3-82 Vol. 2A

VBLENDPS (VEX.128 encoded version)

IF (IMM8[0] = 0) THEN DEST[31:0] SRC1[31:0]

ELSE DEST [31:0]  SRC2[31:0] FI

IF (IMM8[1] = 0) THEN DEST[63:32]  SRC1[63:32]

ELSE DEST [63:32]  SRC2[63:32] FI

IF (IMM8[2] = 0) THEN DEST[95:64]  SRC1[95:64]

ELSE DEST [95:64]  SRC2[95:64] FI

IF (IMM8[3] = 0) THEN DEST[127:96]  SRC1[127:96]

ELSE DEST [127:96]  SRC2[127:96] FI

DEST[VLMAX-1:128]  0

VBLENDPS (VEX.256 encoded version)

IF (IMM8[0] = 0) THEN DEST[31:0] SRC1[31:0]

ELSE DEST [31:0]  SRC2[31:0] FI

IF (IMM8[1] = 0) THEN DEST[63:32]  SRC1[63:32]

ELSE DEST [63:32]  SRC2[63:32] FI

IF (IMM8[2] = 0) THEN DEST[95:64]  SRC1[95:64]

ELSE DEST [95:64]  SRC2[95:64] FI

IF (IMM8[3] = 0) THEN DEST[127:96]  SRC1[127:96]

ELSE DEST [127:96]  SRC2[127:96] FI

IF (IMM8[4] = 0) THEN DEST[159:128]  SRC1[159:128]

ELSE DEST [159:128]  SRC2[159:128] FI

IF (IMM8[5] = 0) THEN DEST[191:160]  SRC1[191:160]

ELSE DEST [191:160]  SRC2[191:160] FI

IF (IMM8[6] = 0) THEN DEST[223:192]  SRC1[223:192]

ELSE DEST [223:192]  SRC2[223:192] FI

IF (IMM8[7] = 0) THEN DEST[255:224]  SRC1[255:224]

ELSE DEST [255:224]  SRC2[255:224] FI.

Intel C/C++ Compiler Intrinsic Equivalent

BLENDPS: __m128 _mm_blend_ps (__m128 v1, __m128 v2, const int mask);

VBLENDPS: __m256 _mm256_blend_ps (__m256 a, __m256 b, const int mask);

SIMD Floating-Point Exceptions

None

Other Exceptions

See Exceptions Type 4.

BLENDVPD — Variable Blend Packed Double Precision Floating-Point Values

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-83

BLENDVPD — Variable Blend Packed Double Precision Floating-Point Values

Instruction Operand Encoding

Description

Conditionally copy each quadword data element of double-precision floating-point value from the second source

operand and the first source operand depending on mask bits defined in the mask register operand. The mask bits

are the most significant bit in each quadword element of the mask register.

Each quadword element of the destination operand is copied from:

•the corresponding quadword element in the second source operand, if a mask bit is “1”; or

•the corresponding quadword element in the first source operand, if a mask bit is “0”

The register assignment of the implicit mask operand for BLENDVPD is defined to be the architectural register

XMM0.

128-bit Legacy SSE version: The first source operand and the destination operand is the same. Bits (VLMAX-1:128)

of the corresponding YMM destination register remain unchanged. The mask register operand is implicitly defined

to be the architectural register XMM0. An attempt to execute BLENDVPD with a VEX prefix will cause #UD.

VEX.128 encoded version: The first source operand and the destination operand are XMM registers. The second

source operand is an XMM register or 128-bit memory location. The mask operand is the third source register, and

encoded in bits[7:4] of the immediate byte(imm8). The bits[3:0] of imm8 are ignored. In 32-bit mode, imm8[7] is

ignored. The upper bits (VLMAX-1:128) of the corresponding YMM register (destination register) are zeroed.

VEX.W must be 0, otherwise, the instruction will #UD.

VEX.256 encoded version: The first source operand and destination operand are YMM registers. The second source

operand can be a YMM register or a 256-bit memory location. The mask operand is the third source register, and

encoded in bits[7:4] of the immediate byte(imm8). The bits[3:0] of imm8 are ignored. In 32-bit mode, imm8[7] is

ignored. VEX.W must be 0, otherwise, the instruction will #UD.

VBLENDVPD permits the mask to be any XMM or YMM register. In contrast, BLENDVPD treats XMM0 implicitly as the

mask and do not support non-destructive destination operation.

Opcode/

Instruction

Op/

64/32-bit

Mode

CPUID

Feature

Flag

Description

66 0F 38 15 /r

BLENDVPD xmm1, xmm2/m128 , <XMM0>

RM0 V/V SSE4_1 Select packed DP FP values from xmm1 and

xmm2 from mask specified in XMM0 and

store the values in xmm1.

VEX.NDS.128.66.0F3A.W0 4B /r /is4

VBLENDVPD xmm1, xmm2, xmm3/m128, xmm4

RVMR V/V AVX Conditionally copy double-precision floating-

point values from xmm2 or xmm3/m128 to

xmm1, based on mask bits in the mask

operand, xmm4.

VEX.NDS.256.66.0F3A.W0 4B /r /is4

VBLENDVPD ymm1, ymm2, ymm3/m256, ymm4

RVMR V/V AVX Conditionally copy double-precision floating-

point values from ymm2 or ymm3/m256 to

ymm1, based on mask bits in the mask

operand, ymm4.

Op/En Operand 1 Operand 2 Operand 3 Operand 4

RM0 ModRM:reg (r, w) ModRM:r/m (r) implicit XMM0 NA

RVMR ModRM:reg (w) VEX.vvvv (r) ModRM:r/m (r) imm8[7:4]

BLENDVPD — Variable Blend Packed Double Precision Floating-Point Values

INSTRUCTION SET REFERENCE, A-L

3-84 Vol. 2A

Operation

BLENDVPD (128-bit Legacy SSE version)

MASK  XMM0

IF (MASK[63] = 0) THEN DEST[63:0]  DEST[63:0]

ELSE DEST [63:0]  SRC[63:0] FI

IF (MASK[127] = 0) THEN DEST[127:64]  DEST[127:64]

ELSE DEST [127:64]  SRC[127:64] FI

DEST[VLMAX-1:128] (Unmodified)

VBLENDVPD (VEX.128 encoded version)

MASK  SRC3

IF (MASK[63] = 0) THEN DEST[63:0]  SRC1[63:0]

ELSE DEST [63:0]  SRC2[63:0] FI

IF (MASK[127] = 0) THEN DEST[127:64]  SRC1[127:64]

ELSE DEST [127:64]  SRC2[127:64] FI

DEST[VLMAX-1:128]  0

VBLENDVPD (VEX.256 encoded version)

MASK  SRC3

IF (MASK[63] = 0) THEN DEST[63:0]  SRC1[63:0]

ELSE DEST [63:0]  SRC2[63:0] FI

IF (MASK[127] = 0) THEN DEST[127:64]  SRC1[127:64]

ELSE DEST [127:64]  SRC2[127:64] FI

IF (MASK[191] = 0) THEN DEST[191:128]  SRC1[191:128]

ELSE DEST [191:128]  SRC2[191:128] FI

IF (MASK[255] = 0) THEN DEST[255:192]  SRC1[255:192]

ELSE DEST [255:192]  SRC2[255:192] FI

Intel C/C++ Compiler Intrinsic Equivalent

BLENDVPD: __m128d _mm_blendv_pd(__m128d v1, __m128d v2, __m128d v3);

VBLENDVPD: __m128 _mm_blendv_pd (__m128d a, __m128d b, __m128d mask);

VBLENDVPD: __m256 _mm256_blendv_pd (__m256d a, __m256d b, __m256d mask);

SIMD Floating-Point Exceptions

None

Other Exceptions

See Exceptions Type 4; additionally

#UD If VEX.W = 1.

BLENDVPS — Variable Blend Packed Single Precision Floating-Point Values

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-85

BLENDVPS — Variable Blend Packed Single Precision Floating-Point Values

Instruction Operand Encoding

Description

Conditionally copy each dword data element of single-precision floating-point value from the second source

operand and the first source operand depending on mask bits defined in the mask register operand. The mask bits

are the most significant bit in each dword element of the mask register.

Each quadword element of the destination operand is copied from:

•the corresponding dword element in the second source operand, if a mask bit is “1”; or

•the corresponding dword element in the first source operand, if a mask bit is “0”

The register assignment of the implicit mask operand for BLENDVPS is defined to be the architectural register

XMM0.

128-bit Legacy SSE version: The first source operand and the destination operand is the same. Bits (VLMAX-1:128)

of the corresponding YMM destination register remain unchanged. The mask register operand is implicitly defined

to be the architectural register XMM0. An attempt to execute BLENDVPS with a VEX prefix will cause #UD.

VEX.128 encoded version: The first source operand and the destination operand are XMM registers. The second

source operand is an XMM register or 128-bit memory location. The mask operand is the third source register, and

encoded in bits[7:4] of the immediate byte(imm8). The bits[3:0] of imm8 are ignored. In 32-bit mode, imm8[7] is

ignored. The upper bits (VLMAX-1:128) of the corresponding YMM register (destination register) are zeroed.

VEX.W must be 0, otherwise, the instruction will #UD.

VEX.256 encoded version: The first source operand and destination operand are YMM registers. The second source

operand can be a YMM register or a 256-bit memory location. The mask operand is the third source register, and

encoded in bits[7:4] of the immediate byte(imm8). The bits[3:0] of imm8 are ignored. In 32-bit mode, imm8[7] is

ignored. VEX.W must be 0, otherwise, the instruction will #UD.

VBLENDVPS permits the mask to be any XMM or YMM register. In contrast, BLENDVPS treats XMM0 implicitly as the

mask and do not support non-destructive destination operation.

Opcode/

Instruction

Op/

64/32-bit

Mode

CPUID

Feature

Flag

Description

66 0F 38 14 /r

BLENDVPS xmm1, xmm2/m128, <XMM0>

RM0 V/V SSE4_1 Select packed single precision floating-point

values from xmm1 and xmm2/m128 from

mask specified in XMM0 and store the values

into xmm1.

VEX.NDS.128.66.0F3A.W0 4A /r /is4

VBLENDVPS xmm1, xmm2, xmm3/m128, xmm4

RVMR V/V AVX Conditionally copy single-precision floating-

point values from xmm2 or xmm3/m128 to

xmm1, based on mask bits in the specified

mask operand, xmm4.

VEX.NDS.256.66.0F3A.W0 4A /r /is4

VBLENDVPS ymm1, ymm2, ymm3/m256, ymm4

RVMR V/V AVX Conditionally copy single-precision floating-

point values from ymm2 or ymm3/m256 to

ymm1, based on mask bits in the specified

mask register, ymm4.

Op/En Operand 1 Operand 2 Operand 3 Operand 4

RM0 ModRM:reg (r, w) ModRM:r/m (r) implicit XMM0 NA

RVMR ModRM:reg (w) VEX.vvvv (r) ModRM:r/m (r) imm8[7:4]

BLENDVPS — Variable Blend Packed Single Precision Floating-Point Values

INSTRUCTION SET REFERENCE, A-L

3-86 Vol. 2A

Operation

BLENDVPS (128-bit Legacy SSE version)

MASK  XMM0

IF (MASK[31] = 0) THEN DEST[31:0]  DEST[31:0]

ELSE DEST [31:0]  SRC[31:0] FI

IF (MASK[63] = 0) THEN DEST[63:32]  DEST[63:32]

ELSE DEST [63:32]  SRC[63:32] FI

IF (MASK[95] = 0) THEN DEST[95:64]  DEST[95:64]

ELSE DEST [95:64]  SRC[95:64] FI

IF (MASK[127] = 0) THEN DEST[127:96]  DEST[127:96]

ELSE DEST [127:96]  SRC[127:96] FI

DEST[VLMAX-1:128] (Unmodified)

VBLENDVPS (VEX.128 encoded version)

MASK  SRC3

IF (MASK[31] = 0) THEN DEST[31:0]  SRC1[31:0]

ELSE DEST [31:0]  SRC2[31:0] FI

IF (MASK[63] = 0) THEN DEST[63:32]  SRC1[63:32]

ELSE DEST [63:32]  SRC2[63:32] FI

IF (MASK[95] = 0) THEN DEST[95:64]  SRC1[95:64]

ELSE DEST [95:64]  SRC2[95:64] FI

IF (MASK[127] = 0) THEN DEST[127:96]  SRC1[127:96]

ELSE DEST [127:96]  SRC2[127:96] FI

DEST[VLMAX-1:128]  0

VBLENDVPS (VEX.256 encoded version)

MASK  SRC3

IF (MASK[31] = 0) THEN DEST[31:0]  SRC1[31:0]

ELSE DEST [31:0]  SRC2[31:0] FI

IF (MASK[63] = 0) THEN DEST[63:32]  SRC1[63:32]

ELSE DEST [63:32]  SRC2[63:32] FI

IF (MASK[95] = 0) THEN DEST[95:64]  SRC1[95:64]

ELSE DEST [95:64]  SRC2[95:64] FI

IF (MASK[127] = 0) THEN DEST[127:96]  SRC1[127:96]

ELSE DEST [127:96]  SRC2[127:96] FI

IF (MASK[159] = 0) THEN DEST[159:128]  SRC1[159:128]

ELSE DEST [159:128]  SRC2[159:128] FI

IF (MASK[191] = 0) THEN DEST[191:160]  SRC1[191:160]

ELSE DEST [191:160]  SRC2[191:160] FI

IF (MASK[223] = 0) THEN DEST[223:192]  SRC1[223:192]

ELSE DEST [223:192]  SRC2[223:192] FI

IF (MASK[255] = 0) THEN DEST[255:224]  SRC1[255:224]

ELSE DEST [255:224]  SRC2[255:224] FI

Intel C/C++ Compiler Intrinsic Equivalent

BLENDVPS: __m128 _mm_blendv_ps(__m128 v1, __m128 v2, __m128 v3);

VBLENDVPS: __m128 _mm_blendv_ps (__m128 a, __m128 b, __m128 mask);

VBLENDVPS: __m256 _mm256_blendv_ps (__m256 a, __m256 b, __m256 mask);

SIMD Floating-Point Exceptions

None

BLENDVPS — Variable Blend Packed Single Precision Floating-Point Values

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-87

Other Exceptions

See Exceptions Type 4; additionally

#UD If VEX.W = 1.

BLSI — Extract Lowest Set Isolated Bit

INSTRUCTION SET REFERENCE, A-L

3-88 Vol. 2A

BLSI — Extract Lowest Set Isolated Bit

Instruction Operand Encoding

Description

Extracts the lowest set bit from the source operand and set the corresponding bit in the destination register. All

other bits in the destination operand are zeroed. If no bits are set in the source operand, BLSI sets all the bits in

the destination to 0 and sets ZF and CF.

This instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in

64-bit mode. In 64-bit mode operand size 64 requires VEX.W1. VEX.W1 is ignored in non-64-bit modes. An

attempt to execute this instruction with VEX.L not equal to 0 will cause #UD.

Operation

temp ← (-SRC) bitwiseAND (SRC);

SF ← temp[OperandSize -1];

ZF ← (temp = 0);

IF SRC = 0

CF ← 0;

ELSE

CF ← 1;

DEST ← temp;

Flags Affected

ZF and SF are updated based on the result. CF is set if the source is not zero. OF flags are cleared. AF and PF

flags are undefined.

Intel C/C++ Compiler Intrinsic Equivalent

BLSI: unsigned __int32 _blsi_u32(unsigned __int32 src);

BLSI: unsigned __int64 _blsi_u64(unsigned __int64 src);

SIMD Floating-Point Exceptions

None

Other Exceptions

See Section 2.5.1, “Exception Conditions for VEX-Encoded GPR Instructions”, Table 2-29; additionally

#UD If VEX.W = 1.

Opcode/Instruction Op/

64/32

-bit

Mode

CPUID

Feature

Flag

Description

VEX.NDD.LZ.0F38.W0 F3 /3

BLSI r32, r/m32

VM V/V BMI1 Extract lowest set bit from r/m32 and set that bit in r32.

VEX.NDD.LZ.0F38.W1 F3 /3

BLSI r64, r/m64

VM V/N.E. BMI1 Extract lowest set bit from r/m64, and set that bit in r64.

Op/En Operand 1 Operand 2 Operand 3 Operand 4

VM VEX.vvvv (w) ModRM:r/m (r) NA NA

BLSMSK — Get Mask Up to Lowest Set Bit

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-89

BLSMSK — Get Mask Up to Lowest Set Bit

Instruction Operand Encoding

Description

Sets all the lower bits of the destination operand to “1” up to and including lowest set bit (=1) in the source

operand. If source operand is zero, BLSMSK sets all bits of the destination operand to 1 and also sets CF to 1.

This instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in

64-bit mode. In 64-bit mode operand size 64 requires VEX.W1. VEX.W1 is ignored in non-64-bit modes. An

attempt to execute this instruction with VEX.L not equal to 0 will cause #UD.

Operation

temp ← (SRC-1) XOR (SRC) ;

SF ← temp[OperandSize -1];

ZF ← 0;

IF SRC = 0

CF ← 1;

ELSE

CF ← 0;

DEST ← temp;

Flags Affected

SF is updated based on the result. CF is set if the source if zero. ZF and OF flags are cleared. AF and PF flag are

undefined.

Intel C/C++ Compiler Intrinsic Equivalent

BLSMSK: unsigned __int32 _blsmsk_u32(unsigned __int32 src);

BLSMSK: unsigned __int64 _blsmsk_u64(unsigned __int64 src);

SIMD Floating-Point Exceptions

None

Other Exceptions

See Section 2.5.1, “Exception Conditions for VEX-Encoded GPR Instructions”, Table 2-29; additionally

#UD If VEX.W = 1.

Opcode/Instruction Op/

64/32

-bit

Mode

CPUID

Feature

Flag

Description

VEX.NDD.LZ.0F38.W0 F3 /2

BLSMSK r32, r/m32

VM V/V BMI1 Set all lower bits in r32 to “1” starting from bit 0 to lowest set bit in

r/m32.

VEX.NDD.LZ.0F38.W1 F3 /2

BLSMSK r64, r/m64

VM V/N.E. BMI1 Set all lower bits in r64 to “1” starting from bit 0 to lowest set bit in

r/m64.

Op/En Operand 1 Operand 2 Operand 3 Operand 4

VM VEX.vvvv (w) ModRM:r/m (r) NA NA

BLSR — Reset Lowest Set Bit

INSTRUCTION SET REFERENCE, A-L

3-90 Vol. 2A

BLSR — Reset Lowest Set Bit

Instruction Operand Encoding

Description

Copies all bits from the source operand to the destination operand and resets (=0) the bit position in the destina-

tion operand that corresponds to the lowest set bit of the source operand. If the source operand is zero BLSR sets

CF.

This instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in

64-bit mode. In 64-bit mode operand size 64 requires VEX.W1. VEX.W1 is ignored in non-64-bit modes. An

attempt to execute this instruction with VEX.L not equal to 0 will cause #UD.

Operation

temp ← (SRC-1) bitwiseAND ( SRC );

SF ← temp[OperandSize -1];

ZF ← (temp = 0);

IF SRC = 0

CF ← 1;

ELSE

CF ← 0;

DEST ← temp;

Flags Affected

ZF and SF flags are updated based on the result. CF is set if the source is zero. OF flag is cleared. AF and PF flags

are undefined.

Intel C/C++ Compiler Intrinsic Equivalent

BLSR: unsigned __int32 _blsr_u32(unsigned __int32 src);

BLSR: unsigned __int64 _blsr_u64(unsigned __int64 src);

SIMD Floating-Point Exceptions

None

Other Exceptions

See Section 2.5.1, “Exception Conditions for VEX-Encoded GPR Instructions”, Table 2-29; additionally

#UD If VEX.W = 1.

Opcode/Instruction Op/

64/32

-bit

Mode

CPUID

Feature

Flag

Description

VEX.NDD.LZ.0F38.W0 F3 /1

BLSR r32, r/m32

VM V/V BMI1 Reset lowest set bit of r/m32, keep all other bits of r/m32 and write

result to r32.

VEX.NDD.LZ.0F38.W1 F3 /1

BLSR r64, r/m64

VM V/N.E. BMI1 Reset lowest set bit of r/m64, keep all other bits of r/m64 and write

result to r64.

Op/En Operand 1 Operand 2 Operand 3 Operand 4

VM VEX.vvvv (w) ModRM:r/m (r) NA NA

BNDCL—Check Lower Bound

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-91

BNDCL—Check Lower Bound

Instruction Operand Encoding

Description

Compare the address in the second operand with the lower bound in bnd. The second operand can be either a

and signal a #BR exception.

This instruction does not cause any memory access, and does not read or write any flags.

Operation

BNDCL BND, reg

IF reg < BND.LB Then

BNDSTATUS  01H;

#BR;

FI;

BNDCL BND, mem

TEMP  LEA(mem);

IF TEMP < BND.LB Then

BNDSTATUS  01H;

#BR;

FI;

Intel C/C++ Compiler Intrinsic Equivalent

BNDCL void _bnd_chk_ptr_lbounds(const void *q)

Flags Affected

None

Protected Mode Exceptions

#BR If lower bound check fails.

#UD If the LOCK prefix is used.

If ModRM.r/m encodes BND4-BND7 when Intel MPX is enabled.

If 67H prefix is not used and CS.D=0.

If 67H prefix is used and CS.D=1.

Opcode/

Instruction

Op/En 64/32

bit Mode

Support

CPUID

Feature

Flag

Description

F3 0F 1A /r

BNDCL bnd, r/m32

RM NE/V MPX Generate a #BR if the address in r/m32 is lower than the lower

bound in bnd.LB.

F3 0F 1A /r

BNDCL bnd, r/m64

RM V/NE MPX Generate a #BR if the address in r/m64 is lower than the lower

bound in bnd.LB.

Op/En Operand 1 Operand 2 Operand 3

RM ModRM:reg (w) ModRM:r/m (r) NA

BNDCL—Check Lower Bound

INSTRUCTION SET REFERENCE, A-L

3-92 Vol. 2A

Real-Address Mode Exceptions

#BR If lower bound check fails.

#UD If the LOCK prefix is used.

If ModRM.r/m encodes BND4-BND7 when Intel MPX is enabled.

If 16-bit addressing is used.

Virtual-8086 Mode Exceptions

#BR If lower bound check fails.

#UD If the LOCK prefix is used.

If ModRM.r/m encodes BND4-BND7 when Intel MPX is enabled.

If 16-bit addressing is used.

Compatibility Mode Exceptions

Same exceptions as in protected mode.

64-Bit Mode Exceptions

#UD If ModRM.r/m and REX encodes BND4-BND15 when Intel MPX is enabled.

Same exceptions as in protected mode.

BNDCU/BNDCN—Check Upper Bound

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-93

BNDCU/BNDCN—Check Upper Bound

Instruction Operand Encoding

Description

Compare the address in the second operand with the upper bound in bnd. The second operand can be either a

01H and signal a #BR exception.

BNDCU perform 1’s complement operation on the upper bound of bnd first before proceeding with address compar-

ison. BNDCN perform address comparison directly using the upper bound in bnd that is already reverted out of 1’s

complement form.

This instruction does not cause any memory access, and does not read or write any flags.

Effective address computation of m32/64 has identical behavior to LEA

Operation

BNDCU BND, reg

IF reg > NOT(BND.UB) Then

BNDSTATUS  01H;

#BR;

FI;

BNDCU BND, mem

TEMP  LEA(mem);

IF TEMP > NOT(BND.UB) Then

BNDSTATUS  01H;

#BR;

FI;

BNDCN BND, reg

IF reg > BND.UB Then

BNDSTATUS  01H;

#BR;

FI;

Opcode/

Instruction

Op/En 64/32

bit Mode

Support

CPUID

Feature

Flag

Description

F2 0F 1A /r

BNDCU bnd, r/m32

RM NE/V MPX Generate a #BR if the address in r/m32 is higher than the upper

bound in bnd.UB (bnb.UB in 1's complement form).

F2 0F 1A /r

BNDCU bnd, r/m64

RM V/NE MPX Generate a #BR if the address in r/m64 is higher than the upper

bound in bnd.UB (bnb.UB in 1's complement form).

F2 0F 1B /r

BNDCN bnd, r/m32

RM NE/V MPX Generate a #BR if the address in r/m32 is higher than the upper

bound in bnd.UB (bnb.UB not in 1's complement form).

F2 0F 1B /r

BNDCN bnd, r/m64

RM V/NE MPX Generate a #BR if the address in r/m64 is higher than the upper

bound in bnd.UB (bnb.UB not in 1's complement form).

Op/En Operand 1 Operand 2 Operand 3

RM ModRM:reg (w) ModRM:r/m (r) NA

BNDCU/BNDCN—Check Upper Bound

INSTRUCTION SET REFERENCE, A-L

3-94 Vol. 2A

BNDCN BND, mem

TEMP  LEA(mem);

IF TEMP > BND.UB Then

BNDSTATUS  01H;

#BR;

FI;

Intel C/C++ Compiler Intrinsic Equivalent

BNDCU .void _bnd_chk_ptr_ubounds(const void *q)

Flags Affected

None

Protected Mode Exceptions

#BR If upper bound check fails.

#UD If the LOCK prefix is used.

If ModRM.r/m encodes BND4-BND7 when Intel MPX is enabled.

If 67H prefix is not used and CS.D=0.

If 67H prefix is used and CS.D=1.

Real-Address Mode Exceptions

#BR If upper bound check fails.

#UD If the LOCK prefix is used.

If ModRM.r/m encodes BND4-BND7 when Intel MPX is enabled.

If 16-bit addressing is used.

Virtual-8086 Mode Exceptions

#BR If upper bound check fails.

#UD If the LOCK prefix is used.

If ModRM.r/m encodes BND4-BND7 when Intel MPX is enabled.

If 16-bit addressing is used.

Compatibility Mode Exceptions

Same exceptions as in protected mode.

64-Bit Mode Exceptions

#UD If ModRM.r/m and REX encodes BND4-BND15 when Intel MPX is enabled.

Same exceptions as in protected mode.

BNDLDX—Load Extended Bounds Using Address Translation

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-95

BNDLDX—Load Extended Bounds Using Address Translation

Instruction Operand Encoding

Description

BNDLDX uses the linear address constructed from the base register and displacement of the SIB-addressing form

of the memory operand (mib) to perform address translation to access a bound table entry and conditionally load

the bounds in the BTE to the destination. The destination register is updated with the bounds in the BTE, if the

content of the index register of mib matches the pointer value stored in the BTE.

If the pointer value comparison fails, the destination is updated with INIT bounds (lb = 0x0, ub = 0x0) (note: as

articulated earlier, the upper bound is represented using 1's complement, therefore, the 0x0 value of upper bound

allows for access to full memory).

This instruction does not cause memory access to the linear address of mib nor the effective address referenced by

the base, and does not read or write any flags.

Segment overrides apply to the linear address computation with the base of mib, and are used during address

translation to generate the address of the bound table entry. By default, the address of the BTE is assumed to be

linear address. There are no segmentation checks performed on the base of mib.

The base of mib will not be checked for canonical address violation as it does not access memory.

Any encoding of this instruction that does not specify base or index register will treat those registers as zero

(constant). The reg-reg form of this instruction will remain a NOP.

The scale field of the SIB byte has no effect on these instructions and is ignored.

The bound register may be partially updated on memory faults. The order in which memory operands are loaded is

implementation specific.

Operation

base  mib.SIB.base ? mib.SIB.base + Disp: 0;

ptr_value  mib.SIB.index ? mib.SIB.index : 0;

Outside 64-bit mode

A_BDE[31:0]  (Zero_extend32(base[31:12] « 2) + (BNDCFG[31:12] «12 );

A_BT[31:0]  LoadFrom(A_BDE );

IF A_BT[0] equal 0 Then

BNDSTATUS  A_BDE | 02H;

#BR;

FI;

A_BTE[31:0]  (Zero_extend32(base[11:2] « 4) + (A_BT[31:2] « 2 );

Temp_lb[31:0]  LoadFrom(A_BTE);

Temp_ub[31:0]  LoadFrom(A_BTE + 4);

Temp_ptr[31:0]  LoadFrom(A_BTE + 8);

IF Temp_ptr equal ptr_value Then

BND.LB  Temp_lb;

BND.UB  Temp_ub;

Opcode/

Instruction

Op/En 64/32

bit Mode

Support

CPUID

Feature

Flag

Description

0F 1A /r

BNDLDX bnd, mib

RM V/V MPX Load the bounds stored in a bound table entry (BTE) into bnd with

address translation using the base of mib and conditional on the

index of mib matching the pointer value in the BTE.

Op/En Operand 1 Operand 2 Operand 3

RM ModRM:reg (w) SIB.base (r): Address of pointer

SIB.index(r) NA

BNDLDX—Load Extended Bounds Using Address Translation

INSTRUCTION SET REFERENCE, A-L

3-96 Vol. 2A

ELSE

BND.LB  0;

BND.UB  0;

FI;

In 64-bit mode

A_BDE[63:0]  (Zero_extend64(base[47+MAWA:20] « 3) + (BNDCFG[63:20] «12 );1

A_BT[63:0]  LoadFrom(A_BDE);

IF A_BT[0] equal 0 Then

BNDSTATUS  A_BDE | 02H;

#BR;

FI;

A_BTE[63:0]  (Zero_extend64(base[19:3] « 5) + (A_BT[63:3] « 3 );

Temp_lb[63:0]  LoadFrom(A_BTE);

Temp_ub[63:0]  LoadFrom(A_BTE + 8);

Temp_ptr[63:0]  LoadFrom(A_BTE + 16);

IF Temp_ptr equal ptr_value Then

BND.LB  Temp_lb;

BND.UB  Temp_ub;

ELSE

BND.LB  0;

BND.UB  0;

FI;

Intel C/C++ Compiler Intrinsic Equivalent

BNDLDX: Generated by compiler as needed.

Flags Affected

None

Protected Mode Exceptions

#BR If the bound directory entry is invalid.

#UD If the LOCK prefix is used.

If ModRM.r/m encodes BND4-BND7 when Intel MPX is enabled.

If 67H prefix is not used and CS.D=0.

If 67H prefix is used and CS.D=1.

#GP(0) If a destination effective address of the Bound Table entry is outside the DS segment limit.

If DS register contains a NULL segment selector.

#PF(fault code) If a page fault occurs.

Real-Address Mode Exceptions

#UD If the LOCK prefix is used.

If ModRM.r/m encodes BND4-BND7 when Intel MPX is enabled.

If 16-bit addressing is used.

#GP(0) If a destination effective address of the Bound Table entry is outside the DS segment limit.

1. If CPL < 3, the supervisor MAWA (MAWAS) is used; this value is 0. If CPL = 3, the user MAWA (MAWAU) is used; this value is enumer-

ated in CPUID.(EAX=07H,ECX=0H):ECX.MAWAU[bits 21:17]. See Section 17.3.1 of Intel® 64 and IA-32 Architectures Software Devel-

oper’s Manual, Volume 1.

BNDLDX—Load Extended Bounds Using Address Translation

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-97

Virtual-8086 Mode Exceptions

#UD If the LOCK prefix is used.

If ModRM.r/m encodes BND4-BND7 when Intel MPX is enabled.

If 16-bit addressing is used.

#GP(0) If a destination effective address of the Bound Table entry is outside the DS segment limit.

#PF(fault code) If a page fault occurs.

Compatibility Mode Exceptions

Same exceptions as in protected mode.

64-Bit Mode Exceptions

#BR If the bound directory entry is invalid.

#UD If ModRM is RIP relative.

If the LOCK prefix is used.

If ModRM.r/m and REX encodes BND4-BND15 when Intel MPX is enabled.

#GP(0) If the memory address (A_BDE or A_BTE) is in a non-canonical form.

#PF(fault code) If a page fault occurs.

BNDMK—Make Bounds

INSTRUCTION SET REFERENCE, A-L

3-98 Vol. 2A

BNDMK—Make Bounds

Instruction Operand Encoding

Description

Makes bounds from the second operand and stores the lower and upper bounds in the bound register bnd. The

second operand must be a memory operand. The content of the base register from the memory operand is stored

in the lower bound bnd.LB. The 1's complement of the effective address of m32/m64 is stored in the upper bound

b.UB. Computation of m32/m64 has identical behavior to LEA.

This instruction does not cause any memory access, and does not read or write any flags.

If the instruction did not specify base register, the lower bound will be zero. The reg-reg form of this instruction

retains legacy behavior (NOP).

RIP relative instruction in 64-bit will #UD.

Operation

BND.LB  SRCMEM.base;

IF 64-bit mode Then

BND.UB  NOT(LEA.64_bits(SRCMEM));

ELSE

BND.UB  Zero_Extend.64_bits(NOT(LEA.32_bits(SRCMEM)));

FI;

Intel C/C++ Compiler Intrinsic Equivalent

BNDMKvoid * _bnd_set_ptr_bounds(const void * q, size_t size);

Flags Affected

None

Protected Mode Exceptions

#UD If ModRM is RIP relative.

If the LOCK prefix is used.

If ModRM.r/m encodes BND4-BND7 when Intel MPX is enabled.

If 67H prefix is not used and CS.D=0.

If 67H prefix is used and CS.D=1.

Real-Address Mode Exceptions

#UD If ModRM is RIP relative.

If the LOCK prefix is used.

If ModRM.r/m encodes BND4-BND7 when Intel MPX is enabled.

If 16-bit addressing is used.

Opcode/

Instruction

Op/En 64/32

bit Mode

Support

CPUID

Feature

Flag

Description

F3 0F 1B /r

BNDMK bnd, m32

RM NE/V MPX Make lower and upper bounds from m32 and store them in bnd.

F3 0F 1B /r

BNDMK bnd, m64

RM V/NE MPX Make lower and upper bounds from m64 and store them in bnd.

Op/En Operand 1 Operand 2 Operand 3

RM ModRM:reg (w) ModRM:r/m (r) NA

BNDMK—Make Bounds

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-99

Virtual-8086 Mode Exceptions

#UD If ModRM is RIP relative.

If the LOCK prefix is used.

If ModRM.r/m encodes BND4-BND7 when Intel MPX is enabled.

If 16-bit addressing is used.

Compatibility Mode Exceptions

Same exceptions as in protected mode.

64-Bit Mode Exceptions

#UD If ModRM.r/m and REX encodes BND4-BND15 when Intel MPX is enabled.

#SS(0) If the memory address referencing the SS segment is in a non-canonical form.

#GP(0) If the memory address is in a non-canonical form.

Same exceptions as in protected mode.

BNDMOV—Move Bounds

INSTRUCTION SET REFERENCE, A-L

3-100 Vol. 2A

BNDMOV—Move Bounds

Instruction Operand Encoding

Description

BNDMOV moves a pair of lower and upper bound values from the source operand (the second operand) to the

destination (the first operand). Each operation is 128-bit move. The exceptions are same as the MOV instruction.

The memory format for loading/store bounds in 64-bit mode is shown in Figure 3-5.

This instruction does not change flags.

Operation

BNDMOV register to register

DEST.LB  SRC.LB;

DEST.UB  SRC.UB;

Opcode/

Instruction

Op/En 64/32

bit Mode

Support

CPUID

Feature

Flag

Description

66 0F 1A /r

BNDMOV bnd1, bnd2/m64

RM NE/V MPX Move lower and upper bound from bnd2/m64 to bound register

bnd1.

66 0F 1A /r

BNDMOV bnd1, bnd2/m128

RM V/NE MPX Move lower and upper bound from bnd2/m128 to bound register

bnd1.

66 0F 1B /r

BNDMOV bnd1/m64, bnd2

MR NE/V MPX Move lower and upper bound from bnd2 to bnd1/m64.

66 0F 1B /r

BNDMOV bnd1/m128, bnd2

MR V/NE MPX Move lower and upper bound from bnd2 to bound register

bnd1/m128.

Op/En Operand 1 Operand 2 Operand 3

RM ModRM:reg (w) ModRM:r/m (r) NA

MR ModRM:r/m (w) ModRM:reg (r) NA

Figure 3-5. Memory Layout of BNDMOV to/from Memory

Upper Bound (UB) Lower Bound (LB)

16 Byte offset

BNDMOV to memory in 64-bit mode

Upper Bound (UB) Lower Bound (LB)

16 Byte offset

BNDMOV to memory in 32-bit mode

BNDMOV—Move Bounds

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-101

BNDMOV from memory

IF 64-bit mode THEN

DEST.LB  LOAD_QWORD(SRC);

DEST.UB  LOAD_QWORD(SRC+8);

ELSE

DEST.LB  LOAD_DWORD_ZERO_EXT(SRC);

DEST.UB  LOAD_DWORD_ZERO_EXT(SRC+4);

FI;

BNDMOV to memory

IF 64-bit mode THEN

DEST[63:0]  SRC.LB;

DEST[127:64]  SRC.UB;

ELSE

DEST[31:0]  SRC.LB;

DEST[63:32]  SRC.UB;

FI;

Intel C/C++ Compiler Intrinsic Equivalent

BNDMOV void * _bnd_copy_ptr_bounds(const void *q, const void *r)

Flags Affected

None

Protected Mode Exceptions

#UD If the LOCK prefix is used but the destination is not a memory operand.

If ModRM.r/m encodes BND4-BND7 when Intel MPX is enabled.

If 67H prefix is not used and CS.D=0.

If 67H prefix is used and CS.D=1.

#SS(0) If the memory operand effective address is outside the SS segment limit.

#GP(0) If the memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.

If the destination operand points to a non-writable segment

If the DS, ES, FS, or GS segment register contains a NULL segment selector.

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while CPL is 3.

#PF(fault code) If a page fault occurs.

Real-Address Mode Exceptions

#UD If the LOCK prefix is used but the destination is not a memory operand.

If ModRM.r/m encodes BND4-BND7 when Intel MPX is enabled.

If 16-bit addressing is used.

#GP(0) If the memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.

#SS If the memory operand effective address is outside the SS segment limit.

BNDMOV—Move Bounds

INSTRUCTION SET REFERENCE, A-L

3-102 Vol. 2A

Virtual-8086 Mode Exceptions

#UD If the LOCK prefix is used but the destination is not a memory operand.

If ModRM.r/m encodes BND4-BND7 when Intel MPX is enabled.

If 16-bit addressing is used.

#GP(0) If the memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.

#SS(0) If the memory operand effective address is outside the SS segment limit.

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while CPL is 3.

#PF(fault code) If a page fault occurs.

Compatibility Mode Exceptions

Same exceptions as in protected mode.

64-Bit Mode Exceptions

#UD If the LOCK prefix is used but the destination is not a memory operand.

If ModRM.r/m and REX encodes BND4-BND15 when Intel MPX is enabled.

#SS(0) If the memory address referencing the SS segment is in a non-canonical form.

#GP(0) If the memory address is in a non-canonical form.

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while CPL is 3.

#PF(fault code) If a page fault occurs.

BNDSTX—Store Extended Bounds Using Address Translation

INSTRUCTION SET REFERENCE, A-L

Vol. 2A 3-103

BNDSTX—Store Extended Bounds Using Address Translation

Instruction Operand Encoding

Description

BNDSTX uses the linear address constructed from the displacement and base register of the SIB-addressing form

of the memory operand (mib) to perform address translation to store to a bound table entry. The bounds in the

source operand bnd are written to the lower and upper bounds in the BTE. The content of the index register of mib

is written to the pointer value field in the BTE.