Intel® 64 And IA 32 Architectures Software Developer’s Manual Volume 2C: Instruction Set Reference, V Z 326018 Sdm Vol 2c System Programming Guide P1

Intel%202018-11%20%5BIntel%2064%20and%20IA-32%20Architectures%20Software%20Developer%E2%80%99s%20Manual%20Vol.2C%20-%20Instructi

326018-sdm-vol-2c_system_programming_guide_p1

Inte_instruction_manual_V-Z

User Manual:

Open the PDF directly: View PDF .
Page Count: 612 [warning: Documents this large are best viewed by clicking the View PDF Link!]

Chapter 5 Instruction Set Reference, V-Z
- 5.1 Ternary Bit Vector Logic Table
- 5.2 Instructions (V-Z)

Intel® 64 and IA-32 Architectures

Software Developer’s Manual

Volume 2C:

Instruction Set Reference, V-Z

NOTE: The Intel® 64 and IA-32 Architectures Software Developer's Manual consists of ten volumes:

Basic Architecture, Order Number 253665; Instruction Set Reference A-L, Order Number 253666;

Instruction Set Reference M-U, Order Number 253667; Instruction Set Reference V-Z, Order Number

326018; Instruction Set Reference, Order Number 334569; System Programming Guide, Part 1, Order

Number 253668; System Programming Guide, Part 2, Order Number 253669; System Programming

Guide, Part 3, Order Number 326019; System Programming Guide, Part 4, Order Number 332831;

Model-Specific Registers, Order Number 335592. Refer to all ten volumes when evaluating your design

needs.

Order Number: 326018-068US

November 2018

Intel technologies features and benefits depend on system configuration and may require enabled hardware, software, or service activation. Learn

more at intel.com, or from the OEM or retailer.

No computer system can be absolutely secure. Intel does not assume any liability for lost or stolen data or systems or any damages resulting

from such losses.

You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products

described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject

matter disclosed herein.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifica-

tions. Current characterized errata are available on request.

This document contains information on products, services and/or processes in development. All information provided here is subject to change

without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps

Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-

800-548-4725, or by visiting http://www.intel.com/design/literature.htm.

Intel, the Intel logo, Intel Atom, Intel Core, Intel SpeedStep, MMX, Pentium, VTune, and Xeon are trademarks of Intel Corporation in the U.S.

and/or other countries.

*Other names and brands may be claimed as the property of others.

Vol. 2C 5-1

CHAPTER 5

INSTRUCTION SET REFERENCE, V-Z

5.1 TERNARY BIT VECTOR LOGIC TABLE

VPTERNLOGD/VPTERNLOGQ instructions operate on dword/qword elements and take three bit vectors of the

respective input data elements to form a set of 32/64 indices, where each 3-bit value provides an index into an 8-

bit lookup table represented by the imm8 byte of the instruction. The 256 possible values of the imm8 byte is

constructed as a 16x16 boolean logic table. The 16 rows of the table uses the lower 4 bits of imm8 as row index.

The 16 columns are referenced by imm8[7:4]. The 16 columns of the table are present in two halves, with 8

columns shown in Table 5-1 for the column index value between 0:7, followed by Table 5-2 showing the 8 columns

corresponding to column index 8:15. This section presents the two-halves of the 256-entry table using a short-

hand notation representing simple or compound boolean logic expressions with three input bit source data.

The three input bit source data will be denoted with the capital letters: A, B, C; where A represents a bit from the

first source operand (also the destination operand), B and C represent a bit from the 2nd and 3rd source operands.

Each map entry takes the form of a logic expression consisting of one of more component expressions. Each

component expression consists of either a unary or binary boolean operator and associated operands. Each binary

boolean operator is expressed in lowercase letters, and operands concatenated after the logic operator. The unary

operator ‘not’ is expressed using ‘!’. Additionally, the conditional expression “A?B:C” expresses a result returning B

if A is set, returning C otherwise.

A binary boolean operator is followed by two operands, e.g. andAB. For a compound binary expression that contain

commutative components and comprising the same logic operator, the 2nd logic operator is omitted and three

operands can be concatenated in sequence, e.g. andABC. When the 2nd operand of the first binary boolean expres-

sion comes from the result of another boolean expression, the 2nd boolean expression is concatenated after the

uppercase operand of the first logic expression, e.g. norBnandAC. When the result is independent of an operand,

that operand is omitted in the logic expression, e.g. zeros or norCB.

The 3-input expression “majorABC” returns 0 if two or more input bits are 0, returns 1 if two or more input bits are

1. The 3-input expression “minorABC” returns 1 if two or more input bits are 0, returns 0 if two or more input bits

are 1.

The building-block bit logic functions used in Table 5-1 and Table 5-2 include;

•Constants: TRUE (1), FALSE (0);

•Unary function: Not (!);

•Binary functions: and, nand, or, nor, xor, xnor;

•Conditional function: Select (?:);

•Tertiary functions: major, minor.

INSTRUCTION SET REFERENCE, V-Z

5-2 Vol. 2C

Table 5-2 shows the half of 256-entry map corresponding to column index values 8:15.

Table 5-1. Low 8 columns of the 16x16 Map of VPTERNLOG Boolean Logic Operations

Imm [7:4]

[3:0] 0H 1H 2H 3H 4H 5H 6H 7H

00H FALSE andAnorBC norBnandAC andA!B norCnandBA andA!C andAxorBC andAnandBC

01H norABC norCB norBxorAC A?!B:norBC norCxorBA A?!C:norBC A?xorBC:norB

A?nandBC:no

rBC

02H andCnorBA norBxnorAC andC!B norBnorAC C?norBA:and

C?norBA:A C?!B:andBA C?!B:A

03H norBA norBandAC C?!B:norBA !B C?norBA:xnor

A?!C:!B A?xorBC:!B A?nandBC:!B

04H andBnorAC norCxnorBA B?norAC:and

B?norAC:A andB!C norCnorBA B?!C:andAC B?!C:A

05H norCA norCandBA B?norAC:xnor

A?!B:!C B?!C:norAC !C A?xorBC:!C A?nandBC:!C

06H norAxnorBC A?norBC:xorB

B?norAC:C xorBorAC C?norBA:B xorCorBA xorCB B?!C:orAC

07H norAandBC minorABC C?!B:!A nandBorAC B?!C:!A nandCorBA A?xorBC:nan

dBC

nandCB

08H norAnandBC A?norBC:and

andCxorBA A?!B:andBC andBxorAC A?!C:andBC A?xorBC:and

xorAandBC

09H norAxorBC A?norBC:xnor

C?xorBA:norB

A?!B:xnorBC B?xorAC:norA

A?!C:xnorBC xnorABC A?nandBC:xn

orBC

0AH andC!A A?norBC:C andCnandBA A?!B:C C?!A:andBA xorCA xorCandBA A?nandBC:C

0BH C?!A:norBA C?!A:!B C?nandBA:no

rBA

C?nandBA:!B B?xorAC:!A B?xorAC:nan

dAC

C?nandBA:xn

orBA

nandBxnorAC

0CH andB!A A?norBC:B B?!A:andAC xorBA andBnandAC A?!C:B xorBandAC A?nandBC:B

0DH B?!A:norAC B?!A:!C B?!A:xnorAC C?xorBA:nan

dBA

B?nandAC:no

rAC

B?nandAC:!C B?nandAC:xn

orAC

nandCxnorBA

0EH norAnorBC xorAorBC B?!A:C A?!B:orBC C?!A:B A?!C:orBC B?nandAC:C A?nandBC:or

0FH !A nandAorBC C?nandBA:!A nandBA B?nandAC:!A nandCA nandAxnorBC nandABC

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-3

Table 5-1 and Table 5-2 translate each of the possible value of the imm8 byte to a Boolean expression. These tables

can also be used by software to translate Boolean expressions to numerical constants to form the imm8 value

needed to construct the VPTERNLOG syntax. There is a unique set of three byte constants (F0H, CCH, AAH) that

can be used for this purpose as input operands in conjunction with the Boolean expressions defined in those tables.

The reverse mapping can be expressed as:

Result_imm8 = Table_Lookup_Entry( 0F0H, 0CCH, 0AAH)

Table_Lookup_Entry is the Boolean expression defined in Table 5-1 and Table 5-2.

Table 5-2. Low 8 columns of the 16x16 Map of VPTERNLOG Boolean Logic Operations

Imm [7:4]

[3:0] 08H 09H 0AH 0BH 0CH 0DH 0EH 0FH

00H andABC andAxnorBC andCA B?andAC:A andBA C?andBA:A andAorBC A

01H A?andBC:nor

B?andAC:!C A?C:norBC C?A:!B A?B:norBC B?A:!C xnorAorBC orAnorBC

02H andCxnorBA B?andAC:xor

B?andAC:C B?andAC:orA

C?xnorBA:an

dBA

B?A:xorAC B?A:C B?A:orAC

03H A?andBC:!B xnorBandAC A?C:!B nandBnandA

xnorBA B?A:nandAC A?orBC:!B orA!B

04H andBxnorAC C?andBA:xor

B?xnorAC:an

dAC

B?xnorAC:A C?andBA:B C?andBA:orB

C?A:B C?A:orBA

05H A?andBC:!C xnorCandBA xnorCA C?A:nandBA A?B:!C nandCnandB

A?orBC:!C orA!C

06H A?andBC:xor

xorABC A?C:xorBC B?xnorAC:orA

A?B:xorBC C?xnorBA:orB

A?orBC:xorBC orAxorBC

07H xnorAandBC A?xnorBC:na

ndBC

A?C:nandBC nandBxorAC A?B:nandBC nandCxorBA A?orBCnandB

orAnandBC

08H andCB A?xnorBC:an

dBC

andCorAB B?C:A andBorAC C?B:A majorABC orAandBC

09H B?C:norAC xnorCB xnorCorBA C?orBA:!B xnorBorAC B?orAC:!C A?orBC:xnorB

orAxnorBC

0AH A?andBC:C A?xnorBC:C C B?C:orAC A?B:C B?orAC:xorAC orCandBA orCA

0BH B?C:!A B?C:nandAC orCnorBA orC!B B?orAC:!A B?orAC:nand

orCxnorBA nandBnorAC

0CH A?andBC:B A?xnorBC:B A?C:B C?orBA:xorBA B C?B:orBA orBandAC orBA

0DH C?B!A C?B:nandBA C?orBA:!A C?orBA:nand

orBnorAC orB!C orBxnorAC nandCnorBA

0EH A?andBC:orB

A?xnorBC:orB

A?C:orBC orCxorBA A?B:orBC orBxorAC orCB orABC

0FH nandAnandB

nandAxorBC orC!A orCnandBA orB!A orBnandAC nandAnorBC TRUE

INSTRUCTION SET REFERENCE, V-Z

5-4 Vol. 2C

5.2 INSTRUCTIONS (V-Z)

Chapter 5 continues an alphabetical discussion of Intel® 64 and IA-32 instructions (V-Z). See also: Chapter 3,

“Instruction Set Reference, A-L,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume

2A, and Chapter 4, “Instruction Set Reference, M-U‚” in the Intel® 64 and IA-32 Architectures Software Devel-

oper’s Manual, Volume 2B.

VALIGND/VALIGNQ—Align Doubleword/Quadword Vectors

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-5

VALIGND/VALIGNQ—Align Doubleword/Quadword Vectors

Instruction Operand Encoding

Description

Concatenates and shifts right doubleword/quadword elements of the first source operand (the second operand)

and the second source operand (the third operand) into a 1024/512/256-bit intermediate vector. The low

512/256/128-bit of the intermediate vector is written to the destination operand (the first operand) using the

writemask k1. The destination and first source operands are ZMM/YMM/XMM registers. The second source operand

can be a ZMM/YMM/XMM register, a 512/256/128-bit memory location or a 512/256/128-bit vector broadcasted

from a 32/64-bit memory location.

This instruction is writemasked, so only those elements with the corresponding bit set in vector mask register k1

are computed and stored into zmm1. Elements in zmm1 with the corresponding bit clear in k1 retain their previous

values (merging-masking) or are set to 0 (zeroing-masking).

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

EVEX.128.66.0F3A.W0 03 /r ib

VALIGND xmm1 {k1}{z}, xmm2,

xmm3/m128/m32bcst, imm8

AV/V AVX512VL

AVX512F

Shift right and merge vectors xmm2 and

xmm3/m128/m32bcst with double-word granularity

using imm8 as number of elements to shift, and store the

final result in xmm1, under writemask.

EVEX.128.66.0F3A.W1 03 /r ib

VALIGNQ xmm1 {k1}{z}, xmm2,

xmm3/m128/m64bcst, imm8

AV/V AVX512VL

AVX512F

Shift right and merge vectors xmm2 and

xmm3/m128/m64bcst with quad-word granularity using

imm8 as number of elements to shift, and store the final

result in xmm1, under writemask.

EVEX.256.66.0F3A.W0 03 /r ib

VALIGND ymm1 {k1}{z}, ymm2,

ymm3/m256/m32bcst, imm8

AV/V AVX512VL

AVX512F

Shift right and merge vectors ymm2 and

ymm3/m256/m32bcst with double-word granularity

using imm8 as number of elements to shift, and store the

final result in ymm1, under writemask.

EVEX.256.66.0F3A.W1 03 /r ib

VALIGNQ ymm1 {k1}{z}, ymm2,

ymm3/m256/m64bcst, imm8

AV/V AVX512VL

AVX512F

Shift right and merge vectors ymm2 and

ymm3/m256/m64bcst with quad-word granularity using

imm8 as number of elements to shift, and store the final

result in ymm1, under writemask.

EVEX.512.66.0F3A.W0 03 /r ib

VALIGND zmm1 {k1}{z}, zmm2,

zmm3/m512/m32bcst, imm8

A V/V AVX512F Shift right and merge vectors zmm2 and

zmm3/m512/m32bcst with double-word granularity

using imm8 as number of elements to shift, and store the

final result in zmm1, under writemask.

EVEX.512.66.0F3A.W1 03 /r ib

VALIGNQ zmm1 {k1}{z}, zmm2,

zmm3/m512/m64bcst, imm8

A V/V AVX512F Shift right and merge vectors zmm2 and

zmm3/m512/m64bcst with quad-word granularity using

imm8 as number of elements to shift, and store the final

result in zmm1, under writemask.

Op/En Tuple Type Operand 1 Operand 2 Operand 3 Operand 4

A Full ModRM:reg (w) EVEX.vvvv ModRM:r/m (r) NA

VALIGND/VALIGNQ—Align Doubleword/Quadword Vectors

INSTRUCTION SET REFERENCE, V-Z

5-6 Vol. 2C

Operation

VALIGND (EVEX encoded versions)

(KL, VL) = (4, 128), (8, 256), (16, 512)

IF (SRC2 *is memory*) (AND EVEX.b = 1)

THEN

FOR j  0 TO KL-1

i j * 32

src[i+31:i]  SRC2[31:0]

ENDFOR;

ELSE src  SRC2

; Concatenate sources

tmp[VL-1:0]  src[VL-1:0]

tmp[2VL-1:VL]  SRC1[VL-1:0]

; Shift right doubleword elements

IF VL = 128

THEN SHIFT = imm8[1:0]

ELSE

IF VL = 256

THEN SHIFT = imm8[2:0]

ELSE SHIFT = imm8[3:0]

FI;

tmp[2VL-1:0]  tmp[2VL-1:0] >> (32*SHIFT)

; Apply writemask

FOR j  0 TO KL-1

i j * 32

IF k1[j] OR *no writemask*

THEN DEST[i+31:i]  tmp[i+31:i]

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+31:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+31:i]  0

FI;

ENDFOR;

DEST[MAXVL-1:VL]  0

VALIGND/VALIGNQ—Align Doubleword/Quadword Vectors

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-7

VALIGNQ (EVEX encoded versions)

(KL, VL) = (2, 128), (4, 256),(8, 512)

IF (SRC2 *is memory*) (AND EVEX.b = 1)

THEN

FOR j  0 TO KL-1

i j * 64

src[i+63:i]  SRC2[63:0]

ENDFOR;

ELSE src  SRC2

; Concatenate sources

tmp[VL-1:0]  src[VL-1:0]

tmp[2VL-1:VL]  SRC1[VL-1:0]

; Shift right quadword elements

IF VL = 128

THEN SHIFT = imm8[0]

ELSE

IF VL = 256

THEN SHIFT = imm8[1:0]

ELSE SHIFT = imm8[2:0]

FI;

tmp[2VL-1:0]  tmp[2VL-1:0] >> (64*SHIFT)

; Apply writemask

FOR j  0 TO KL-1

i j * 64

IF k1[j] OR *no writemask*

THEN DEST[i+63:i]  tmp[i+63:i]

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+63:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+63:i]  0

FI;

ENDFOR;

DEST[MAXVL-1:VL]  0

VALIGND/VALIGNQ—Align Doubleword/Quadword Vectors

INSTRUCTION SET REFERENCE, V-Z

5-8 Vol. 2C

Intel C/C++ Compiler Intrinsic Equivalent

VALIGND __m512i _mm512_alignr_epi32( __m512i a, __m512i b, int cnt);

VALIGND __m512i _mm512_mask_alignr_epi32(__m512i s, __mmask16 k, __m512i a, __m512i b, int cnt);

VALIGND __m512i _mm512_maskz_alignr_epi32( __mmask16 k, __m512i a, __m512i b, int cnt);

VALIGND __m256i _mm256_mask_alignr_epi32(__m256i s, __mmask8 k, __m256i a, __m256i b, int cnt);

VALIGND __m256i _mm256_maskz_alignr_epi32( __mmask8 k, __m256i a, __m256i b, int cnt);

VALIGND __m128i _mm_mask_alignr_epi32(__m128i s, __mmask8 k, __m128i a, __m128i b, int cnt);

VALIGND __m128i _mm_maskz_alignr_epi32( __mmask8 k, __m128i a, __m128i b, int cnt);

VALIGNQ __m512i _mm512_alignr_epi64( __m512i a, __m512i b, int cnt);

VALIGNQ __m512i _mm512_mask_alignr_epi64(__m512i s, __mmask8 k, __m512i a, __m512i b, int cnt);

VALIGNQ __m512i _mm512_maskz_alignr_epi64( __mmask8 k, __m512i a, __m512i b, int cnt);

VALIGNQ __m256i _mm256_mask_alignr_epi64(__m256i s, __mmask8 k, __m256i a, __m256i b, int cnt);

VALIGNQ __m256i _mm256_maskz_alignr_epi64( __mmask8 k, __m256i a, __m256i b, int cnt);

VALIGNQ __m128i _mm_mask_alignr_epi64(__m128i s, __mmask8 k, __m128i a, __m128i b, int cnt);

VALIGNQ __m128i _mm_maskz_alignr_epi64( __mmask8 k, __m128i a, __m128i b, int cnt);

Exceptions

See Exceptions Type E4NF.

VBLENDMPD/VBLENDMPS—Blend Float64/Float32 Vectors Using an OpMask Control

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-9

VBLENDMPD/VBLENDMPS—Blend Float64/Float32 Vectors Using an OpMask Control

Instruction Operand Encoding

Description

Performs an element-by-element blending between float64/float32 elements in the first source operand (the

second operand) with the elements in the second source operand (the third operand) using an opmask register as

select control. The blended result is written to the destination register.

The destination and first source operands are ZMM/YMM/XMM registers. The second source operand can be a

ZMM/YMM/XMM register, a 512/256/128-bit memory location or a 512/256/128-bit vector broadcasted from a 64-

bit memory location.

The opmask register is not used as a writemask for this instruction. Instead, the mask is used as an element

selector: every element of the destination is conditionally selected between first source or second source using the

value of the related mask bit (0 for first source operand, 1 for second source operand).

If EVEX.z is set, the elements with corresponding mask bit value of 0 in the destination operand are zeroed.

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

EVEX.128.66.0F38.W1 65 /r

VBLENDMPD xmm1 {k1}{z},

xmm2, xmm3/m128/m64bcst

AV/V AVX512VL

AVX512F

Blend double-precision vector xmm2 and double-precision

vector xmm3/m128/m64bcst and store the result in xmm1,

under control mask.

EVEX.256.66.0F38.W1 65 /r

VBLENDMPD ymm1 {k1}{z},

ymm2, ymm3/m256/m64bcst

AV/V AVX512VL

AVX512F

Blend double-precision vector ymm2 and double-precision

vector ymm3/m256/m64bcst and store the result in ymm1,

under control mask.

EVEX.512.66.0F38.W1 65 /r

VBLENDMPD zmm1 {k1}{z},

zmm2, zmm3/m512/m64bcst

A V/V AVX512F Blend double-precision vector zmm2 and double-precision

vector zmm3/m512/m64bcst and store the result in zmm1,

under control mask.

EVEX.128.66.0F38.W0 65 /r

VBLENDMPS xmm1 {k1}{z},

xmm2, xmm3/m128/m32bcst

AV/V AVX512VL

AVX512F

Blend single-precision vector xmm2 and single-precision

vector xmm3/m128/m32bcst and store the result in xmm1,

under control mask.

EVEX.256.66.0F38.W0 65 /r

VBLENDMPS ymm1 {k1}{z},

ymm2, ymm3/m256/m32bcst

AV/V AVX512VL

AVX512F

Blend single-precision vector ymm2 and single-precision

vector ymm3/m256/m32bcst and store the result in ymm1,

under control mask.

EVEX.512.66.0F38.W0 65 /r

VBLENDMPS zmm1 {k1}{z},

zmm2, zmm3/m512/m32bcst

A V/V AVX512F Blend single-precision vector zmm2 and single-precision

vector zmm3/m512/m32bcst using k1 as select control and

store the result in zmm1.

Op/En Tuple Type Operand 1 Operand 2 Operand 3 Operand 4

A Full ModRM:reg (w) EVEX.vvvv ModRM:r/m (r) NA

VBLENDMPD/VBLENDMPS—Blend Float64/Float32 Vectors Using an OpMask Control

INSTRUCTION SET REFERENCE, V-Z

5-10 Vol. 2C

Operation

VBLENDMPD (EVEX encoded versions)

(KL, VL) = (2, 128), (4, 256), (8, 512)

FOR j  0 TO KL-1

i  j * 64

IF k1[j] OR *no controlmask*

THEN

IF (EVEX.b = 1) AND (SRC2 *is memory*)

THEN

DEST[i+63:i]  SRC2[63:0]

ELSE

DEST[i+63:i]  SRC2[i+63:i]

FI;

ELSE

IF *merging-masking* ; merging-masking

THEN DEST[i+63:i]  SRC1[i+63:i]

ELSE ; zeroing-masking

DEST[i+63:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL]  0

VBLENDMPS (EVEX encoded versions)

(KL, VL) = (4, 128), (8, 256), (16, 512)

FOR j  0 TO KL-1

i  j * 32

IF k1[j] OR *no controlmask*

THEN

IF (EVEX.b = 1) AND (SRC2 *is memory*)

THEN

DEST[i+31:i]  SRC2[31:0]

ELSE

DEST[i+31:i]  SRC2[i+31:i]

FI;

ELSE

IF *merging-masking* ; merging-masking

THEN DEST[i+31:i]  SRC1[i+31:i]

ELSE ; zeroing-masking

DEST[i+31:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL]  0

VBLENDMPD/VBLENDMPS—Blend Float64/Float32 Vectors Using an OpMask Control

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-11

Intel C/C++ Compiler Intrinsic Equivalent

VBLENDMPD __m512d _mm512_mask_blend_pd(__mmask8 k, __m512d a, __m512d b);

VBLENDMPD __m256d _mm256_mask_blend_pd(__mmask8 k, __m256d a, __m256d b);

VBLENDMPD __m128d _mm_mask_blend_pd(__mmask8 k, __m128d a, __m128d b);

VBLENDMPS __m512 _mm512_mask_blend_ps(__mmask16 k, __m512 a, __m512 b);

VBLENDMPS __m256 _mm256_mask_blend_ps(__mmask8 k, __m256 a, __m256 b);

VBLENDMPS __m128 _mm_mask_blend_ps(__mmask8 k, __m128 a, __m128 b);

SIMD Floating-Point Exceptions

None

Other Exceptions

See Exceptions Type E4.

VBROADCAST—Load with Broadcast Floating-Point Data

INSTRUCTION SET REFERENCE, V-Z

5-12 Vol. 2C

VBROADCAST—Load with Broadcast Floating-Point Data

Opcode/

Instruction

Op /

64/32

bit

Mode

Support

CPUID

Feature

Flag

Description

VEX.128.66.0F38.W0 18 /r

VBROADCASTSS xmm1, m32

A V/V AVX Broadcast single-precision floating-point element in

mem to four locations in xmm1.

VEX.256.66.0F38.W0 18 /r

VBROADCASTSS ymm1, m32

A V/V AVX Broadcast single-precision floating-point element in

mem to eight locations in ymm1.

VEX.256.66.0F38.W0 19 /r

VBROADCASTSD ymm1, m64

A V/V AVX Broadcast double-precision floating-point element in

mem to four locations in ymm1.

VEX.256.66.0F38.W0 1A /r

VBROADCASTF128 ymm1, m128

A V/V AVX Broadcast 128 bits of floating-point data in mem to

low and high 128-bits in ymm1.

VEX.128.66.0F38.W0 18/r

VBROADCASTSS xmm1, xmm2

A V/V AVX2 Broadcast the low single-precision floating-point

element in the source operand to four locations in

xmm1.

VEX.256.66.0F38.W0 18 /r

VBROADCASTSS ymm1, xmm2

A V/V AVX2 Broadcast low single-precision floating-point element

in the source operand to eight locations in ymm1.

VEX.256.66.0F38.W0 19 /r

VBROADCASTSD ymm1, xmm2

A V/V AVX2 Broadcast low double-precision floating-point element

in the source operand to four locations in ymm1.

EVEX.256.66.0F38.W1 19 /r

VBROADCASTSD ymm1 {k1}{z},

xmm2/m64

BV/V AVX512VL

AVX512F

Broadcast low double-precision floating-point element

in xmm2/m64 to four locations in ymm1 using

writemask k1.

EVEX.512.66.0F38.W1 19 /r

VBROADCASTSD zmm1 {k1}{z},

xmm2/m64

B V/V AVX512F Broadcast low double-precision floating-point element

in xmm2/m64 to eight locations in zmm1 using

writemask k1.

EVEX.256.66.0F38.W0 19 /r

VBROADCASTF32X2 ymm1 {k1}{z},

xmm2/m64

CV/V AVX512VL

AVX512DQ

Broadcast two single-precision floating-point elements

in xmm2/m64 to locations in ymm1 using writemask

k1.

EVEX.512.66.0F38.W0 19 /r

VBROADCASTF32X2 zmm1 {k1}{z},

xmm2/m64

C V/V AVX512DQ Broadcast two single-precision floating-point elements

in xmm2/m64 to locations in zmm1 using writemask

k1.

EVEX.128.66.0F38.W0 18 /r

VBROADCASTSS xmm1 {k1}{z},

xmm2/m32

BV/V AVX512VL

AVX512F

Broadcast low single-precision floating-point element

in xmm2/m32 to all locations in xmm1 using

writemask k1.

EVEX.256.66.0F38.W0 18 /r

VBROADCASTSS ymm1 {k1}{z},

xmm2/m32

BV/V AVX512VL

AVX512F

Broadcast low single-precision floating-point element

in xmm2/m32 to all locations in ymm1 using

writemask k1.

EVEX.512.66.0F38.W0 18 /r

VBROADCASTSS zmm1 {k1}{z},

xmm2/m32

B V/V AVX512F Broadcast low single-precision floating-point element

in xmm2/m32 to all locations in zmm1 using

writemask k1.

EVEX.256.66.0F38.W0 1A /r

VBROADCASTF32X4 ymm1 {k1}{z},

m128

DV/V AVX512VL

AVX512F

Broadcast 128 bits of 4 single-precision floating-point

data in mem to locations in ymm1 using writemask k1.

EVEX.512.66.0F38.W0 1A /r

VBROADCASTF32X4 zmm1 {k1}{z},

m128

D V/V AVX512F Broadcast 128 bits of 4 single-precision floating-point

data in mem to locations in zmm1 using writemask k1.

EVEX.256.66.0F38.W1 1A /r

VBROADCASTF64X2 ymm1 {k1}{z},

m128

CV/V AVX512VL

AVX512DQ

Broadcast 128 bits of 2 double-precision floating-point

data in mem to locations in ymm1 using writemask k1.

VBROADCAST—Load with Broadcast Floating-Point Data

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-13

Instruction Operand Encoding

Description

VBROADCASTSD/VBROADCASTSS/VBROADCASTF128 load floating-point values as one tuple from the source

operand (second operand) in memory and broadcast to all elements of the destination operand (first operand).

VEX256-encoded versions: The destination operand is a YMM register. The source operand is either a 32-bit, 64-

bit, or 128-bit memory location. Register source encodings are reserved and will #UD. Bits (MAXVL-1:256) of the

destination register are zeroed.

EVEX-encoded versions: The destination operand is a ZMM/YMM/XMM register and updated according to the

writemask k1. The source operand is either a 32-bit, 64-bit memory location or the low doubleword/quadword

element of an XMM register.

VBROADCASTF32X2/VBROADCASTF32X4/VBROADCASTF64X2/VBROADCASTF32X8/VBROADCASTF64X4 load

floating-point values as tuples from the source operand (the second operand) in memory or register and broadcast

to all elements of the destination operand (the first operand). The destination operand is a YMM/ZMM register

updated according to the writemask k1. The source operand is either a register or 64-bit/128-bit/256-bit memory

location.

VBROADCASTSD and VBROADCASTF128,F32x4 and F64x2 are only supported as 256-bit and 512-bit wide

versions and up. VBROADCASTSS is supported in 128-bit, 256-bit and 512-bit wide versions. F32x8 and F64x4 are

only supported as 512-bit wide versions.

VBROADCASTF32X2/VBROADCASTF32X4/VBROADCASTF32X8 have 32-bit granularity. VBROADCASTF64X2 and

VBROADCASTF64X4 have 64-bit granularity.

Note: VEX.vvvv and EVEX.vvvv are reserved and must be 1111b otherwise instructions will #UD.

If VBROADCASTSD or VBROADCASTF128 is encoded with VEX.L= 0, an attempt to execute the instruction encoded

with VEX.L= 0 will cause an #UD exception.

EVEX.512.66.0F38.W1 1A /r

VBROADCASTF64X2 zmm1 {k1}{z},

m128

C V/V AVX512DQ Broadcast 128 bits of 2 double-precision floating-point

data in mem to locations in zmm1 using writemask k1.

EVEX.512.66.0F38.W0 1B /r

VBROADCASTF32X8 zmm1 {k1}{z},

m256

E V/V AVX512DQ Broadcast 256 bits of 8 single-precision floating-point

data in mem to locations in zmm1 using writemask k1.

EVEX.512.66.0F38.W1 1B /r

VBROADCASTF64X4 zmm1 {k1}{z},

m256

D V/V AVX512F Broadcast 256 bits of 4 double-precision floating-point

data in mem to locations in zmm1 using writemask k1.

Op/En Tuple Type Operand 1 Operand 2 Operand 3 Operand 4

A NA ModRM:reg (w) ModRM:r/m (r) NA NA

B Tuple1 Scalar ModRM:reg (w) ModRM:r/m (r) NA NA

C Tuple2 ModRM:reg (w) ModRM:r/m (r) NA NA

D Tuple4 ModRM:reg (w) ModRM:r/m (r) NA NA

E Tuple8 ModRM:reg (w) ModRM:r/m (r) NA NA

Opcode/

Instruction

Op /

64/32

bit

Mode

Support

CPUID

Feature

Flag

Description

VBROADCAST—Load with Broadcast Floating-Point Data

INSTRUCTION SET REFERENCE, V-Z

5-14 Vol. 2C

Figure 5-1. VBROADCASTSS Operation (VEX.256 encoded version)

Figure 5-2. VBROADCASTSS Operation (VEX.128-bit version)

Figure 5-3. VBROADCASTSD Operation (VEX.256-bit version)

Figure 5-4. VBROADCASTF128 Operation (VEX.256-bit version)

DEST

m32 X0

X0X0 X0X0 X0X0 X0X0

DEST

m32 X0

X0X0 X00X00 00

DEST

m64 X0

X0X0X0X0

DEST

m128 X0

X0X0

VBROADCAST—Load with Broadcast Floating-Point Data

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-15

Operation

VBROADCASTSS (128 bit version VEX and legacy)

temp  SRC[31:0]

DEST[31:0]  temp

DEST[63:32]  temp

DEST[95:64]  temp

DEST[127:96]  temp

DEST[MAXVL-1:128]  0

VBROADCASTSS (VEX.256 encoded version)

temp  SRC[31:0]

DEST[31:0]  temp

DEST[63:32]  temp

DEST[95:64]  temp

DEST[127:96]  temp

DEST[159:128]  temp

DEST[191:160]  temp

DEST[223:192]  temp

DEST[255:224]  temp

DEST[MAXVL-1:256]  0

VBROADCASTSS (EVEX encoded versions)

(KL, VL) (4, 128), (8, 256),= (16, 512)

FOR j  0 TO KL-1

i j * 32

IF k1[j] OR *no writemask*

THEN DEST[i+31:i]  SRC[31:0]

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+31:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+31:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL]  0

Figure 5-5. VBROADCASTF64X4 Operation (512-bit version with writemask all 1s)

DEST

m256 X0

X0X0

VBROADCAST—Load with Broadcast Floating-Point Data

INSTRUCTION SET REFERENCE, V-Z

5-16 Vol. 2C

VBROADCASTSD (VEX.256 encoded version)

temp  SRC[63:0]

DEST[63:0]  temp

DEST[127:64]  temp

DEST[191:128]  temp

DEST[255:192]  temp

DEST[MAXVL-1:256]  0

VBROADCASTSD (EVEX encoded versions)

(KL, VL) = (4, 256), (8, 512)

FOR j  0 TO KL-1

i j * 64

IF k1[j] OR *no writemask*

THEN DEST[i+63:i]  SRC[63:0]

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+63:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+63:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL]  0

VBROADCASTF32x2 (EVEX encoded versions)

(KL, VL) = (8, 256), (16, 512)

FOR j  0 TO KL-1

i j * 32

n (j mod 2) * 32

IF k1[j] OR *no writemask*

THEN DEST[i+31:i]  SRC[n+31:n]

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+31:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+31:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL]  0

VBROADCASTF128 (VEX.256 encoded version)

temp  SRC[127:0]

DEST[127:0]  temp

DEST[255:128]  temp

DEST[MAXVL-1:256]  0

VBROADCAST—Load with Broadcast Floating-Point Data

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-17

VBROADCASTF32X4 (EVEX encoded versions)

(KL, VL) = (8, 256), (16, 512)

FOR j  0 TO KL-1

i j* 32

n (j modulo 4) * 32

IF k1[j] OR *no writemask*

THEN DEST[i+31:i]  SRC[n+31:n]

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+31:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+31:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL]  0

VBROADCASTF64X2 (EVEX encoded versions)

(KL, VL) = (4, 256), (8, 512)

FOR j  0 TO KL-1

i  j * 64

n (j modulo 2) * 64

IF k1[j] OR *no writemask*

THEN DEST[i+63:i]  SRC[n+63:n]

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+63:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+63:i] = 0

FI;

ENDFOR;

VBROADCASTF32X8 (EVEX.U1.512 encoded version)

FOR j  0 TO 15

i  j * 32

n (j modulo 8) * 32

IF k1[j] OR *no writemask*

THEN DEST[i+31:i]  SRC[n+31:n]

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+31:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+31:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL]  0

VBROADCAST—Load with Broadcast Floating-Point Data

INSTRUCTION SET REFERENCE, V-Z

5-18 Vol. 2C

VBROADCASTF64X4 (EVEX.512 encoded version)

FOR j  0 TO 7

i  j * 64

n (j modulo 4) * 64

IF k1[j] OR *no writemask*

THEN DEST[i+63:i]  SRC[n+63:n]

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+63:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+63:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL]  0

Intel C/C++ Compiler Intrinsic Equivalent

VBROADCASTF32x2 __m512 _mm512_broadcast_f32x2( __m128 a);

VBROADCASTF32x2 __m512 _mm512_mask_broadcast_f32x2(__m512 s, __mmask16 k, __m128 a);

VBROADCASTF32x2 __m512 _mm512_maskz_broadcast_f32x2( __mmask16 k, __m128 a);

VBROADCASTF32x2 __m256 _mm256_broadcast_f32x2( __m128 a);

VBROADCASTF32x2 __m256 _mm256_mask_broadcast_f32x2(__m256 s, __mmask8 k, __m128 a);

VBROADCASTF32x2 __m256 _mm256_maskz_broadcast_f32x2( __mmask8 k, __m128 a);

VBROADCASTF32x4 __m512 _mm512_broadcast_f32x4( __m128 a);

VBROADCASTF32x4 __m512 _mm512_mask_broadcast_f32x4(__m512 s, __mmask16 k, __m128 a);

VBROADCASTF32x4 __m512 _mm512_maskz_broadcast_f32x4( __mmask16 k, __m128 a);

VBROADCASTF32x4 __m256 _mm256_broadcast_f32x4( __m128 a);

VBROADCASTF32x4 __m256 _mm256_mask_broadcast_f32x4(__m256 s, __mmask8 k, __m128 a);

VBROADCASTF32x4 __m256 _mm256_maskz_broadcast_f32x4( __mmask8 k, __m128 a);

VBROADCASTF32x8 __m512 _mm512_broadcast_f32x8( __m256 a);

VBROADCASTF32x8 __m512 _mm512_mask_broadcast_f32x8(__m512 s, __mmask16 k, __m256 a);

VBROADCASTF32x8 __m512 _mm512_maskz_broadcast_f32x8( __mmask16 k, __m256 a);

VBROADCASTF64x2 __m512d _mm512_broadcast_f64x2( __m128d a);

VBROADCASTF64x2 __m512d _mm512_mask_broadcast_f64x2(__m512d s, __mmask8 k, __m128d a);

VBROADCASTF64x2 __m512d _mm512_maskz_broadcast_f64x2( __mmask8 k, __m128d a);

VBROADCASTF64x2 __m256d _mm256_broadcast_f64x2( __m128d a);

VBROADCASTF64x2 __m256d _mm256_mask_broadcast_f64x2(__m256d s, __mmask8 k, __m128d a);

VBROADCASTF64x2 __m256d _mm256_maskz_broadcast_f64x2( __mmask8 k, __m128d a);

VBROADCASTF64x4 __m512d _mm512_broadcast_f64x4( __m256d a);

VBROADCASTF64x4 __m512d _mm512_mask_broadcast_f64x4(__m512d s, __mmask8 k, __m256d a);

VBROADCASTF64x4 __m512d _mm512_maskz_broadcast_f64x4( __mmask8 k, __m256d a);

VBROADCASTSD __m512d _mm512_broadcastsd_pd( __m128d a);

VBROADCASTSD __m512d _mm512_mask_broadcastsd_pd(__m512d s, __mmask8 k, __m128d a);

VBROADCASTSD __m512d _mm512_maskz_broadcastsd_pd(__mmask8 k, __m128d a);

VBROADCASTSD __m256d _mm256_broadcastsd_pd(__m128d a);

VBROADCASTSD __m256d _mm256_mask_broadcastsd_pd(__m256d s, __mmask8 k, __m128d a);

VBROADCASTSD __m256d _mm256_maskz_broadcastsd_pd( __mmask8 k, __m128d a);

VBROADCASTSD __m256d _mm256_broadcast_sd(double *a);

VBROADCASTSS __m512 _mm512_broadcastss_ps( __m128 a);

VBROADCASTSS __m512 _mm512_mask_broadcastss_ps(__m512 s, __mmask16 k, __m128 a);

VBROADCASTSS __m512 _mm512_maskz_broadcastss_ps( __mmask16 k, __m128 a);

VBROADCASTSS __m256 _mm256_broadcastss_ps(__m128 a);

VBROADCASTSS __m256 _mm256_mask_broadcastss_ps(__m256 s, __mmask8 k, __m128 a);

VBROADCASTSS __m256 _mm256_maskz_broadcastss_ps( __mmask8 k, __m128 a);

VBROADCAST—Load with Broadcast Floating-Point Data

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-19

VBROADCASTSS __m128 _mm_broadcastss_ps(__m128 a);

VBROADCASTSS __m128 _mm_mask_broadcastss_ps(__m128 s, __mmask8 k, __m128 a);

VBROADCASTSS __m128 _mm_maskz_broadcastss_ps( __mmask8 k, __m128 a);

VBROADCASTSS __m128 _mm_broadcast_ss(float *a);

VBROADCASTSS __m256 _mm256_broadcast_ss(float *a);

VBROADCASTF128 __m256 _mm256_broadcast_ps(__m128 * a);

VBROADCASTF128 __m256d _mm256_broadcast_pd(__m128d * a);

Exceptions

VEX-encoded instructions, see Exceptions Type 6;

EVEX-encoded instructions, see Exceptions Type E6.

#UD If VEX.L = 0 for VBROADCASTSD or VBROADCASTF128.

If EVEX.L’L = 0 for VBROADCASTSD/VBROADCASTF32X2/VBROADCASTF32X4/VBROADCASTF64X2.

If EVEX.L’L < 10b for VBROADCASTF32X8/VBROADCASTF64X4.

VCOMPRESSPD—Store Sparse Packed Double-Precision Floating-Point Values into Dense Memory

INSTRUCTION SET REFERENCE, V-Z

5-20 Vol. 2C

VCOMPRESSPD—Store Sparse Packed Double-Precision Floating-Point Values into Dense

Memory

Instruction Operand Encoding

Description

Compress (store) up to 8 double-precision floating-point values from the source operand (the second operand) as

a contiguous vector to the destination operand (the first operand) The source operand is a ZMM/YMM/XMM register,

the destination operand can be a ZMM/YMM/XMM register or a 512/256/128-bit memory location.

The opmask register k1 selects the active elements (partial vector or possibly non-contiguous if less than 8 active

elements) from the source operand to compress into a contiguous vector. The contiguous vector is written to the

destination starting from the low element of the destination operand.

Memory destination version: Only the contiguous vector is written to the destination memory location. EVEX.z

must be zero.

source operand, the upper bits of the destination register are unmodified if EVEX.z is not set, otherwise the upper

bits are zeroed.

EVEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.

Note that the compressed displacement assumes a pre-scaling (N) corresponding to the size of one single element

instead of the size of the full vector.

Operation

VCOMPRESSPD (EVEX encoded versions) store form

(KL, VL) = (2, 128), (4, 256), (8, 512)

SIZE 64

k  0

FOR j  0 TO KL-1

i  j * 64

IF k1[j] OR *no writemask*

THEN

DEST[k+SIZE-1:k] SRC[i+63:i]

k  k + SIZE

FI;

ENDFOR

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

EVEX.128.66.0F38.W1 8A /r

VCOMPRESSPD xmm1/m128 {k1}{z},

xmm2

AV/V AVX512VL

AVX512F

Compress packed double-precision floating-point

values from xmm2 to xmm1/m128 using writemask

k1.

EVEX.256.66.0F38.W1 8A /r

VCOMPRESSPD ymm1/m256 {k1}{z},

ymm2

AV/V AVX512VL

AVX512F

Compress packed double-precision floating-point

values from ymm2 to ymm1/m256 using writemask

k1.

EVEX.512.66.0F38.W1 8A /r

VCOMPRESSPD zmm1/m512 {k1}{z},

zmm2

A V/V AVX512F Compress packed double-precision floating-point

values from zmm2 using control mask k1 to

zmm1/m512.

Op/En Tuple Type Operand 1 Operand 2 Operand 3 Operand 4

A Tuple1 Scalar ModRM:r/m (w) ModRM:reg (r) NA NA

VCOMPRESSPD—Store Sparse Packed Double-Precision Floating-Point Values into Dense Memory

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-21

VCOMPRESSPD (EVEX encoded versions) reg-reg form

(KL, VL) = (2, 128), (4, 256), (8, 512)

SIZE 64

k  0

FOR j  0 TO KL-1

i  j * 64

IF k1[j] OR *no writemask*

THEN

DEST[k+SIZE-1:k] SRC[i+63:i]

k  k + SIZE

FI;

ENDFOR

IF *merging-masking*

THEN *DEST[VL-1:k] remains unchanged*

ELSE DEST[VL-1:k] ← 0

DEST[MAXVL-1:VL]  0

Intel C/C++ Compiler Intrinsic Equivalent

VCOMPRESSPD __m512d _mm512_mask_compress_pd( __m512d s, __mmask8 k, __m512d a);

VCOMPRESSPD __m512d _mm512_maskz_compress_pd( __mmask8 k, __m512d a);

VCOMPRESSPD void _mm512_mask_compressstoreu_pd( void * d, __mmask8 k, __m512d a);

VCOMPRESSPD __m256d _mm256_mask_compress_pd( __m256d s, __mmask8 k, __m256d a);

VCOMPRESSPD __m256d _mm256_maskz_compress_pd( __mmask8 k, __m256d a);

VCOMPRESSPD void _mm256_mask_compressstoreu_pd( void * d, __mmask8 k, __m256d a);

VCOMPRESSPD __m128d _mm_mask_compress_pd( __m128d s, __mmask8 k, __m128d a);

VCOMPRESSPD __m128d _mm_maskz_compress_pd( __mmask8 k, __m128d a);

VCOMPRESSPD void _mm_mask_compressstoreu_pd( void * d, __mmask8 k, __m128d a);

SIMD Floating-Point Exceptions

None

Other Exceptions

EVEX-encoded instructions, see Exceptions Type E4.nb.

#UD If EVEX.vvvv != 1111B.

VCOMPRESSPS—Store Sparse Packed Single-Precision Floating-Point Values into Dense Memory

INSTRUCTION SET REFERENCE, V-Z

5-22 Vol. 2C

VCOMPRESSPS—Store Sparse Packed Single-Precision Floating-Point Values into Dense Memory

Instruction Operand Encoding

Description

Compress (stores) up to 16 single-precision floating-point values from the source operand (the second operand) to

the destination operand (the first operand). The source operand is a ZMM/YMM/XMM register, the destination

operand can be a ZMM/YMM/XMM register or a 512/256/128-bit memory location.

The opmask register k1 selects the active elements (a partial vector or possibly non-contiguous if less than 16

active elements) from the source operand to compress into a contiguous vector. The contiguous vector is written to

the destination starting from the low element of the destination operand.

Memory destination version: Only the contiguous vector is written to the destination memory location. EVEX.z

must be zero.

source operand, the upper bits of the destination register are unmodified if EVEX.z is not set, otherwise the upper

bits are zeroed.

EVEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.

Note that the compressed displacement assumes a pre-scaling (N) corresponding to the size of one single element

instead of the size of the full vector.

Operation

VCOMPRESSPS (EVEX encoded versions) store form

(KL, VL) = (4, 128), (8, 256), (16, 512)

SIZE 32

k  0

FOR j  0 TO KL-1

i  j * 32

IF k1[j] OR *no writemask*

THEN

DEST[k+SIZE-1:k] SRC[i+31:i]

k  k + SIZE

FI;

ENDFOR;

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

EVEX.128.66.0F38.W0 8A /r

VCOMPRESSPS xmm1/m128 {k1}{z},

xmm2

A V/V AVX512VL

AVX512F

Compress packed single-precision floating-point

values from xmm2 to xmm1/m128 using writemask

k1.

EVEX.256.66.0F38.W0 8A /r

VCOMPRESSPS ymm1/m256 {k1}{z},

ymm2

A V/V AVX512VL

AVX512F

Compress packed single-precision floating-point

values from ymm2 to ymm1/m256 using writemask

k1.

EVEX.512.66.0F38.W0 8A /r

VCOMPRESSPS zmm1/m512 {k1}{z},

zmm2

A V/V AVX512F Compress packed single-precision floating-point

values from zmm2 using control mask k1 to

zmm1/m512.

Op/En Tuple Type Operand 1 Operand 2 Operand 3 Operand 4

A Tuple1 Scalar ModRM:r/m (w) ModRM:reg (r) NA NA

VCOMPRESSPS—Store Sparse Packed Single-Precision Floating-Point Values into Dense Memory

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-23

VCOMPRESSPS (EVEX encoded versions) reg-reg form

(KL, VL) = (4, 128), (8, 256), (16, 512)

SIZE 32

k  0

FOR j  0 TO KL-1

i  j * 32

IF k1[j] OR *no writemask*

THEN

DEST[k+SIZE-1:k] SRC[i+31:i]

k  k + SIZE

FI;

ENDFOR

IF *merging-masking*

THEN *DEST[VL-1:k] remains unchanged*

ELSE DEST[VL-1:k]  0

DEST[MAXVL-1:VL]  0

Intel C/C++ Compiler Intrinsic Equivalent

VCOMPRESSPS __m512 _mm512_mask_compress_ps( __m512 s, __mmask16 k, __m512 a);

VCOMPRESSPS __m512 _mm512_maskz_compress_ps( __mmask16 k, __m512 a);

VCOMPRESSPS void _mm512_mask_compressstoreu_ps( void * d, __mmask16 k, __m512 a);

VCOMPRESSPS __m256 _mm256_mask_compress_ps( __m256 s, __mmask8 k, __m256 a);

VCOMPRESSPS __m256 _mm256_maskz_compress_ps( __mmask8 k, __m256 a);

VCOMPRESSPS void _mm256_mask_compressstoreu_ps( void * d, __mmask8 k, __m256 a);

VCOMPRESSPS __m128 _mm_mask_compress_ps( __m128 s, __mmask8 k, __m128 a);

VCOMPRESSPS __m128 _mm_maskz_compress_ps( __mmask8 k, __m128 a);

VCOMPRESSPS void _mm_mask_compressstoreu_ps( void * d, __mmask8 k, __m128 a);

SIMD Floating-Point Exceptions

None

Other Exceptions

EVEX-encoded instructions, see Exceptions Type E4.nb.

#UD If EVEX.vvvv != 1111B.

VCVTPD2QQ—Convert Packed Double-Precision Floating-Point Values to Packed Quadword Integers

INSTRUCTION SET REFERENCE, V-Z

5-24 Vol. 2C

VCVTPD2QQ—Convert Packed Double-Precision Floating-Point Values to Packed Quadword

Integers

Instruction Operand Encoding

Description

Converts packed double-precision floating-point values in the source operand (second operand) to packed quad-

word integers in the destination operand (first operand).

EVEX encoded versions: The source operand is a ZMM/YMM/XMM register or a 512/256/128-bit memory location.

The destination operation is a ZMM/YMM/XMM register conditionally updated with writemask k1.

When a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR

format, the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value

(2w-1, where w represents the number of bits in the destination format) is returned.

EVEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

EVEX.128.66.0F.W1 7B /r

VCVTPD2QQ xmm1 {k1}{z},

xmm2/m128/m64bcst

A V/V AVX512VL

AVX512DQ

Convert two packed double-precision floating-point values from

xmm2/m128/m64bcst to two packed quadword integers in

xmm1 with writemask k1.

EVEX.256.66.0F.W1 7B /r

VCVTPD2QQ ymm1 {k1}{z},

ymm2/m256/m64bcst

A V/V AVX512VL

AVX512DQ

Convert four packed double-precision floating-point values from

ymm2/m256/m64bcst to four packed quadword integers in

ymm1 with writemask k1.

EVEX.512.66.0F.W1 7B /r

VCVTPD2QQ zmm1 {k1}{z},

zmm2/m512/m64bcst{er}

A V/V AVX512DQ Convert eight packed double-precision floating-point values

from zmm2/m512/m64bcst to eight packed quadword integers

in zmm1 with writemask k1.

Op/En Tuple Type Operand 1 Operand 2 Operand 3 Operand 4

A Full ModRM:reg (w) ModRM:r/m (r) NA NA

VCVTPD2QQ—Convert Packed Double-Precision Floating-Point Values to Packed Quadword Integers

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-25

Operation

VCVTPD2QQ (EVEX encoded version) when src operand is a register

(KL, VL) = (2, 128), (4, 256), (8, 512)

IF (VL == 512) AND (EVEX.b == 1)

THEN

SET_RM(EVEX.RC);

ELSE

SET_RM(MXCSR.RM);

FI;

FOR j  0 TO KL-1

i  j * 64

IF k1[j] OR *no writemask*

THEN DEST[i+63:i] 

Convert_Double_Precision_Floating_Point_To_QuadInteger(SRC[i+63:i])

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+63:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+63:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL]  0

VCVTPD2QQ (EVEX encoded version) when src operand is a memory source

(KL, VL) = (2, 128), (4, 256), (8, 512)

FOR j  0 TO KL-1

i  j * 64

IF k1[j] OR *no writemask*

THEN

IF (EVEX.b == 1)

THEN

DEST[i+63:i] Convert_Double_Precision_Floating_Point_To_QuadInteger(SRC[63:0])

ELSE

DEST[i+63:i]  Convert_Double_Precision_Floating_Point_To_QuadInteger(SRC[i+63:i])

FI;

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+63:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+63:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL]  0

VCVTPD2QQ—Convert Packed Double-Precision Floating-Point Values to Packed Quadword Integers

INSTRUCTION SET REFERENCE, V-Z

5-26 Vol. 2C

Intel C/C++ Compiler Intrinsic Equivalent

VCVTPD2QQ __m512i _mm512_cvtpd_epi64( __m512d a);

VCVTPD2QQ __m512i _mm512_mask_cvtpd_epi64( __m512i s, __mmask8 k, __m512d a);

VCVTPD2QQ __m512i _mm512_maskz_cvtpd_epi64( __mmask8 k, __m512d a);

VCVTPD2QQ __m512i _mm512_cvt_roundpd_epi64( __m512d a, int r);

VCVTPD2QQ __m512i _mm512_mask_cvt_roundpd_epi64( __m512i s, __mmask8 k, __m512d a, int r);

VCVTPD2QQ __m512i _mm512_maskz_cvt_roundpd_epi64( __mmask8 k, __m512d a, int r);

VCVTPD2QQ __m256i _mm256_mask_cvtpd_epi64( __m256i s, __mmask8 k, __m256d a);

VCVTPD2QQ __m256i _mm256_maskz_cvtpd_epi64( __mmask8 k, __m256d a);

VCVTPD2QQ __m128i _mm_mask_cvtpd_epi64( __m128i s, __mmask8 k, __m128d a);

VCVTPD2QQ __m128i _mm_maskz_cvtpd_epi64( __mmask8 k, __m128d a);

VCVTPD2QQ __m256i _mm256_cvtpd_epi64 (__m256d src)

VCVTPD2QQ __m128i _mm_cvtpd_epi64 (__m128d src)

SIMD Floating-Point Exceptions

Invalid, Precision

Other Exceptions

EVEX-encoded instructions, see Exceptions Type E2

#UD If EVEX.vvvv != 1111B.

VCVTPD2UDQ—Convert Packed Double-Precision Floating-Point Values to Packed Unsigned Doubleword Integers

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-27

VCVTPD2UDQ—Convert Packed Double-Precision Floating-Point Values to Packed Unsigned

Doubleword Integers

Instruction Operand Encoding

Description

Converts packed double-precision floating-point values in the source operand (the second operand) to packed

unsigned doubleword integers in the destination operand (the first operand).

When a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR

format, the floating-point invalid exception is raised, and if this exception is masked, the integer value 2w – 1 is

returned, where w represents the number of bits in the destination format.

The source operand is a ZMM/YMM/XMM register, a 512/256/128-bit memory location, or a 512/256/128-bit vector

broadcasted from a 64-bit memory location. The destination operand is a ZMM/YMM/XMM register conditionally

updated with writemask k1. The upper bits (MAXVL-1:256) of the corresponding destination are zeroed.

EVEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.

Opcode

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

EVEX.128.0F.W1 79 /r

VCVTPD2UDQ xmm1 {k1}{z},

xmm2/m128/m64bcst

A V/V AVX512VL

AVX512F

Convert two packed double-precision floating-point

values in xmm2/m128/m64bcst to two unsigned

doubleword integers in xmm1 subject to writemask k1.

EVEX.256.0F.W1 79 /r

VCVTPD2UDQ xmm1 {k1}{z},

ymm2/m256/m64bcst

A V/V AVX512VL

AVX512F

Convert four packed double-precision floating-point

values in ymm2/m256/m64bcst to four unsigned

doubleword integers in xmm1 subject to writemask k1.

EVEX.512.0F.W1 79 /r

VCVTPD2UDQ ymm1 {k1}{z},

zmm2/m512/m64bcst{er}

A V/V AVX512F Convert eight packed double-precision floating-point

values in zmm2/m512/m64bcst to eight unsigned

doubleword integers in ymm1 subject to writemask k1.

Op/En Tuple Type Operand 1 Operand 2 Operand 3 Operand 4

A Full ModRM:reg (w) ModRM:r/m (r) NA NA

VCVTPD2UDQ—Convert Packed Double-Precision Floating-Point Values to Packed Unsigned Doubleword Integers

INSTRUCTION SET REFERENCE, V-Z

5-28 Vol. 2C

Operation

VCVTPD2UDQ (EVEX encoded versions) when src2 operand is a register

(KL, VL) = (2, 128), (4, 256), (8, 512)

IF (VL = 512) AND (EVEX.b = 1)

THEN

SET_RM(EVEX.RC);

ELSE

SET_RM(MXCSR.RM);

FI;

FOR j  0 TO KL-1

i  j * 32

k  j * 64

IF k1[j] OR *no writemask*

THEN

DEST[i+31:i] 

Convert_Double_Precision_Floating_Point_To_UInteger(SRC[k+63:k])

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+31:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+31:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL/2]  0

VCVTPD2UDQ (EVEX encoded versions) when src operand is a memory source

(KL, VL) = (2, 128), (4, 256), (8, 512)

FOR j  0 TO KL-1

i  j * 32

k  j * 64

IF k1[j] OR *no writemask*

THEN

IF (EVEX.b = 1)

THEN

DEST[i+31:i] 

Convert_Double_Precision_Floating_Point_To_UInteger(SRC[63:0])

ELSE

DEST[i+31:i] 

Convert_Double_Precision_Floating_Point_To_UInteger(SRC[k+63:k])

FI;

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+31:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+31:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL/2]  0

VCVTPD2UDQ—Convert Packed Double-Precision Floating-Point Values to Packed Unsigned Doubleword Integers

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-29

Intel C/C++ Compiler Intrinsic Equivalent

VCVTPD2UDQ __m256i _mm512_cvtpd_epu32( __m512d a);

VCVTPD2UDQ __m256i _mm512_mask_cvtpd_epu32( __m256i s, __mmask8 k, __m512d a);

VCVTPD2UDQ __m256i _mm512_maskz_cvtpd_epu32( __mmask8 k, __m512d a);

VCVTPD2UDQ __m256i _mm512_cvt_roundpd_epu32( __m512d a, int r);

VCVTPD2UDQ __m256i _mm512_mask_cvt_roundpd_epu32( __m256i s, __mmask8 k, __m512d a, int r);

VCVTPD2UDQ __m256i _mm512_maskz_cvt_roundpd_epu32( __mmask8 k, __m512d a, int r);

VCVTPD2UDQ __m128i _mm256_mask_cvtpd_epu32( __m128i s, __mmask8 k, __m256d a);

VCVTPD2UDQ __m128i _mm256_maskz_cvtpd_epu32( __mmask8 k, __m256d a);

VCVTPD2UDQ __m128i _mm_mask_cvtpd_epu32( __m128i s, __mmask8 k, __m128d a);

VCVTPD2UDQ __m128i _mm_maskz_cvtpd_epu32( __mmask8 k, __m128d a);

SIMD Floating-Point Exceptions

Invalid, Precision

Other Exceptions

EVEX-encoded instructions, see Exceptions Type E2.

#UD If EVEX.vvvv != 1111B.

VCVTPD2UQQ—Convert Packed Double-Precision Floating-Point Values to Packed Unsigned Quadword Integers

INSTRUCTION SET REFERENCE, V-Z

5-30 Vol. 2C

VCVTPD2UQQ—Convert Packed Double-Precision Floating-Point Values to Packed Unsigned

Quadword Integers

Instruction Operand Encoding

Description

Converts packed double-precision floating-point values in the source operand (second operand) to packed

unsigned quadword integers in the destination operand (first operand).

When a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR

format, the floating-point invalid exception is raised, and if this exception is masked, the integer value 2w – 1 is

returned, where w represents the number of bits in the destination format.

The source operand is a ZMM/YMM/XMM register or a 512/256/128-bit memory location. The destination operation

is a ZMM/YMM/XMM register conditionally updated with writemask k1.

EVEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

EVEX.128.66.0F.W1 79 /r

VCVTPD2UQQ xmm1 {k1}{z},

xmm2/m128/m64bcst

AV/V AVX512VL

AVX512DQ

Convert two packed double-precision floating-point values from

xmm2/mem to two packed unsigned quadword integers in

xmm1 with writemask k1.

EVEX.256.66.0F.W1 79 /r

VCVTPD2UQQ ymm1 {k1}{z},

ymm2/m256/m64bcst

AV/V AVX512VL

AVX512DQ

Convert fourth packed double-precision floating-point values

from ymm2/mem to four packed unsigned quadword integers

in ymm1 with writemask k1.

EVEX.512.66.0F.W1 79 /r

VCVTPD2UQQ zmm1 {k1}{z},

zmm2/m512/m64bcst{er}

A V/V AVX512DQ Convert eight packed double-precision floating-point values

from zmm2/mem to eight packed unsigned quadword integers

in zmm1 with writemask k1.

Op/En Tuple Type Operand 1 Operand 2 Operand 3 Operand 4

A Full ModRM:reg (w) ModRM:r/m (r) NA NA

VCVTPD2UQQ—Convert Packed Double-Precision Floating-Point Values to Packed Unsigned Quadword Integers

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-31

Operation

VCVTPD2UQQ (EVEX encoded versions) when src operand is a register

(KL, VL) = (2, 128), (4, 256), (8, 512)

IF (VL == 512) AND (EVEX.b == 1)

THEN

SET_RM(EVEX.RC);

ELSE

SET_RM(MXCSR.RM);

FI;

FOR j  0 TO KL-1

i  j * 64

IF k1[j] OR *no writemask*

THEN DEST[i+63:i] 

Convert_Double_Precision_Floating_Point_To_UQuadInteger(SRC[i+63:i])

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+63:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+63:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL]  0

VCVTPD2UQQ (EVEX encoded versions) when src operand is a memory source

(KL, VL) = (2, 128), (4, 256), (8, 512)

FOR j  0 TO KL-1

i  j * 64

IF k1[j] OR *no writemask*

THEN

IF (EVEX.b == 1)

THEN

DEST[i+63:i] 

Convert_Double_Precision_Floating_Point_To_UQuadInteger(SRC[63:0])

ELSE

DEST[i+63:i] 

Convert_Double_Precision_Floating_Point_To_UQuadInteger(SRC[i+63:i])

FI;

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+63:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+63:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL]  0

VCVTPD2UQQ—Convert Packed Double-Precision Floating-Point Values to Packed Unsigned Quadword Integers

INSTRUCTION SET REFERENCE, V-Z

5-32 Vol. 2C

Intel C/C++ Compiler Intrinsic Equivalent

VCVTPD2UQQ __m512i _mm512_cvtpd_epu64( __m512d a);

VCVTPD2UQQ __m512i _mm512_mask_cvtpd_epu64( __m512i s, __mmask8 k, __m512d a);

VCVTPD2UQQ __m512i _mm512_maskz_cvtpd_epu64( __mmask8 k, __m512d a);

VCVTPD2UQQ __m512i _mm512_cvt_roundpd_epu64( __m512d a, int r);

VCVTPD2UQQ __m512i _mm512_mask_cvt_roundpd_epu64( __m512i s, __mmask8 k, __m512d a, int r);

VCVTPD2UQQ __m512i _mm512_maskz_cvt_roundpd_epu64( __mmask8 k, __m512d a, int r);

VCVTPD2UQQ __m256i _mm256_mask_cvtpd_epu64( __m256i s, __mmask8 k, __m256d a);

VCVTPD2UQQ __m256i _mm256_maskz_cvtpd_epu64( __mmask8 k, __m256d a);

VCVTPD2UQQ __m128i _mm_mask_cvtpd_epu64( __m128i s, __mmask8 k, __m128d a);

VCVTPD2UQQ __m128i _mm_maskz_cvtpd_epu64( __mmask8 k, __m128d a);

VCVTPD2UQQ __m256i _mm256_cvtpd_epu64 (__m256d src)

VCVTPD2UQQ __m128i _mm_cvtpd_epu64 (__m128d src)

SIMD Floating-Point Exceptions

Invalid, Precision

Other Exceptions

EVEX-encoded instructions, see Exceptions Type E2

#UD If EVEX.vvvv != 1111B.

VCVTPH2PS—Convert 16-bit FP values to Single-Precision FP values

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-33

VCVTPH2PS—Convert 16-bit FP values to Single-Precision FP values

Instruction Operand Encoding

Description

Converts packed half precision (16-bits) floating-point values in the low-order bits of the source operand (the

second operand) to packed single-precision floating-point values and writes the converted values into the destina-

tion operand (the first operand).

If case of a denormal operand, the correct normal result is returned. MXCSR.DAZ is ignored and is treated as if it

0. No denormal exception is reported on MXCSR.

VEX.128 version: The source operand is a XMM register or 64-bit memory location. The destination operand is a

XMM register. The upper bits (MAXVL-1:128) of the corresponding destination register are zeroed.

VEX.256 version: The source operand is a XMM register or 128-bit memory location. The destination operand is a

YMM register. Bits (MAXVL-1:256) of the corresponding destination register are zeroed.

EVEX encoded versions: The source operand is a YMM/XMM/XMM (low 64-bits) register or a 256/128/64-bit

memory location. The destination operand is a ZMM/YMM/XMM register conditionally updated with writemask k1.

The diagram below illustrates how data is converted from four packed half precision (in 64 bits) to four single preci-

sion (in 128 bits) FP values.

Note: VEX.vvvv and EVEX.vvvv are reserved (must be 1111b).

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

VEX.128.66.0F38.W0 13 /r

VCVTPH2PS xmm1, xmm2/m64

A V/V F16C Convert four packed half precision (16-bit) floating-

point values in xmm2/m64 to packed single-precision

floating-point value in xmm1.

VEX.256.66.0F38.W0 13 /r

VCVTPH2PS ymm1, xmm2/m128

A V/V F16C Convert eight packed half precision (16-bit) floating-

point values in xmm2/m128 to packed single-

precision floating-point value in ymm1.

EVEX.128.66.0F38.W0 13 /r

VCVTPH2PS xmm1 {k1}{z}, xmm2/m64

B V/V AVX512VL

AVX512F

Convert four packed half precision (16-bit) floating-

point values in xmm2/m64 to packed single-precision

floating-point values in xmm1.

EVEX.256.66.0F38.W0 13 /r

VCVTPH2PS ymm1 {k1}{z},

xmm2/m128

B V/V AVX512VL

AVX512F

Convert eight packed half precision (16-bit) floating-

point values in xmm2/m128 to packed single-

precision floating-point values in ymm1.

EVEX.512.66.0F38.W0 13 /r

VCVTPH2PS zmm1 {k1}{z},

ymm2/m256 {sae}

B V/V AVX512F Convert sixteen packed half precision (16-bit)

floating-point values in ymm2/m256 to packed

single-precision floating-point values in zmm1.

Op/En Tuple Type Operand 1 Operand 2 Operand 3 Operand 4

A NA ModRM:reg (w) ModRM:r/m (r) NA NA

B Half Mem ModRM:reg (w) ModRM:r/m (r) NA NA

VCVTPH2PS—Convert 16-bit FP values to Single-Precision FP values

INSTRUCTION SET REFERENCE, V-Z

5-34 Vol. 2C

Operation

vCvt_h2s(SRC1[15:0])

{

RETURN Cvt_Half_Precision_To_Single_Precision(SRC1[15:0]);

}

VCVTPH2PS (EVEX encoded versions)

(KL, VL) = (4, 128), (8, 256), (16, 512)

FOR j  0 TO KL-1

i  j * 32

k  j * 16

IF k1[j] OR *no writemask*

THEN DEST[i+31:i] 

vCvt_h2s(SRC[k+15:k])

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+31:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+31:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL]  0

VCVTPH2PS (VEX.256 encoded version)

DEST[31:0] vCvt_h2s(SRC1[15:0]);

DEST[63:32] vCvt_h2s(SRC1[31:16]);

DEST[95:64] vCvt_h2s(SRC1[47:32]);

DEST[127:96] vCvt_h2s(SRC1[63:48]);

DEST[159:128] vCvt_h2s(SRC1[79:64]);

DEST[191:160] vCvt_h2s(SRC1[95:80]);

DEST[223:192] vCvt_h2s(SRC1[111:96]);

DEST[255:224] vCvt_h2s(SRC1[127:112]);

DEST[MAXVL-1:256]  0

Figure 5-6. VCVTPH2PS (128-bit Version)

VH0VH1VH2VH3

15 031 1647 3263 4895 64127 96

VS0VS1VS2VS3

31 063 3295 64127 96

convert convert

convertconvert

xmm2/mem64

xmm1

VCVTPH2PS xmm1, xmm2/mem64, imm8

VCVTPH2PS—Convert 16-bit FP values to Single-Precision FP values

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-35

VCVTPH2PS (VEX.128 encoded version)

DEST[31:0] vCvt_h2s(SRC1[15:0]);

DEST[63:32] vCvt_h2s(SRC1[31:16]);

DEST[95:64] vCvt_h2s(SRC1[47:32]);

DEST[127:96] vCvt_h2s(SRC1[63:48]);

DEST[MAXVL-1:128]  0

Flags Affected

None

Intel C/C++ Compiler Intrinsic Equivalent

VCVTPH2PS __m512 _mm512_cvtph_ps( __m256i a);

VCVTPH2PS __m512 _mm512_mask_cvtph_ps(__m512 s, __mmask16 k, __m256i a);

VCVTPH2PS __m512 _mm512_maskz_cvtph_ps(__mmask16 k, __m256i a);

VCVTPH2PS __m512 _mm512_cvt_roundph_ps( __m256i a, int sae);

VCVTPH2PS __m512 _mm512_mask_cvt_roundph_ps(__m512 s, __mmask16 k, __m256i a, int sae);

VCVTPH2PS __m512 _mm512_maskz_cvt_roundph_ps( __mmask16 k, __m256i a, int sae);

VCVTPH2PS __m256 _mm256_mask_cvtph_ps(__m256 s, __mmask8 k, __m128i a);

VCVTPH2PS __m256 _mm256_maskz_cvtph_ps(__mmask8 k, __m128i a);

VCVTPH2PS __m128 _mm_mask_cvtph_ps(__m128 s, __mmask8 k, __m128i a);

VCVTPH2PS __m128 _mm_maskz_cvtph_ps(__mmask8 k, __m128i a);

VCVTPH2PS __m128 _mm_cvtph_ps ( __m128i m1);

VCVTPH2PS __m256 _mm256_cvtph_ps ( __m128i m1)

SIMD Floating-Point Exceptions

Invalid

Other Exceptions

VEX-encoded instructions, see Exceptions Type 11 (do not report #AC);

EVEX-encoded instructions, see Exceptions Type E11.

#UD If VEX.W=1.

#UD If VEX.vvvv != 1111B or EVEX.vvvv != 1111B.

VCVTPS2PH—Convert Single-Precision FP value to 16-bit FP value

INSTRUCTION SET REFERENCE, V-Z

5-36 Vol. 2C

VCVTPS2PH—Convert Single-Precision FP value to 16-bit FP value

Instruction Operand Encoding

Description

Convert packed single-precision floating values in the source operand to half-precision (16-bit) floating-point

values and store to the destination operand. The rounding mode is specified using the immediate field (imm8).

Underflow results (i.e., tiny results) are converted to denormals. MXCSR.FTZ is ignored. If a source element is

denormal relative to the input format with DM masked and at least one of PM or UM unmasked; a SIMD exception

will be raised with DE, UE and PE set.

The immediate byte defines several bit fields that control rounding operation. The effect and encoding of the RC

field are listed in Table 5-3.

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

VEX.128.66.0F3A.W0 1D /r ib

VCVTPS2PH xmm1/m64, xmm2,

imm8

A V/V F16C Convert four packed single-precision floating-point values

in xmm2 to packed half-precision (16-bit) floating-point

values in xmm1/m64. Imm8 provides rounding controls.

VEX.256.66.0F3A.W0 1D /r ib

VCVTPS2PH xmm1/m128, ymm2,

imm8

A V/V F16C Convert eight packed single-precision floating-point values

in ymm2 to packed half-precision (16-bit) floating-point

values in xmm1/m128. Imm8 provides rounding controls.

EVEX.128.66.0F3A.W0 1D /r ib

VCVTPS2PH xmm1/m64 {k1}{z},

xmm2, imm8

BV/V AVX512VL

AVX512F

Convert four packed single-precision floating-point values

in xmm2 to packed half-precision (16-bit) floating-point

values in xmm1/m64. Imm8 provides rounding controls.

EVEX.256.66.0F3A.W0 1D /r ib

VCVTPS2PH xmm1/m128 {k1}{z},

ymm2, imm8

BV/V AVX512VL

AVX512F

Convert eight packed single-precision floating-point values

in ymm2 to packed half-precision (16-bit) floating-point

values in xmm1/m128. Imm8 provides rounding controls.

EVEX.512.66.0F3A.W0 1D /r ib

VCVTPS2PH ymm1/m256 {k1}{z},

zmm2{sae}, imm8

B V/V AVX512F Convert sixteen packed single-precision floating-point

values in zmm2 to packed half-precision (16-bit) floating-

point values in ymm1/m256. Imm8 provides rounding

controls.

Op/En Tuple Type Operand 1 Operand 2 Operand 3 Operand 4

A NA ModRM:r/m (w) ModRM:reg (r) Imm8 NA

B Half Mem ModRM:r/m (w) ModRM:reg (r) Imm8 NA

Figure 5-7. VCVTPS2PH (128-bit Version)

VH0VH1VH2VH3

15 031 1647 3263 4895 64127 96

VS0VS1VS2VS3

31 063 3295 64127 96

xmm1/mem64

xmm2

VCVTPS2PH xmm1/mem64, xmm2, imm8

convert

convert convert

convert

VCVTPS2PH—Convert Single-Precision FP value to 16-bit FP value

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-37

VEX.128 version: The source operand is a XMM register. The destination operand is a XMM register or 64-bit

memory location. If the destination operand is a register then the upper bits (MAXVL-1:64) of corresponding

VEX.256 version: The source operand is a YMM register. The destination operand is a XMM register or 128-bit

memory location. If the destination operand is a register, the upper bits (MAXVL-1:128) of the corresponding desti-

nation register are zeroed.

Note: VEX.vvvv and EVEX.vvvv are reserved (must be 1111b).

EVEX encoded versions: The source operand is a ZMM/YMM/XMM register. The destination operand is a

YMM/XMM/XMM (low 64-bits) register or a 256/128/64-bit memory location, conditionally updated with writemask

k1. Bits (MAXVL-1:256/128/64) of the corresponding destination register are zeroed.

Operation

vCvt_s2h(SRC1[31:0])

{

IF Imm[2] = 0

THEN ; using Imm[1:0] for rounding control, see Table 5-3

RETURN Cvt_Single_Precision_To_Half_Precision_FP_Imm(SRC1[31:0]);

ELSE ; using MXCSR.RC for rounding control

RETURN Cvt_Single_Precision_To_Half_Precision_FP_Mxcsr(SRC1[31:0]);

FI;

}

VCVTPS2PH (EVEX encoded versions) when dest is a register

(KL, VL) = (4, 128), (8, 256), (16, 512)

FOR j  0 TO KL-1

i  j * 16

k  j * 32

IF k1[j] OR *no writemask*

THEN DEST[i+15:i] 

vCvt_s2h(SRC[k+31:k])

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+15:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+15:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL/2]  0

Table 5-3. Immediate Byte Encoding for 16-bit Floating-Point Conversion Instructions

Bits Field Name/value Description Comment

Imm[1:0] RC=00B Round to nearest even If Imm[2] = 0

RC=01B Round down

RC=10B Round up

RC=11B Truncate

Imm[2] MS1=0 Use imm[1:0] for rounding Ignore MXCSR.RC

MS1=1 Use MXCSR.RC for rounding

Imm[7:3] Ignored Ignored by processor

VCVTPS2PH—Convert Single-Precision FP value to 16-bit FP value

INSTRUCTION SET REFERENCE, V-Z

5-38 Vol. 2C

VCVTPS2PH (EVEX encoded versions) when dest is memory

(KL, VL) = (4, 128), (8, 256), (16, 512)

FOR j  0 TO KL-1

i  j * 16

k  j * 32

IF k1[j] OR *no writemask*

THEN DEST[i+15:i] 

vCvt_s2h(SRC[k+31:k])

ELSE

*DEST[i+15:i] remains unchanged* ; merging-masking

FI;

ENDFOR

VCVTPS2PH (VEX.256 encoded version)

DEST[15:0] vCvt_s2h(SRC1[31:0]);

DEST[31:16] vCvt_s2h(SRC1[63:32]);

DEST[47:32] vCvt_s2h(SRC1[95:64]);

DEST[63:48] vCvt_s2h(SRC1[127:96]);

DEST[79:64] vCvt_s2h(SRC1[159:128]);

DEST[95:80] vCvt_s2h(SRC1[191:160]);

DEST[111:96] vCvt_s2h(SRC1[223:192]);

DEST[127:112] vCvt_s2h(SRC1[255:224]);

DEST[MAXVL-1:128]  0

VCVTPS2PH (VEX.128 encoded version)

DEST[15:0] vCvt_s2h(SRC1[31:0]);

DEST[31:16] vCvt_s2h(SRC1[63:32]);

DEST[47:32] vCvt_s2h(SRC1[95:64]);

DEST[63:48] vCvt_s2h(SRC1[127:96]);

DEST[MAXVL-1:64]  0

Flags Affected

None

Intel C/C++ Compiler Intrinsic Equivalent

VCVTPS2PH __m256i _mm512_cvtps_ph(__m512 a);

VCVTPS2PH __m256i _mm512_mask_cvtps_ph(__m256i s, __mmask16 k,__m512 a);

VCVTPS2PH __m256i _mm512_maskz_cvtps_ph(__mmask16 k,__m512 a);

VCVTPS2PH __m256i _mm512_cvt_roundps_ph(__m512 a, const int imm);

VCVTPS2PH __m256i _mm512_mask_cvt_roundps_ph(__m256i s, __mmask16 k,__m512 a, const int imm);

VCVTPS2PH __m256i _mm512_maskz_cvt_roundps_ph(__mmask16 k,__m512 a, const int imm);

VCVTPS2PH __m128i _mm256_mask_cvtps_ph(__m128i s, __mmask8 k,__m256 a);

VCVTPS2PH __m128i _mm256_maskz_cvtps_ph(__mmask8 k,__m256 a);

VCVTPS2PH __m128i _mm_mask_cvtps_ph(__m128i s, __mmask8 k,__m128 a);

VCVTPS2PH __m128i _mm_maskz_cvtps_ph(__mmask8 k,__m128 a);

VCVTPS2PH __m128i _mm_cvtps_ph ( __m128 m1, const int imm);

VCVTPS2PH __m128i _mm256_cvtps_ph(__m256 m1, const int imm);

SIMD Floating-Point Exceptions

Invalid, Underflow, Overflow, Precision, Denormal (if MXCSR.DAZ=0);

VCVTPS2PH—Convert Single-Precision FP value to 16-bit FP value

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-39

Other Exceptions

VEX-encoded instructions, see Exceptions Type 11 (do not report #AC);

EVEX-encoded instructions, see Exceptions Type E11.

#UD If VEX.W=1.

#UD If VEX.vvvv != 1111B or EVEX.vvvv != 1111B.

VCVTPS2UDQ—Convert Packed Single-Precision Floating-Point Values to Packed Unsigned Doubleword Integer Values

INSTRUCTION SET REFERENCE, V-Z

5-40 Vol. 2C

VCVTPS2UDQ—Convert Packed Single-Precision Floating-Point Values to Packed Unsigned

Doubleword Integer Values

Instruction Operand Encoding

Description

Converts sixteen packed single-precision floating-point values in the source operand to sixteen unsigned double-

word integers in the destination operand.

When a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR

format, the floating-point invalid exception is raised, and if this exception is masked, the integer value 2w – 1 is

returned, where w represents the number of bits in the destination format.

The source operand is a ZMM/YMM/XMM register, a 512/256/128-bit memory location, or a 512/256/128-bit vector

broadcasted from a 32-bit memory location. The destination operand is a ZMM/YMM/XMM register conditionally

updated with writemask k1.

Note: EVEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

EVEX.128.0F.W0 79 /r

VCVTPS2UDQ xmm1 {k1}{z},

xmm2/m128/m32bcst

A V/V AVX512VL

AVX512F

Convert four packed single precision floating-point

values from xmm2/m128/m32bcst to four packed

unsigned doubleword values in xmm1 subject to

writemask k1.

EVEX.256.0F.W0 79 /r

VCVTPS2UDQ ymm1 {k1}{z},

ymm2/m256/m32bcst

A V/V AVX512VL

AVX512F

Convert eight packed single precision floating-point

values from ymm2/m256/m32bcst to eight packed

unsigned doubleword values in ymm1 subject to

writemask k1.

EVEX.512.0F.W0 79 /r

VCVTPS2UDQ zmm1 {k1}{z},

zmm2/m512/m32bcst{er}

A V/V AVX512F Convert sixteen packed single-precision floating-point

values from zmm2/m512/m32bcst to sixteen packed

unsigned doubleword values in zmm1 subject to

writemask k1.

Op/En Tuple Type Operand 1 Operand 2 Operand 3 Operand 4

A Full ModRM:reg (w) ModRM:r/m (r) NA NA

VCVTPS2UDQ—Convert Packed Single-Precision Floating-Point Values to Packed Unsigned Doubleword Integer Values

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-41

Operation

VCVTPS2UDQ (EVEX encoded versions) when src operand is a register

(KL, VL) = (4, 128), (8, 256), (16, 512)

IF (VL = 512) AND (EVEX.b = 1)

THEN

SET_RM(EVEX.RC);

ELSE

SET_RM(MXCSR.RM);

FI;

FOR j  0 TO KL-1

i  j * 32

IF k1[j] OR *no writemask*

THEN DEST[i+31:i] 

Convert_Single_Precision_Floating_Point_To_UInteger(SRC[i+31:i])

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+31:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+31:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL]  0

VCVTPS2UDQ (EVEX encoded versions) when src operand is a memory source

(KL, VL) = (4, 128), (8, 256), (16, 512)

FOR j  0 TO KL-1

i  j * 32

IF k1[j] OR *no writemask*

THEN

IF (EVEX.b = 1)

THEN

DEST[i+31:i] 

Convert_Single_Precision_Floating_Point_To_UInteger(SRC[31:0])

ELSE

DEST[i+31:i] 

Convert_Single_Precision_Floating_Point_To_UInteger(SRC[i+31:i])

FI;

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+31:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+31:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL]  0

VCVTPS2UDQ—Convert Packed Single-Precision Floating-Point Values to Packed Unsigned Doubleword Integer Values

INSTRUCTION SET REFERENCE, V-Z

5-42 Vol. 2C

Intel C/C++ Compiler Intrinsic Equivalent

VCVTPS2UDQ __m512i _mm512_cvtps_epu32( __m512 a);

VCVTPS2UDQ __m512i _mm512_mask_cvtps_epu32( __m512i s, __mmask16 k, __m512 a);

VCVTPS2UDQ __m512i _mm512_maskz_cvtps_epu32( __mmask16 k, __m512 a);

VCVTPS2UDQ __m512i _mm512_cvt_roundps_epu32( __m512 a, int r);

VCVTPS2UDQ __m512i _mm512_mask_cvt_roundps_epu32( __m512i s, __mmask16 k, __m512 a, int r);

VCVTPS2UDQ __m512i _mm512_maskz_cvt_roundps_epu32( __mmask16 k, __m512 a, int r);

VCVTPS2UDQ __m256i _mm256_cvtps_epu32( __m256d a);

VCVTPS2UDQ __m256i _mm256_mask_cvtps_epu32( __m256i s, __mmask8 k, __m256 a);

VCVTPS2UDQ __m256i _mm256_maskz_cvtps_epu32( __mmask8 k, __m256 a);

VCVTPS2UDQ __m128i _mm_cvtps_epu32( __m128 a);

VCVTPS2UDQ __m128i _mm_mask_cvtps_epu32( __m128i s, __mmask8 k, __m128 a);

VCVTPS2UDQ __m128i _mm_maskz_cvtps_epu32( __mmask8 k, __m128 a);

SIMD Floating-Point Exceptions

Invalid, Precision

Other Exceptions

EVEX-encoded instructions, see Exceptions Type E2.

#UD If EVEX.vvvv != 1111B.

VCVTPS2QQ—Convert Packed Single Precision Floating-Point Values to Packed Singed Quadword Integer Values

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-43

VCVTPS2QQ—Convert Packed Single Precision Floating-Point Values to Packed Singed

Quadword Integer Values

Instruction Operand Encoding

Description

Converts eight packed single-precision floating-point values in the source operand to eight signed quadword inte-

gers in the destination operand.

When a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR

format, the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value

(2w-1, where w represents the number of bits in the destination format) is returned.

The source operand is a YMM/XMM/XMM (low 64- bits) register or a 256/128/64-bit memory location. The destina-

tion operation is a ZMM/YMM/XMM register conditionally updated with writemask k1.

Note: EVEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

EVEX.128.66.0F.W0 7B /r

VCVTPS2QQ xmm1 {k1}{z},

xmm2/m64/m32bcst

AV/V AVX512VL

AVX512DQ

Convert two packed single precision floating-point values from

xmm2/m64/m32bcst to two packed signed quadword values in

xmm1 subject to writemask k1.

EVEX.256.66.0F.W0 7B /r

VCVTPS2QQ ymm1 {k1}{z},

xmm2/m128/m32bcst

AV/V AVX512VL

AVX512DQ

Convert four packed single precision floating-point values from

xmm2/m128/m32bcst to four packed signed quadword values

in ymm1 subject to writemask k1.

EVEX.512.66.0F.W0 7B /r

VCVTPS2QQ zmm1 {k1}{z},

ymm2/m256/m32bcst{er}

A V/V AVX512DQ Convert eight packed single precision floating-point values from

ymm2/m256/m32bcst to eight packed signed quadword values

in zmm1 subject to writemask k1.

Op/En Tuple Type Operand 1 Operand 2 Operand 3 Operand 4

A Half ModRM:reg (w) ModRM:r/m (r) NA NA

VCVTPS2QQ—Convert Packed Single Precision Floating-Point Values to Packed Singed Quadword Integer Values

INSTRUCTION SET REFERENCE, V-Z

5-44 Vol. 2C

Operation

VCVTPS2QQ (EVEX encoded versions) when src operand is a register

(KL, VL) = (2, 128), (4, 256), (8, 512)

IF (VL == 512) AND (EVEX.b == 1)

THEN

SET_RM(EVEX.RC);

ELSE

SET_RM(MXCSR.RM);

FI;

FOR j  0 TO KL-1

i  j * 64

k  j * 32

IF k1[j] OR *no writemask*

THEN DEST[i+63:i] 

Convert_Single_Precision_To_QuadInteger(SRC[k+31:k])

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+63:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+63:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL]  0

VCVTPS2QQ (EVEX encoded versions) when src operand is a memory source

(KL, VL) = (2, 128), (4, 256), (8, 512)

FOR j  0 TO KL-1

i  j * 64

k  j * 32

IF k1[j] OR *no writemask*

THEN

IF (EVEX.b == 1)

THEN

DEST[i+63:i] 

Convert_Single_Precision_To_QuadInteger(SRC[31:0])

ELSE

DEST[i+63:i] 

Convert_Single_Precision_To_QuadInteger(SRC[k+31:k])

FI;

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+63:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+63:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL]  0

VCVTPS2QQ—Convert Packed Single Precision Floating-Point Values to Packed Singed Quadword Integer Values

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-45

Intel C/C++ Compiler Intrinsic Equivalent

VCVTPS2QQ __m512i _mm512_cvtps_epi64( __m512 a);

VCVTPS2QQ __m512i _mm512_mask_cvtps_epi64( __m512i s, __mmask16 k, __m512 a);

VCVTPS2QQ __m512i _mm512_maskz_cvtps_epi64( __mmask16 k, __m512 a);

VCVTPS2QQ __m512i _mm512_cvt_roundps_epi64( __m512 a, int r);

VCVTPS2QQ __m512i _mm512_mask_cvt_roundps_epi64( __m512i s, __mmask16 k, __m512 a, int r);

VCVTPS2QQ __m512i _mm512_maskz_cvt_roundps_epi64( __mmask16 k, __m512 a, int r);

VCVTPS2QQ __m256i _mm256_cvtps_epi64( __m256 a);

VCVTPS2QQ __m256i _mm256_mask_cvtps_epi64( __m256i s, __mmask8 k, __m256 a);

VCVTPS2QQ __m256i _mm256_maskz_cvtps_epi64( __mmask8 k, __m256 a);

VCVTPS2QQ __m128i _mm_cvtps_epi64( __m128 a);

VCVTPS2QQ __m128i _mm_mask_cvtps_epi64( __m128i s, __mmask8 k, __m128 a);

VCVTPS2QQ __m128i _mm_maskz_cvtps_epi64( __mmask8 k, __m128 a);

SIMD Floating-Point Exceptions

Invalid, Precision

Other Exceptions

EVEX-encoded instructions, see Exceptions Type E3

#UD If EVEX.vvvv != 1111B.

VCVTPS2UQQ—Convert Packed Single Precision Floating-Point Values to Packed Unsigned Quadword Integer Values

INSTRUCTION SET REFERENCE, V-Z

5-46 Vol. 2C

VCVTPS2UQQ—Convert Packed Single Precision Floating-Point Values to Packed Unsigned

Quadword Integer Values

Instruction Operand Encoding

Description

Converts up to eight packed single-precision floating-point values in the source operand to unsigned quadword

integers in the destination operand.

When a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR

format, the floating-point invalid exception is raised, and if this exception is masked, the integer value 2w – 1 is

returned, where w represents the number of bits in the destination format.

The source operand is a YMM/XMM/XMM (low 64- bits) register or a 256/128/64-bit memory location. The destina-

tion operation is a ZMM/YMM/XMM register conditionally updated with writemask k1.

EVEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

EVEX.128.66.0F.W0 79 /r

VCVTPS2UQQ xmm1 {k1}{z},

xmm2/m64/m32bcst

AV/V AVX512VL

AVX512DQ

Convert two packed single precision floating-point values from

zmm2/m64/m32bcst to two packed unsigned quadword values

in zmm1 subject to writemask k1.

EVEX.256.66.0F.W0 79 /r

VCVTPS2UQQ ymm1 {k1}{z},

xmm2/m128/m32bcst

AV/V AVX512VL

AVX512DQ

Convert four packed single precision floating-point values from

xmm2/m128/m32bcst to four packed unsigned quadword

values in ymm1 subject to writemask k1.

EVEX.512.66.0F.W0 79 /r

VCVTPS2UQQ zmm1 {k1}{z},

ymm2/m256/m32bcst{er}

A V/V AVX512DQ Convert eight packed single precision floating-point values from

ymm2/m256/m32bcst to eight packed unsigned quadword

values in zmm1 subject to writemask k1.

Op/En Tuple Type Operand 1 Operand 2 Operand 3 Operand 4

A Half ModRM:reg (w) ModRM:r/m (r) NA NA

VCVTPS2UQQ—Convert Packed Single Precision Floating-Point Values to Packed Unsigned Quadword Integer Values

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-47

Operation

VCVTPS2UQQ (EVEX encoded versions) when src operand is a register

(KL, VL) = (2, 128), (4, 256), (8, 512)

IF (VL == 512) AND (EVEX.b == 1)

THEN

SET_RM(EVEX.RC);

ELSE

SET_RM(MXCSR.RM);

FI;

FOR j  0 TO KL-1

i  j * 64

k  j * 32

IF k1[j] OR *no writemask*

THEN DEST[i+63:i] 

Convert_Single_Precision_To_UQuadInteger(SRC[k+31:k])

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+63:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+63:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL]  0

VCVTPS2UQQ (EVEX encoded versions) when src operand is a memory source

(KL, VL) = (2, 128), (4, 256), (8, 512)

FOR j  0 TO KL-1

i  j * 64

k  j * 32

IF k1[j] OR *no writemask*

THEN

IF (EVEX.b == 1)

THEN

DEST[i+63:i] 

Convert_Single_Precision_To_UQuadInteger(SRC[31:0])

ELSE

DEST[i+63:i] 

Convert_Single_Precision_To_UQuadInteger(SRC[k+31:k])

FI;

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+63:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+63:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL]  0

VCVTPS2UQQ—Convert Packed Single Precision Floating-Point Values to Packed Unsigned Quadword Integer Values

INSTRUCTION SET REFERENCE, V-Z

5-48 Vol. 2C

Intel C/C++ Compiler Intrinsic Equivalent

VCVTPS2UQQ __m512i _mm512_cvtps_epu64( __m512 a);

VCVTPS2UQQ __m512i _mm512_mask_cvtps_epu64( __m512i s, __mmask16 k, __m512 a);

VCVTPS2UQQ __m512i _mm512_maskz_cvtps_epu64( __mmask16 k, __m512 a);

VCVTPS2UQQ __m512i _mm512_cvt_roundps_epu64( __m512 a, int r);

VCVTPS2UQQ __m512i _mm512_mask_cvt_roundps_epu64( __m512i s, __mmask16 k, __m512 a, int r);

VCVTPS2UQQ __m512i _mm512_maskz_cvt_roundps_epu64( __mmask16 k, __m512 a, int r);

VCVTPS2UQQ __m256i _mm256_cvtps_epu64( __m256 a);

VCVTPS2UQQ __m256i _mm256_mask_cvtps_epu64( __m256i s, __mmask8 k, __m256 a);

VCVTPS2UQQ __m256i _mm256_maskz_cvtps_epu64( __mmask8 k, __m256 a);

VCVTPS2UQQ __m128i _mm_cvtps_epu64( __m128 a);

VCVTPS2UQQ __m128i _mm_mask_cvtps_epu64( __m128i s, __mmask8 k, __m128 a);

VCVTPS2UQQ __m128i _mm_maskz_cvtps_epu64( __mmask8 k, __m128 a);

SIMD Floating-Point Exceptions

Invalid, Precision

Other Exceptions

EVEX-encoded instructions, see Exceptions Type E3

#UD If EVEX.vvvv != 1111B.

VCVTQQ2PD—Convert Packed Quadword Integers to Packed Double-Precision Floating-Point Values

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-49

VCVTQQ2PD—Convert Packed Quadword Integers to Packed Double-Precision Floating-Point

Values

Instruction Operand Encoding

Description

Converts packed quadword integers in the source operand (second operand) to packed double-precision floating-

point values in the destination operand (first operand).

The source operand is a ZMM/YMM/XMM register or a 512/256/128-bit memory location. The destination operation

is a ZMM/YMM/XMM register conditionally updated with writemask k1.

EVEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.

Operation

VCVTQQ2PD (EVEX2 encoded versions) when src operand is a register

(KL, VL) = (2, 128), (4, 256), (8, 512)

IF (VL == 512) AND (EVEX.b == 1)

THEN

SET_RM(EVEX.RC);

ELSE

SET_RM(MXCSR.RM);

FI;

FOR j  0 TO KL-1

i  j * 64

IF k1[j] OR *no writemask*

THEN DEST[i+63:i] 

Convert_QuadInteger_To_Double_Precision_Floating_Point(SRC[i+63:i])

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+63:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+63:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL]  0

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

EVEX.128.F3.0F.W1 E6 /r

VCVTQQ2PD xmm1 {k1}{z},

xmm2/m128/m64bcst

A V/V AVX512VL

AVX512DQ

Convert two packed quadword integers from

xmm2/m128/m64bcst to packed double-precision floating-

point values in xmm1 with writemask k1.

EVEX.256.F3.0F.W1 E6 /r

VCVTQQ2PD ymm1 {k1}{z},

ymm2/m256/m64bcst

A V/V AVX512VL

AVX512DQ

Convert four packed quadword integers from

ymm2/m256/m64bcst to packed double-precision floating-

point values in ymm1 with writemask k1.

EVEX.512.F3.0F.W1 E6 /r

VCVTQQ2PD zmm1 {k1}{z},

zmm2/m512/m64bcst{er}

A V/V AVX512DQ Convert eight packed quadword integers from

zmm2/m512/m64bcst to eight packed double-precision

floating-point values in zmm1 with writemask k1.

Op/En Tuple Type Operand 1 Operand 2 Operand 3 Operand 4

A Full ModRM:reg (w) ModRM:r/m (r) NA NA

VCVTQQ2PD—Convert Packed Quadword Integers to Packed Double-Precision Floating-Point Values

INSTRUCTION SET REFERENCE, V-Z

5-50 Vol. 2C

VCVTQQ2PD (EVEX encoded versions) when src operand is a memory source

(KL, VL) = (2, 128), (4, 256), (8, 512)

FOR j  0 TO KL-1

i  j * 64

IF k1[j] OR *no writemask*

THEN

IF (EVEX.b == 1)

THEN

DEST[i+63:i] 

Convert_QuadInteger_To_Double_Precision_Floating_Point(SRC[63:0])

ELSE

DEST[i+63:i] 

Convert_QuadInteger_To_Double_Precision_Floating_Point(SRC[i+63:i])

FI;

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+63:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+63:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL]  0

Intel C/C++ Compiler Intrinsic Equivalent

VCVTQQ2PD __m512d _mm512_cvtepi64_pd( __m512i a);

VCVTQQ2PD __m512d _mm512_mask_cvtepi64_pd( __m512d s, __mmask16 k, __m512i a);

VCVTQQ2PD __m512d _mm512_maskz_cvtepi64_pd( __mmask16 k, __m512i a);

VCVTQQ2PD __m512d _mm512_cvt_roundepi64_pd( __m512i a, int r);

VCVTQQ2PD __m512d _mm512_mask_cvt_roundepi64_pd( __m512d s, __mmask8 k, __m512i a, int r);

VCVTQQ2PD __m512d _mm512_maskz_cvt_roundepi64_pd( __mmask8 k, __m512i a, int r);

VCVTQQ2PD __m256d _mm256_mask_cvtepi64_pd( __m256d s, __mmask8 k, __m256i a);

VCVTQQ2PD __m256d _mm256_maskz_cvtepi64_pd( __mmask8 k, __m256i a);

VCVTQQ2PD __m128d _mm_mask_cvtepi64_pd( __m128d s, __mmask8 k, __m128i a);

VCVTQQ2PD __m128d _mm_maskz_cvtepi64_pd( __mmask8 k, __m128i a);

SIMD Floating-Point Exceptions

Precision

Other Exceptions

EVEX-encoded instructions, see Exceptions Type E2

#UD If EVEX.vvvv != 1111B.

VCVTQQ2PS—Convert Packed Quadword Integers to Packed Single-Precision Floating-Point Values

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-51

VCVTQQ2PS—Convert Packed Quadword Integers to Packed Single-Precision Floating-Point

Values

Instruction Operand Encoding

Description

Converts packed quadword integers in the source operand (second operand) to packed single-precision floating-

point values in the destination operand (first operand).

The source operand is a ZMM/YMM/XMM register or a 512/256/128-bit memory location. The destination operation

is a YMM/XMM/XMM (lower 64 bits) register conditionally updated with writemask k1.

EVEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.

Operation

VCVTQQ2PS (EVEX encoded versions) when src operand is a register

(KL, VL) = (2, 128), (4, 256), (8, 512)

FOR j  0 TO KL-1

i  j * 64

k  j * 32

IF k1[j] OR *no writemask*

THEN DEST[k+31:k] 

Convert_QuadInteger_To_Single_Precision_Floating_Point(SRC[i+63:i])

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[k+31:k] remains unchanged*

ELSE ; zeroing-masking

DEST[k+31:k]  0

FI;

ENDFOR

DEST[MAXVL-1:VL/2]  0

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

EVEX.128.0F.W1 5B /r

VCVTQQ2PS xmm1 {k1}{z},

xmm2/m128/m64bcst

AV/V AVX512VL

AVX512DQ

Convert two packed quadword integers from xmm2/mem to

packed single-precision floating-point values in xmm1 with

writemask k1.

EVEX.256.0F.W1 5B /r

VCVTQQ2PS xmm1 {k1}{z},

ymm2/m256/m64bcst

AV/V AVX512VL

AVX512DQ

Convert four packed quadword integers from ymm2/mem to

packed single-precision floating-point values in xmm1 with

writemask k1.

EVEX.512.0F.W1 5B /r

VCVTQQ2PS ymm1 {k1}{z},

zmm2/m512/m64bcst{er}

A V/V AVX512DQ Convert eight packed quadword integers from zmm2/mem to

eight packed single-precision floating-point values in ymm1 with

writemask k1.

Op/En Tuple Type Operand 1 Operand 2 Operand 3 Operand 4

A Full ModRM:reg (w) ModRM:r/m (r) NA NA

VCVTQQ2PS—Convert Packed Quadword Integers to Packed Single-Precision Floating-Point Values

INSTRUCTION SET REFERENCE, V-Z

5-52 Vol. 2C

VCVTQQ2PS (EVEX encoded versions) when src operand is a memory source

(KL, VL) = (2, 128), (4, 256), (8, 512)

FOR j  0 TO KL-1

i  j * 64

k  j * 32

IF k1[j] OR *no writemask*

THEN

IF (EVEX.b == 1)

THEN

DEST[k+31:k] 

Convert_QuadInteger_To_Single_Precision_Floating_Point(SRC[63:0])

ELSE

DEST[k+31:k] 

Convert_QuadInteger_To_Single_Precision_Floating_Point(SRC[i+63:i])

FI;

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[k+31:k] remains unchanged*

ELSE ; zeroing-masking

DEST[k+31:k]  0

FI;

ENDFOR

DEST[MAXVL-1:VL/2]  0

Intel C/C++ Compiler Intrinsic Equivalent

VCVTQQ2PS __m256 _mm512_cvtepi64_ps( __m512i a);

VCVTQQ2PS __m256 _mm512_mask_cvtepi64_ps( __m256 s, __mmask16 k, __m512i a);

VCVTQQ2PS __m256 _mm512_maskz_cvtepi64_ps( __mmask16 k, __m512i a);

VCVTQQ2PS __m256 _mm512_cvt_roundepi64_ps( __m512i a, int r);

VCVTQQ2PS __m256 _mm512_mask_cvt_roundepi_ps( __m256 s, __mmask8 k, __m512i a, int r);

VCVTQQ2PS __m256 _mm512_maskz_cvt_roundepi64_ps( __mmask8 k, __m512i a, int r);

VCVTQQ2PS __m128 _mm256_cvtepi64_ps( __m256i a);

VCVTQQ2PS __m128 _mm256_mask_cvtepi64_ps( __m128 s, __mmask8 k, __m256i a);

VCVTQQ2PS __m128 _mm256_maskz_cvtepi64_ps( __mmask8 k, __m256i a);

VCVTQQ2PS __m128 _mm_cvtepi64_ps( __m128i a);

VCVTQQ2PS __m128 _mm_mask_cvtepi64_ps( __m128 s, __mmask8 k, __m128i a);

VCVTQQ2PS __m128 _mm_maskz_cvtepi64_ps( __mmask8 k, __m128i a);

SIMD Floating-Point Exceptions

Precision

Other Exceptions

EVEX-encoded instructions, see Exceptions Type E2

#UD If EVEX.vvvv != 1111B.

VCVTSD2USI—Convert Scalar Double-Precision Floating-Point Value to Unsigned Doubleword Integer

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-53

VCVTSD2USI—Convert Scalar Double-Precision Floating-Point Value to Unsigned Doubleword

Integer

Instruction Operand Encoding

Description

Converts a double-precision floating-point value in the source operand (the second operand) to an unsigned

doubleword integer in the destination operand (the first operand). The source operand can be an XMM register or

a 64-bit memory location. The destination operand is a general-purpose register. When the source operand is an

XMM register, the double-precision floating-point value is contained in the low quadword of the register.

When a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR

format, the floating-point invalid exception is raised, and if this exception is masked, the integer value 2w – 1 is

returned, where w represents the number of bits in the destination format.

Operation

VCVTSD2USI (EVEX encoded version)

IF (SRC *is register*) AND (EVEX.b = 1)

THEN

SET_RM(EVEX.RC);

ELSE

SET_RM(MXCSR.RM);

FI;

IF 64-Bit Mode and OperandSize = 64

THEN DEST[63:0]  Convert_Double_Precision_Floating_Point_To_UInteger(SRC[63:0]);

ELSE DEST[31:0]  Convert_Double_Precision_Floating_Point_To_UInteger(SRC[63:0]);

Intel C/C++ Compiler Intrinsic Equivalent

VCVTSD2USI unsigned int _mm_cvtsd_u32(__m128d);

VCVTSD2USI unsigned int _mm_cvt_roundsd_u32(__m128d, int r);

VCVTSD2USI unsigned __int64 _mm_cvtsd_u64(__m128d);

VCVTSD2USI unsigned __int64 _mm_cvt_roundsd_u64(__m128d, int r);

SIMD Floating-Point Exceptions

Invalid, Precision

Other Exceptions

EVEX-encoded instructions, see Exceptions Type E3NF.

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

EVEX.LIG.F2.0F.W0 79 /r

VCVTSD2USI r32, xmm1/m64{er}

A V/V AVX512F Convert one double-precision floating-point value from

xmm1/m64 to one unsigned doubleword integer r32.

EVEX.LIG.F2.0F.W1 79 /r

VCVTSD2USI r64, xmm1/m64{er}

AV/N.E.

NOTES:

1. EVEX.W1 in non-64 bit is ignored; the instructions behaves as if the W0 version is used.

AVX512F Convert one double-precision floating-point value from

xmm1/m64 to one unsigned quadword integer zero-

extended into r64.

Op/En Tuple Type Operand 1 Operand 2 Operand 3 Operand 4

A Tuple1 Fixed ModRM:reg (w) ModRM:r/m (r) NA NA

VCVTSS2USI—Convert Scalar Single-Precision Floating-Point Value to Unsigned Doubleword Integer

INSTRUCTION SET REFERENCE, V-Z

5-54 Vol. 2C

VCVTSS2USI—Convert Scalar Single-Precision Floating-Point Value to Unsigned Doubleword

Integer

Instruction Operand Encoding

Description

Converts a single-precision floating-point value in the source operand (the second operand) to an unsigned double-

word integer (or unsigned quadword integer if operand size is 64 bits) in the destination operand (the first

operand). The source operand can be an XMM register or a memory location. The destination operand is a general-

purpose register. When the source operand is an XMM register, the single-precision floating-point value is contained

in the low doubleword of the register.

When a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR

format, the floating-point invalid exception is raised, and if this exception is masked, the integer value 2w – 1 is

returned, where w represents the number of bits in the destination format.

VEX.W1 and EVEX.W1 versions: promotes the instruction to produce 64-bit data in 64-bit mode.

Note: EVEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.

Operation

VCVTSS2USI (EVEX encoded version)

IF (SRC *is register*) AND (EVEX.b = 1)

THEN

SET_RM(EVEX.RC);

ELSE

SET_RM(MXCSR.RM);

FI;

IF 64-bit Mode and OperandSize = 64

THEN

DEST[63:0]  Convert_Single_Precision_Floating_Point_To_UInteger(SRC[31:0]);

ELSE

DEST[31:0]  Convert_Single_Precision_Floating_Point_To_UInteger(SRC[31:0]);

FI;

Intel C/C++ Compiler Intrinsic Equivalent

VCVTSS2USI unsigned _mm_cvtss_u32( __m128 a);

VCVTSS2USI unsigned _mm_cvt_roundss_u32( __m128 a, int r);

VCVTSS2USI unsigned __int64 _mm_cvtss_u64( __m128 a);

VCVTSS2USI unsigned __int64 _mm_cvt_roundss_u64( __m128 a, int r);

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

EVEX.LIG.F3.0F.W0 79 /r

VCVTSS2USI r32, xmm1/m32{er}

A V/V AVX512F Convert one single-precision floating-point value from

xmm1/m32 to one unsigned doubleword integer in r32.

EVEX.LIG.F3.0F.W1 79 /r

VCVTSS2USI r64, xmm1/m32{er}

AV/N.E.

NOTES:

1. EVEX.W1 in non-64 bit is ignored; the instructions behaves as if the W0 version is used.

AVX512F Convert one single-precision floating-point value from

xmm1/m32 to one unsigned quadword integer in r64.

Op/En Tuple Type Operand 1 Operand 2 Operand 3 Operand 4

A Tuple1 Fixed ModRM:reg (w) ModRM:r/m (r) NA NA

VCVTSS2USI—Convert Scalar Single-Precision Floating-Point Value to Unsigned Doubleword Integer

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-55

SIMD Floating-Point Exceptions

Invalid, Precision

Other Exceptions

EVEX-encoded instructions, see Exceptions Type E3NF.

VCVTTPD2QQ—Convert with Truncation Packed Double-Precision Floating-Point Values to Packed Quadword Integers

INSTRUCTION SET REFERENCE, V-Z

5-56 Vol. 2C

VCVTTPD2QQ—Convert with Truncation Packed Double-Precision Floating-Point Values to

Packed Quadword Integers

Instruction Operand Encoding

Description

Converts with truncation packed double-precision floating-point values in the source operand (second operand) to

packed quadword integers in the destination operand (first operand).

EVEX encoded versions: The source operand is a ZMM/YMM/XMM register or a 512/256/128-bit memory location.

The destination operand is a ZMM/YMM/XMM register conditionally updated with writemask k1.

When a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR

is raised, and if this exception is masked, the indefinite integer value (2w-1, where w represents the number of bits

in the destination format) is returned.

Note: EVEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.

Operation

VCVTTPD2QQ (EVEX encoded version) when src operand is a register

(KL, VL) = (2, 128), (4, 256), (8, 512)

FOR j  0 TO KL-1

i  j * 64

IF k1[j] OR *no writemask*

THEN DEST[i+63:i] 

Convert_Double_Precision_Floating_Point_To_QuadInteger_Truncate(SRC[i+63:i])

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+63:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+63:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL]  0

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

EVEX.128.66.0F.W1 7A /r

VCVTTPD2QQ xmm1 {k1}{z},

xmm2/m128/m64bcst

AV/V AVX512VL

AVX512DQ

Convert two packed double-precision floating-point values from

zmm2/m128/m64bcst to two packed quadword integers in

zmm1 using truncation with writemask k1.

EVEX.256.66.0F.W1 7A /r

VCVTTPD2QQ ymm1 {k1}{z},

ymm2/m256/m64bcst

AV/V AVX512VL

AVX512DQ

Convert four packed double-precision floating-point values

from ymm2/m256/m64bcst to four packed quadword integers

in ymm1 using truncation with writemask k1.

EVEX.512.66.0F.W1 7A /r

VCVTTPD2QQ zmm1 {k1}{z},

zmm2/m512/m64bcst{sae}

A V/V AVX512DQ Convert eight packed double-precision floating-point values

from zmm2/m512 to eight packed quadword integers in zmm1

using truncation with writemask k1.

Op/En Tuple Type Operand 1 Operand 2 Operand 3 Operand 4

A Full ModRM:reg (w) ModRM:r/m (r) NA NA

VCVTTPD2QQ—Convert with Truncation Packed Double-Precision Floating-Point Values to Packed Quadword Integers

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-57

VCVTTPD2QQ (EVEX encoded version) when src operand is a memory source

(KL, VL) = (2, 128), (4, 256), (8, 512)

FOR j  0 TO KL-1

i  j * 64

IF k1[j] OR *no writemask*

THEN

IF (EVEX.b == 1)

THEN

DEST[i+63:i] Convert_Double_Precision_Floating_Point_To_QuadInteger_Truncate(SRC[63:0])

ELSE

DEST[i+63:i]  Convert_Double_Precision_Floating_Point_To_QuadInteger_Truncate(SRC[i+63:i])

FI;

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+63:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+63:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL]  0

Intel C/C++ Compiler Intrinsic Equivalent

VCVTTPD2QQ __m512i _mm512_cvttpd_epi64( __m512d a);

VCVTTPD2QQ __m512i _mm512_mask_cvttpd_epi64( __m512i s, __mmask8 k, __m512d a);

VCVTTPD2QQ __m512i _mm512_maskz_cvttpd_epi64( __mmask8 k, __m512d a);

VCVTTPD2QQ __m512i _mm512_cvtt_roundpd_epi64( __m512d a, int sae);

VCVTTPD2QQ __m512i _mm512_mask_cvtt_roundpd_epi64( __m512i s, __mmask8 k, __m512d a, int sae);

VCVTTPD2QQ __m512i _mm512_maskz_cvtt_roundpd_epi64( __mmask8 k, __m512d a, int sae);

VCVTTPD2QQ __m256i _mm256_mask_cvttpd_epi64( __m256i s, __mmask8 k, __m256d a);

VCVTTPD2QQ __m256i _mm256_maskz_cvttpd_epi64( __mmask8 k, __m256d a);

VCVTTPD2QQ __m128i _mm_mask_cvttpd_epi64( __m128i s, __mmask8 k, __m128d a);

VCVTTPD2QQ __m128i _mm_maskz_cvttpd_epi64( __mmask8 k, __m128d a);

SIMD Floating-Point Exceptions

Invalid, Precision

Other Exceptions

EVEX-encoded instructions, see Exceptions Type E2.

#UD If EVEX.vvvv != 1111B.

VCVTTPD2UDQ—Convert with Truncation Packed Double-Precision Floating-Point Values to Packed Unsigned Doubleword Integers

INSTRUCTION SET REFERENCE, V-Z

5-58 Vol. 2C

VCVTTPD2UDQ—Convert with Truncation Packed Double-Precision Floating-Point Values to

Packed Unsigned Doubleword Integers

Instruction Operand Encoding

Description

Converts with truncation packed double-precision floating-point values in the source operand (the second operand)

to packed unsigned doubleword integers in the destination operand (the first operand).

When a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR

is raised, and if this exception is masked, the integer value 2w – 1 is returned, where w represents the number of

bits in the destination format.

The source operand is a ZMM/YMM/XMM register, a 512/256/128-bit memory location, or a 512/256/128-bit vector

broadcasted from a 64-bit memory location. The destination operand is a YMM/XMM/XMM (low 64 bits) register

conditionally updated with writemask k1. The upper bits (MAXVL-1:256) of the corresponding destination are

zeroed.

Note: EVEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.

Opcode

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

EVEX.128.0F.W1 78 /r

VCVTTPD2UDQ xmm1 {k1}{z},

xmm2/m128/m64bcst

A V/V AVX512VL

AVX512F

Convert two packed double-precision floating-point values

in xmm2/m128/m64bcst to two unsigned doubleword

integers in xmm1 using truncation subject to writemask

k1.

EVEX.256.0F.W1 78 02 /r

VCVTTPD2UDQ xmm1 {k1}{z},

ymm2/m256/m64bcst

A V/V AVX512VL

AVX512F

Convert four packed double-precision floating-point

values in ymm2/m256/m64bcst to four unsigned

doubleword integers in xmm1 using truncation subject to

writemask k1.

EVEX.512.0F.W1 78 /r

VCVTTPD2UDQ ymm1 {k1}{z},

zmm2/m512/m64bcst{sae}

A V/V AVX512F Convert eight packed double-precision floating-point

values in zmm2/m512/m64bcst to eight unsigned

doubleword integers in ymm1 using truncation subject to

writemask k1.

Op/En Tuple Type Operand 1 Operand 2 Operand 3 Operand 4

A Full ModRM:reg (w) ModRM:r/m (r) NA NA

VCVTTPD2UDQ—Convert with Truncation Packed Double-Precision Floating-Point Values to Packed Unsigned Doubleword Integers

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-59

Operation

VCVTTPD2UDQ (EVEX encoded versions) when src2 operand is a register

(KL, VL) = (2, 128), (4, 256),(8, 512)

FOR j  0 TO KL-1

i  j * 32

k  j * 64

IF k1[j] OR *no writemask*

THEN

DEST[i+31:i] 

Convert_Double_Precision_Floating_Point_To_UInteger_Truncate(SRC[k+63:k])

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+31:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+31:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL/2]  0

VCVTTPD2UDQ (EVEX encoded versions) when src operand is a memory source

(KL, VL) = (2, 128), (4, 256),(8, 512)

FOR j  0 TO KL-1

i  j * 32

k  j * 64

IF k1[j] OR *no writemask*

THEN

IF (EVEX.b = 1)

THEN

DEST[i+31:i] 

Convert_Double_Precision_Floating_Point_To_UInteger_Truncate(SRC[63:0])

ELSE

DEST[i+31:i] 

Convert_Double_Precision_Floating_Point_To_UInteger_Truncate(SRC[k+63:k])

FI;

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+31:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+31:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL/2]  0

VCVTTPD2UDQ—Convert with Truncation Packed Double-Precision Floating-Point Values to Packed Unsigned Doubleword Integers

INSTRUCTION SET REFERENCE, V-Z

5-60 Vol. 2C

Intel C/C++ Compiler Intrinsic Equivalent

VCVTTPD2UDQ __m256i _mm512_cvttpd_epu32( __m512d a);

VCVTTPD2UDQ __m256i _mm512_mask_cvttpd_epu32( __m256i s, __mmask8 k, __m512d a);

VCVTTPD2UDQ __m256i _mm512_maskz_cvttpd_epu32( __mmask8 k, __m512d a);

VCVTTPD2UDQ __m256i _mm512_cvtt_roundpd_epu32( __m512d a, int sae);

VCVTTPD2UDQ __m256i _mm512_mask_cvtt_roundpd_epu32( __m256i s, __mmask8 k, __m512d a, int sae);

VCVTTPD2UDQ __m256i _mm512_maskz_cvtt_roundpd_epu32( __mmask8 k, __m512d a, int sae);

VCVTTPD2UDQ __m128i _mm256_mask_cvttpd_epu32( __m128i s, __mmask8 k, __m256d a);

VCVTTPD2UDQ __m128i _mm256_maskz_cvttpd_epu32( __mmask8 k, __m256d a);

VCVTTPD2UDQ __m128i _mm_mask_cvttpd_epu32( __m128i s, __mmask8 k, __m128d a);

VCVTTPD2UDQ __m128i _mm_maskz_cvttpd_epu32( __mmask8 k, __m128d a);

SIMD Floating-Point Exceptions

Invalid, Precision

Other Exceptions

EVEX-encoded instructions, see Exceptions Type E2.

#UD If EVEX.vvvv != 1111B.

VCVTTPD2UQQ—Convert with Truncation Packed Double-Precision Floating-Point Values to Packed Unsigned Quadword Integers

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-61

VCVTTPD2UQQ—Convert with Truncation Packed Double-Precision Floating-Point Values to

Packed Unsigned Quadword Integers

Instruction Operand Encoding

Description

Converts with truncation packed double-precision floating-point values in the source operand (second operand) to

packed unsigned quadword integers in the destination operand (first operand).

When a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR

is raised, and if this exception is masked, the integer value 2w – 1 is returned, where w represents the number of

bits in the destination format.

EVEX encoded versions: The source operand is a ZMM/YMM/XMM register or a 512/256/128-bit memory location.

The destination operation is a ZMM/YMM/XMM register conditionally updated with writemask k1.

Note: EVEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.

Operation

VCVTTPD2UQQ (EVEX encoded versions) when src operand is a register

(KL, VL) = (2, 128), (4, 256), (8, 512)

FOR j  0 TO KL-1

i  j * 64

IF k1[j] OR *no writemask*

THEN DEST[i+63:i] 

Convert_Double_Precision_Floating_Point_To_UQuadInteger_Truncate(SRC[i+63:i])

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+63:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+63:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL]  0

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

EVEX.128.66.0F.W1 78 /r

VCVTTPD2UQQ xmm1 {k1}{z},

xmm2/m128/m64bcst

A V/V AVX512VL

AVX512DQ

Convert two packed double-precision floating-point values

from xmm2/m128/m64bcst to two packed unsigned

quadword integers in xmm1 using truncation with

writemask k1.

EVEX.256.66.0F.W1 78 /r

VCVTTPD2UQQ ymm1 {k1}{z},

ymm2/m256/m64bcst

A V/V AVX512VL

AVX512DQ

Convert four packed double-precision floating-point values

from ymm2/m256/m64bcst to four packed unsigned

quadword integers in ymm1 using truncation with

writemask k1.

EVEX.512.66.0F.W1 78 /r

VCVTTPD2UQQ zmm1 {k1}{z},

zmm2/m512/m64bcst{sae}

A V/V AVX512DQ Convert eight packed double-precision floating-point values

from zmm2/mem to eight packed unsigned quadword

integers in zmm1 using truncation with writemask k1.

Op/En Tuple Type Operand 1 Operand 2 Operand 3 Operand 4

A Full ModRM:reg (w) ModRM:r/m (r) NA NA

VCVTTPD2UQQ—Convert with Truncation Packed Double-Precision Floating-Point Values to Packed Unsigned Quadword Integers

INSTRUCTION SET REFERENCE, V-Z

5-62 Vol. 2C

VCVTTPD2UQQ (EVEX encoded versions) when src operand is a memory source

(KL, VL) = (2, 128), (4, 256), (8, 512)

FOR j  0 TO KL-1

i  j * 64

IF k1[j] OR *no writemask*

THEN

IF (EVEX.b == 1)

THEN

DEST[i+63:i] 

Convert_Double_Precision_Floating_Point_To_UQuadInteger_Truncate(SRC[63:0])

ELSE

DEST[i+63:i] 

Convert_Double_Precision_Floating_Point_To_UQuadInteger_Truncate(SRC[i+63:i])

FI;

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+63:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+63:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL]  0

Intel C/C++ Compiler Intrinsic Equivalent

VCVTTPD2UQQ _mm<size>[_mask[z]]_cvtt[_round]pd_epu64

VCVTTPD2UQQ __m512i _mm512_cvttpd_epu64( __m512d a);

VCVTTPD2UQQ __m512i _mm512_mask_cvttpd_epu64( __m512i s, __mmask8 k, __m512d a);

VCVTTPD2UQQ __m512i _mm512_maskz_cvttpd_epu64( __mmask8 k, __m512d a);

VCVTTPD2UQQ __m512i _mm512_cvtt_roundpd_epu64( __m512d a, int sae);

VCVTTPD2UQQ __m512i _mm512_mask_cvtt_roundpd_epu64( __m512i s, __mmask8 k, __m512d a, int sae);

VCVTTPD2UQQ __m512i _mm512_maskz_cvtt_roundpd_epu64( __mmask8 k, __m512d a, int sae);

VCVTTPD2UQQ __m256i _mm256_mask_cvttpd_epu64( __m256i s, __mmask8 k, __m256d a);

VCVTTPD2UQQ __m256i _mm256_maskz_cvttpd_epu64( __mmask8 k, __m256d a);

VCVTTPD2UQQ __m128i _mm_mask_cvttpd_epu64( __m128i s, __mmask8 k, __m128d a);

VCVTTPD2UQQ __m128i _mm_maskz_cvttpd_epu64( __mmask8 k, __m128d a);

SIMD Floating-Point Exceptions

Invalid, Precision

Other Exceptions

EVEX-encoded instructions, see Exceptions Type E2.

#UD If EVEX.vvvv != 1111B.

VCVTTPS2UDQ—Convert with Truncation Packed Single-Precision Floating-Point Values to Packed Unsigned Doubleword Integer Val-

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-63

VCVTTPS2UDQ—Convert with Truncation Packed Single-Precision Floating-Point Values to

Packed Unsigned Doubleword Integer Values

Instruction Operand Encoding

Description

Converts with truncation packed single-precision floating-point values in the source operand to sixteen unsigned

doubleword integers in the destination operand.

When a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR.

If a converted result cannot be represented in the destination format, the floating-point invalid exception is raised,

and if this exception is masked, the integer value 2w – 1 is returned, where w represents the number of bits in the

destination format.

EVEX encoded versions: The source operand is a ZMM/YMM/XMM register, a 512/256/128-bit memory location or

a 512/256/128-bit vector broadcasted from a 32-bit memory location. The destination operand is a

ZMM/YMM/XMM register conditionally updated with writemask k1.

Note: EVEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.

Operation

VCVTTPS2UDQ (EVEX encoded versions) when src operand is a register

(KL, VL) = (4, 128), (8, 256), (16, 512)

FOR j  0 TO KL-1

i  j * 32

IF k1[j] OR *no writemask*

THEN DEST[i+31:i] 

Convert_Single_Precision_Floating_Point_To_UInteger_Truncate(SRC[i+31:i])

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+31:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+31:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL]  0

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

EVEX.128.0F.W0 78 /r

VCVTTPS2UDQ xmm1 {k1}{z},

xmm2/m128/m32bcst

A V/V AVX512VL

AVX512F

Convert four packed single precision floating-point

values from xmm2/m128/m32bcst to four packed

unsigned doubleword values in xmm1 using

truncation subject to writemask k1.

EVEX.256.0F.W0 78 /r

VCVTTPS2UDQ ymm1 {k1}{z},

ymm2/m256/m32bcst

A V/V AVX512VL

AVX512F

Convert eight packed single precision floating-point

values from ymm2/m256/m32bcst to eight packed

unsigned doubleword values in ymm1 using

truncation subject to writemask k1.

EVEX.512.0F.W0 78 /r

VCVTTPS2UDQ zmm1 {k1}{z},

zmm2/m512/m32bcst{sae}

A V/V AVX512F Convert sixteen packed single-precision floating-

point values from zmm2/m512/m32bcst to sixteen

packed unsigned doubleword values in zmm1 using

truncation subject to writemask k1.

Op/En Tuple Type Operand 1 Operand 2 Operand 3 Operand 4

A Full ModRM:reg (w) ModRM:r/m (r) NA NA

VCVTTPS2UDQ—Convert with Truncation Packed Single-Precision Floating-Point Values to Packed Unsigned Doubleword Integer Val-

INSTRUCTION SET REFERENCE, V-Z

5-64 Vol. 2C

VCVTTPS2UDQ (EVEX encoded versions) when src operand is a memory source

(KL, VL) = (4, 128), (8, 256), (16, 512)

FOR j  0 TO KL-1

i  j * 32

IF k1[j] OR *no writemask*

THEN

IF (EVEX.b = 1)

THEN

DEST[i+31:i] 

Convert_Single_Precision_Floating_Point_To_UInteger_Truncate(SRC[31:0])

ELSE

DEST[i+31:i] 

Convert_Single_Precision_Floating_Point_To_UInteger_Truncate(SRC[i+31:i])

FI;

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+31:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+31:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL]  0

Intel C/C++ Compiler Intrinsic Equivalent

VCVTTPS2UDQ __m512i _mm512_cvttps_epu32( __m512 a);

VCVTTPS2UDQ __m512i _mm512_mask_cvttps_epu32( __m512i s, __mmask16 k, __m512 a);

VCVTTPS2UDQ __m512i _mm512_maskz_cvttps_epu32( __mmask16 k, __m512 a);

VCVTTPS2UDQ __m512i _mm512_cvtt_roundps_epu32( __m512 a, int sae);

VCVTTPS2UDQ __m512i _mm512_mask_cvtt_roundps_epu32( __m512i s, __mmask16 k, __m512 a, int sae);

VCVTTPS2UDQ __m512i _mm512_maskz_cvtt_roundps_epu32( __mmask16 k, __m512 a, int sae);

VCVTTPS2UDQ __m256i _mm256_mask_cvttps_epu32( __m256i s, __mmask8 k, __m256 a);

VCVTTPS2UDQ __m256i _mm256_maskz_cvttps_epu32( __mmask8 k, __m256 a);

VCVTTPS2UDQ __m128i _mm_mask_cvttps_epu32( __m128i s, __mmask8 k, __m128 a);

VCVTTPS2UDQ __m128i _mm_maskz_cvttps_epu32( __mmask8 k, __m128 a);

SIMD Floating-Point Exceptions

Invalid, Precision

Other Exceptions

EVEX-encoded instructions, see Exceptions Type E2.

#UD If EVEX.vvvv != 1111B.

VCVTTPS2QQ—Convert with Truncation Packed Single Precision Floating-Point Values to Packed Singed Quadword Integer Values

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-65

VCVTTPS2QQ—Convert with Truncation Packed Single Precision Floating-Point Values to

Packed Singed Quadword Integer Values

Instruction Operand Encoding

Description

Converts with truncation packed single-precision floating-point values in the source operand to eight signed quad-

word integers in the destination operand.

When a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR

is raised, and if this exception is masked, the indefinite integer value (2w-1, where w represents the number of bits

in the destination format) is returned.

EVEX encoded versions: The source operand is a YMM/XMM/XMM (low 64 bits) register or a 256/128/64-bit

memory location. The destination operation is a vector register conditionally updated with writemask k1.

Note: EVEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.

Operation

VCVTTPS2QQ (EVEX encoded versions) when src operand is a register

(KL, VL) = (2, 128), (4, 256), (8, 512)

FOR j  0 TO KL-1

i  j * 64

k  j * 32

IF k1[j] OR *no writemask*

THEN DEST[i+63:i] 

Convert_Single_Precision_To_QuadInteger_Truncate(SRC[k+31:k])

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+63:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+63:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL]  0

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

EVEX.128.66.0F.W0 7A /r

VCVTTPS2QQ xmm1 {k1}{z},

xmm2/m64/m32bcst

AV/V AVX512VL

AVX512DQ

Convert two packed single precision floating-point values from

xmm2/m64/m32bcst to two packed signed quadword values in

xmm1 using truncation subject to writemask k1.

EVEX.256.66.0F.W0 7A /r

VCVTTPS2QQ ymm1 {k1}{z},

xmm2/m128/m32bcst

AV/V AVX512VL

AVX512DQ

Convert four packed single precision floating-point values from

xmm2/m128/m32bcst to four packed signed quadword values

in ymm1 using truncation subject to writemask k1.

EVEX.512.66.0F.W0 7A /r

VCVTTPS2QQ zmm1 {k1}{z},

ymm2/m256/m32bcst{sae}

A V/V AVX512DQ Convert eight packed single precision floating-point values from

ymm2/m256/m32bcst to eight packed signed quadword values

in zmm1 using truncation subject to writemask k1.

Op/En Tuple Type Operand 1 Operand 2 Operand 3 Operand 4

A Half ModRM:reg (w) ModRM:r/m (r) NA NA

VCVTTPS2QQ—Convert with Truncation Packed Single Precision Floating-Point Values to Packed Singed Quadword Integer Values

INSTRUCTION SET REFERENCE, V-Z

5-66 Vol. 2C

VCVTTPS2QQ (EVEX encoded versions) when src operand is a memory source

(KL, VL) = (2, 128), (4, 256), (8, 512)

FOR j  0 TO KL-1

i  j * 64

k  j * 32

IF k1[j] OR *no writemask*

THEN

IF (EVEX.b == 1)

THEN

DEST[i+63:i] 

Convert_Single_Precision_To_QuadInteger_Truncate(SRC[31:0])

ELSE

DEST[i+63:i] 

Convert_Single_Precision_To_QuadInteger_Truncate(SRC[k+31:k])

FI;

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+63:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+63:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL]  0

Intel C/C++ Compiler Intrinsic Equivalent

VCVTTPS2QQ __m512i _mm512_cvttps_epi64( __m256 a);

VCVTTPS2QQ __m512i _mm512_mask_cvttps_epi64( __m512i s, __mmask16 k, __m256 a);

VCVTTPS2QQ __m512i _mm512_maskz_cvttps_epi64( __mmask16 k, __m256 a);

VCVTTPS2QQ __m512i _mm512_cvtt_roundps_epi64( __m256 a, int sae);

VCVTTPS2QQ __m512i _mm512_mask_cvtt_roundps_epi64( __m512i s, __mmask16 k, __m256 a, int sae);

VCVTTPS2QQ __m512i _mm512_maskz_cvtt_roundps_epi64( __mmask16 k, __m256 a, int sae);

VCVTTPS2QQ __m256i _mm256_mask_cvttps_epi64( __m256i s, __mmask8 k, __m128 a);

VCVTTPS2QQ __m256i _mm256_maskz_cvttps_epi64( __mmask8 k, __m128 a);

VCVTTPS2QQ __m128i _mm_mask_cvttps_epi64( __m128i s, __mmask8 k, __m128 a);

VCVTTPS2QQ __m128i _mm_maskz_cvttps_epi64( __mmask8 k, __m128 a);

SIMD Floating-Point Exceptions

Invalid, Precision

Other Exceptions

EVEX-encoded instructions, see Exceptions Type E3.

#UD If EVEX.vvvv != 1111B.

VCVTTPS2UQQ—Convert with Truncation Packed Single Precision Floating-Point Values to Packed Unsigned Quadword Integer Values

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-67

VCVTTPS2UQQ—Convert with Truncation Packed Single Precision Floating-Point Values to

Packed Unsigned Quadword Integer Values

Instruction Operand Encoding

Description

Converts with truncation up to eight packed single-precision floating-point values in the source operand to

unsigned quadword integers in the destination operand.

When a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR

is raised, and if this exception is masked, the integer value 2w – 1 is returned, where w represents the number of

bits in the destination format.

EVEX encoded versions: The source operand is a YMM/XMM/XMM (low 64 bits) register or a 256/128/64-bit

memory location. The destination operation is a vector register conditionally updated with writemask k1.

Note: EVEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.

Operation

VCVTTPS2UQQ (EVEX encoded versions) when src operand is a register

(KL, VL) = (2, 128), (4, 256), (8, 512)

FOR j  0 TO KL-1

i  j * 64

k  j * 32

IF k1[j] OR *no writemask*

THEN DEST[i+63:i] 

Convert_Single_Precision_To_UQuadInteger_Truncate(SRC[k+31:k])

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+63:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+63:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL]  0

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

EVEX.128.66.0F.W0 78 /r

VCVTTPS2UQQ xmm1 {k1}{z},

xmm2/m64/m32bcst

AV/V AVX512VL

AVX512DQ

Convert two packed single precision floating-point values

from xmm2/m64/m32bcst to two packed unsigned quadword

values in xmm1 using truncation subject to writemask k1.

EVEX.256.66.0F.W0 78 /r

VCVTTPS2UQQ ymm1 {k1}{z},

xmm2/m128/m32bcst

AV/V AVX512VL

AVX512DQ

Convert four packed single precision floating-point values

from xmm2/m128/m32bcst to four packed unsigned

quadword values in ymm1 using truncation subject to

writemask k1.

EVEX.512.66.0F.W0 78 /r

VCVTTPS2UQQ zmm1 {k1}{z},

ymm2/m256/m32bcst{sae}

A V/V AVX512DQ Convert eight packed single precision floating-point values

from ymm2/m256/m32bcst to eight packed unsigned

quadword values in zmm1 using truncation subject to

writemask k1.

Op/En Tuple Type Operand 1 Operand 2 Operand 3 Operand 4

A Half ModRM:reg (w) ModRM:r/m (r) NA NA

VCVTTPS2UQQ—Convert with Truncation Packed Single Precision Floating-Point Values to Packed Unsigned Quadword Integer Values

INSTRUCTION SET REFERENCE, V-Z

5-68 Vol. 2C

VCVTTPS2UQQ (EVEX encoded versions) when src operand is a memory source

(KL, VL) = (2, 128), (4, 256), (8, 512)

FOR j  0 TO KL-1

i  j * 64

k  j * 32

IF k1[j] OR *no writemask*

THEN

IF (EVEX.b == 1)

THEN

DEST[i+63:i] 

Convert_Single_Precision_To_UQuadInteger_Truncate(SRC[31:0])

ELSE

DEST[i+63:i] 

Convert_Single_Precision_To_UQuadInteger_Truncate(SRC[k+31:k])

FI;

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+63:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+63:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL]  0

Intel C/C++ Compiler Intrinsic Equivalent

VCVTTPS2UQQ _mm<size>[_mask[z]]_cvtt[_round]ps_epu64

VCVTTPS2UQQ __m512i _mm512_cvttps_epu64( __m256 a);

VCVTTPS2UQQ __m512i _mm512_mask_cvttps_epu64( __m512i s, __mmask16 k, __m256 a);

VCVTTPS2UQQ __m512i _mm512_maskz_cvttps_epu64( __mmask16 k, __m256 a);

VCVTTPS2UQQ __m512i _mm512_cvtt_roundps_epu64( __m256 a, int sae);

VCVTTPS2UQQ __m512i _mm512_mask_cvtt_roundps_epu64( __m512i s, __mmask16 k, __m256 a, int sae);

VCVTTPS2UQQ __m512i _mm512_maskz_cvtt_roundps_epu64( __mmask16 k, __m256 a, int sae);

VCVTTPS2UQQ __m256i _mm256_mask_cvttps_epu64( __m256i s, __mmask8 k, __m128 a);

VCVTTPS2UQQ __m256i _mm256_maskz_cvttps_epu64( __mmask8 k, __m128 a);

VCVTTPS2UQQ __m128i _mm_mask_cvttps_epu64( __m128i s, __mmask8 k, __m128 a);

VCVTTPS2UQQ __m128i _mm_maskz_cvttps_epu64( __mmask8 k, __m128 a);

SIMD Floating-Point Exceptions

Invalid, Precision

Other Exceptions

EVEX-encoded instructions, see Exceptions Type E3.

#UD If EVEX.vvvv != 1111B.

VCVTTSD2USI—Convert with Truncation Scalar Double-Precision Floating-Point Value to Unsigned Integer

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-69

VCVTTSD2USI—Convert with Truncation Scalar Double-Precision Floating-Point Value to

Unsigned Integer

Instruction Operand Encoding

Description

Converts with truncation a double-precision floating-point value in the source operand (the second operand) to an

unsigned doubleword integer (or unsigned quadword integer if operand size is 64 bits) in the destination operand

(the first operand). The source operand can be an XMM register or a 64-bit memory location. The destination

operand is a general-purpose register. When the source operand is an XMM register, the double-precision floating-

point value is contained in the low quadword of the register.

When a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR

is raised, and if this exception is masked, the integer value 2w – 1 is returned, where w represents the number of

bits in the destination format.

EVEX.W1 version: promotes the instruction to produce 64-bit data in 64-bit mode.

Operation

VCVTTSD2USI (EVEX encoded version)

IF 64-Bit Mode and OperandSize = 64

THEN DEST[63:0]  Convert_Double_Precision_Floating_Point_To_UInteger_Truncate(SRC[63:0]);

ELSE DEST[31:0]  Convert_Double_Precision_Floating_Point_To_UInteger_Truncate(SRC[63:0]);

Intel C/C++ Compiler Intrinsic Equivalent

VCVTTSD2USI unsigned int _mm_cvttsd_u32(__m128d);

VCVTTSD2USI unsigned int _mm_cvtt_roundsd_u32(__m128d, int sae);

VCVTTSD2USI unsigned __int64 _mm_cvttsd_u64(__m128d);

VCVTTSD2USI unsigned __int64 _mm_cvtt_roundsd_u64(__m128d, int sae);

SIMD Floating-Point Exceptions

Invalid, Precision

Other Exceptions

EVEX-encoded instructions, see Exceptions Type E3NF.

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

EVEX.LIG.F2.0F.W0 78 /r

VCVTTSD2USI r32, xmm1/m64{sae}

A V/V AVX512F Convert one double-precision floating-point value from

xmm1/m64 to one unsigned doubleword integer r32

using truncation.

EVEX.LIG.F2.0F.W1 78 /r

VCVTTSD2USI r64, xmm1/m64{sae}

AV/N.E.

NOTES:

1. For this specific instruction, EVEX.W in non-64 bit is ignored; the instructions behaves as if the W0 version is

used.

AVX512F Convert one double-precision floating-point value from

xmm1/m64 to one unsigned quadword integer zero-

extended into r64 using truncation.

Op/En Tuple Type Operand 1 Operand 2 Operand 3 Operand 4

A Tuple1 Fixed ModRM:reg (w) ModRM:r/m (r) NA NA

VCVTTSS2USI—Convert with Truncation Scalar Single-Precision Floating-Point Value to Unsigned Integer

INSTRUCTION SET REFERENCE, V-Z

5-70 Vol. 2C

VCVTTSS2USI—Convert with Truncation Scalar Single-Precision Floating-Point Value to

Unsigned Integer

Instruction Operand Encoding

Description

Converts with truncation a single-precision floating-point value in the source operand (the second operand) to an

unsigned doubleword integer (or unsigned quadword integer if operand size is 64 bits) in the destination operand

(the first operand). The source operand can be an XMM register or a memory location. The destination operand is

a general-purpose register. When the source operand is an XMM register, the single-precision floating-point value

is contained in the low doubleword of the register.

When a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR

is raised, and if this exception is masked, the integer value 2w – 1 is returned, where w represents the number of

bits in the destination format.

EVEX.W1 version: promotes the instruction to produce 64-bit data in 64-bit mode.

Note: EVEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

EVEX.LIG.F3.0F.W0 78 /r

VCVTTSS2USI r32, xmm1/m32{sae}

A V/V AVX512F Convert one single-precision floating-point value from

xmm1/m32 to one unsigned doubleword integer in

r32 using truncation.

EVEX.LIG.F3.0F.W1 78 /r

VCVTTSS2USI r64, xmm1/m32{sae}

AV/N.E.

NOTES:

1. For this specific instruction, EVEX.W in non-64 bit is ignored; the instructions behaves as if the W0 version is

used.

AVX512F Convert one single-precision floating-point value from

xmm1/m32 to one unsigned quadword integer in r64

using truncation.

Op/En Tuple Type Operand 1 Operand 2 Operand 3 Operand 4

A Tuple1 Fixed ModRM:reg (w) ModRM:r/m (r) NA NA

VCVTTSS2USI—Convert with Truncation Scalar Single-Precision Floating-Point Value to Unsigned Integer

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-71

Operation

VCVTTSS2USI (EVEX encoded version)

IF 64-bit Mode and OperandSize = 64

THEN

DEST[63:0]  Convert_Single_Precision_Floating_Point_To_UInteger_Truncate(SRC[31:0]);

ELSE

DEST[31:0]  Convert_Single_Precision_Floating_Point_To_UInteger_Truncate(SRC[31:0]);

FI;

Intel C/C++ Compiler Intrinsic Equivalent

VCVTTSS2USI unsigned int _mm_cvttss_u32( __m128 a);

VCVTTSS2USI unsigned int _mm_cvtt_roundss_u32( __m128 a, int sae);

VCVTTSS2USI unsigned __int64 _mm_cvttss_u64( __m128 a);

VCVTTSS2USI unsigned __int64 _mm_cvtt_roundss_u64( __m128 a, int sae);

SIMD Floating-Point Exceptions

Invalid, Precision

Other Exceptions

EVEX-encoded instructions, see Exceptions Type E3NF.

VCVTUDQ2PD—Convert Packed Unsigned Doubleword Integers to Packed Double-Precision Floating-Point Values

INSTRUCTION SET REFERENCE, V-Z

5-72 Vol. 2C

VCVTUDQ2PD—Convert Packed Unsigned Doubleword Integers to Packed Double-Precision

Floating-Point Values

Instruction Operand Encoding

Description

Converts packed unsigned doubleword integers in the source operand (second operand) to packed double-preci-

sion floating-point values in the destination operand (first operand).

The source operand is a YMM/XMM/XMM (low 64 bits) register, a 256/128/64-bit memory location or a

256/128/64-bit vector broadcasted from a 32-bit memory location. The destination operand is a ZMM/YMM/XMM

Attempt to encode this instruction with EVEX embedded rounding is ignored.

Note: EVEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.

Operation

VCVTUDQ2PD (EVEX encoded versions) when src operand is a register

(KL, VL) = (2, 128), (4, 256), (8, 512)

FOR j  0 TO KL-1

i  j * 64

k  j * 32

IF k1[j] OR *no writemask*

THEN DEST[i+63:i] 

Convert_UInteger_To_Double_Precision_Floating_Point(SRC[k+31:k])

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+63:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+63:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL]  0

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

EVEX.128.F3.0F.W0 7A /r

VCVTUDQ2PD xmm1 {k1}{z},

xmm2/m64/m32bcst

AV/V AVX512VL

AVX512F

Convert two packed unsigned doubleword integers

from ymm2/m64/m32bcst to packed double-precision

floating-point values in zmm1 with writemask k1.

EVEX.256.F3.0F.W0 7A /r

VCVTUDQ2PD ymm1 {k1}{z},

xmm2/m128/m32bcst

AV/V AVX512VL

AVX512F

Convert four packed unsigned doubleword integers

from xmm2/m128/m32bcst to packed double-

precision floating-point values in zmm1 with

writemask k1.

EVEX.512.F3.0F.W0 7A /r

VCVTUDQ2PD zmm1 {k1}{z},

ymm2/m256/m32bcst

A V/V AVX512F Convert eight packed unsigned doubleword integers

from ymm2/m256/m32bcst to eight packed double-

precision floating-point values in zmm1 with

writemask k1.

Op/En Tuple Type Operand 1 Operand 2 Operand 3 Operand 4

A Half ModRM:reg (w) ModRM:r/m (r) NA NA

VCVTUDQ2PD—Convert Packed Unsigned Doubleword Integers to Packed Double-Precision Floating-Point Values

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-73

VCVTUDQ2PD (EVEX encoded versions) when src operand is a memory source

(KL, VL) = (2, 128), (4, 256), (8, 512)

FOR j  0 TO KL-1

i  j * 64

k  j * 32

IF k1[j] OR *no writemask*

THEN

IF (EVEX.b = 1)

THEN

DEST[i+63:i] 

Convert_UInteger_To_Double_Precision_Floating_Point(SRC[31:0])

ELSE

DEST[i+63:i] 

Convert_UInteger_To_Double_Precision_Floating_Point(SRC[k+31:k])

FI;

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+63:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+63:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL]  0

Intel C/C++ Compiler Intrinsic Equivalent

VCVTUDQ2PD __m512d _mm512_cvtepu32_pd( __m256i a);

VCVTUDQ2PD __m512d _mm512_mask_cvtepu32_pd( __m512d s, __mmask8 k, __m256i a);

VCVTUDQ2PD __m512d _mm512_maskz_cvtepu32_pd( __mmask8 k, __m256i a);

VCVTUDQ2PD __m256d _mm256_cvtepu32_pd( __m128i a);

VCVTUDQ2PD __m256d _mm256_mask_cvtepu32_pd( __m256d s, __mmask8 k, __m128i a);

VCVTUDQ2PD __m256d _mm256_maskz_cvtepu32_pd( __mmask8 k, __m128i a);

VCVTUDQ2PD __m128d _mm_cvtepu32_pd( __m128i a);

VCVTUDQ2PD __m128d _mm_mask_cvtepu32_pd( __m128d s, __mmask8 k, __m128i a);

VCVTUDQ2PD __m128d _mm_maskz_cvtepu32_pd( __mmask8 k, __m128i a);

SIMD Floating-Point Exceptions

None

Other Exceptions

EVEX-encoded instructions, see Exceptions Type E5.

#UD If EVEX.vvvv != 1111B.

VCVTUDQ2PS—Convert Packed Unsigned Doubleword Integers to Packed Single-Precision Floating-Point Values

INSTRUCTION SET REFERENCE, V-Z

5-74 Vol. 2C

VCVTUDQ2PS—Convert Packed Unsigned Doubleword Integers to Packed Single-Precision

Floating-Point Values

Instruction Operand Encoding

Description

Converts packed unsigned doubleword integers in the source operand (second operand) to single-precision

floating-point values in the destination operand (first operand).

The source operand is a ZMM/YMM/XMM register, a 512/256/128-bit memory location or a 512/256/128-bit vector

broadcasted from a 32-bit memory location. The destination operand is a ZMM/YMM/XMM register conditionally

updated with writemask k1.

Note: EVEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.

Operation

VCVTUDQ2PS (EVEX encoded version) when src operand is a register

(KL, VL) = (4, 128), (8, 256), (16, 512)

IF (VL = 512) AND (EVEX.b = 1)

THEN

SET_RM(EVEX.RC);

ELSE

SET_RM(MXCSR.RM);

FI;

FOR j  0 TO KL-1

i  j * 32

IF k1[j] OR *no writemask*

THEN DEST[i+31:i] 

Convert_UInteger_To_Single_Precision_Floating_Point(SRC[i+31:i])

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+31:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+31:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL]  0

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

EVEX.128.F2.0F.W0 7A /r

VCVTUDQ2PS xmm1 {k1}{z},

xmm2/m128/m32bcst

A V/V AVX512VL

AVX512F

Convert four packed unsigned doubleword integers from

xmm2/m128/m32bcst to packed single-precision

floating-point values in xmm1 with writemask k1.

EVEX.256.F2.0F.W0 7A /r

VCVTUDQ2PS ymm1 {k1}{z},

ymm2/m256/m32bcst

A V/V AVX512VL

AVX512F

Convert eight packed unsigned doubleword integers

from ymm2/m256/m32bcst to packed single-precision

floating-point values in zmm1 with writemask k1.

EVEX.512.F2.0F.W0 7A /r

VCVTUDQ2PS zmm1 {k1}{z},

zmm2/m512/m32bcst{er}

A V/V AVX512F Convert sixteen packed unsigned doubleword integers

from zmm2/m512/m32bcst to sixteen packed single-

precision floating-point values in zmm1 with writemask

k1.

Op/En Tuple Type Operand 1 Operand 2 Operand 3 Operand 4

A Full ModRM:reg (w) ModRM:r/m (r) NA NA

VCVTUDQ2PS—Convert Packed Unsigned Doubleword Integers to Packed Single-Precision Floating-Point Values

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-75

VCVTUDQ2PS (EVEX encoded version) when src operand is a memory source

(KL, VL) = (4, 128), (8, 256), (16, 512)

FOR j  0 TO KL-1

i  j * 32

IF k1[j] OR *no writemask*

THEN

IF (EVEX.b = 1)

THEN

DEST[i+31:i] 

Convert_UInteger_To_Single_Precision_Floating_Point(SRC[31:0])

ELSE

DEST[i+31:i] 

Convert_UInteger_To_Single_Precision_Floating_Point(SRC[i+31:i])

FI;

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+31:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+31:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL]  0

Intel C/C++ Compiler Intrinsic Equivalent

VCVTUDQ2PS __m512 _mm512_cvtepu32_ps( __m512i a);

VCVTUDQ2PS __m512 _mm512_mask_cvtepu32_ps( __m512 s, __mmask16 k, __m512i a);

VCVTUDQ2PS __m512 _mm512_maskz_cvtepu32_ps( __mmask16 k, __m512i a);

VCVTUDQ2PS __m512 _mm512_cvt_roundepu32_ps( __m512i a, int r);

VCVTUDQ2PS __m512 _mm512_mask_cvt_roundepu32_ps( __m512 s, __mmask16 k, __m512i a, int r);

VCVTUDQ2PS __m512 _mm512_maskz_cvt_roundepu32_ps( __mmask16 k, __m512i a, int r);

VCVTUDQ2PS __m256 _mm256_cvtepu32_ps( __m256i a);

VCVTUDQ2PS __m256 _mm256_mask_cvtepu32_ps( __m256 s, __mmask8 k, __m256i a);

VCVTUDQ2PS __m256 _mm256_maskz_cvtepu32_ps( __mmask8 k, __m256i a);

VCVTUDQ2PS __m128 _mm_cvtepu32_ps( __m128i a);

VCVTUDQ2PS __m128 _mm_mask_cvtepu32_ps( __m128 s, __mmask8 k, __m128i a);

VCVTUDQ2PS __m128 _mm_maskz_cvtepu32_ps( __mmask8 k, __m128i a);

SIMD Floating-Point Exceptions

Precision

Other Exceptions

EVEX-encoded instructions, see Exceptions Type E2.

#UD If EVEX.vvvv != 1111B.

VCVTUQQ2PD—Convert Packed Unsigned Quadword Integers to Packed Double-Precision Floating-Point Values

INSTRUCTION SET REFERENCE, V-Z

5-76 Vol. 2C

VCVTUQQ2PD—Convert Packed Unsigned Quadword Integers to Packed Double-Precision

Floating-Point Values

Instruction Operand Encoding

Description

Converts packed unsigned quadword integers in the source operand (second operand) to packed double-precision

floating-point values in the destination operand (first operand).

The source operand is a ZMM/YMM/XMM register, a 512/256/128-bit memory location or a 512/256/128-bit vector

broadcasted from a 64-bit memory location. The destination operand is a ZMM/YMM/XMM register conditionally

updated with writemask k1.

Note: EVEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.

Operation

VCVTUQQ2PD (EVEX encoded version) when src operand is a register

(KL, VL) = (2, 128), (4, 256), (8, 512)

IF (VL == 512) AND (EVEX.b == 1)

THEN

SET_RM(EVEX.RC);

ELSE

SET_RM(MXCSR.RM);

FI;

FOR j  0 TO KL-1

i  j * 64

IF k1[j] OR *no writemask*

THEN DEST[i+63:i] 

Convert_UQuadInteger_To_Double_Precision_Floating_Point(SRC[i+63:i])

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+63:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+63:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL]  0

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

EVEX.128.F3.0F.W1 7A /r

VCVTUQQ2PD xmm1 {k1}{z},

xmm2/m128/m64bcst

A V/V AVX512VL

AVX512DQ

Convert two packed unsigned quadword integers from

xmm2/m128/m64bcst to two packed double-precision

floating-point values in xmm1 with writemask k1.

EVEX.256.F3.0F.W1 7A /r

VCVTUQQ2PD ymm1 {k1}{z},

ymm2/m256/m64bcst

A V/V AVX512VL

AVX512DQ

Convert four packed unsigned quadword integers from

ymm2/m256/m64bcst to packed double-precision floating-

point values in ymm1 with writemask k1.

EVEX.512.F3.0F.W1 7A /r

VCVTUQQ2PD zmm1 {k1}{z},

zmm2/m512/m64bcst{er}

A V/V AVX512DQ Convert eight packed unsigned quadword integers from

zmm2/m512/m64bcst to eight packed double-precision

floating-point values in zmm1 with writemask k1.

Op/En Tuple Type Operand 1 Operand 2 Operand 3 Operand 4

A Full ModRM:reg (w) ModRM:r/m (r) NA NA

VCVTUQQ2PD—Convert Packed Unsigned Quadword Integers to Packed Double-Precision Floating-Point Values

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-77

VCVTUQQ2PD (EVEX encoded version) when src operand is a memory source

(KL, VL) = (2, 128), (4, 256), (8, 512)

FOR j  0 TO KL-1

i  j * 64

IF k1[j] OR *no writemask*

THEN

IF (EVEX.b == 1)

THEN

DEST[i+63:i] 

Convert_UQuadInteger_To_Double_Precision_Floating_Point(SRC[63:0])

ELSE

DEST[i+63:i] 

Convert_UQuadInteger_To_Double_Precision_Floating_Point(SRC[i+63:i])

FI;

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+63:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+63:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL]  0

Intel C/C++ Compiler Intrinsic Equivalent

VCVTUQQ2PD __m512d _mm512_cvtepu64_ps( __m512i a);

VCVTUQQ2PD __m512d _mm512_mask_cvtepu64_ps( __m512d s, __mmask8 k, __m512i a);

VCVTUQQ2PD __m512d _mm512_maskz_cvtepu64_ps( __mmask8 k, __m512i a);

VCVTUQQ2PD __m512d _mm512_cvt_roundepu64_ps( __m512i a, int r);

VCVTUQQ2PD __m512d _mm512_mask_cvt_roundepu64_ps( __m512d s, __mmask8 k, __m512i a, int r);

VCVTUQQ2PD __m512d _mm512_maskz_cvt_roundepu64_ps( __mmask8 k, __m512i a, int r);

VCVTUQQ2PD __m256d _mm256_cvtepu64_ps( __m256i a);

VCVTUQQ2PD __m256d _mm256_mask_cvtepu64_ps( __m256d s, __mmask8 k, __m256i a);

VCVTUQQ2PD __m256d _mm256_maskz_cvtepu64_ps( __mmask8 k, __m256i a);

VCVTUQQ2PD __m128d _mm_cvtepu64_ps( __m128i a);

VCVTUQQ2PD __m128d _mm_mask_cvtepu64_ps( __m128d s, __mmask8 k, __m128i a);

VCVTUQQ2PD __m128d _mm_maskz_cvtepu64_ps( __mmask8 k, __m128i a);

SIMD Floating-Point Exceptions

Precision

Other Exceptions

EVEX-encoded instructions, see Exceptions Type E2.

#UD If EVEX.vvvv != 1111B.

VCVTUQQ2PS—Convert Packed Unsigned Quadword Integers to Packed Single-Precision Floating-Point Values

INSTRUCTION SET REFERENCE, V-Z

5-78 Vol. 2C

VCVTUQQ2PS—Convert Packed Unsigned Quadword Integers to Packed Single-Precision

Floating-Point Values

Instruction Operand Encoding

Description

Converts packed unsigned quadword integers in the source operand (second operand) to single-precision floating-

point values in the destination operand (first operand).

EVEX encoded versions: The source operand is a ZMM/YMM/XMM register or a 512/256/128-bit memory location.

The destination operand is a YMM/XMM/XMM (low 64 bits) register conditionally updated with writemask k1.

Note: EVEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.

Operation

VCVTUQQ2PS (EVEX encoded version) when src operand is a register

(KL, VL) = (2, 128), (4, 256), (8, 512)

IF (VL = 512) AND (EVEX.b = 1)

THEN

SET_RM(EVEX.RC);

ELSE

SET_RM(MXCSR.RM);

FI;

FOR j  0 TO KL-1

i  j * 32

k  j * 64

IF k1[j] OR *no writemask*

THEN DEST[i+31:i] 

Convert_UQuadInteger_To_Single_Precision_Floating_Point(SRC[k+63:k])

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+31:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+31:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL/2]  0

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

EVEX.128.F2.0F.W1 7A /r

VCVTUQQ2PS xmm1 {k1}{z},

xmm2/m128/m64bcst

AV/V AVX512VL

AVX512DQ

Convert two packed unsigned quadword integers from

xmm2/m128/m64bcst to packed single-precision floating-

point values in zmm1 with writemask k1.

EVEX.256.F2.0F.W1 7A /r

VCVTUQQ2PS xmm1 {k1}{z},

ymm2/m256/m64bcst

AV/V AVX512VL

AVX512DQ

Convert four packed unsigned quadword integers from

ymm2/m256/m64bcst to packed single-precision floating-

point values in xmm1 with writemask k1.

EVEX.512.F2.0F.W1 7A /r

VCVTUQQ2PS ymm1 {k1}{z},

zmm2/m512/m64bcst{er}

A V/V AVX512DQ Convert eight packed unsigned quadword integers from

zmm2/m512/m64bcst to eight packed single-precision

floating-point values in zmm1 with writemask k1.

Op/En Tuple Type Operand 1 Operand 2 Operand 3 Operand 4

A Full ModRM:reg (w) ModRM:r/m (r) NA NA

VCVTUQQ2PS—Convert Packed Unsigned Quadword Integers to Packed Single-Precision Floating-Point Values

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-79

VCVTUQQ2PS (EVEX encoded version) when src operand is a memory source

(KL, VL) = (2, 128), (4, 256), (8, 512)

FOR j  0 TO KL-1

i  j * 32

k  j * 64

IF k1[j] OR *no writemask*

THEN

IF (EVEX.b = 1)

THEN

DEST[i+31:i] 

Convert_UQuadInteger_To_Single_Precision_Floating_Point(SRC[63:0])

ELSE

DEST[i+31:i] 

Convert_UQuadInteger_To_Single_Precision_Floating_Point(SRC[k+63:k])

FI;

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+31:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+31:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL/2]  0

Intel C/C++ Compiler Intrinsic Equivalent

VCVTUQQ2PS __m256 _mm512_cvtepu64_ps( __m512i a);

VCVTUQQ2PS __m256 _mm512_mask_cvtepu64_ps( __m256 s, __mmask8 k, __m512i a);

VCVTUQQ2PS __m256 _mm512_maskz_cvtepu64_ps( __mmask8 k, __m512i a);

VCVTUQQ2PS __m256 _mm512_cvt_roundepu64_ps( __m512i a, int r);

VCVTUQQ2PS __m256 _mm512_mask_cvt_roundepu64_ps( __m256 s, __mmask8 k, __m512i a, int r);

VCVTUQQ2PS __m256 _mm512_maskz_cvt_roundepu64_ps( __mmask8 k, __m512i a, int r);

VCVTUQQ2PS __m128 _mm256_cvtepu64_ps( __m256i a);

VCVTUQQ2PS __m128 _mm256_mask_cvtepu64_ps( __m128 s, __mmask8 k, __m256i a);

VCVTUQQ2PS __m128 _mm256_maskz_cvtepu64_ps( __mmask8 k, __m256i a);

VCVTUQQ2PS __m128 _mm_cvtepu64_ps( __m128i a);

VCVTUQQ2PS __m128 _mm_mask_cvtepu64_ps( __m128 s, __mmask8 k, __m128i a);

VCVTUQQ2PS __m128 _mm_maskz_cvtepu64_ps( __mmask8 k, __m128i a);

SIMD Floating-Point Exceptions

Precision

Other Exceptions

EVEX-encoded instructions, see Exceptions Type E2.

#UD If EVEX.vvvv != 1111B.

VCVTUSI2SD—Convert Unsigned Integer to Scalar Double-Precision Floating-Point Value

INSTRUCTION SET REFERENCE, V-Z

5-80 Vol. 2C

VCVTUSI2SD—Convert Unsigned Integer to Scalar Double-Precision Floating-Point Value

Instruction Operand Encoding

Description

Converts an unsigned doubleword integer (or unsigned quadword integer if operand size is 64 bits) in the second

source operand to a double-precision floating-point value in the destination operand. The result is stored in the low

quadword of the destination operand. When conversion is inexact, the value returned is rounded according to the

rounding control bits in the MXCSR register.

The second source operand can be a general-purpose register or a 32/64-bit memory location. The first source and

destination operands are XMM registers. Bits (127:64) of the XMM register destination are copied from corre-

sponding bits in the first source operand. Bits (MAXVL-1:128) of the destination register are zeroed.

EVEX.W1 version: promotes the instruction to use 64-bit input value in 64-bit mode.

EVEX.W0 version: attempt to encode this instruction with EVEX embedded rounding is ignored.

Operation

VCVTUSI2SD (EVEX encoded version)

IF (SRC2 *is register*) AND (EVEX.b = 1)

THEN

SET_RM(EVEX.RC);

ELSE

SET_RM(MXCSR.RM);

FI;

IF 64-Bit Mode And OperandSize = 64

THEN

DEST[63:0]  Convert_UInteger_To_Double_Precision_Floating_Point(SRC2[63:0]);

ELSE

DEST[63:0]  Convert_UInteger_To_Double_Precision_Floating_Point(SRC2[31:0]);

FI;

DEST[127:64]  SRC1[127:64]

DEST[MAXVL-1:128]  0

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

EVEX.LIG.F2.0F.W0 7B /r

VCVTUSI2SD xmm1, xmm2, r/m32

A V/V AVX512F Convert one unsigned doubleword integer from

r/m32 to one double-precision floating-point value in

xmm1.

EVEX.LIG.F2.0F.W1 7B /r

VCVTUSI2SD xmm1, xmm2, r/m64{er}

AV/N.E.

NOTES:

1. For this specific instruction, EVEX.W in non-64 bit is ignored; the instructions behaves as if the W0 version is

used.

AVX512F Convert one unsigned quadword integer from r/m64

to one double-precision floating-point value in xmm1.

Op/En Tuple Type Operand 1 Operand 2 Operand 3 Operand 4

A Tuple1 Scalar ModRM:reg (w) EVEX.vvvv ModRM:r/m (r) NA

VCVTUSI2SD—Convert Unsigned Integer to Scalar Double-Precision Floating-Point Value

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-81

Intel C/C++ Compiler Intrinsic Equivalent

VCVTUSI2SD __m128d _mm_cvtu32_sd( __m128d s, unsigned a);

VCVTUSI2SD __m128d _mm_cvtu64_sd( __m128d s, unsigned __int64 a);

VCVTUSI2SD __m128d _mm_cvt_roundu64_sd( __m128d s, unsigned __int64 a, int r);

SIMD Floating-Point Exceptions

Precision

Other Exceptions

See Exceptions Type E3NF if W1, else type E10NF.

VCVTUSI2SS—Convert Unsigned Integer to Scalar Single-Precision Floating-Point Value

INSTRUCTION SET REFERENCE, V-Z

5-82 Vol. 2C

VCVTUSI2SS—Convert Unsigned Integer to Scalar Single-Precision Floating-Point Value

Instruction Operand Encoding

Description

Converts a unsigned doubleword integer (or unsigned quadword integer if operand size is 64 bits) in the source

operand (second operand) to a single-precision floating-point value in the destination operand (first operand). The

source operand can be a general-purpose register or a memory location. The destination operand is an XMM

value returned is rounded according to the rounding control bits in the MXCSR register or the embedded rounding

control bits.

The second source operand can be a general-purpose register or a 32/64-bit memory location. The first source and

destination operands are XMM registers. Bits (127:32) of the XMM register destination are copied from corre-

sponding bits in the first source operand. Bits (MAXVL-1:128) of the destination register are zeroed.

EVEX.W1 version: promotes the instruction to use 64-bit input value in 64-bit mode.

Operation

VCVTUSI2SS (EVEX encoded version)

IF (SRC2 *is register*) AND (EVEX.b = 1)

THEN

SET_RM(EVEX.RC);

ELSE

SET_RM(MXCSR.RM);

FI;

IF 64-Bit Mode And OperandSize = 64

THEN

DEST[31:0]  Convert_UInteger_To_Single_Precision_Floating_Point(SRC[63:0]);

ELSE

DEST[31:0]  Convert_UInteger_To_Single_Precision_Floating_Point(SRC[31:0]);

FI;

DEST[127:32]  SRC1[127:32]

DEST[MAXVL-1:128]  0

Intel C/C++ Compiler Intrinsic Equivalent

VCVTUSI2SS __m128 _mm_cvtu32_ss( __m128 s, unsigned a);

VCVTUSI2SS __m128 _mm_cvt_roundu32_ss( __m128 s, unsigned a, int r);

VCVTUSI2SS __m128 _mm_cvtu64_ss( __m128 s, unsigned __int64 a);

VCVTUSI2SS __m128 _mm_cvt_roundu64_ss( __m128 s, unsigned __int64 a, int r);

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

EVEX.LIG.F3.0F.W0 7B /r

VCVTUSI2SS xmm1, xmm2, r/m32{er}

A V/V AVX512F Convert one signed doubleword integer from r/m32 to

one single-precision floating-point value in xmm1.

EVEX.LIG.F3.0F.W1 7B /r

VCVTUSI2SS xmm1, xmm2, r/m64{er}

AV/N.E.

NOTES:

1. For this specific instruction, EVEX.W in non-64 bit is ignored; the instructions behaves as if the W0 version is

used.

AVX512F Convert one signed quadword integer from r/m64 to

one single-precision floating-point value in xmm1.

Op/En Tuple Type Operand 1 Operand 2 Operand 3 Operand 4

A Tuple1 Scalar ModRM:reg (w) VEX.vvvv ModRM:r/m (r) NA

VCVTUSI2SS—Convert Unsigned Integer to Scalar Single-Precision Floating-Point Value

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-83

SIMD Floating-Point Exceptions

Precision

Other Exceptions

See Exceptions Type E3NF.

VDBPSADBW—Double Block Packed Sum-Absolute-Differences (SAD) on Unsigned Bytes

INSTRUCTION SET REFERENCE, V-Z

5-84 Vol. 2C

VDBPSADBW—Double Block Packed Sum-Absolute-Differences (SAD) on Unsigned Bytes

Instruction Operand Encoding

Description

Compute packed SAD (sum of absolute differences) word results of unsigned bytes from two 32-bit dword

elements. Packed SAD word results are calculated in multiples of qword superblocks, producing 4 SAD word results

in each 64-bit superblock of the destination register.

Within each super block of packed word results, the SAD results from two 32-bit dword elements are calculated as

follows:

•The lower two word results are calculated each from the SAD operation between a sliding dword element within

a qword superblock from an intermediate vector with a stationary dword element in the corresponding qword

superblock of the first source operand. The intermediate vector, see “Tmp1” in Figure 5-8, is constructed from

the second source operand the imm8 byte as shuffle control to select dword elements within a 128-bit lane of

the second source operand. The two sliding dword elements in a qword superblock of Tmp1 are located at byte

offset 0 and 1 within the superblock, respectively. The stationary dword element in the qword superblock from

the first source operand is located at byte offset 0.

•The next two word results are calculated each from the SAD operation between a sliding dword element within

a qword superblock from the intermediate vector Tmp1 with a second stationary dword element in the corre-

sponding qword superblock of the first source operand. The two sliding dword elements in a qword superblock

of Tmp1 are located at byte offset 2and 3 within the superblock, respectively. The stationary dword element in

the qword superblock from the first source operand is located at byte offset 4.

•The intermediate vector is constructed in 128-bits lanes. Within each 128-bit lane, each dword element of the

intermediate vector is selected by a two-bit field within the imm8 byte on the corresponding 128-bits of the

second source operand. The imm8 byte serves as dword shuffle control within each 128-bit lanes of the inter-

mediate vector and the second source operand, similarly to PSHUFD.

The first source operand is a ZMM/YMM/XMM register. The second source operand is a ZMM/YMM/XMM register, or

a 512/256/128-bit memory location. The destination operand is conditionally updated based on writemask k1 at

16-bit word granularity.

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

EVEX.128.66.0F3A.W0 42 /r ib

VDBPSADBW xmm1 {k1}{z}, xmm2,

xmm3/m128, imm8

AV/V AVX512VL

AVX512BW

Compute packed SAD word results of unsigned bytes in

dword block from xmm2 with unsigned bytes of dword

blocks transformed from xmm3/m128 using the shuffle

controls in imm8. Results are written to xmm1 under the

writemask k1.

EVEX.256.66.0F3A.W0 42 /r ib

VDBPSADBW ymm1 {k1}{z}, ymm2,

ymm3/m256, imm8

AV/V AVX512VL

AVX512BW

Compute packed SAD word results of unsigned bytes in

dword block from ymm2 with unsigned bytes of dword

blocks transformed from ymm3/m256 using the shuffle

controls in imm8. Results are written to ymm1 under the

writemask k1.

EVEX.512.66.0F3A.W0 42 /r ib

VDBPSADBW zmm1 {k1}{z}, zmm2,

zmm3/m512, imm8

A V/V AVX512BW Compute packed SAD word results of unsigned bytes in

dword block from zmm2 with unsigned bytes of dword

blocks transformed from zmm3/m512 using the shuffle

controls in imm8. Results are written to zmm1 under the

writemask k1.

Op/En Tuple Type Operand 1 Operand 2 Operand 3 Operand 4

A Full Mem ModRM:reg (w) EVEX.vvvv ModRM:r/m (r) Imm8

VDBPSADBW—Double Block Packed Sum-Absolute-Differences (SAD) on Unsigned Bytes

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-85

Figure 5-8. 64-bit Super Block of SAD Operation in VDBPSADBW

63 0153147

Src1 stationary dword 0

Tmp1 sliding dword

abs

01531

Src1 stationary dword 0

Tmp1 sliding dword

abs

Src1 stationary dword 1

Tmp1 sliding dword

abs

Src1 stationary dword 1

Tmp1 sliding dword

abs

Destination qword superblock

723

01531 723

82339 1531

01531 723

324763 3955

243955 3147

324763 3955

163147 2339

127+128*n 128*n31+128*n63+128*n95+128*n

128-bit Lane of Src2

03715

00B: DW0

01B: DW1

10B: DW2

11B: DW3

DW3 DW2 DW1 DW0

127+128*n 128*n31+128*n63+128*n95+128*n

128-bit Lane of Tmp1

imm8 shuffle control

Tmp1 qword superblock

VDBPSADBW—Double Block Packed Sum-Absolute-Differences (SAD) on Unsigned Bytes

INSTRUCTION SET REFERENCE, V-Z

5-86 Vol. 2C

Operation

VDBPSADBW (EVEX encoded versions)

(KL, VL) = (8, 128), (16, 256), (32, 512)

Selection of quadruplets:

FOR I = 0 to VL step 128

TMP1[I+31:I]  select (SRC2[I+127: I], imm8[1:0])

TMP1[I+63: I+32]  select (SRC2[I+127: I], imm8[3:2])

TMP1[I+95: I+64]  select (SRC2[I+127: I], imm8[5:4])

TMP1[I+127: I+96] select (SRC2[I+127: I], imm8[7:6])

END FOR

SAD of quadruplets:

FOR I =0 to VL step 64

TMP_DEST[I+15:I]  ABS(SRC1[I+7: I] - TMP1[I+7: I]) +

ABS(SRC1[I+15: I+8]- TMP1[I+15: I+8]) +

ABS(SRC1[I+23: I+16]- TMP1[I+23: I+16]) +

ABS(SRC1[I+31: I+24]- TMP1[I+31: I+24])

TMP_DEST[I+31: I+16] ABS(SRC1[I+7: I] - TMP1[I+15: I+8]) +

ABS(SRC1[I+15: I+8]- TMP1[I+23: I+16]) +

ABS(SRC1[I+23: I+16]- TMP1[I+31: I+24]) +

ABS(SRC1[I+31: I+24]- TMP1[I+39: I+32])

TMP_DEST[I+47: I+32] ABS(SRC1[I+39: I+32] - TMP1[I+23: I+16]) +

ABS(SRC1[I+47: I+40]- TMP1[I+31: I+24]) +

ABS(SRC1[I+55: I+48]- TMP1[I+39: I+32]) +

ABS(SRC1[I+63: I+56]- TMP1[I+47: I+40])

TMP_DEST[I+63: I+48] ABS(SRC1[I+39: I+32] - TMP1[I+31: I+24]) +

ABS(SRC1[I+47: I+40] - TMP1[I+39: I+32]) +

ABS(SRC1[I+55: I+48] - TMP1[I+47: I+40]) +

ABS(SRC1[I+63: I+56] - TMP1[I+55: I+48])

ENDFOR

FOR j  0 TO KL-1

i  j * 16

IF k1[j] OR *no writemask*

THEN DEST[i+15:i]  TMP_DEST[i+15:i]

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+15:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+15:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL]  0

VDBPSADBW—Double Block Packed Sum-Absolute-Differences (SAD) on Unsigned Bytes

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-87

Intel C/C++ Compiler Intrinsic Equivalent

VDBPSADBW __m512i _mm512_dbsad_epu8(__m512i a, __m512i b);

VDBPSADBW __m512i _mm512_mask_dbsad_epu8(__m512i s, __mmask32 m, __m512i a, __m512i b);

VDBPSADBW __m512i _mm512_maskz_dbsad_epu8(__mmask32 m, __m512i a, __m512i b);

VDBPSADBW __m256i _mm256_dbsad_epu8(__m256i a, __m256i b);

VDBPSADBW __m256i _mm256_mask_dbsad_epu8(__m256i s, __mmask16 m, __m256i a, __m256i b);

VDBPSADBW __m256i _mm256_maskz_dbsad_epu8(__mmask16 m, __m256i a, __m256i b);

VDBPSADBW __m128i _mm_dbsad_epu8(__m128i a, __m128i b);

VDBPSADBW __m128i _mm_mask_dbsad_epu8(__m128i s, __mmask8 m, __m128i a, __m128i b);

VDBPSADBW __m128i _mm_maskz_dbsad_epu8(__mmask8 m, __m128i a, __m128i b);

SIMD Floating-Point Exceptions

None

Other Exceptions

See Exceptions Type E4NF.nb.

VEXPANDPD—Load Sparse Packed Double-Precision Floating-Point Values from Dense Memory

INSTRUCTION SET REFERENCE, V-Z

5-88 Vol. 2C

VEXPANDPD—Load Sparse Packed Double-Precision Floating-Point Values from Dense Memory

Instruction Operand Encoding

Description

Expand (load) up to 8/4/2, contiguous, double-precision floating-point values of the input vector in the source

operand (the second operand) to sparse elements in the destination operand (the first operand) selected by the

writemask k1.

The destination operand is a ZMM/YMM/XMM register, the source operand can be a ZMM/YMM/XMM register or a

512/256/128-bit memory location.

The input vector starts from the lowest element in the source operand. The writemask register k1 selects the desti-

nation elements (a partial vector or sparse elements if less than 8 elements) to be replaced by the ascending

elements in the input vector. Destination elements not selected by the writemask k1 are either unmodified or

zeroed, depending on EVEX.z.

EVEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.

Note that the compressed displacement assumes a pre-scaling (N) corresponding to the size of one single element

instead of the size of the full vector.

Operation

VEXPANDPD (EVEX encoded versions)

(KL, VL) = (2, 128), (4, 256), (8, 512)

k  0

FOR j  0 TO KL-1

i  j * 64

IF k1[j] OR *no writemask*

THEN

DEST[i+63:i]  SRC[k+63:k];

k  k + 64

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+63:i] remains unchanged*

ELSE ; zeroing-masking

THEN DEST[i+63:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL] 0

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

EVEX.128.66.0F38.W1 88 /r

VEXPANDPD xmm1 {k1}{z},

xmm2/m128

AV/V AVX512VL

AVX512F

Expand packed double-precision floating-point values

from xmm2/m128 to xmm1 using writemask k1.

EVEX.256.66.0F38.W1 88 /r

VEXPANDPD ymm1 {k1}{z}, ymm2/m256

AV/V AVX512VL

AVX512F

Expand packed double-precision floating-point values

from ymm2/m256 to ymm1 using writemask k1.

EVEX.512.66.0F38.W1 88 /r

VEXPANDPD zmm1 {k1}{z}, zmm2/m512

A V/V AVX512F Expand packed double-precision floating-point values

from zmm2/m512 to zmm1 using writemask k1.

Op/En Tuple Type Operand 1 Operand 2 Operand 3 Operand 4

A Tuple1 Scalar ModRM:reg (w) ModRM:r/m (r) NA NA

VEXPANDPD—Load Sparse Packed Double-Precision Floating-Point Values from Dense Memory

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-89

Intel C/C++ Compiler Intrinsic Equivalent

VEXPANDPD __m512d _mm512_mask_expand_pd( __m512d s, __mmask8 k, __m512d a);

VEXPANDPD __m512d _mm512_maskz_expand_pd( __mmask8 k, __m512d a);

VEXPANDPD __m512d _mm512_mask_expandloadu_pd( __m512d s, __mmask8 k, void * a);

VEXPANDPD __m512d _mm512_maskz_expandloadu_pd( __mmask8 k, void * a);

VEXPANDPD __m256d _mm256_mask_expand_pd( __m256d s, __mmask8 k, __m256d a);

VEXPANDPD __m256d _mm256_maskz_expand_pd( __mmask8 k, __m256d a);

VEXPANDPD __m256d _mm256_mask_expandloadu_pd( __m256d s, __mmask8 k, void * a);

VEXPANDPD __m256d _mm256_maskz_expandloadu_pd( __mmask8 k, void * a);

VEXPANDPD __m128d _mm_mask_expand_pd( __m128d s, __mmask8 k, __m128d a);

VEXPANDPD __m128d _mm_maskz_expand_pd( __mmask8 k, __m128d a);

VEXPANDPD __m128d _mm_mask_expandloadu_pd( __m128d s, __mmask8 k, void * a);

VEXPANDPD __m128d _mm_maskz_expandloadu_pd( __mmask8 k, void * a);

SIMD Floating-Point Exceptions

None

Other Exceptions

See Exceptions Type E4.nb.

#UD If EVEX.vvvv != 1111B.

VEXPANDPS—Load Sparse Packed Single-Precision Floating-Point Values from Dense Memory

INSTRUCTION SET REFERENCE, V-Z

5-90 Vol. 2C

VEXPANDPS—Load Sparse Packed Single-Precision Floating-Point Values from Dense Memory

Instruction Operand Encoding

Description

Expand (load) up to 16/8/4, contiguous, single-precision floating-point values of the input vector in the source

operand (the second operand) to sparse elements of the destination operand (the first operand) selected by the

writemask k1.

The destination operand is a ZMM/YMM/XMM register, the source operand can be a ZMM/YMM/XMM register or a

512/256/128-bit memory location.

The input vector starts from the lowest element in the source operand. The writemask k1 selects the destination

elements (a partial vector or sparse elements if less than 16 elements) to be replaced by the ascending elements

in the input vector. Destination elements not selected by the writemask k1 are either unmodified or zeroed,

depending on EVEX.z.

EVEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.

Note that the compressed displacement assumes a pre-scaling (N) corresponding to the size of one single element

instead of the size of the full vector.

Operation

VEXPANDPS (EVEX encoded versions)

(KL, VL) = (4, 128), (8, 256), (16, 512)

k  0

FOR j  0 TO KL-1

i  j * 32

IF k1[j] OR *no writemask*

THEN

DEST[i+31:i]  SRC[k+31:k];

k  k + 32

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+31:i] remains unchanged*

ELSE ; zeroing-masking

DEST[i+31:i]  0

FI;

ENDFOR

DEST[MAXVL-1:VL] 0

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

EVEX.128.66.0F38.W0 88 /r

VEXPANDPS xmm1 {k1}{z}, xmm2/m128

A V/V AVX512VL

AVX512F

Expand packed single-precision floating-point values

from xmm2/m128 to xmm1 using writemask k1.

EVEX.256.66.0F38.W0 88 /r

VEXPANDPS ymm1 {k1}{z}, ymm2/m256

A V/V AVX512VL

AVX512F

Expand packed single-precision floating-point values

from ymm2/m256 to ymm1 using writemask k1.

EVEX.512.66.0F38.W0 88 /r

VEXPANDPS zmm1 {k1}{z}, zmm2/m512

A V/V AVX512F Expand packed single-precision floating-point values

from zmm2/m512 to zmm1 using writemask k1.

Op/En Tuple Type Operand 1 Operand 2 Operand 3 Operand 4

A Tuple1 Scalar ModRM:reg (w) ModRM:r/m (r) NA NA

VEXPANDPS—Load Sparse Packed Single-Precision Floating-Point Values from Dense Memory

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-91

Intel C/C++ Compiler Intrinsic Equivalent

VEXPANDPS __m512 _mm512_mask_expand_ps( __m512 s, __mmask16 k, __m512 a);

VEXPANDPS __m512 _mm512_maskz_expand_ps( __mmask16 k, __m512 a);

VEXPANDPS __m512 _mm512_mask_expandloadu_ps( __m512 s, __mmask16 k, void * a);

VEXPANDPS __m512 _mm512_maskz_expandloadu_ps( __mmask16 k, void * a);

VEXPANDPD __m256 _mm256_mask_expand_ps( __m256 s, __mmask8 k, __m256 a);

VEXPANDPD __m256 _mm256_maskz_expand_ps( __mmask8 k, __m256 a);

VEXPANDPD __m256 _mm256_mask_expandloadu_ps( __m256 s, __mmask8 k, void * a);

VEXPANDPD __m256 _mm256_maskz_expandloadu_ps( __mmask8 k, void * a);

VEXPANDPD __m128 _mm_mask_expand_ps( __m128 s, __mmask8 k, __m128 a);

VEXPANDPD __m128 _mm_maskz_expand_ps( __mmask8 k, __m128 a);

VEXPANDPD __m128 _mm_mask_expandloadu_ps( __m128 s, __mmask8 k, void * a);

VEXPANDPD __m128 _mm_maskz_expandloadu_ps( __mmask8 k, void * a);

SIMD Floating-Point Exceptions

None

Other Exceptions

See Exceptions Type E4.nb.

#UD If EVEX.vvvv != 1111B.

VERR/VERW—Verify a Segment for Reading or Writing

INSTRUCTION SET REFERENCE, V-Z

5-92 Vol. 2C

VERR/VERW—Verify a Segment for Reading or Writing

Instruction Operand Encoding

Description

Verifies whether the code or data segment specified with the source operand is readable (VERR) or writable (VERW)

from the current privilege level (CPL). The source operand is a 16-bit register or a memory location that contains

the segment selector for the segment to be verified. If the segment is accessible and readable (VERR) or writable

(VERW), the ZF flag is set; otherwise, the ZF flag is cleared. Code segments are never verified as writable. This

check cannot be performed on system segments.

To set the ZF flag, the following conditions must be met:

•The segment selector is not NULL.

•The selector must denote a descriptor within the bounds of the descriptor table (GDT or LDT).

•The selector must denote the descriptor of a code or data segment (not that of a system segment or gate).

•For the VERR instruction, the segment must be readable.

•For the VERW instruction, the segment must be a writable data segment.

•If the segment is not a conforming code segment, the segment’s DPL must be greater than or equal to (have

less or the same privilege as) both the CPL and the segment selector's RPL.

The validation performed is the same as is performed when a segment selector is loaded into the DS, ES, FS, or GS

protection exception, enabling the software to anticipate possible segment access problems.

This instruction’s operation is the same in non-64-bit modes and 64-bit mode. The operand size is fixed at 16 bits.

Operation

IF SRC(Offset) > (GDTR(Limit) or (LDTR(Limit))

THEN ZF ← 0; FI;

Read segment descriptor;

IF SegmentDescriptor(DescriptorType) = 0 (* System segment *)

or (SegmentDescriptor(Type) ≠ conforming code segment)

and (CPL > DPL) or (RPL > DPL)

THEN

ZF ← 0;

ELSE

IF ((Instruction = VERR) and (Segment readable))

or ((Instruction = VERW) and (Segment writable))

THEN

ZF ← 1;

FI;

Opcode Instruction Op/

64-Bit

Mode

Compat/

Leg Mode

Description

0F 00 /4 VERR r/m16 M Valid Valid Set ZF=1 if segment specified with r/m16 can

be read.

0F 00 /5 VERW r/m16 M Valid Valid Set ZF=1 if segment specified with r/m16 can

be written.

Op/En Operand 1 Operand 2 Operand 3 Operand 4

MModRM:r/m (r) NA NA NA

VERR/VERW—Verify a Segment for Reading or Writing

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-93

Flags Affected

The ZF flag is set to 1 if the segment is accessible and readable (VERR) or writable (VERW); otherwise, it is set to 0.

Protected Mode Exceptions

The only exceptions generated for these instructions are those related to illegal addressing of the source operand.

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.

If the DS, ES, FS, or GS register is used to access memory and it contains a NULL segment

selector.

#SS(0) If a memory operand effective address is outside the SS segment limit.

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the

current privilege level is 3.

#UD If the LOCK prefix is used.

Real-Address Mode Exceptions

#UD The VERR and VERW instructions are not recognized in real-address mode.

If the LOCK prefix is used.

Virtual-8086 Mode Exceptions

#UD The VERR and VERW instructions are not recognized in virtual-8086 mode.

If the LOCK prefix is used.

Compatibility Mode Exceptions

Same exceptions as in protected mode.

64-Bit Mode Exceptions

#SS(0) If a memory address referencing the SS segment is in a non-canonical form.

#GP(0) If the memory address is in a non-canonical form.

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the

current privilege level is 3.

#UD If the LOCK prefix is used.

VEXTRACTF128/VEXTRACTF32x4/VEXTRACTF64x2/VEXTRACTF32x8/VEXTRACTF64x4—Extract Packed Floating-Point Values

INSTRUCTION SET REFERENCE, V-Z

5-94 Vol. 2C

VEXTRACTF128/VEXTRACTF32x4/VEXTRACTF64x2/VEXTRACTF32x8/VEXTRACTF64x4—Extra

ct Packed Floating-Point Values

Instruction Operand Encoding

Description

VEXTRACTF128/VEXTRACTF32x4 and VEXTRACTF64x2 extract 128-bits of single-precision floating-point values

from the source operand (the second operand) and store to the low 128-bit of the destination operand (the first

operand). The 128-bit data extraction occurs at an 128-bit granular offset specified by imm8[0] (256-bit) or

imm8[1:0] as the multiply factor. The destination may be either a vector register or an 128-bit memory location.

VEXTRACTF32x4: The low 128-bit of the destination operand is updated at 32-bit granularity according to the

writemask.

VEXTRACTF32x8 and VEXTRACTF64x4 extract 256-bits of double-precision floating-point values from the source

operand (second operand) and store to the low 256-bit of the destination operand (the first operand). The 256-bit

data extraction occurs at an 256-bit granular offset specified by imm8[0] (256-bit) or imm8[0] as the multiply

factor The destination may be either a vector register or a 256-bit memory location.

VEXTRACTF64x4: The low 256-bit of the destination operand is updated at 64-bit granularity according to the

writemask.

VEX.vvvv and EVEX.vvvv are reserved and must be 1111b otherwise instructions will #UD.

The high 6 bits of the immediate are ignored.

If VEXTRACTF128 is encoded with VEX.L= 0, an attempt to execute the instruction encoded with VEX.L= 0 will

cause an #UD exception.

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

VEX.256.66.0F3A.W0 19 /r ib

VEXTRACTF128 xmm1/m128, ymm2,

imm8

A V/V AVX Extract 128 bits of packed floating-point values

from ymm2 and store results in xmm1/m128.

EVEX.256.66.0F3A.W0 19 /r ib

VEXTRACTF32X4 xmm1/m128 {k1}{z},

ymm2, imm8

CV/V AVX512VL

AVX512F

Extract 128 bits of packed single-precision floating-

point values from ymm2 and store results in

xmm1/m128 subject to writemask k1.

EVEX.512.66.0F3A.W0 19 /r ib

VEXTRACTF32x4 xmm1/m128 {k1}{z},

zmm2, imm8

C V/V AVX512F Extract 128 bits of packed single-precision floating-

point values from zmm2 and store results in

xmm1/m128 subject to writemask k1.

EVEX.256.66.0F3A.W1 19 /r ib

VEXTRACTF64X2 xmm1/m128 {k1}{z},

ymm2, imm8

BV/V AVX512VL

AVX512DQ

Extract 128 bits of packed double-precision

floating-point values from ymm2 and store results

in xmm1/m128 subject to writemask k1.

EVEX.512.66.0F3A.W1 19 /r ib

VEXTRACTF64X2 xmm1/m128 {k1}{z},

zmm2, imm8

B V/V AVX512DQ Extract 128 bits of packed double-precision

floating-point values from zmm2 and store results

in xmm1/m128 subject to writemask k1.

EVEX.512.66.0F3A.W0 1B /r ib

VEXTRACTF32X8 ymm1/m256 {k1}{z},

zmm2, imm8

D V/V AVX512DQ Extract 256 bits of packed single-precision floating-

point values from zmm2 and store results in

ymm1/m256 subject to writemask k1.

EVEX.512.66.0F3A.W1 1B /r ib

VEXTRACTF64x4 ymm1/m256 {k1}{z},

zmm2, imm8

C V/V AVX512F Extract 256 bits of packed double-precision

floating-point values from zmm2 and store results

in ymm1/m256 subject to writemask k1.

Op/En Tuple Type Operand 1 Operand 2 Operand 3 Operand 4

A NA ModRM:r/m (w) ModRM:reg (r) Imm8 NA

B Tuple2 ModRM:r/m (w) ModRM:reg (r) Imm8 NA

C Tuple4 ModRM:r/m (w) ModRM:reg (r) Imm8 NA

D Tuple8 ModRM:r/m (w) ModRM:reg (r) Imm8 NA

VEXTRACTF128/VEXTRACTF32x4/VEXTRACTF64x2/VEXTRACTF32x8/VEXTRACTF64x4—Extract Packed Floating-Point Values

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-95

Operation

VEXTRACTF32x4 (EVEX encoded versions) when destination is a register

VL = 256, 512

IF VL = 256

CASE (imm8[0]) OF

0: TMP_DEST[127:0]  SRC1[127:0]

1: TMP_DEST[127:0]  SRC1[255:128]

ESAC.

FI;

IF VL = 512

CASE (imm8[1:0]) OF

00: TMP_DEST[127:0]  SRC1[127:0]

01: TMP_DEST[127:0]  SRC1[255:128]

10: TMP_DEST[127:0]  SRC1[383:256]

11: TMP_DEST[127:0]  SRC1[511:384]

ESAC.

FI;

FOR j  0 TO 3

i  j * 32

IF k1[j] OR *no writemask*

THEN DEST[i+31:i]  TMP_DEST[i+31:i]

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+31:i] remains unchanged*

ELSE *zeroing-masking* ; zeroing-masking

DEST[i+31:i]  0

FI;

ENDFOR

DEST[MAXVL-1:128]  0

VEXTRACTF32x4 (EVEX encoded versions) when destination is memory

VL = 256, 512

IF VL = 256

CASE (imm8[0]) OF

0: TMP_DEST[127:0]  SRC1[127:0]

1: TMP_DEST[127:0]  SRC1[255:128]

ESAC.

FI;

IF VL = 512

CASE (imm8[1:0]) OF

00: TMP_DEST[127:0]  SRC1[127:0]

01: TMP_DEST[127:0]  SRC1[255:128]

10: TMP_DEST[127:0]  SRC1[383:256]

11: TMP_DEST[127:0]  SRC1[511:384]

ESAC.

FI;

FOR j  0 TO 3

i  j * 32

IF k1[j] OR *no writemask*

THEN DEST[i+31:i]  TMP_DEST[i+31:i]

ELSE *DEST[i+31:i] remains unchanged* ; merging-masking

FI;

VEXTRACTF128/VEXTRACTF32x4/VEXTRACTF64x2/VEXTRACTF32x8/VEXTRACTF64x4—Extract Packed Floating-Point Values

INSTRUCTION SET REFERENCE, V-Z

5-96 Vol. 2C

ENDFOR

VEXTRACTF64x2 (EVEX encoded versions) when destination is a register

VL = 256, 512

IF VL = 256

CASE (imm8[0]) OF

0: TMP_DEST[127:0]  SRC1[127:0]

1: TMP_DEST[127:0]  SRC1[255:128]

ESAC.

FI;

IF VL = 512

CASE (imm8[1:0]) OF

00: TMP_DEST[127:0]  SRC1[127:0]

01: TMP_DEST[127:0]  SRC1[255:128]

10: TMP_DEST[127:0]  SRC1[383:256]

11: TMP_DEST[127:0]  SRC1[511:384]

ESAC.

FI;

FOR j  0 TO 1

i  j * 64

IF k1[j] OR *no writemask*

THEN DEST[i+63:i]  TMP_DEST[i+63:i]

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+63:i] remains unchanged*

ELSE *zeroing-masking* ; zeroing-masking

DEST[i+63:i]  0

FI;

ENDFOR

DEST[MAXVL-1:128]  0

VEXTRACTF64x2 (EVEX encoded versions) when destination is memory

VL = 256, 512

IF VL = 256

CASE (imm8[0]) OF

0: TMP_DEST[127:0]  SRC1[127:0]

1: TMP_DEST[127:0]  SRC1[255:128]

ESAC.

FI;

IF VL = 512

CASE (imm8[1:0]) OF

00: TMP_DEST[127:0]  SRC1[127:0]

01: TMP_DEST[127:0]  SRC1[255:128]

10: TMP_DEST[127:0]  SRC1[383:256]

11: TMP_DEST[127:0]  SRC1[511:384]

ESAC.

FI;

FOR j  0 TO 1

i  j * 64

IF k1[j] OR *no writemask*

THEN DEST[i+63:i]  TMP_DEST[i+63:i]

VEXTRACTF128/VEXTRACTF32x4/VEXTRACTF64x2/VEXTRACTF32x8/VEXTRACTF64x4—Extract Packed Floating-Point Values

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-97

ELSE *DEST[i+63:i] remains unchanged* ; merging-masking

FI;

ENDFOR

VEXTRACTF32x8 (EVEX.U1.512 encoded version) when destination is a register

VL = 512

CASE (imm8[0]) OF

0: TMP_DEST[255:0]  SRC1[255:0]

1: TMP_DEST[255:0]  SRC1[511:256]

ESAC.

FOR j  0 TO 7

i  j * 32

IF k1[j] OR *no writemask*

THEN DEST[i+31:i]  TMP_DEST[i+31:i]

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+31:i] remains unchanged*

ELSE *zeroing-masking* ; zeroing-masking

DEST[i+31:i]  0

FI;

ENDFOR

DEST[MAXVL-1:256]  0

VEXTRACTF32x8 (EVEX.U1.512 encoded version) when destination is memory

CASE (imm8[0]) OF

0: TMP_DEST[255:0]  SRC1[255:0]

1: TMP_DEST[255:0]  SRC1[511:256]

ESAC.

FOR j  0 TO 7

i  j * 32

IF k1[j] OR *no writemask*

THEN DEST[i+31:i]  TMP_DEST[i+31:i]

ELSE *DEST[i+31:i] remains unchanged* ; merging-masking

FI;

ENDFOR

VEXTRACTF64x4 (EVEX.512 encoded version) when destination is a register

VL = 512

CASE (imm8[0]) OF

0: TMP_DEST[255:0]  SRC1[255:0]

1: TMP_DEST[255:0]  SRC1[511:256]

ESAC.

FOR j  0 TO 3

i  j * 64

IF k1[j] OR *no writemask*

THEN DEST[i+63:i]  TMP_DEST[i+63:i]

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+63:i] remains unchanged*

ELSE *zeroing-masking* ; zeroing-masking

VEXTRACTF128/VEXTRACTF32x4/VEXTRACTF64x2/VEXTRACTF32x8/VEXTRACTF64x4—Extract Packed Floating-Point Values

INSTRUCTION SET REFERENCE, V-Z

5-98 Vol. 2C

DEST[i+63:i]  0

FI;

ENDFOR

DEST[MAXVL-1:256]  0

VEXTRACTF64x4 (EVEX.512 encoded version) when destination is memory

CASE (imm8[0]) OF

0: TMP_DEST[255:0]  SRC1[255:0]

1: TMP_DEST[255:0]  SRC1[511:256]

ESAC.

FOR j  0 TO 3

i  j * 64

IF k1[j] OR *no writemask*

THEN DEST[i+63:i]  TMP_DEST[i+63:i]

ELSE ; merging-masking

*DEST[i+63:i] remains unchanged*

FI;

ENDFOR

VEXTRACTF128 (memory destination form)

CASE (imm8[0]) OF

0: DEST[127:0] SRC1[127:0]

1: DEST[127:0] SRC1[255:128]

ESAC.

VEXTRACTF128 (register destination form)

CASE (imm8[0]) OF

0: DEST[127:0] SRC1[127:0]

1: DEST[127:0] SRC1[255:128]

ESAC.

DEST[MAXVL-1:128] 0

Intel C/C++ Compiler Intrinsic Equivalent

VEXTRACTF32x4 __m128 _mm512_extractf32x4_ps(__m512 a, const int nidx);

VEXTRACTF32x4 __m128 _mm512_mask_extractf32x4_ps(__m128 s, __mmask8 k, __m512 a, const int nidx);

VEXTRACTF32x4 __m128 _mm512_maskz_extractf32x4_ps( __mmask8 k, __m512 a, const int nidx);

VEXTRACTF32x4 __m128 _mm256_extractf32x4_ps(__m256 a, const int nidx);

VEXTRACTF32x4 __m128 _mm256_mask_extractf32x4_ps(__m128 s, __mmask8 k, __m256 a, const int nidx);

VEXTRACTF32x4 __m128 _mm256_maskz_extractf32x4_ps( __mmask8 k, __m256 a, const int nidx);

VEXTRACTF32x8 __m256 _mm512_extractf32x8_ps(__m512 a, const int nidx);

VEXTRACTF32x8 __m256 _mm512_mask_extractf32x8_ps(__m256 s, __mmask8 k, __m512 a, const int nidx);

VEXTRACTF32x8 __m256 _mm512_maskz_extractf32x8_ps( __mmask8 k, __m512 a, const int nidx);

VEXTRACTF64x2 __m128d _mm512_extractf64x2_pd(__m512d a, const int nidx);

VEXTRACTF64x2 __m128d _mm512_mask_extractf64x2_pd(__m128d s, __mmask8 k, __m512d a, const int nidx);

VEXTRACTF64x2 __m128d _mm512_maskz_extractf64x2_pd( __mmask8 k, __m512d a, const int nidx);

VEXTRACTF64x2 __m128d _mm256_extractf64x2_pd(__m256d a, const int nidx);

VEXTRACTF64x2 __m128d _mm256_mask_extractf64x2_pd(__m128d s, __mmask8 k, __m256d a, const int nidx);

VEXTRACTF64x2 __m128d _mm256_maskz_extractf64x2_pd( __mmask8 k, __m256d a, const int nidx);

VEXTRACTF64x4 __m256d _mm512_extractf64x4_pd( __m512d a, const int nidx);

VEXTRACTF64x4 __m256d _mm512_mask_extractf64x4_pd(__m256d s, __mmask8 k, __m512d a, const int nidx);

VEXTRACTF64x4 __m256d _mm512_maskz_extractf64x4_pd( __mmask8 k, __m512d a, const int nidx);

VEXTRACTF128 __m128 _mm256_extractf128_ps (__m256 a, int offset);

VEXTRACTF128/VEXTRACTF32x4/VEXTRACTF64x2/VEXTRACTF32x8/VEXTRACTF64x4—Extract Packed Floating-Point Values

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-99

VEXTRACTF128 __m128d _mm256_extractf128_pd (__m256d a, int offset);

VEXTRACTF128 __m128i_mm256_extractf128_si256(__m256i a, int offset);

SIMD Floating-Point Exceptions

None

Other Exceptions

VEX-encoded instructions, see Exceptions Type 6;

EVEX-encoded instructions, see Exceptions Type E6NF.

#UD IF VEX.L = 0.

#UD If VEX.vvvv != 1111B or EVEX.vvvv != 1111B.

VEXTRACTI128/VEXTRACTI32x4/VEXTRACTI64x2/VEXTRACTI32x8/VEXTRACTI64x4—Extract packed Integer Values

INSTRUCTION SET REFERENCE, V-Z

5-100 Vol. 2C

VEXTRACTI128/VEXTRACTI32x4/VEXTRACTI64x2/VEXTRACTI32x8/VEXTRACTI64x4—Extract

packed Integer Values

Instruction Operand Encoding

Description

VEXTRACTI128/VEXTRACTI32x4 and VEXTRACTI64x2 extract 128-bits of doubleword integer values from the

source operand (the second operand) and store to the low 128-bit of the destination operand (the first operand).

The 128-bit data extraction occurs at an 128-bit granular offset specified by imm8[0] (256-bit) or imm8[1:0] as

the multiply factor. The destination may be either a vector register or an 128-bit memory location.

VEXTRACTI32x4: The low 128-bit of the destination operand is updated at 32-bit granularity according to the

writemask.

VEXTRACTI64x2: The low 128-bit of the destination operand is updated at 64-bit granularity according to the

writemask.

VEXTRACTI32x8 and VEXTRACTI64x4 extract 256-bits of quadword integer values from the source operand (the

second operand) and store to the low 256-bit of the destination operand (the first operand). The 256-bit data

extraction occurs at an 256-bit granular offset specified by imm8[0] (256-bit) or imm8[0] as the multiply factor

The destination may be either a vector register or a 256-bit memory location.

VEXTRACTI32x8: The low 256-bit of the destination operand is updated at 32-bit granularity according to the

writemask.

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

VEX.256.66.0F3A.W0 39 /r ib

VEXTRACTI128 xmm1/m128, ymm2,

imm8

A V/V AVX2 Extract 128 bits of integer data from ymm2 and

store results in xmm1/m128.

EVEX.256.66.0F3A.W0 39 /r ib

VEXTRACTI32X4 xmm1/m128 {k1}{z},

ymm2, imm8

C V/V AVX512VL

AVX512F

Extract 128 bits of double-word integer values

from ymm2 and store results in xmm1/m128

subject to writemask k1.

EVEX.512.66.0F3A.W0 39 /r ib

VEXTRACTI32x4 xmm1/m128 {k1}{z},

zmm2, imm8

C V/V AVX512F Extract 128 bits of double-word integer values

from zmm2 and store results in xmm1/m128

subject to writemask k1.

EVEX.256.66.0F3A.W1 39 /r ib

VEXTRACTI64X2 xmm1/m128 {k1}{z},

ymm2, imm8

B V/V AVX512VL

AVX512DQ

Extract 128 bits of quad-word integer values from

ymm2 and store results in xmm1/m128 subject to

writemask k1.

EVEX.512.66.0F3A.W1 39 /r ib

VEXTRACTI64X2 xmm1/m128 {k1}{z},

zmm2, imm8

B V/V AVX512DQ Extract 128 bits of quad-word integer values from

zmm2 and store results in xmm1/m128 subject to

writemask k1.

EVEX.512.66.0F3A.W0 3B /r ib

VEXTRACTI32X8 ymm1/m256 {k1}{z},

zmm2, imm8

D V/V AVX512DQ Extract 256 bits of double-word integer values

from zmm2 and store results in ymm1/m256

subject to writemask k1.

EVEX.512.66.0F3A.W1 3B /r ib

VEXTRACTI64x4 ymm1/m256 {k1}{z},

zmm2, imm8

C V/V AVX512F Extract 256 bits of quad-word integer values from

zmm2 and store results in ymm1/m256 subject to

writemask k1.

Op/En Tuple Type Operand 1 Operand 2 Operand 3 Operand 4

A NA ModRM:r/m (w) ModRM:reg (r) Imm8 NA

B Tuple2 ModRM:r/m (w) ModRM:reg (r) Imm8 NA

C Tuple4 ModRM:r/m (w) ModRM:reg (r) Imm8 NA

D Tuple8 ModRM:r/m (w) ModRM:reg (r) Imm8 NA

VEXTRACTI128/VEXTRACTI32x4/VEXTRACTI64x2/VEXTRACTI32x8/VEXTRACTI64x4—Extract packed Integer Values

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-101

VEXTRACTI64x4: The low 256-bit of the destination operand is updated at 64-bit granularity according to the

writemask.

VEX.vvvv and EVEX.vvvv are reserved and must be 1111b otherwise instructions will #UD.

The high 7 bits (6 bits in EVEX.512) of the immediate are ignored.

If VEXTRACTI128 is encoded with VEX.L= 0, an attempt to execute the instruction encoded with VEX.L= 0 will

cause an #UD exception.

Operation

VEXTRACTI32x4 (EVEX encoded versions) when destination is a register

VL = 256, 512

IF VL = 256

CASE (imm8[0]) OF

0: TMP_DEST[127:0]  SRC1[127:0]

1: TMP_DEST[127:0]  SRC1[255:128]

ESAC.

FI;

IF VL = 512

CASE (imm8[1:0]) OF

00: TMP_DEST[127:0]  SRC1[127:0]

01: TMP_DEST[127:0]  SRC1[255:128]

10: TMP_DEST[127:0]  SRC1[383:256]

11: TMP_DEST[127:0]  SRC1[511:384]

ESAC.

FI;

FOR j  0 TO 3

i  j * 32

IF k1[j] OR *no writemask*

THEN DEST[i+31:i]  TMP_DEST[i+31:i]

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+31:i] remains unchanged*

ELSE *zeroing-masking* ; zeroing-masking

DEST[i+31:i]  0

FI;

ENDFOR

DEST[MAXVL-1:128]  0

VEXTRACTI32x4 (EVEX encoded versions) when destination is memory

VL = 256, 512

IF VL = 256

CASE (imm8[0]) OF

0: TMP_DEST[127:0]  SRC1[127:0]

1: TMP_DEST[127:0]  SRC1[255:128]

ESAC.

FI;

IF VL = 512

CASE (imm8[1:0]) OF

00: TMP_DEST[127:0]  SRC1[127:0]

01: TMP_DEST[127:0]  SRC1[255:128]

10: TMP_DEST[127:0]  SRC1[383:256]

11: TMP_DEST[127:0]  SRC1[511:384]

ESAC.

VEXTRACTI128/VEXTRACTI32x4/VEXTRACTI64x2/VEXTRACTI32x8/VEXTRACTI64x4—Extract packed Integer Values

INSTRUCTION SET REFERENCE, V-Z

5-102 Vol. 2C

FI;

FOR j  0 TO 3

i  j * 32

IF k1[j] OR *no writemask*

THEN DEST[i+31:i]  TMP_DEST[i+31:i]

ELSE *DEST[i+31:i] remains unchanged* ; merging-masking

FI;

ENDFOR

VEXTRACTI64x2 (EVEX encoded versions) when destination is a register

VL = 256, 512

IF VL = 256

CASE (imm8[0]) OF

0: TMP_DEST[127:0]  SRC1[127:0]

1: TMP_DEST[127:0]  SRC1[255:128]

ESAC.

FI;

IF VL = 512

CASE (imm8[1:0]) OF

00: TMP_DEST[127:0]  SRC1[127:0]

01: TMP_DEST[127:0]  SRC1[255:128]

10: TMP_DEST[127:0]  SRC1[383:256]

11: TMP_DEST[127:0]  SRC1[511:384]

ESAC.

FI;

FOR j  0 TO 1

i  j * 64

IF k1[j] OR *no writemask*

THEN DEST[i+63:i]  TMP_DEST[i+63:i]

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+63:i] remains unchanged*

ELSE *zeroing-masking* ; zeroing-masking

DEST[i+63:i]  0

FI;

ENDFOR

DEST[MAXVL-1:128]  0

VEXTRACTI128/VEXTRACTI32x4/VEXTRACTI64x2/VEXTRACTI32x8/VEXTRACTI64x4—Extract packed Integer Values

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-103

VEXTRACTI64x2 (EVEX encoded versions) when destination is memory

VL = 256, 512

IF VL = 256

CASE (imm8[0]) OF

0: TMP_DEST[127:0]  SRC1[127:0]

1: TMP_DEST[127:0]  SRC1[255:128]

ESAC.

FI;

IF VL = 512

CASE (imm8[1:0]) OF

00: TMP_DEST[127:0]  SRC1[127:0]

01: TMP_DEST[127:0]  SRC1[255:128]

10: TMP_DEST[127:0]  SRC1[383:256]

11: TMP_DEST[127:0]  SRC1[511:384]

ESAC.

FI;

FOR j  0 TO 1

i  j * 64

IF k1[j] OR *no writemask*

THEN DEST[i+63:i]  TMP_DEST[i+63:i]

ELSE *DEST[i+63:i] remains unchanged* ; merging-masking

FI;

ENDFOR

VEXTRACTI32x8 (EVEX.U1.512 encoded version) when destination is a register

VL = 512

CASE (imm8[0]) OF

0: TMP_DEST[255:0]  SRC1[255:0]

1: TMP_DEST[255:0]  SRC1[511:256]

ESAC.

FOR j  0 TO 7

i  j * 32

IF k1[j] OR *no writemask*

THEN DEST[i+31:i]  TMP_DEST[i+31:i]

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+31:i] remains unchanged*

ELSE *zeroing-masking* ; zeroing-masking

DEST[i+31:i]  0

FI;

ENDFOR

DEST[MAXVL-1:256]  0

VEXTRACTI128/VEXTRACTI32x4/VEXTRACTI64x2/VEXTRACTI32x8/VEXTRACTI64x4—Extract packed Integer Values

INSTRUCTION SET REFERENCE, V-Z

5-104 Vol. 2C

VEXTRACTI32x8 (EVEX.U1.512 encoded version) when destination is memory

CASE (imm8[0]) OF

0: TMP_DEST[255:0]  SRC1[255:0]

1: TMP_DEST[255:0]  SRC1[511:256]

ESAC.

FOR j  0 TO 7

i  j * 32

IF k1[j] OR *no writemask*

THEN DEST[i+31:i]  TMP_DEST[i+31:i]

ELSE *DEST[i+31:i] remains unchanged* ; merging-masking

FI;

ENDFOR

VEXTRACTI64x4 (EVEX.512 encoded version) when destination is a register

VL = 512

CASE (imm8[0]) OF

0: TMP_DEST[255:0]  SRC1[255:0]

1: TMP_DEST[255:0]  SRC1[511:256]

ESAC.

FOR j  0 TO 3

i  j * 64

IF k1[j] OR *no writemask*

THEN DEST[i+63:i]  TMP_DEST[i+63:i]

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+63:i] remains unchanged*

ELSE *zeroing-masking* ; zeroing-masking

DEST[i+63:i]  0

FI;

ENDFOR

DEST[MAXVL-1:256]  0

VEXTRACTI64x4 (EVEX.512 encoded version) when destination is memory

CASE (imm8[0]) OF

0: TMP_DEST[255:0]  SRC1[255:0]

1: TMP_DEST[255:0]  SRC1[511:256]

ESAC.

FOR j  0 TO 3

i  j * 64

IF k1[j] OR *no writemask*

THEN DEST[i+63:i]  TMP_DEST[i+63:i]

ELSE *DEST[i+63:i] remains unchanged* ; merging-masking

FI;

ENDFOR

VEXTRACTI128/VEXTRACTI32x4/VEXTRACTI64x2/VEXTRACTI32x8/VEXTRACTI64x4—Extract packed Integer Values

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-105

VEXTRACTI128 (memory destination form)

CASE (imm8[0]) OF

0: DEST[127:0] SRC1[127:0]

1: DEST[127:0] SRC1[255:128]

ESAC.

VEXTRACTI128 (register destination form)

CASE (imm8[0]) OF

0: DEST[127:0] SRC1[127:0]

1: DEST[127:0] SRC1[255:128]

ESAC.

DEST[MAXVL-1:128] 0

Intel C/C++ Compiler Intrinsic Equivalent

VEXTRACTI32x4 __m128i _mm512_extracti32x4_epi32(__m512i a, const int nidx);

VEXTRACTI32x4 __m128i _mm512_mask_extracti32x4_epi32(__m128i s, __mmask8 k, __m512i a, const int nidx);

VEXTRACTI32x4 __m128i _mm512_maskz_extracti32x4_epi32( __mmask8 k, __m512i a, const int nidx);

VEXTRACTI32x4 __m128i _mm256_extracti32x4_epi32(__m256i a, const int nidx);

VEXTRACTI32x4 __m128i _mm256_mask_extracti32x4_epi32(__m128i s, __mmask8 k, __m256i a, const int nidx);

VEXTRACTI32x4 __m128i _mm256_maskz_extracti32x4_epi32( __mmask8 k, __m256i a, const int nidx);

VEXTRACTI32x8 __m256i _mm512_extracti32x8_epi32(__m512i a, const int nidx);

VEXTRACTI32x8 __m256i _mm512_mask_extracti32x8_epi32(__m256i s, __mmask8 k, __m512i a, const int nidx);

VEXTRACTI32x8 __m256i _mm512_maskz_extracti32x8_epi32( __mmask8 k, __m512i a, const int nidx);

VEXTRACTI64x2 __m128i _mm512_extracti64x2_epi64(__m512i a, const int nidx);

VEXTRACTI64x2 __m128i _mm512_mask_extracti64x2_epi64(__m128i s, __mmask8 k, __m512i a, const int nidx);

VEXTRACTI64x2 __m128i _mm512_maskz_extracti64x2_epi64( __mmask8 k, __m512i a, const int nidx);

VEXTRACTI64x2 __m128i _mm256_extracti64x2_epi64(__m256i a, const int nidx);

VEXTRACTI64x2 __m128i _mm256_mask_extracti64x2_epi64(__m128i s, __mmask8 k, __m256i a, const int nidx);

VEXTRACTI64x2 __m128i _mm256_maskz_extracti64x2_epi64( __mmask8 k, __m256i a, const int nidx);

VEXTRACTI64x4 __m256i _mm512_extracti64x4_epi64(__m512i a, const int nidx);

VEXTRACTI64x4 __m256i _mm512_mask_extracti64x4_epi64(__m256i s, __mmask8 k, __m512i a, const int nidx);

VEXTRACTI64x4 __m256i _mm512_maskz_extracti64x4_epi64( __mmask8 k, __m512i a, const int nidx);

VEXTRACTI128 __m128i _mm256_extracti128_si256(__m256i a, int offset);

SIMD Floating-Point Exceptions

None

Other Exceptions

VEX-encoded instructions, see Exceptions Type 6;

EVEX-encoded instructions, see Exceptions Type E6NF.

#UD IF VEX.L = 0.

#UD If VEX.vvvv != 1111B or EVEX.vvvv != 1111B.

VFIXUPIMMPD—Fix Up Special Packed Float64 Values

INSTRUCTION SET REFERENCE, V-Z

5-106 Vol. 2C

VFIXUPIMMPD—Fix Up Special Packed Float64 Values

Instruction Operand Encoding

Description

Perform fix-up of quad-word elements encoded in double-precision floating-point format in the first source operand

(the second operand) using a 32-bit, two-level look-up table specified in the corresponding quadword element of

the second source operand (the third operand) with exception reporting specifier imm8. The elements that are

fixed-up are selected by mask bits of 1 specified in the opmask k1. Mask bits of 0 in the opmask k1 or table

response action of 0000b preserves the corresponding element of the first operand. The fixed-up elements from

the first source operand and the preserved element in the first operand are combined as the final results in the

destination operand (the first operand).

The destination and the first source operands are ZMM/YMM/XMM registers. The second source operand can be a

ZMM/YMM/XMM register, a 512/256/128-bit memory location or a 512/256/128-bit vector broadcasted from a 64-

bit memory location.

The two-level look-up table perform a fix-up of each DP FP input data in the first source operand by decoding the

input data encoding into 8 token types. A response table is defined for each token type that converts the input

encoding in the first source operand with one of 16 response actions.

This instruction is specifically intended for use in fixing up the results of arithmetic calculations involving one source

so that they match the spec, although it is generally useful for fixing up the results of multiple-instruction

sequences to reflect special-number inputs. For example, consider rcp(0). Input 0 to rcp, and you should get INF

according to the DX10 spec. However, evaluating rcp via Newton-Raphson, where x=approx(1/0), yields an incor-

rect result. To deal with this, VFIXUPIMMPD can be used after the N-R reciprocal sequence to set the result to the

correct value (i.e. INF when the input is 0).

If MXCSR.DAZ is not set, denormal input elements in the first source operand are considered as normal inputs and

do not trigger any fixup nor fault reporting.

Imm8 is used to set the required flags reporting. It supports #ZE and #IE fault reporting (see details below).

MXCSR mask bits are ignored and are treated as if all mask bits are set to masked response). If any of the imm8

bits is set and the condition met for fault reporting, MXCSR.IE or MXCSR.ZE might be updated.

This instruction is writemasked, so only those elements with the corresponding bit set in vector mask register k1

are computed and stored into zmm1. Elements in the destination with the corresponding bit clear in k1 retain their

previous values or are set to 0.

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

EVEX.128.66.0F3A.W1 54 /r ib

VFIXUPIMMPD xmm1 {k1}{z}, xmm2,

xmm3/m128/m64bcst, imm8

A V/V AVX512VL

AVX512F

Fix up special numbers in float64 vector xmm1, float64

vector xmm2 and int64 vector xmm3/m128/m64bcst

and store the result in xmm1, under writemask.

EVEX.256.66.0F3A.W1 54 /r ib

VFIXUPIMMPD ymm1 {k1}{z}, ymm2,

ymm3/m256/m64bcst, imm8

A V/V AVX512VL

AVX512F

Fix up special numbers in float64 vector ymm1, float64

vector ymm2 and int64 vector ymm3/m256/m64bcst

and store the result in ymm1, under writemask.

EVEX.512.66.0F3A.W1 54 /r ib

VFIXUPIMMPD zmm1 {k1}{z}, zmm2,

zmm3/m512/m64bcst{sae}, imm8

A V/V AVX512F Fix up elements of float64 vector in zmm2 using int64

vector table in zmm3/m512/m64bcst, combine with

preserved elements from zmm1, and store the result in

zmm1.

Op/En Tuple Type Operand 1 Operand 2 Operand 3 Operand 4

A Full ModRM:reg (r, w) EVEX.vvvv ModRM:r/m (r) Imm8

VFIXUPIMMPD—Fix Up Special Packed Float64 Values

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-107

Operation

enum TOKEN_TYPE

{

QNAN_TOKEN  0,

SNAN_TOKEN  1,

ZERO_VALUE_TOKEN  2,

POS_ONE_VALUE_TOKEN  3,

NEG_INF_TOKEN  4,

POS_INF_TOKEN  5,

NEG_VALUE_TOKEN  6,

POS_VALUE_TOKEN  7

}

FIXUPIMM_DP (dest[63:0], src1[63:0],tbl3[63:0], imm8 [7:0]){

tsrc[63:0]  ((src1[62:52] = 0) AND (MXCSR.DAZ =1)) ? 0.0 : src1[63:0]

CASE(tsrc[63:0] of TOKEN_TYPE) {

QNAN_TOKEN: j  0;

SNAN_TOKEN: j  1;

ZERO_VALUE_TOKEN: j  2;

POS_ONE_VALUE_TOKEN: j  3;

NEG_INF_TOKEN: j  4;

POS_INF_TOKEN: j  5;

NEG_VALUE_TOKEN: j  6;

POS_VALUE_TOKEN: j  7;

} ; end source special CASE(tsrc…)

; The required response from src3 table is extracted

token_response[3:0] = tbl3[3+4*j:4*j];

CASE(token_response[3:0]) {

0000: dest[63:0]  dest[63:0]; ; preserve content of DEST

0001: dest[63:0]  tsrc[63:0]; ; pass through src1 normal input value, denormal as zero

0010: dest[63:0]  QNaN(tsrc[63:0]);

0011: dest[63:0]  QNAN_Indefinite;

0100: dest[63:0]  -INF;

0101: dest[63:0]  +INF;

0110: dest[63:0]  tsrc.sign? –INF : +INF;

0111: dest[63:0]  -0;

1000: dest[63:0]  +0;

1001: dest[63:0]  -1;

1010: dest[63:0]  +1;

1011: dest[63:0]  ½;

1100: dest[63:0]  90.0;

1101: dest[63:0]  PI/2;

1110: dest[63:0]  MAX_FLOAT;

1111: dest[63:0]  -MAX_FLOAT;

} ; end of token_response CASE

VFIXUPIMMPD—Fix Up Special Packed Float64 Values

INSTRUCTION SET REFERENCE, V-Z

5-108 Vol. 2C

; The required fault reporting from imm8 is extracted

; TOKENs are mutually exclusive and TOKENs priority defines the order.

; Multiple faults related to a single token can occur simultaneously.

IF (tsrc[63:0] of TOKEN_TYPE: ZERO_VALUE_TOKEN) AND imm8[0] then set #ZE;

IF (tsrc[63:0] of TOKEN_TYPE: ZERO_VALUE_TOKEN) AND imm8[1] then set #IE;

IF (tsrc[63:0] of TOKEN_TYPE: ONE_VALUE_TOKEN) AND imm8[2] then set #ZE;

IF (tsrc[63:0] of TOKEN_TYPE: ONE_VALUE_TOKEN) AND imm8[3] then set #IE;

IF (tsrc[63:0] of TOKEN_TYPE: SNAN_TOKEN) AND imm8[4] then set #IE;

IF (tsrc[63:0] of TOKEN_TYPE: NEG_INF_TOKEN) AND imm8[5] then set #IE;

IF (tsrc[63:0] of TOKEN_TYPE: NEG_VALUE_TOKEN) AND imm8[6] then set #IE;

IF (tsrc[63:0] of TOKEN_TYPE: POS_INF_TOKEN) AND imm8[7] then set #IE;

; end fault reporting

return dest[63:0];

} ; end of FIXUPIMM_DP()

VFIXUPIMMPD

(KL, VL) = (2, 128), (4, 256), (8, 512)

FOR j  0 TO KL-1

i  j * 64

IF k1[j] OR *no writemask*

THEN

IF (EVEX.b = 1) AND (SRC2 *is memory*)

THEN

DEST[i+63:i]  FIXUPIMM_DP(DEST[i+63:i], SRC1[i+63:i], SRC2[63:0], imm8 [7:0])

ELSE

DEST[i+63:i]  FIXUPIMM_DP(DEST[i+63:i], SRC1[i+63:i], SRC2[i+63:i], imm8 [7:0])

FI;

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+63:i] remains unchanged*

ELSE DEST[i+63:i]  0 ; zeroing-masking

FI;

ENDFOR

DEST[MAXVL-1:VL]  0

VFIXUPIMMPD—Fix Up Special Packed Float64 Values

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-109

Immediate Control Description:

Intel C/C++ Compiler Intrinsic Equivalent

VFIXUPIMMPD __m512d _mm512_fixupimm_pd( __m512d a, __m512i tbl, int imm);

VFIXUPIMMPD __m512d _mm512_mask_fixupimm_pd(__m512d s, __mmask8 k, __m512d a, __m512i tbl, int imm);

VFIXUPIMMPD __m512d _mm512_maskz_fixupimm_pd( __mmask8 k, __m512d a, __m512i tbl, int imm);

VFIXUPIMMPD __m512d _mm512_fixupimm_round_pd( __m512d a, __m512i tbl, int imm, int sae);

VFIXUPIMMPD __m512d _mm512_mask_fixupimm_round_pd(__m512d s, __mmask8 k, __m512d a, __m512i tbl, int imm, int sae);

VFIXUPIMMPD __m512d _mm512_maskz_fixupimm_round_pd( __mmask8 k, __m512d a, __m512i tbl, int imm, int sae);

VFIXUPIMMPD __m256d _mm256_fixupimm_pd( __m256d a, __m256i tbl, int imm);

VFIXUPIMMPD __m256d _mm256_mask_fixupimm_pd(__m256d s, __mmask8 k, __m256d a, __m256i tbl, int imm);

VFIXUPIMMPD __m256d _mm256_maskz_fixupimm_pd( __mmask8 k, __m256d a, __m256i tbl, int imm);

VFIXUPIMMPD __m128d _mm_fixupimm_pd( __m128d a, __m128i tbl, int imm);

VFIXUPIMMPD __m128d _mm_mask_fixupimm_pd(__m128d s, __mmask8 k, __m128d a, __m128i tbl, int imm);

VFIXUPIMMPD __m128d _mm_maskz_fixupimm_pd( __mmask8 k, __m128d a, __m128i tbl, int imm);

SIMD Floating-Point Exceptions

Zero, Invalid

Other Exceptions

See Exceptions Type E2.

Figure 5-9. VFIXUPIMMPD Immediate Control Description

76543210

+ INF #IE

- INF #IE

SNaN #IE

- VE #IE

ONE

#IE

ONE #ZE

ZERO #IE

ZERO #ZE

VFIXUPIMMPS—Fix Up Special Packed Float32 Values

INSTRUCTION SET REFERENCE, V-Z

5-110 Vol. 2C

VFIXUPIMMPS—Fix Up Special Packed Float32 Values

Instruction Operand Encoding

Description

Perform fix-up of doubleword elements encoded in single-precision floating-point format in the first source operand

(the second operand) using a 32-bit, two-level look-up table specified in the corresponding doubleword element of

the second source operand (the third operand) with exception reporting specifier imm8. The elements that are

fixed-up are selected by mask bits of 1 specified in the opmask k1. Mask bits of 0 in the opmask k1 or table

response action of 0000b preserves the corresponding element of the first operand. The fixed-up elements from

the first source operand and the preserved element in the first operand are combined as the final results in the

destination operand (the first operand).

The destination and the first source operands are ZMM/YMM/XMM registers. The second source operand can be a

ZMM/YMM/XMM register, a 512/256/128-bit memory location or a 512/256/128-bit vector broadcasted from a 64-

bit memory location.

The two-level look-up table perform a fix-up of each SP FP input data in the first source operand by decoding the

input data encoding into 8 token types. A response table is defined for each token type that converts the input

encoding in the first source operand with one of 16 response actions.

This instruction is specifically intended for use in fixing up the results of arithmetic calculations involving one source

so that they match the spec, although it is generally useful for fixing up the results of multiple-instruction

sequences to reflect special-number inputs. For example, consider rcp(0). Input 0 to rcp, and you should get INF

according to the DX10 spec. However, evaluating rcp via Newton-Raphson, where x=approx(1/0), yields an incor-

rect result. To deal with this, VFIXUPIMMPS can be used after the N-R reciprocal sequence to set the result to the

correct value (i.e. INF when the input is 0).

If MXCSR.DAZ is not set, denormal input elements in the first source operand are considered as normal inputs and

do not trigger any fixup nor fault reporting.

Imm8 is used to set the required flags reporting. It supports #ZE and #IE fault reporting (see details below).

MXCSR.DAZ is used and refer to zmm2 only (i.e. zmm1 is not considered as zero in case MXCSR.DAZ is set).

MXCSR mask bits are ignored and are treated as if all mask bits are set to masked response). If any of the imm8

bits is set and the condition met for fault reporting, MXCSR.IE or MXCSR.ZE might be updated.

Opcode/

Instruction

Op /

64/32

bit Mode

Support

CPUID

Feature

Flag

Description

EVEX.128.66.0F3A.W0 54 /r

VFIXUPIMMPS xmm1 {k1}{z}, xmm2,

xmm3/m128/m32bcst, imm8

AV/V AVX512VL

AVX512F

Fix up special numbers in float32 vector xmm1, float32

vector xmm2 and int32 vector xmm3/m128/m32bcst

and store the result in xmm1, under writemask.

EVEX.256.66.0F3A.W0 54 /r

VFIXUPIMMPS ymm1 {k1}{z}, ymm2,

ymm3/m256/m32bcst, imm8

AV/V AVX512VL

AVX512F

Fix up special numbers in float32 vector ymm1, float32

vector ymm2 and int32 vector ymm3/m256/m32bcst

and store the result in ymm1, under writemask.

EVEX.512.66.0F3A.W0 54 /r ib

VFIXUPIMMPS zmm1 {k1}{z}, zmm2,

zmm3/m512/m32bcst{sae}, imm8

A V/V AVX512F Fix up elements of float32 vector in zmm2 using int32

vector table in zmm3/m512/m32bcst, combine with

preserved elements from zmm1, and store the result in

zmm1.

Op/En Tuple Type Operand 1 Operand 2 Operand 3 Operand 4

A Full ModRM:reg (r, w) EVEX.vvvv ModRM:r/m (r) Imm8

VFIXUPIMMPS—Fix Up Special Packed Float32 Values

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-111

Operation

enum TOKEN_TYPE

{

QNAN_TOKEN  0,

SNAN_TOKEN  1,

ZERO_VALUE_TOKEN  2,

POS_ONE_VALUE_TOKEN  3,

NEG_INF_TOKEN  4,

POS_INF_TOKEN  5,

NEG_VALUE_TOKEN  6,

POS_VALUE_TOKEN  7

}

FIXUPIMM_SP ( dest[31:0], src1[31:0],tbl3[31:0], imm8 [7:0]){

tsrc[31:0]  ((src1[30:23] = 0) AND (MXCSR.DAZ =1)) ? 0.0 : src1[31:0]

CASE(tsrc[31:0] of TOKEN_TYPE) {

QNAN_TOKEN: j  0;

SNAN_TOKEN: j  1;

ZERO_VALUE_TOKEN: j  2;

POS_ONE_VALUE_TOKEN: j  3;

NEG_INF_TOKEN: j  4;

POS_INF_TOKEN: j  5;

NEG_VALUE_TOKEN: j  6;

POS_VALUE_TOKEN: j  7;

} ; end source special CASE(tsrc…)

; The required response from src3 table is extracted

token_response[3:0] = tbl3[3+4*j:4*j];

CASE(token_response[3:0]) {

0000: dest[31:0]  dest[31:0]; ; preserve content of DEST

0001: dest[31:0]  tsrc[31:0]; ; pass through src1 normal input value, denormal as zero

0010: dest[31:0]  QNaN(tsrc[31:0]);

0011: dest[31:0]  QNAN_Indefinite;

0100: dest[31:0]  -INF;

0101: dest[31:0]  +INF;

0110: dest[31:0]  tsrc.sign? –INF : +INF;

0111: dest[31:0]  -0;

1000: dest[31:0]  +0;

1001: dest[31:0]  -1;

1010: dest[31:0]  +1;

1011: dest[31:0]  ½;

1100: dest[31:0]  90.0;

1101: dest[31:0]  PI/2;

1110: dest[31:0]  MAX_FLOAT;

1111: dest[31:0]  -MAX_FLOAT;

} ; end of token_response CASE

VFIXUPIMMPS—Fix Up Special Packed Float32 Values

INSTRUCTION SET REFERENCE, V-Z

5-112 Vol. 2C

; The required fault reporting from imm8 is extracted

; TOKENs are mutually exclusive and TOKENs priority defines the order.

; Multiple faults related to a single token can occur simultaneously.

IF (tsrc[31:0] of TOKEN_TYPE: ZERO_VALUE_TOKEN) AND imm8[0] then set #ZE;

IF (tsrc[31:0] of TOKEN_TYPE: ZERO_VALUE_TOKEN) AND imm8[1] then set #IE;

IF (tsrc[31:0] of TOKEN_TYPE: ONE_VALUE_TOKEN) AND imm8[2] then set #ZE;

IF (tsrc[31:0] of TOKEN_TYPE: ONE_VALUE_TOKEN) AND imm8[3] then set #IE;

IF (tsrc[31:0] of TOKEN_TYPE: SNAN_TOKEN) AND imm8[4] then set #IE;

IF (tsrc[31:0] of TOKEN_TYPE: NEG_INF_TOKEN) AND imm8[5] then set #IE;

IF (tsrc[31:0] of TOKEN_TYPE: NEG_VALUE_TOKEN) AND imm8[6] then set #IE;

IF (tsrc[31:0] of TOKEN_TYPE: POS_INF_TOKEN) AND imm8[7] then set #IE;

; end fault reporting

return dest[31:0];

} ; end of FIXUPIMM_SP()

VFIXUPIMMPS (EVEX)

(KL, VL) = (4, 128), (8, 256), (16, 512)

FOR j  0 TO KL-1

i  j * 32

IF k1[j] OR *no writemask*

THEN

IF (EVEX.b = 1) AND (SRC2 *is memory*)

THEN

DEST[i+31:i]  FIXUPIMM_SP(DEST[i+31:i], SRC1[i+31:i], SRC2[31:0], imm8 [7:0])

ELSE

DEST[i+31:i]  FIXUPIMM_SP(DEST[i+31:i], SRC1[i+31:i], SRC2[i+31:i], imm8 [7:0])

FI;

ELSE

IF *merging-masking* ; merging-masking

THEN *DEST[i+31:i] remains unchanged*

ELSE DEST[i+31:i]  0 ; zeroing-masking

FI;

ENDFOR

DEST[MAXVL-1:VL]  0

VFIXUPIMMPS—Fix Up Special Packed Float32 Values

INSTRUCTION SET REFERENCE, V-Z

Vol. 2C 5-113

Immediate Control Description:

Intel C/C++ Compiler Intrinsic Equivalent

VFIXUPIMMPS __m512 _mm512_fixupimm_ps( __m512 a, __m512i tbl, int imm);

VFIXUPIMMPS __m512 _mm512_mask_fixupimm_ps(__m512 s, __mmask16 k, __m512 a, __m512i tbl, int imm);

VFIXUPIMMPS __m512 _mm512_maskz_fixupimm_ps( __mmask16 k, __m512 a, __m512i tbl, int imm);

VFIXUPIMMPS __m512 _mm512_fixupimm_round_ps( __m512 a, __m512i tbl, int imm, int sae);

VFIXUPIMMPS __m512 _mm512_mask_fixupimm_round_ps(__m512 s, __mmask16 k, __m512 a, __m512i tbl, int imm, int sae);

VFIXUPIMMPS __m512 _mm512_maskz_fixupimm_round_ps( __mmask16 k, __m512 a, __m512i tbl, int imm, int sae);

VFIXUPIMMPS __m256 _mm256_fixupimm_ps( __m256 a, __m256i tbl, int imm);

VFIXUPIMMPS __m256 _mm256_mask_fixupimm_ps(__m256 s, __mmask8 k, __m256 a, __m256i tbl, int imm);

VFIXUPIMMPS __m256 _mm256_maskz_fixupimm_ps( __mmask8 k, __m256 a, __m256i tbl, int imm);

VFIXUPIMMPS __m128 _mm_fixupimm_ps( __m128 a, __m128i tbl, int imm);

VFIXUPIMMPS __m128 _mm_mask_fixupimm_ps(__m128 s, __mmask8 k, __m128 a, __m128i tbl, int imm);

VFIXUPIMMPS __m128 _mm_maskz_fixupimm_ps( __mmask8 k, __m128 a, __m128i tbl, int imm);

SIMD Floating-Point Exceptions

Zero, Invalid

Other Exceptions

See Exceptions Type E2.

Figure 5-10. VFIXUPIMMPS Immediate Control Description

76543210

+ INF #IE

- INF #IE

SNaN #IE

- VE #IE

ONE

#IE

ONE #ZE

ZERO #IE

ZERO #ZE

VFIXUPIMMSD—Fix Up Special Scalar Float64 Value

INSTRUCTION SET REFERENCE, V-Z

5-114 Vol. 2C

VFIXUPIMMSD—Fix Up Special Scalar Float64 Value

Instruction Operand Encoding

Description

Perform a fix-up of the low quadword element encoded in double-precision floating-point format in the first source

operand (the second operand) using a 32-bit, two-level look-up table specified in the low quadword element of the

second source operand (the third operand) with exception reporting specifier imm8. The element that is fixed-up is

selected by mask bit of 1 specified in the opmask k1. Mask bit of 0 in the opmask k1 or table response action of

0000b preserves the corresponding element of the first operand. The fixed-up element from the first source

operand or the preserved element in the first operand becomes the low quadword element of the destination

operand (the first operand). Bits 127:64 of the destination operand is copied from the corresponding bits of the first

source operand. The destination and first source operands are XMM registers. The second source operand can be a

XMM register or a 64- bit memory location.