ARM® Compiler Armasm User Guide DUI0801G

User Manual:

Open the PDF directly: View PDF .
Page Count: 1721

Download
Open PDF In Browser	View PDF

ARM® Compiler
Version 6.6

armasm User Guide

ARM® Compiler

ARM® Compiler
armasm User Guide
Copyright © 2014-2016 ARM Limited or its affiliates. All rights reserved.
Release Information

Document History
Issue

Date

Confidentiality

Change

14 March 2014

Non-Confidential

ARM Compiler v6.00 Release

15 December 2014

Non-Confidential

ARM Compiler v6.01 Release

30 June 2015

Non-Confidential

ARM Compiler v6.02 Release

18 November 2015

Non-Confidential

ARM Compiler v6.3 Release

24 February 2016

Non-Confidential

ARM Compiler v6.4 Release

29 June 2016

Non-Confidential

ARM Compiler v6.5 Release

04 November 2016

Non-Confidential

ARM Compiler v6.6 Release

Non-Confidential Proprietary Notice
This document is protected by copyright and other related rights and the practice or implementation of the information contained in
this document may be protected by one or more patents or pending patent applications. No part of this document may be
reproduced in any form by any means without the express prior written permission of ARM. No license, express or implied, by
estoppel or otherwise to any intellectual property rights is granted by this document unless specifically stated.
Your access to the information in this document is conditional upon your acceptance that you will not use or permit others to use
the information for the purposes of determining whether implementations infringe any third party patents.
THIS DOCUMENT IS PROVIDED “AS IS”. ARM PROVIDES NO REPRESENTATIONS AND NO WARRANTIES,
EXPRESS, IMPLIED OR STATUTORY, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
MERCHANTABILITY, SATISFACTORY QUALITY, NON-INFRINGEMENT OR FITNESS FOR A PARTICULAR PURPOSE
WITH RESPECT TO THE DOCUMENT. For the avoidance of doubt, ARM makes no representation with respect to, and has
undertaken no analysis to identify or understand the scope and content of, third party patents, copyrights, trade secrets, or other
rights.
This document may include technical inaccuracies or typographical errors.
TO THE EXTENT NOT PROHIBITED BY LAW, IN NO EVENT WILL ARM BE LIABLE FOR ANY DAMAGES,
INCLUDING WITHOUT LIMITATION ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, PUNITIVE, OR
CONSEQUENTIAL DAMAGES, HOWEVER CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY, ARISING
OUT OF ANY USE OF THIS DOCUMENT, EVEN IF ARM HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH
DAMAGES.
This document consists solely of commercial items. You shall be responsible for ensuring that any use, duplication or disclosure of
this document complies fully with any relevant export laws and regulations to assure that this document or any portion thereof is
not exported, directly or indirectly, in violation of such export laws. Use of the word “partner” in reference to ARM’s customers is
not intended to create or refer to any partnership relationship with any other company. ARM may make changes to this document at
any time and without notice.
If any of the provisions contained in these terms conflict with any of the provisions of any signed written agreement covering this
document with ARM, then the signed written agreement prevails over and supersedes the conflicting provisions of these terms.
This document may be translated into other languages for convenience, and you agree that if there is any conflict between the
English version of this document and any translation, the terms of the English version of the Agreement shall prevail.
Words and logos marked with ® or ™ are registered trademarks or trademarks of ARM Limited or its affiliates in the EU and/or
elsewhere. All rights reserved. Other brands and names mentioned in this document may be the trademarks of their respective
owners. Please follow ARM’s trademark usage guidelines at http://www.arm.com/about/trademark-usage-guidelines.php
Copyright © 2014-2016, ARM Limited or its affiliates. All rights reserved.

ARM DUI0801G

ARM® Compiler

ARM Limited. Company 02557590 registered in England.
110 Fulbourn Road, Cambridge, England CB1 9NJ.
LES-PRE-20349
Confidentiality Status
This document is Non-Confidential. The right to use, copy and disclose this document may be subject to license restrictions in
accordance with the terms of the agreement entered into by ARM and the party that ARM delivered this document to.
Unrestricted Access is an ARM internal classification.
Product Status
The information in this document is Final, that is for a developed product.
Web Address
http://www.arm.com

ARM DUI0801G

Contents
ARM® Compiler armasm User Guide

Preface
About this book ..................................................... ..................................................... 43

Chapter 1

Overview of the Assembler
1.1
1.2
1.3
1.4
1.5

Chapter 2

About the ARM architecture .......................................... ..........................................
A32 and T32 instruction sets ......................................... .........................................
A64 instruction set ................................................. .................................................
Changing between AArch64 and AArch32 states ......................... .........................
Advanced SIMD ................................................... ...................................................
Floating-point hardware ............................................. .............................................

2-57
2-58
2-59
2-60
2-61
2-62

Overview of AArch32 state
3.1
3.2
3.3
3.4
3.5

ARM DUI0801G

1-47
1-48
1-49
1-51
1-53

Overview of the ARM Architecture
2.1
2.2
2.3
2.4
2.5
2.6

Chapter 3

About the ARM Compiler toolchain assemblers ......................................................
Key features of the assembler ........................................ ........................................
How the assembler works ........................................... ...........................................
Directives that can be omitted in pass 2 of the assembler ......................................
Support level definitions ............................................. .............................................

Changing between A32 and T32 instruction set states ..................... .....................
Processor modes, and privileged and unprivileged software execution ..................
Processor modes in ARMv6-M, ARMv7-M, and ARMv8-M .................. ..................
Registers in AArch32 state ......................................................................................
General-purpose registers in AArch32 state ............................. .............................

3-64
3-65
3-66
3-67
3-69

Chapter 4

3.6

3.7
3.8
3.9
3.10
3.11
3.12
3.13
3.14
3.15

Predeclared core register names in AArch32 state ........................ ........................ 3-71
Predeclared extension register names in AArch32 state .................... .................... 3-72
Program Counter in AArch32 state .......................................................................... 3-73
The Q flag in AArch32 state .......................................... .......................................... 3-74
Application Program Status Register ................................... ................................... 3-75
Current Program Status Register in AArch32 state ........................ ........................ 3-76
Saved Program Status Registers in AArch32 state ........................ ........................ 3-77
A32 and T32 instruction set overview ...................................................................... 3-78
Access to the inline barrel shifter in AArch32 state ........................ ........................ 3-79

Overview of AArch64 state
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
4.10
4.11
4.12

Chapter 5

Structure of Assembly Language Modules
5.1
5.2
5.3
5.4

Chapter 6

Syntax of source lines in assembly language .......................................................... 5-94
Literals .......................................................... .......................................................... 5-96
ELF sections and the AREA directive ...................................................................... 5-97
An example ARM assembly language module ........................................................ 5-98

Writing A32/T32 Assembly Language
6.1
6.2
6.3
6.4
6.5
6.6
6.7
6.8
6.9
6.10
6.11
6.12
6.13
6.14
6.15
6.16
6.17
6.18

ARM DUI0801G

Registers in AArch64 state ...................................................................................... 4-81
Exception levels ................................................... ................................................... 4-82
Link registers ..................................................... ..................................................... 4-83
Stack Pointer register .............................................................................................. 4-84
Predeclared core register names in AArch64 state ........................ ........................ 4-85
Predeclared extension register names in AArch64 state .................... .................... 4-86
Program Counter in AArch64 state .......................................................................... 4-87
Conditional execution in AArch64 state ................................. ................................. 4-88
The Q flag in AArch64 state .......................................... .......................................... 4-89
Process State .......................................................................................................... 4-90
Saved Program Status Registers in AArch64 state ........................ ........................ 4-91
A64 instruction set overview .................................................................................... 4-92

About the Unified Assembler Language ................................................................ 6-102
Syntax differences between UAL and A64 assembly language ............................ 6-103
Register usage in subroutine calls .................................... .................................... 6-104
Load immediate values .......................................................................................... 6-105
Load immediate values using MOV and MVN ........................... ........................... 6-106
Load immediate values using MOV32 ................................. ................................. 6-109
Load immediate values using LDR Rd, =const ...................................................... 6-110
Literal pools ............................................................................................................ 6-111
Load addresses into registers ................................................................................ 6-113
Load addresses to a register using ADR ............................... ............................... 6-114
Load addresses to a register using ADRL .............................. .............................. 6-116
Load addresses to a register using LDR Rd, =label .............................................. 6-117
Other ways to load and store registers .................................................................. 6-119
Load and store multiple register instructions ............................ ............................ 6-120
Load and store multiple register instructions in A32 and T32 ................................ 6-121
Stack implementation using LDM and STM ............................. ............................. 6-122
Stack operations for nested subroutines ............................... ............................... 6-124
Block copy with LDM and STM .............................................................................. 6-125

Chapter 7

6.19

Memory accesses .................................................................................................. 6-127

6.20
6.21
6.22
6.23
6.24
6.25
6.26
6.27
6.28

The Read-Modify-Write operation .................................... .................................... 6-128
Optional hash with immediate constants ............................... ............................... 6-129
Use of macros ................................................... ................................................... 6-130
Test-and-branch macro example ..................................... ..................................... 6-131
Unsigned integer division macro example .............................. .............................. 6-132
Instruction and directive relocations ...................................................................... 6-134
Symbol versions .................................................................................................... 6-136
Frame directives .................................................................................................... 6-137
Exception tables and Unwind tables ...................................................................... 6-138

Condition Codes
7.1
7.2
7.3
7.4
7.5
7.6
7.7
7.8
7.9
7.10
7.11
7.12
7.13
7.14
7.15
7.16

Chapter 8

Using armasm
8.1
8.2
8.3
8.4
8.5
8.6
8.7
8.8
8.9
8.10
8.11
8.12
8.13
8.14
8.15
8.16
8.17

Chapter 9

armasm command-line syntax ....................................... ....................................... 8-160
Specify command-line options with an environment variable ................................ 8-161
Using stdin to input source code to the assembler ................................................ 8-162
Built-in variables and constants ...................................... ...................................... 8-163
Identifying versions of armasm in source code .......................... .......................... 8-167
Diagnostic messages .............................................. .............................................. 8-168
Interlocks diagnostics ............................................................................................ 8-169
Automatic IT block generation in T32 code ............................. ............................. 8-170
T32 branch target alignment .................................................................................. 8-171
T32 code size diagnostics .......................................... .......................................... 8-172
A32 and T32 instruction portability diagnostics .......................... .......................... 8-173
T32 instruction width diagnostics ..................................... ..................................... 8-174
Two pass assembler diagnostics ..................................... ..................................... 8-175
Using the C preprocessor ...................................................................................... 8-176
Address alignment in A32/T32 code ...................................................................... 8-178
Address alignment in A64 code ...................................... ...................................... 8-179
Instruction width selection in T32 code .................................................................. 8-180

Advanced SIMD Programming
9.1

ARM DUI0801G

Conditional instructions ............................................ ............................................ 7-140
Conditional execution in A32 code ........................................................................ 7-141
Conditional execution in T32 code .................................... .................................... 7-142
Conditional execution in A64 code ........................................................................ 7-143
Condition flags ................................................... ................................................... 7-144
Updates to the condition flags in A32/T32 code .................................................... 7-145
Updates to the condition flags in A64 code ............................. ............................. 7-146
Floating-point instructions that update the condition flags .................. .................. 7-147
Carry flag ....................................................... ....................................................... 7-148
Overflow flag .......................................................................................................... 7-149
Condition code suffixes ............................................ ............................................ 7-150
Condition code suffixes and related flags .............................................................. 7-151
Comparison of condition code meanings in integer and floating-point code .... .... 7-152
Benefits of using conditional execution in A32 and T32 code ............... ............... 7-154
Example showing the benefits of conditional instructions in A32 and T32 code . . 7-155
Optimization for execution speed .......................................................................... 7-158

Architecture support for Advanced SIMD .............................................................. 9-182

Chapter 10

9.2

Extension register bank mapping for Advanced SIMD in AArch32 state ....... ....... 9-183

9.3
9.4
9.5
9.6
9.7
9.8
9.9
9.10
9.11
9.12
9.13
9.14
9.15
9.16
9.17
9.18
9.19
9.20
9.21

Extension register bank mapping for Advanced SIMD in AArch64 state ....... .......
Views of the Advanced SIMD register bank in AArch32 state ............... ...............
Views of the Advanced SIMD register bank in AArch64 state ............... ...............
Differences between A32/T32 and A64 Advanced SIMD instruction syntax .... ....
Load values to Advanced SIMD registers .............................. ..............................
Conditional execution of A32/T32 Advanced SIMD instructions ............. .............
Floating-point exceptions for Advanced SIMD in A32/T32 instructions ........ ........
Advanced SIMD data types in A32/T32 instructions ...................... ......................
Polynomial arithmetic over {0,1} ............................................................................
Advanced SIMD vectors ........................................................................................
Normal, long, wide, and narrow Advanced SIMD instructions ............... ...............
Saturating Advanced SIMD instructions ................................................................
Advanced SIMD scalars ........................................................................................
Extended notation extension for Advanced SIMD in A32/T32 code ......................
Advanced SIMD system registers in AArch32 state ..............................................
Flush-to-zero mode in Advanced SIMD ................................ ................................
When to use flush-to-zero mode in Advanced SIMD ...................... ......................
The effects of using flush-to-zero mode in Advanced SIMD ................ ................
Advanced SIMD operations not affected by flush-to-zero mode ............. .............

Floating-point Programming
10.1
10.2
10.3
10.4
10.5
10.6
10.7
10.8
10.9
10.10
10.11
10.12
10.13
10.14
10.15
10.16

Chapter 11

Architecture support for floating-point .................................................................. 10-207
Extension register bank mapping for floating-point in AArch32 state .................. 10-208
Extension register bank mapping in AArch64 state ...................... ...................... 10-210
Views of the floating-point extension register bank in AArch32 state .................. 10-211
Views of the floating-point extension register bank in AArch64 state .................. 10-212
Differences between A32/T32 and A64 floating-point instruction syntax ...... ...... 10-213
Load values to floating-point registers ................................ ................................ 10-214
Conditional execution of A32/T32 floating-point instructions ............... ............... 10-215
Floating-point exceptions for floating-point in A32/T32 instructions .................... 10-216
Floating-point data types in A32/T32 instructions ................................................ 10-217
Extended notation extension for floating-point in A32/T32 code ............ ............ 10-218
Floating-point system registers in AArch32 state ................................................ 10-219
Flush-to-zero mode in floating-point .................................................................... 10-220
When to use flush-to-zero mode in floating-point ................................................ 10-221
The effects of using flush-to-zero mode in floating-point .................. .................. 10-222
Floating-point operations not affected by flush-to-zero mode .............. .............. 10-223

armasm Command-line Options
11.1
11.2
11.3
11.4
11.5
11.6
11.7
11.8
11.9
11.10

ARM DUI0801G

9-185
9-187
9-188
9-189
9-191
9-192
9-193
9-194
9-195
9-196
9-197
9-198
9-199
9-200
9-201
9-202
9-203
9-204
9-205

--16 ........................................................... ........................................................... 11-226
--32 ........................................................... ........................................................... 11-227
--apcs=qualifier…qualifier .................................................................................... 11-228
--arm .................................................................................................................... 11-230
--arm_only ............................................................................................................ 11-231
--bi ........................................................................................................................ 11-232
--bigend ................................................................................................................ 11-233
--brief_diagnostics, --no_brief_diagnostics .......................................................... 11-234
--checkreglist ................................................... ................................................... 11-235
--cpreproc ...................................................... ...................................................... 11-236

ARM DUI0801G

11.11

--cpreproc_opts=option[,option,…] ................................... ................................... 11-237

11.12
11.13
11.14
11.15
11.16
11.17
11.18
11.19
11.20
11.21
11.22
11.23
11.24
11.25
11.26
11.27
11.28
11.29
11.30
11.31
11.32
11.33
11.34
11.35
11.36
11.37
11.38
11.39
11.40
11.41
11.42
11.43
11.44
11.45
11.46
11.47
11.48
11.49
11.50
11.51
11.52
11.53
11.54
11.55
11.56
11.57
11.58
11.59
11.60

--cpu=list .............................................................................................................. 11-238
--cpu=name .......................................................................................................... 11-239
--debug ........................................................ ........................................................ 11-242
--depend=dependfile ............................................................................................ 11-243
--depend_format=string ........................................... ........................................... 11-244
--diag_error=tag[,tag,…] ........................................... ........................................... 11-245
--diag_remark=tag[,tag,…] ......................................... ......................................... 11-246
--diag_style={arm|ide|gnu} ......................................... ......................................... 11-247
--diag_suppress=tag[,tag,…] ....................................... ....................................... 11-248
--diag_warning=tag[,tag,…] ........................................ ........................................ 11-249
--dllexport_all ................................................... ................................................... 11-250
--dwarf2 ................................................................................................................ 11-251
--dwarf3 ................................................................................................................ 11-252
--errors=errorfile ................................................. ................................................. 11-253
--exceptions, --no_exceptions .............................................................................. 11-254
--exceptions_unwind, --no_exceptions_unwind ......................... ......................... 11-255
--execstack, --no_execstack ................................................................................ 11-256
--execute_only .................................................. .................................................. 11-257
--fpmode=model ................................................. ................................................. 11-258
--fpu=list ....................................................... ....................................................... 11-259
--fpu=name ..................................................... ..................................................... 11-260
-g .......................................................................................................................... 11-261
--help .................................................................................................................... 11-262
-idir[,dir, …] ..................................................... ..................................................... 11-263
--keep ......................................................... ......................................................... 11-264
--length=n ...................................................... ...................................................... 11-265
--li ............................................................ ............................................................ 11-266
--library_type=lib .................................................................................................. 11-267
--list=file ....................................................... ....................................................... 11-268
--list= .................................................................................................................... 11-269
--littleend .............................................................................................................. 11-270
-m ............................................................ ............................................................ 11-271
--maxcache=n ...................................................................................................... 11-272
--md .......................................................... .......................................................... 11-273
--no_code_gen .................................................. .................................................. 11-274
--no_esc ....................................................... ....................................................... 11-275
--no_hide_all ........................................................................................................ 11-276
--no_regs ...................................................... ...................................................... 11-277
--no_terse ...................................................... ...................................................... 11-278
--no_warn ...................................................... ...................................................... 11-279
-o filename ..................................................... ..................................................... 11-280
--pd ........................................................... ........................................................... 11-281
--predefine "directive" ............................................. ............................................. 11-282
--reduce_paths, --no_reduce_paths .................................. .................................. 11-283
--regnames ..................................................... ..................................................... 11-284
--report-if-not-wysiwyg ............................................ ............................................ 11-285
--show_cmdline .................................................................................................... 11-286
--thumb ........................................................ ........................................................ 11-287
--unaligned_access, --no_unaligned_access ........................... ........................... 11-288
Copyright © 2014-2016 ARM Limited or its affiliates. All rights
reserved.
Non-Confidential

Chapter 12

11.61

--unsafe ................................................................................................................ 11-289

11.62
11.63
11.64
11.65
11.66
11.67

--untyped_local_labels ............................................ ............................................
--version_number ................................................ ................................................
--via=filename ......................................................................................................
--vsn .......................................................... ..........................................................
--width=n ..............................................................................................................
--xref .......................................................... ..........................................................

Symbols, Literals, Expressions, and Operators
12.1
12.2
12.3
12.4
12.5
12.6
12.7
12.8
12.9
12.10
12.11
12.12
12.13
12.14
12.15
12.16
12.17
12.18
12.19
12.20
12.21
12.22
12.23
12.24
12.25
12.26
12.27
12.28

Chapter 13

Symbol naming rules ............................................. ............................................. 12-298
Variables .............................................................................................................. 12-299
Numeric constants ............................................... ............................................... 12-300
Assembly time substitution of variables ............................... ............................... 12-301
Register-relative and PC-relative expressions .......................... .......................... 12-302
Labels .................................................................................................................. 12-303
Labels for PC-relative addresses .................................... .................................... 12-304
Labels for register-relative addresses ................................ ................................ 12-305
Labels for absolute addresses ...................................... ...................................... 12-306
Numeric local labels .............................................. .............................................. 12-307
Syntax of numeric local labels ...................................... ...................................... 12-308
String expressions ............................................... ............................................... 12-309
String literals ........................................................................................................ 12-310
Numeric expressions ............................................. ............................................. 12-311
Syntax of numeric literals .......................................... .......................................... 12-312
Syntax of floating-point literals ...................................... ...................................... 12-313
Logical expressions .............................................. .............................................. 12-314
Logical literals ...................................................................................................... 12-315
Unary operators ................................................. ................................................. 12-316
Binary operators .................................................................................................. 12-317
Multiplicative operators ........................................................................................ 12-318
String manipulation operators .............................................................................. 12-319
Shift operators .................................................. .................................................. 12-320
Addition, subtraction, and logical operators ............................ ............................ 12-321
Relational operators .............................................. .............................................. 12-322
Boolean operators ............................................... ............................................... 12-323
Operator precedence ............................................. ............................................. 12-324
Difference between operator precedence in assembly language and C ...... ...... 12-325

A32 and T32 Instructions
13.1
13.2
13.3
13.4
13.5
13.6
13.7
13.8
13.9
13.10
13.11

ARM DUI0801G

11-290
11-291
11-292
11-293
11-294
11-295

A32 and T32 instruction summary ................................... ................................... 13-332
Instruction width specifiers ......................................... ......................................... 13-337
Flexible second operand (Operand2) .................................................................. 13-338
Syntax of Operand2 as a constant ...................................................................... 13-339
Syntax of Operand2 as a register with optional shift ..................... ..................... 13-340
Shift operations .................................................................................................... 13-341
Saturating instructions ............................................ ............................................ 13-344
ADC .......................................................... .......................................................... 13-345
ADD .......................................................... .......................................................... 13-347
ADR (PC-relative) ................................................................................................ 13-349
ADR (register-relative) ............................................ ............................................ 13-351

ARM DUI0801G

13.12

ADRL pseudo-instruction .......................................... .......................................... 13-353

13.13
13.14
13.15
13.16
13.17
13.18
13.19
13.20
13.21
13.22
13.23
13.24
13.25
13.26
13.27
13.28
13.29
13.30
13.31
13.32
13.33
13.34
13.35
13.36
13.37
13.38
13.39
13.40
13.41
13.42
13.43
13.44
13.45
13.46
13.47
13.48
13.49
13.50
13.51
13.52
13.53
13.54
13.55
13.56
13.57
13.58
13.59
13.60
13.61

AND .......................................................... .......................................................... 13-355
ASR .......................................................... .......................................................... 13-357
B .......................................................................................................................... 13-359
BFC .......................................................... .......................................................... 13-361
BFI ........................................................... ........................................................... 13-362
BIC ........................................................... ........................................................... 13-363
BKPT ......................................................... ......................................................... 13-365
BL ........................................................................................................................ 13-366
BLX, BLXNS ........................................................................................................ 13-368
BX, BXNS ............................................................................................................ 13-370
BXJ ...................................................................................................................... 13-372
CBZ and CBNZ .................................................................................................... 13-373
CDP and CDP2 ................................................. ................................................. 13-374
CLREX ........................................................ ........................................................ 13-375
CLZ ...................................................................................................................... 13-376
CMP and CMN .................................................. .................................................. 13-377
CPS .......................................................... .......................................................... 13-379
CPY pseudo-instruction ........................................... ........................................... 13-381
CRC32 ........................................................ ........................................................ 13-382
CRC32C .............................................................................................................. 13-383
DBG .......................................................... .......................................................... 13-384
DCPS1 (T32 instruction) ...................................................................................... 13-385
DCPS2 (T32 instruction) ...................................................................................... 13-386
DCPS3 (T32 instruction) ...................................................................................... 13-387
DMB .......................................................... .......................................................... 13-388
DSB .......................................................... .......................................................... 13-390
EOR .......................................................... .......................................................... 13-392
ERET ......................................................... ......................................................... 13-394
ESB .......................................................... .......................................................... 13-395
HLT ...................................................................................................................... 13-396
HVC .......................................................... .......................................................... 13-397
ISB ........................................................... ........................................................... 13-398
IT ............................................................ ............................................................ 13-399
LDA ...................................................................................................................... 13-402
LDAEX ........................................................ ........................................................ 13-403
LDC and LDC2 .................................................................................................... 13-405
LDM .......................................................... .......................................................... 13-407
LDR (immediate offset) ........................................................................................ 13-409
LDR (PC-relative) ................................................ ................................................ 13-411
LDR (register offset) ............................................................................................ 13-413
LDR (register-relative) ............................................ ............................................ 13-415
LDR pseudo-instruction ........................................... ........................................... 13-417
LDR, unprivileged ................................................................................................ 13-419
LDREX ........................................................ ........................................................ 13-421
LSL ...................................................................................................................... 13-423
LSR ...................................................................................................................... 13-425
MCR and MCR2 .................................................................................................. 13-427
MCRR and MCRR2 .............................................. .............................................. 13-428
MLA .......................................................... .......................................................... 13-429
Copyright © 2014-2016 ARM Limited or its affiliates. All rights
reserved.
Non-Confidential

ARM DUI0801G

13.62

MLS .......................................................... .......................................................... 13-430

13.63
13.64
13.65
13.66
13.67
13.68
13.69
13.70
13.71
13.72
13.73
13.74
13.75
13.76
13.77
13.78
13.79
13.80
13.81
13.82
13.83
13.84
13.85
13.86
13.87
13.88
13.89
13.90
13.91
13.92
13.93
13.94
13.95
13.96
13.97
13.98
13.99
13.100
13.101
13.102
13.103
13.104
13.105
13.106
13.107
13.108
13.109
13.110
13.111

MOV .......................................................... .......................................................... 13-431
MOV32 pseudo-instruction .................................................................................. 13-433
MOVT .................................................................................................................. 13-434
MRC and MRC2 .................................................................................................. 13-435
MRRC and MRRC2 .............................................. .............................................. 13-436
MRS (PSR to general-purpose register) .............................................................. 13-437
MRS (system coprocessor register to ARM register) .......................................... 13-439
MSR (ARM register to system coprocessor register) .......................................... 13-440
MSR (general-purpose register to PSR) .............................................................. 13-441
MUL .......................................................... .......................................................... 13-443
MVN .......................................................... .......................................................... 13-444
NEG pseudo-instruction ........................................... ........................................... 13-446
NOP .......................................................... .......................................................... 13-447
ORN (T32 only) ................................................. ................................................. 13-448
ORR .......................................................... .......................................................... 13-449
PKHBT and PKHTB .............................................. .............................................. 13-451
PLD, PLDW, and PLI ............................................. ............................................. 13-453
POP .......................................................... .......................................................... 13-455
PUSH ......................................................... ......................................................... 13-456
QADD .................................................................................................................. 13-457
QADD8 ................................................................................................................ 13-458
QADD16 .............................................................................................................. 13-459
QASX ......................................................... ......................................................... 13-460
QDADD ................................................................................................................ 13-461
QDSUB ................................................................................................................ 13-462
QSAX ......................................................... ......................................................... 13-463
QSUB ......................................................... ......................................................... 13-464
QSUB8 ........................................................ ........................................................ 13-465
QSUB16 ....................................................... ....................................................... 13-466
RBIT .......................................................... .......................................................... 13-467
REV .......................................................... .......................................................... 13-468
REV16 ........................................................ ........................................................ 13-469
REVSH ................................................................................................................ 13-470
RFE .......................................................... .......................................................... 13-471
ROR .......................................................... .......................................................... 13-473
RRX .......................................................... .......................................................... 13-475
RSB .......................................................... .......................................................... 13-477
RSC .......................................................... .......................................................... 13-479
SADD8 ........................................................ ........................................................ 13-480
SADD16 ....................................................... ....................................................... 13-481
SASX ......................................................... ......................................................... 13-482
SBC .......................................................... .......................................................... 13-483
SBFX ......................................................... ......................................................... 13-485
SDIV .................................................................................................................... 13-486
SEL ...................................................................................................................... 13-487
SETEND .............................................................................................................. 13-489
SETPAN ....................................................... ....................................................... 13-490
SEV .......................................................... .......................................................... 13-491
SEVL ......................................................... ......................................................... 13-492

13.112 SG ........................................................... ........................................................... 13-493
13.113
13.114
13.115
13.116
13.117
13.118
13.119
13.120
13.121
13.122
13.123
13.124
13.125
13.126
13.127
13.128
13.129
13.130
13.131
13.132
13.133
13.134
13.135
13.136
13.137
13.138
13.139
13.140
13.141
13.142
13.143
13.144
13.145
13.146
13.147
13.148
13.149
13.150
13.151
13.152
13.153
13.154
13.155
13.156
13.157
13.158
13.159
13.160
13.161
ARM DUI0801G

SHADD8 .............................................................................................................. 13-494
SHADD16 ............................................................................................................ 13-495
SHASX ........................................................ ........................................................ 13-496
SHSAX ........................................................ ........................................................ 13-497
SHSUB8 .............................................................................................................. 13-498
SHSUB16 ............................................................................................................ 13-499
SMC .......................................................... .......................................................... 13-500
SMLAxy ....................................................... ....................................................... 13-501
SMLAD ................................................................................................................ 13-503
SMLAL ........................................................ ........................................................ 13-504
SMLALD .............................................................................................................. 13-505
SMLALxy ...................................................... ...................................................... 13-506
SMLAWy .............................................................................................................. 13-507
SMLSD ................................................................................................................ 13-508
SMLSLD .............................................................................................................. 13-509
SMMLA ................................................................................................................ 13-510
SMMLS ................................................................................................................ 13-511
SMMUL ................................................................................................................ 13-512
SMUAD ................................................................................................................ 13-513
SMULxy ....................................................... ....................................................... 13-514
SMULL ........................................................ ........................................................ 13-515
SMULWy .............................................................................................................. 13-516
SMUSD ................................................................................................................ 13-517
SRS .......................................................... .......................................................... 13-518
SSAT ......................................................... ......................................................... 13-520
SSAT16 ....................................................... ....................................................... 13-521
SSAX ......................................................... ......................................................... 13-522
SSUB8 ........................................................ ........................................................ 13-523
SSUB16 ....................................................... ....................................................... 13-524
STC and STC2 .................................................................................................... 13-525
STL ...................................................................................................................... 13-527
STLEX ........................................................ ........................................................ 13-528
STM .......................................................... .......................................................... 13-530
STR (immediate offset) ........................................................................................ 13-532
STR (register offset) ............................................................................................ 13-534
STR, unprivileged ................................................................................................ 13-536
STREX ........................................................ ........................................................ 13-538
SUB .......................................................... .......................................................... 13-540
SUBS pc, lr .......................................................................................................... 13-542
SVC .......................................................... .......................................................... 13-544
SWP and SWPB .................................................................................................. 13-545
SXTAB ........................................................ ........................................................ 13-546
SXTAB16 ...................................................... ...................................................... 13-547
SXTAH ........................................................ ........................................................ 13-548
SXTB ......................................................... ......................................................... 13-549
SXTB16 ....................................................... ....................................................... 13-550
SXTH ......................................................... ......................................................... 13-551
SYS .......................................................... .......................................................... 13-553
TBB and TBH ................................................... ................................................... 13-554

13.162 TEQ .......................................................... .......................................................... 13-555
13.163
13.164
13.165
13.166
13.167
13.168
13.169
13.170
13.171
13.172
13.173
13.174
13.175
13.176
13.177
13.178
13.179
13.180
13.181
13.182
13.183
13.184
13.185
13.186
13.187
13.188
13.189
13.190
13.191
13.192
13.193
13.194
13.195
13.196
13.197
13.198
13.199
13.200
13.201
13.202

Chapter 14

Advanced SIMD Instructions (32-bit)
14.1
14.2
14.3
14.4
14.5
14.6
14.7

ARM DUI0801G

TST ...................................................................................................................... 13-556
TT, TTT, TTA, TTAT .............................................. .............................................. 13-557
UADD8 ........................................................ ........................................................ 13-559
UADD16 ....................................................... ....................................................... 13-560
UASX ......................................................... ......................................................... 13-561
UBFX ......................................................... ......................................................... 13-563
UDF .......................................................... .......................................................... 13-564
UDIV .................................................................................................................... 13-565
UHADD8 .............................................................................................................. 13-566
UHADD16 ............................................................................................................ 13-567
UHASX ................................................................................................................ 13-568
UHSAX ................................................................................................................ 13-569
UHSUB8 .............................................................................................................. 13-570
UHSUB16 ............................................................................................................ 13-571
UMAAL ................................................................................................................ 13-572
UMLAL ........................................................ ........................................................ 13-573
UMULL ........................................................ ........................................................ 13-574
UND pseudo-instruction ........................................... ........................................... 13-575
UQADD8 .............................................................................................................. 13-576
UQADD16 ............................................................................................................ 13-577
UQASX ................................................................................................................ 13-578
UQSAX ................................................................................................................ 13-579
UQSUB8 .............................................................................................................. 13-580
UQSUB16 ............................................................................................................ 13-581
USAD8 ........................................................ ........................................................ 13-582
USADA8 .............................................................................................................. 13-583
USAT ......................................................... ......................................................... 13-584
USAT16 ....................................................... ....................................................... 13-585
USAX ......................................................... ......................................................... 13-586
USUB8 ........................................................ ........................................................ 13-588
USUB16 ....................................................... ....................................................... 13-589
UXTAB ........................................................ ........................................................ 13-590
UXTAB16 ...................................................... ...................................................... 13-591
UXTAH ........................................................ ........................................................ 13-593
UXTB ......................................................... ......................................................... 13-594
UXTB16 ....................................................... ....................................................... 13-595
UXTH ......................................................... ......................................................... 13-596
WFE .......................................................... .......................................................... 13-597
WFI ...................................................................................................................... 13-598
YIELD .................................................................................................................. 13-599

Summary of Advanced SIMD instructions ............................. ............................. 14-604
Summary of shared Advanced SIMD and floating-point instructions ......... ......... 14-607
Cryptographic instructions ......................................... ......................................... 14-608
Interleaving provided by load and store element and structure instructions ........ 14-609
Alignment restrictions in load and store element and structure instructions ........ 14-610
FLDMDBX, FLDMIAX .......................................................................................... 14-611
FSTMDBX, FSTMIAX .......................................................................................... 14-612
Copyright © 2014-2016 ARM Limited or its affiliates. All rights
reserved.
Non-Confidential

ARM DUI0801G

14.8

VABA and VABAL ................................................................................................ 14-613

14.9
14.10
14.11
14.12
14.13
14.14
14.15
14.16
14.17
14.18
14.19
14.20
14.21
14.22
14.23
14.24
14.25
14.26
14.27
14.28
14.29
14.30
14.31
14.32
14.33
14.34
14.35
14.36
14.37
14.38
14.39
14.40
14.41
14.42
14.43
14.44
14.45
14.46
14.47
14.48
14.49
14.50
14.51
14.52
14.53
14.54
14.55
14.56
14.57

VABD and VABDL ............................................... ............................................... 14-614
VABS ......................................................... ......................................................... 14-615
VACLE, VACLT, VACGE and VACGT .................................................................. 14-616
VADD ......................................................... ......................................................... 14-617
VADDHN .............................................................................................................. 14-618
VADDL and VADDW ............................................................................................ 14-619
VAND (immediate) ............................................... ............................................... 14-620
VAND (register) ................................................. ................................................. 14-621
VBIC (immediate) ................................................................................................ 14-622
VBIC (register) .................................................. .................................................. 14-623
VBIF .......................................................... .......................................................... 14-624
VBIT .......................................................... .......................................................... 14-625
VBSL ......................................................... ......................................................... 14-626
VCADD ................................................................................................................ 14-627
VCEQ (immediate #0) ............................................ ............................................ 14-628
VCEQ (register) ................................................. ................................................. 14-629
VCGE (immediate #0) ............................................ ............................................ 14-630
VCGE (register) ................................................. ................................................. 14-631
VCGT (immediate #0) .......................................................................................... 14-632
VCGT (register) ................................................. ................................................. 14-633
VCLE (immediate #0) .......................................................................................... 14-634
VCLS ......................................................... ......................................................... 14-635
VCLE (register) .................................................................................................... 14-636
VCLT (immediate #0) ............................................. ............................................. 14-637
VCLT (register) .................................................................................................... 14-638
VCLZ ......................................................... ......................................................... 14-639
VCMLA ................................................................................................................ 14-640
VCMLA (by element) ............................................. ............................................. 14-641
VCNT ......................................................... ......................................................... 14-642
VCVT (between fixed-point or integer, and floating-point) ................. ................. 14-643
VCVT (between half-precision and single-precision floating-point) .......... .......... 14-644
VCVT (from floating-point to integer with directed rounding modes) ......... ......... 14-645
VCVTB, VCVTT (between half-precision and double-precision) ............ ............ 14-646
VDUP ......................................................... ......................................................... 14-647
VEOR ......................................................... ......................................................... 14-648
VEXT ......................................................... ......................................................... 14-649
VFMA, VFMS ................................................... ................................................... 14-650
VHADD ................................................................................................................ 14-651
VHSUB ................................................................................................................ 14-652
VLDn (single n-element structure to one lane) .................................................... 14-653
VLDn (single n-element structure to all lanes) .......................... .......................... 14-655
VLDn (multiple n-element structures) .................................................................. 14-657
VLDM ......................................................... ......................................................... 14-659
VLDR ......................................................... ......................................................... 14-660
VLDR (post-increment and pre-decrement) ............................ ............................ 14-661
VLDR pseudo-instruction .......................................... .......................................... 14-662
VMAX and VMIN ................................................ ................................................ 14-663
VMAXNM, VMINNM ............................................................................................ 14-664
VMLA ......................................................... ......................................................... 14-665
Copyright © 2014-2016 ARM Limited or its affiliates. All rights
reserved.
Non-Confidential

ARM DUI0801G

14.58

VMLA (by scalar) ................................................ ................................................ 14-666

14.59
14.60
14.61
14.62
14.63
14.64
14.65
14.66
14.67
14.68
14.69
14.70
14.71
14.72
14.73
14.74
14.75
14.76
14.77
14.78
14.79
14.80
14.81
14.82
14.83
14.84
14.85
14.86
14.87
14.88
14.89
14.90
14.91
14.92
14.93
14.94
14.95
14.96
14.97
14.98
14.99
14.100
14.101
14.102
14.103
14.104
14.105
14.106
14.107

VMLAL (by scalar) ............................................... ............................................... 14-667
VMLAL ........................................................ ........................................................ 14-668
VMLS (by scalar) ................................................ ................................................ 14-669
VMLS ......................................................... ......................................................... 14-670
VMLSL ........................................................ ........................................................ 14-671
VMLSL (by scalar) ............................................... ............................................... 14-672
VMOV (immediate) .............................................................................................. 14-673
VMOV (register) ................................................. ................................................. 14-674
VMOV (between two ARM registers and a 64-bit extension register) ........ ........ 14-675
VMOV (between an ARM register and an Advanced SIMD scalar) .......... .......... 14-676
VMOVL ................................................................................................................ 14-677
VMOVN ....................................................... ....................................................... 14-678
VMOV2 ................................................................................................................ 14-679
VMRS .................................................................................................................. 14-680
VMSR .................................................................................................................. 14-681
VMUL ......................................................... ......................................................... 14-682
VMUL (by scalar) ................................................ ................................................ 14-683
VMULL ........................................................ ........................................................ 14-684
VMULL (by scalar) ............................................... ............................................... 14-685
VMVN (register) ................................................. ................................................. 14-686
VMVN (immediate) .............................................................................................. 14-687
VNEG ......................................................... ......................................................... 14-688
VORN (register) ................................................. ................................................. 14-689
VORN (immediate) .............................................................................................. 14-690
VORR (register) ................................................. ................................................. 14-691
VORR (immediate) .............................................................................................. 14-692
VPADAL ....................................................... ....................................................... 14-693
VPADD ........................................................ ........................................................ 14-694
VPADDL ....................................................... ....................................................... 14-695
VPMAX and VPMIN .............................................. .............................................. 14-696
VPOP ......................................................... ......................................................... 14-697
VPUSH ................................................................................................................ 14-698
VQABS ................................................................................................................ 14-699
VQADD ................................................................................................................ 14-700
VQDMLAL and VQDMLSL (by vector or by scalar) ...................... ...................... 14-701
VQDMULH (by vector or by scalar) .................................. .................................. 14-702
VQDMULL (by vector or by scalar) ...................................................................... 14-703
VQMOVN and VQMOVUN .................................................................................. 14-704
VQNEG ................................................................................................................ 14-705
VQRDMULH (by vector or by scalar) ................................. ................................. 14-706
VQRSHL (by signed variable) ...................................... ...................................... 14-707
VQRSHRN and VQRSHRUN (by immediate) .......................... .......................... 14-708
VQSHL (by signed variable) ................................................................................ 14-709
VQSHL and VQSHLU (by immediate) ................................ ................................ 14-710
VQSHRN and VQSHRUN (by immediate) ............................. ............................. 14-711
VQSUB ................................................................................................................ 14-712
VRADDHN ..................................................... ..................................................... 14-713
VRECPE .............................................................................................................. 14-714
VRECPS .............................................................................................................. 14-715

14.108 VREV16, VREV32, and VREV64 ........................................................................ 14-716
14.109
14.110
14.111
14.112
14.113
14.114
14.115
14.116
14.117
14.118
14.119
14.120
14.121
14.122
14.123
14.124
14.125
14.126
14.127
14.128
14.129
14.130
14.131
14.132
14.133
14.134
14.135
14.136
14.137
14.138
14.139

Chapter 15

Floating-point Instructions (32-bit)
15.1
15.2
15.3
15.4
15.5
15.6
15.7
15.8
15.9
15.10
15.11
15.12
15.13
15.14
15.15
15.16

ARM DUI0801G

VRHADD ...................................................... ...................................................... 14-717
VRSHL (by signed variable) ................................................................................ 14-718
VRSHR (by immediate) ........................................... ........................................... 14-719
VRSHRN (by immediate) .......................................... .......................................... 14-720
VRINT .................................................................................................................. 14-721
VRSQRTE ..................................................... ..................................................... 14-722
VRSQRTS ..................................................... ..................................................... 14-723
VRSRA (by immediate) ........................................... ........................................... 14-724
VRSUBHN ..................................................... ..................................................... 14-725
VSHL (by immediate) ............................................. ............................................. 14-726
VSHL (by signed variable) ......................................... ......................................... 14-727
VSHLL (by immediate) ............................................ ............................................ 14-728
VSHR (by immediate) .......................................................................................... 14-729
VSHRN (by immediate) ........................................... ........................................... 14-730
VSLI .......................................................... .......................................................... 14-731
VSRA (by immediate) .......................................................................................... 14-732
VSRI .................................................................................................................... 14-733
VSTM ......................................................... ......................................................... 14-734
VSTn (multiple n-element structures) .................................................................. 14-735
VSTn (single n-element structure to one lane) .................................................... 14-737
VSTR ......................................................... ......................................................... 14-739
VSTR (post-increment and pre-decrement) ............................ ............................ 14-740
VSUB ......................................................... ......................................................... 14-741
VSUBHN .............................................................................................................. 14-742
VSUBL and VSUBW ............................................................................................ 14-743
VSWP .................................................................................................................. 14-744
VTBL and VTBX .................................................................................................. 14-745
VTRN ......................................................... ......................................................... 14-746
VTST ......................................................... ......................................................... 14-747
VUZP ......................................................... ......................................................... 14-748
VZIP .......................................................... .......................................................... 14-749

Summary of floating-point instructions ................................ ................................ 15-752
VABS (floating-point) ............................................. ............................................. 15-754
VADD (floating-point) ............................................. ............................................. 15-755
VCMP, VCMPE .................................................................................................... 15-756
VCVT (between single-precision and double-precision) ...................................... 15-757
VCVT (between floating-point and integer) ............................ ............................ 15-758
VCVT (from floating-point to integer with directed rounding modes) ......... ......... 15-759
VCVT (between floating-point and fixed-point) .................................................... 15-760
VCVTB, VCVTT (half-precision extension) .......................................................... 15-761
VCVTB, VCVTT (between half-precision and double-precision) ............ ............ 15-762
VDIV .................................................................................................................... 15-763
VFMA, VFMS, VFNMA, VFNMS (floating-point) ........................ ........................ 15-764
VJCVT ........................................................ ........................................................ 15-765
VLDM (floating-point) ............................................. ............................................. 15-766
VLDR (floating-point) ............................................. ............................................. 15-767
VLDR (post-increment and pre-decrement, floating-point) .................................. 15-768
Copyright © 2014-2016 ARM Limited or its affiliates. All rights
reserved.
Non-Confidential

15.17

VLDR pseudo-instruction (floating-point) .............................. .............................. 15-769

15.18
15.19
15.20
15.21
15.22
15.23
15.24

VMAXNM, VMINNM (floating-point) .................................................................... 15-770
VMLA (floating-point) ............................................. ............................................. 15-771
VMLS (floating-point) ............................................. ............................................. 15-772
VMOV (floating-point) .......................................................................................... 15-773
VMOV (between one ARM register and single precision floating-point register) 15-774
VMOV (between two ARM registers and one or two extension registers) ..... ..... 15-775
VMOV (between an ARM register and half a double precision floating-point register) ....
.............................................................................................................................. 15-776
VMRS (floating-point) .......................................................................................... 15-777
VMSR (floating-point) .......................................................................................... 15-778
VMUL (floating-point) ............................................. ............................................. 15-779
VNEG (floating-point) ............................................. ............................................. 15-780
VNMLA (floating-point) ........................................................................................ 15-781
VNMLS (floating-point) ........................................................................................ 15-782
VNMUL (floating-point) ........................................................................................ 15-783
VPOP (floating-point) ............................................. ............................................. 15-784
VPUSH (floating-point) ........................................................................................ 15-785
VRINT (floating-point) .......................................................................................... 15-786
VSEL ......................................................... ......................................................... 15-787
VSQRT ................................................................................................................ 15-788
VSTM (floating-point) ............................................. ............................................. 15-789
VSTR (floating-point) ............................................. ............................................. 15-790
VSTR (post-increment and pre-decrement, floating-point) .................................. 15-791
VSUB (floating-point) ............................................. ............................................. 15-792

15.25
15.26
15.27
15.28
15.29
15.30
15.31
15.32
15.33
15.34
15.35
15.36
15.37
15.38
15.39
15.40

Chapter 16

A64 General Instructions
16.1
16.2
16.3
16.4
16.5
16.6
16.7
16.8
16.9
16.10
16.11
16.12
16.13
16.14
16.15
16.16
16.17
16.18
16.19
16.20
16.21
16.22
16.23

ARM DUI0801G

A64 instructions in alphabetical order .................................................................. 16-797
Register restrictions for A64 instructions .............................. .............................. 16-803
ADC .......................................................... .......................................................... 16-804
ADCS ......................................................... ......................................................... 16-805
ADD (extended register) ...................................................................................... 16-806
ADD (immediate) ................................................ ................................................ 16-808
ADD (shifted register) .......................................................................................... 16-809
ADDS (extended register) ......................................... ......................................... 16-810
ADDS (immediate) ............................................... ............................................... 16-812
ADDS (shifted register) ........................................................................................ 16-813
ADR .......................................................... .......................................................... 16-814
ADRL pseudo-instruction .......................................... .......................................... 16-815
ADRP ......................................................... ......................................................... 16-816
AND (immediate) ................................................ ................................................ 16-817
AND (shifted register) .......................................................................................... 16-818
ANDS (immediate) ............................................... ............................................... 16-819
ANDS (shifted register) ........................................................................................ 16-820
ASR (register) ...................................................................................................... 16-821
ASR (immediate) ................................................ ................................................ 16-822
ASRV ......................................................... ......................................................... 16-823
AT ........................................................................................................................ 16-824
AUTDA, AUTDZA ................................................................................................ 16-825
AUTDB, AUTDZB ................................................................................................ 16-826
Copyright © 2014-2016 ARM Limited or its affiliates. All rights
reserved.
Non-Confidential

ARM DUI0801G

16.24

AUTIA, AUTIZA, AUTIA1716, AUTIASP, AUTIAZ ....................... ....................... 16-827

16.25
16.26
16.27
16.28
16.29
16.30
16.31
16.32
16.33
16.34
16.35
16.36
16.37
16.38
16.39
16.40
16.41
16.42
16.43
16.44
16.45
16.46
16.47
16.48
16.49
16.50
16.51
16.52
16.53
16.54
16.55
16.56
16.57
16.58
16.59
16.60
16.61
16.62
16.63
16.64
16.65
16.66
16.67
16.68
16.69
16.70
16.71
16.72
16.73

AUTIB, AUTIZB, AUTIB1716, AUTIBSP, AUTIBZ ....................... ....................... 16-828
B.cond ........................................................ ........................................................ 16-829
B .......................................................................................................................... 16-830
BFC .......................................................... .......................................................... 16-831
BFI ........................................................... ........................................................... 16-832
BFM .......................................................... .......................................................... 16-833
BFXIL ......................................................... ......................................................... 16-834
BIC (shifted register) ............................................................................................ 16-835
BICS (shifted register) ............................................ ............................................ 16-836
BL ........................................................................................................................ 16-837
BLR ...................................................................................................................... 16-838
BLRAA, BLRAAZ, BLRAB, BLRABZ ................................. ................................. 16-839
BR ........................................................................................................................ 16-840
BRAA, BRAAZ, BRAB, BRABZ ..................................... ..................................... 16-841
BRK .......................................................... .......................................................... 16-842
CBNZ ......................................................... ......................................................... 16-843
CBZ .......................................................... .......................................................... 16-844
CCMN (immediate) .............................................................................................. 16-845
CCMN (register) ................................................. ................................................. 16-846
CCMP (immediate) .............................................................................................. 16-847
CCMP (register) ................................................. ................................................. 16-848
CINC .................................................................................................................... 16-849
CINV .................................................................................................................... 16-850
CLREX ........................................................ ........................................................ 16-851
CLS ...................................................................................................................... 16-852
CLZ ...................................................................................................................... 16-853
CMN (extended register) .......................................... .......................................... 16-854
CMN (immediate) ................................................ ................................................ 16-856
CMN (shifted register) ............................................ ............................................ 16-857
CMP (extended register) .......................................... .......................................... 16-858
CMP (immediate) ................................................ ................................................ 16-860
CMP (shifted register) .......................................................................................... 16-861
CNEG .................................................................................................................. 16-862
CRC32B, CRC32H, CRC32W, CRC32X .............................. .............................. 16-863
CRC32CB, CRC32CH, CRC32CW, CRC32CX ......................... ......................... 16-864
CSEL ......................................................... ......................................................... 16-865
CSET ......................................................... ......................................................... 16-866
CSETM ................................................................................................................ 16-867
CSINC ........................................................ ........................................................ 16-868
CSINV .................................................................................................................. 16-869
CSNEG ................................................................................................................ 16-870
DC ........................................................... ........................................................... 16-871
DCPS1 ........................................................ ........................................................ 16-872
DCPS2 ........................................................ ........................................................ 16-873
DCPS3 ........................................................ ........................................................ 16-874
DMB .......................................................... .......................................................... 16-875
DRPS ......................................................... ......................................................... 16-876
DSB .......................................................... .......................................................... 16-877
EON (shifted register) .......................................................................................... 16-878
Copyright © 2014-2016 ARM Limited or its affiliates. All rights
reserved.
Non-Confidential

ARM DUI0801G

16.74

EOR (immediate) ................................................ ................................................ 16-879

16.75
16.76
16.77
16.78
16.79
16.80
16.81
16.82
16.83
16.84
16.85
16.86
16.87
16.88
16.89
16.90
16.91
16.92
16.93
16.94
16.95
16.96
16.97
16.98
16.99
16.100
16.101
16.102
16.103
16.104
16.105
16.106
16.107
16.108
16.109
16.110
16.111
16.112
16.113
16.114
16.115
16.116
16.117
16.118
16.119
16.120
16.121
16.122
16.123

EOR (shifted register) .......................................................................................... 16-880
ERET ......................................................... ......................................................... 16-881
ERETAA, ERETAB ............................................... ............................................... 16-882
ESB .......................................................... .......................................................... 16-883
EXTR ......................................................... ......................................................... 16-884
HINT .................................................................................................................... 16-885
HLT ...................................................................................................................... 16-886
HVC .......................................................... .......................................................... 16-887
IC ............................................................ ............................................................ 16-888
ISB ........................................................... ........................................................... 16-889
LSL (register) ................................................... ................................................... 16-890
LSL (immediate) .................................................................................................. 16-891
LSLV .................................................................................................................... 16-892
LSR (register) ...................................................................................................... 16-893
LSR (immediate) .................................................................................................. 16-894
LSRV ......................................................... ......................................................... 16-895
MADD .................................................................................................................. 16-896
MNEG .................................................................................................................. 16-897
MOV (to or from SP) ............................................................................................ 16-898
MOV (inverted wide immediate) .......................................................................... 16-899
MOV (wide immediate) ........................................................................................ 16-900
MOV (bitmask immediate) ......................................... ......................................... 16-901
MOV (register) .................................................. .................................................. 16-902
MOVK .................................................................................................................. 16-903
MOVL pseudo-instruction .................................................................................... 16-904
MOVN .................................................................................................................. 16-905
MOVZ .................................................................................................................. 16-906
MRS .......................................................... .......................................................... 16-907
MSR (immediate) ................................................ ................................................ 16-908
MSR (register) .................................................. .................................................. 16-909
MSUB .................................................................................................................. 16-910
MUL .......................................................... .......................................................... 16-911
MVN .......................................................... .......................................................... 16-912
NEG (shifted register) .......................................................................................... 16-913
NEGS ......................................................... ......................................................... 16-914
NGC .......................................................... .......................................................... 16-915
NGCS .................................................................................................................. 16-916
NOP .......................................................... .......................................................... 16-917
ORN (shifted register) .......................................................................................... 16-918
ORR (immediate) ................................................ ................................................ 16-919
ORR (shifted register) .......................................................................................... 16-920
PACDA, PACDZA ................................................................................................ 16-921
PACDB, PACDZB ................................................................................................ 16-922
PACGA ................................................................................................................ 16-923
PACIA, PACIZA, PACIA1716, PACIASP, PACIAZ ....................... ....................... 16-924
PACIB, PACIZB, PACIB1716, PACIBSP, PACIBZ ....................... ....................... 16-925
PSB .......................................................... .......................................................... 16-926
RBIT .......................................................... .......................................................... 16-927
RET .......................................................... .......................................................... 16-928

16.124 RETAA, RETAB ................................................. ................................................. 16-929
16.125
16.126
16.127
16.128
16.129
16.130
16.131
16.132
16.133
16.134
16.135
16.136
16.137
16.138
16.139
16.140
16.141
16.142
16.143
16.144
16.145
16.146
16.147
16.148
16.149
16.150
16.151
16.152
16.153
16.154
16.155
16.156
16.157
16.158
16.159
16.160
16.161
16.162
16.163
16.164
16.165
16.166
16.167
16.168
16.169
16.170
16.171
16.172
16.173
ARM DUI0801G

REV16 ........................................................ ........................................................ 16-930
REV32 ........................................................ ........................................................ 16-931
REV64 ........................................................ ........................................................ 16-932
REV .......................................................... .......................................................... 16-933
ROR (immediate) ................................................ ................................................ 16-934
ROR (register) .................................................. .................................................. 16-935
RORV .................................................................................................................. 16-936
SBC .......................................................... .......................................................... 16-937
SBCS ......................................................... ......................................................... 16-938
SBFIZ ......................................................... ......................................................... 16-939
SBFM ......................................................... ......................................................... 16-940
SBFX ......................................................... ......................................................... 16-941
SDIV .................................................................................................................... 16-942
SEV .......................................................... .......................................................... 16-943
SEVL ......................................................... ......................................................... 16-944
SMADDL .............................................................................................................. 16-945
SMC .......................................................... .......................................................... 16-946
SMNEGL ...................................................... ...................................................... 16-947
SMSUBL .............................................................................................................. 16-948
SMULH ................................................................................................................ 16-949
SMULL ........................................................ ........................................................ 16-950
SUB (extended register) ...................................................................................... 16-951
SUB (immediate) ................................................ ................................................ 16-953
SUB (shifted register) .......................................................................................... 16-954
SUBS (extended register) .................................................................................... 16-955
SUBS (immediate) ............................................... ............................................... 16-957
SUBS (shifted register) ........................................................................................ 16-958
SVC .......................................................... .......................................................... 16-959
SXTB ......................................................... ......................................................... 16-960
SXTH ......................................................... ......................................................... 16-961
SXTW .................................................................................................................. 16-962
SYS .......................................................... .......................................................... 16-963
SYSL ......................................................... ......................................................... 16-964
TBNZ ......................................................... ......................................................... 16-965
TBZ ...................................................................................................................... 16-966
TLBI .......................................................... .......................................................... 16-967
TST (immediate) .................................................................................................. 16-969
TST (shifted register) ............................................. ............................................. 16-970
UBFIZ .................................................................................................................. 16-971
UBFM ......................................................... ......................................................... 16-972
UBFX ......................................................... ......................................................... 16-973
UDIV .................................................................................................................... 16-974
UMADDL ...................................................... ...................................................... 16-975
UMNEGL ...................................................... ...................................................... 16-976
UMSUBL .............................................................................................................. 16-977
UMULH ................................................................................................................ 16-978
UMULL ........................................................ ........................................................ 16-979
UXTB ......................................................... ......................................................... 16-980
UXTH ......................................................... ......................................................... 16-981

16.174 WFE .......................................................... .......................................................... 16-982
16.175 WFI ...................................................................................................................... 16-983
16.176 XPACD, XPACI, XPACLRI ......................................... ......................................... 16-984
16.177 YIELD .................................................................................................................. 16-985

Chapter 17

A64 Data Transfer Instructions
17.1
17.2
17.3
17.4
17.5
17.6
17.7
17.8
17.9
17.10
17.11
17.12
17.13
17.14
17.15
17.16
17.17
17.18
17.19
17.20
17.21
17.22
17.23
17.24
17.25
17.26
17.27
17.28
17.29
17.30
17.31
17.32
17.33
17.34
17.35
17.36
17.37
17.38
17.39
17.40
17.41
17.42
17.43
17.44

ARM DUI0801G

A64 data transfer instructions in alphabetical order ...................... ...................... 17-990
CASA, CASAL, CAS, CASL, CASAL, CAS, CASL .............................................. 17-996
CASAB, CASALB, CASB, CASLB ................................... ................................... 17-997
CASAH, CASALH, CASH, CASLH ...................................................................... 17-998
CASPA, CASPAL, CASP, CASPL, CASPAL, CASP, CASPL ............... ............... 17-999
LDADDA, LDADDAL, LDADD, LDADDL, LDADDAL, LDADD, LDADDL .......... 17-1001
LDADDAB, LDADDALB, LDADDB, LDADDLB ........................ ........................ 17-1002
LDADDAH, LDADDALH, LDADDH, LDADDLH ........................ ........................ 17-1003
LDAPR ....................................................... ....................................................... 17-1004
LDAPRB ............................................................................................................ 17-1005
LDAPRH ............................................................................................................ 17-1006
LDAR ........................................................ ........................................................ 17-1007
LDARB ....................................................... ....................................................... 17-1008
LDARH ....................................................... ....................................................... 17-1009
LDAXP ....................................................... ....................................................... 17-1010
LDAXR ....................................................... ....................................................... 17-1011
LDAXRB ............................................................................................................ 17-1012
LDAXRH ............................................................................................................ 17-1013
LDCLRA, LDCLRAL, LDCLR, LDCLRL, LDCLRAL, LDCLR, LDCLRL ...... ...... 17-1014
LDCLRAB, LDCLRALB, LDCLRB, LDCLRLB ......................... ......................... 17-1015
LDCLRAH, LDCLRALH, LDCLRH, LDCLRLH .................................................. 17-1016
LDEORA, LDEORAL, LDEOR, LDEORL, LDEORAL, LDEOR, LDEORL .... .... 17-1017
LDEORAB, LDEORALB, LDEORB, LDEORLB ........................ ........................ 17-1018
LDEORAH, LDEORALH, LDEORH, LDEORLH ................................................ 17-1019
LDLAR ....................................................... ....................................................... 17-1020
LDLARB ...................................................... ...................................................... 17-1021
LDLARH ...................................................... ...................................................... 17-1022
LDNP ........................................................ ........................................................ 17-1023
LDP .................................................................................................................... 17-1024
LDPSW .............................................................................................................. 17-1025
LDR (immediate) ............................................... ............................................... 17-1026
LDR (literal) ................................................... ................................................... 17-1027
LDR pseudo-instruction .......................................... .......................................... 17-1028
LDR (register) .................................................................................................... 17-1030
LDRAA, LDRAB, LDRAB ......................................... ......................................... 17-1031
LDRB (immediate) .............................................. .............................................. 17-1032
LDRB (register) .................................................................................................. 17-1033
LDRH (immediate) .............................................. .............................................. 17-1034
LDRH (register) ................................................ ................................................ 17-1035
LDRSB (immediate) ............................................. ............................................. 17-1036
LDRSB (register) ............................................... ............................................... 17-1037
LDRSH (immediate) ............................................. ............................................. 17-1038
LDRSH (register) ............................................... ............................................... 17-1039
LDRSW (immediate) .......................................................................................... 17-1040
Copyright © 2014-2016 ARM Limited or its affiliates. All rights
reserved.
Non-Confidential

17.45

LDRSW (literal) .................................................................................................. 17-1041

17.46
17.47
17.48
17.49
17.50

LDRSW (register) .............................................................................................. 17-1042
LDSETA, LDSETAL, LDSET, LDSETL, LDSETAL, LDSET, LDSETL ................ 17-1043
LDSETAB, LDSETALB, LDSETB, LDSETLB .......................... .......................... 17-1044
LDSETAH, LDSETALH, LDSETH, LDSETLH .................................................... 17-1045
LDSMAXA, LDSMAXAL, LDSMAX, LDSMAXL, LDSMAXAL, LDSMAX, LDSMAXL ....
............................................................................................................................ 17-1046
LDSMAXAB, LDSMAXALB, LDSMAXB, LDSMAXLB ................... ................... 17-1047
LDSMAXAH, LDSMAXALH, LDSMAXH, LDSMAXLH ...................................... 17-1048
LDSMINA, LDSMINAL, LDSMIN, LDSMINL, LDSMINAL, LDSMIN, LDSMINL 17-1049
LDSMINAB, LDSMINALB, LDSMINB, LDSMINLB ............................................ 17-1050
LDSMINAH, LDSMINALH, LDSMINH, LDSMINLH ..................... ..................... 17-1051
LDTR ........................................................ ........................................................ 17-1052
LDTRB ....................................................... ....................................................... 17-1053
LDTRH ....................................................... ....................................................... 17-1054
LDTRSB ...................................................... ...................................................... 17-1055
LDTRSH ............................................................................................................ 17-1056
LDTRSW ..................................................... ..................................................... 17-1057
LDUMAXA, LDUMAXAL, LDUMAX, LDUMAXL, LDUMAXAL, LDUMAX, LDUMAXL ....
............................................................................................................................ 17-1058
LDUMAXAB, LDUMAXALB, LDUMAXB, LDUMAXLB ...................................... 17-1059
LDUMAXAH, LDUMAXALH, LDUMAXH, LDUMAXLH .................. .................. 17-1060
LDUMINA, LDUMINAL, LDUMIN, LDUMINL, LDUMINAL, LDUMIN, LDUMINL 17-1061
LDUMINAB, LDUMINALB, LDUMINB, LDUMINLB ..................... ..................... 17-1062
LDUMINAH, LDUMINALH, LDUMINH, LDUMINLH .......................................... 17-1063
LDUR ........................................................ ........................................................ 17-1064
LDURB ....................................................... ....................................................... 17-1065
LDURH .............................................................................................................. 17-1066
LDURSB ............................................................................................................ 17-1067
LDURSH ............................................................................................................ 17-1068
LDURSW ..................................................... ..................................................... 17-1069
LDXP ........................................................ ........................................................ 17-1070
LDXR ........................................................ ........................................................ 17-1071
LDXRB ....................................................... ....................................................... 17-1072
LDXRH ....................................................... ....................................................... 17-1073
PRFM (immediate) .............................................. .............................................. 17-1074
PRFM (literal) .................................................. .................................................. 17-1075
PRFM (register) ................................................ ................................................ 17-1076
PRFUM (unscaled offset) .................................................................................. 17-1078
STADD, STADDL, STADDL ....................................... ....................................... 17-1079
STADDB, STADDLB .......................................................................................... 17-1080
STADDH, STADDLH .......................................................................................... 17-1081
STCLR, STCLRL, STCLRL ....................................... ....................................... 17-1082
STCLRB, STCLRLB .......................................................................................... 17-1083
STCLRH, STCLRLH .......................................................................................... 17-1084
STEOR, STEORL, STEORL ...................................... ...................................... 17-1085
STEORB, STEORLB ............................................ ............................................ 17-1086
STEORH, STEORLH ............................................ ............................................ 17-1087
STLLR ....................................................... ....................................................... 17-1088
STLLRB ...................................................... ...................................................... 17-1089

17.51
17.52
17.53
17.54
17.55
17.56
17.57
17.58
17.59
17.60
17.61
17.62
17.63
17.64
17.65
17.66
17.67
17.68
17.69
17.70
17.71
17.72
17.73
17.74
17.75
17.76
17.77
17.78
17.79
17.80
17.81
17.82
17.83
17.84
17.85
17.86
17.87
17.88
17.89
17.90
17.91
17.92
ARM DUI0801G

Chapter 18

17.93

STLLRH ...................................................... ...................................................... 17-1090

17.94
17.95
17.96
17.97
17.98
17.99
17.100
17.101
17.102
17.103
17.104
17.105
17.106
17.107
17.108
17.109
17.110
17.111
17.112
17.113
17.114
17.115
17.116
17.117
17.118
17.119
17.120
17.121
17.122
17.123
17.124
17.125
17.126
17.127
17.128
17.129
17.130
17.131
17.132
17.133
17.134
17.135
17.136

STLR ........................................................ ........................................................ 17-1091
STLRB ....................................................... ....................................................... 17-1092
STLRH ....................................................... ....................................................... 17-1093
STLXP ....................................................... ....................................................... 17-1094
STLXR ....................................................... ....................................................... 17-1096
STLXRB ...................................................... ...................................................... 17-1098
STLXRH ...................................................... ...................................................... 17-1099
STNP ........................................................ ........................................................ 17-1100
STP .................................................................................................................... 17-1101
STR (immediate) ................................................................................................ 17-1102
STR (register) .................................................................................................... 17-1103
STRB (immediate) .............................................. .............................................. 17-1104
STRB (register) .................................................................................................. 17-1105
STRH (immediate) .............................................. .............................................. 17-1106
STRH (register) .................................................................................................. 17-1107
STSET, STSETL, STSETL ........................................ ........................................ 17-1108
STSETB, STSETLB ............................................. ............................................. 17-1109
STSETH, STSETLH ............................................. ............................................. 17-1110
STSMAX, STSMAXL, STSMAXL ................................... ................................... 17-1111
STSMAXB, STSMAXLB .......................................... .......................................... 17-1112
STSMAXH, STSMAXLH .......................................... .......................................... 17-1113
STSMIN, STSMINL, STSMINL ..................................... ..................................... 17-1114
STSMINB, STSMINLB ........................................... ........................................... 17-1115
STSMINH, STSMINLH ........................................... ........................................... 17-1116
STTR .................................................................................................................. 17-1117
STTRB ....................................................... ....................................................... 17-1118
STTRH ....................................................... ....................................................... 17-1119
STUMAX, STUMAXL, STUMAXL ...................................................................... 17-1120
STUMAXB, STUMAXLB .................................................................................... 17-1121
STUMAXH, STUMAXLH .................................................................................... 17-1122
STUMIN, STUMINL, STUMINL .................................... .................................... 17-1123
STUMINB, STUMINLB ........................................... ........................................... 17-1124
STUMINH, STUMINLH ...................................................................................... 17-1125
STUR ........................................................ ........................................................ 17-1126
STURB ....................................................... ....................................................... 17-1127
STURH ....................................................... ....................................................... 17-1128
STXP ........................................................ ........................................................ 17-1129
STXR ........................................................ ........................................................ 17-1131
STXRB ....................................................... ....................................................... 17-1132
STXRH ....................................................... ....................................................... 17-1133
SWPA, SWPAL, SWP, SWPL, SWPAL, SWP, SWPL ........................................ 17-1134
SWPAB, SWPALB, SWPB, SWPLB .................................................................. 17-1135
SWPAH, SWPALH, SWPH, SWPLH ................................ ................................ 17-1136

A64 Floating-point Instructions
18.1
18.2
18.3
18.4

ARM DUI0801G

A64 floating-point instructions in alphabetical order ..........................................
FABS (scalar) .................................................. ..................................................
FADD (scalar) .................................................. ..................................................
FCCMP ..............................................................................................................
Copyright © 2014-2016 ARM Limited or its affiliates. All rights
reserved.
Non-Confidential

18-1139
18-1142
18-1143
18-1144
23

ARM DUI0801G

18.5

FCCMPE ............................................................................................................ 18-1145

18.6
18.7
18.8
18.9
18.10
18.11
18.12
18.13
18.14
18.15
18.16
18.17
18.18
18.19
18.20
18.21
18.22
18.23
18.24
18.25
18.26
18.27
18.28
18.29
18.30
18.31
18.32
18.33
18.34
18.35
18.36
18.37
18.38
18.39
18.40
18.41
18.42
18.43
18.44
18.45
18.46
18.47
18.48
18.49
18.50
18.51
18.52
18.53
18.54

FCMP ........................................................ ........................................................ 18-1146
FCMPE .............................................................................................................. 18-1148
FCSEL ....................................................... ....................................................... 18-1150
FCVT ........................................................ ........................................................ 18-1151
FCVTAS (scalar) ................................................................................................ 18-1152
FCVTAU (scalar) ................................................................................................ 18-1153
FCVTMS (scalar) ............................................... ............................................... 18-1154
FCVTMU (scalar) ............................................... ............................................... 18-1155
FCVTNS (scalar) ............................................... ............................................... 18-1156
FCVTNU (scalar) ............................................... ............................................... 18-1157
FCVTPS (scalar) ................................................................................................ 18-1158
FCVTPU (scalar) ............................................... ............................................... 18-1159
FCVTZS (scalar, fixed-point) ...................................... ...................................... 18-1160
FCVTZS (scalar, integer) ......................................... ......................................... 18-1161
FCVTZU (scalar, fixed-point) ...................................... ...................................... 18-1162
FCVTZU (scalar, integer) ......................................... ......................................... 18-1163
FDIV (scalar) ...................................................................................................... 18-1164
FJCVTZS ..................................................... ..................................................... 18-1165
FMADD .............................................................................................................. 18-1166
FMAX (scalar) .................................................................................................... 18-1167
FMAXNM (scalar) .............................................................................................. 18-1168
FMIN (scalar) .................................................. .................................................. 18-1169
FMINNM (scalar) ............................................... ............................................... 18-1170
FMOV (register) ................................................ ................................................ 18-1171
FMOV (general) ................................................ ................................................ 18-1172
FMOV (scalar, immediate) ........................................ ........................................ 18-1173
FMSUB .............................................................................................................. 18-1174
FMUL (scalar) .................................................................................................... 18-1175
FNEG (scalar) .................................................................................................... 18-1176
FNMADD ..................................................... ..................................................... 18-1177
FNMSUB ............................................................................................................ 18-1178
FNMUL (scalar) ................................................ ................................................ 18-1179
FRINTA (scalar) ................................................ ................................................ 18-1180
FRINTI (scalar) .................................................................................................. 18-1181
FRINTM (scalar) ................................................................................................ 18-1182
FRINTN (scalar) ................................................ ................................................ 18-1183
FRINTP (scalar) ................................................ ................................................ 18-1184
FRINTX (scalar) ................................................ ................................................ 18-1185
FRINTZ (scalar) ................................................ ................................................ 18-1186
FSQRT (scalar) .................................................................................................. 18-1187
FSUB (scalar) .................................................................................................... 18-1188
LDNP (SIMD and FP) ........................................................................................ 18-1189
LDP (SIMD and FP) ............................................. ............................................. 18-1190
LDR (immediate, SIMD and FP) ........................................................................ 18-1192
LDR (literal, SIMD and FP) ................................................................................ 18-1194
LDR (register, SIMD and FP) ...................................... ...................................... 18-1195
LDUR (SIMD and FP) ........................................................................................ 18-1196
SCVTF (scalar, fixed-point) ................................................................................ 18-1197
SCVTF (scalar, integer) .......................................... .......................................... 18-1199
Copyright © 2014-2016 ARM Limited or its affiliates. All rights
reserved.
Non-Confidential

Chapter 19

18.55

STNP (SIMD and FP) ........................................................................................ 18-1200

18.56
18.57
18.58
18.59
18.60
18.61

STP (SIMD and FP) ............................................. .............................................
STR (immediate, SIMD and FP) ........................................................................
STR (register, SIMD and FP) ...................................... ......................................
STUR (SIMD and FP) ........................................................................................
UCVTF (scalar, fixed-point) ....................................... .......................................
UCVTF (scalar, integer) .......................................... ..........................................

A64 SIMD Scalar Instructions
19.1
19.2
19.3
19.4
19.5
19.6
19.7
19.8
19.9
19.10
19.11
19.12
19.13
19.14
19.15
19.16
19.17
19.18
19.19
19.20
19.21
19.22
19.23
19.24
19.25
19.26
19.27
19.28
19.29
19.30
19.31
19.32
19.33
19.34
19.35
19.36
19.37
19.38
19.39
19.40
19.41

ARM DUI0801G

18-1201
18-1202
18-1204
18-1205
18-1206
18-1208

A64 SIMD scalar instructions in alphabetical order ..................... ..................... 19-1212
ABS (scalar) ................................................... ................................................... 19-1217
ADD (scalar) ...................................................................................................... 19-1218
ADDP (scalar) .................................................................................................... 19-1219
CMEQ (scalar, register) .......................................... .......................................... 19-1220
CMEQ (scalar, zero) .......................................................................................... 19-1221
CMGE (scalar, register) .......................................... .......................................... 19-1222
CMGE (scalar, zero) .......................................................................................... 19-1223
CMGT (scalar, register) .......................................... .......................................... 19-1224
CMGT (scalar, zero) .......................................................................................... 19-1225
CMHI (scalar, register) ........................................... ........................................... 19-1226
CMHS (scalar, register) .......................................... .......................................... 19-1227
CMLE (scalar, zero) ............................................. ............................................. 19-1228
CMLT (scalar, zero) ............................................. ............................................. 19-1229
CMTST (scalar) ................................................ ................................................ 19-1230
DUP (scalar, element) ........................................... ........................................... 19-1231
FABD (scalar) .................................................................................................... 19-1232
FACGE (scalar) ................................................ ................................................ 19-1233
FACGT (scalar) .................................................................................................. 19-1234
FADDP (scalar) .................................................................................................. 19-1235
FCMEQ (scalar, register) ......................................... ......................................... 19-1236
FCMEQ (scalar, zero) ........................................................................................ 19-1237
FCMGE (scalar, register) ......................................... ......................................... 19-1238
FCMGE (scalar, zero) ........................................................................................ 19-1239
FCMGT (scalar, register) ......................................... ......................................... 19-1240
FCMGT (scalar, zero) ........................................................................................ 19-1241
FCMLA (scalar, by element) .............................................................................. 19-1242
FCMLE (scalar, zero) ............................................ ............................................ 19-1244
FCMLT (scalar, zero) ............................................ ............................................ 19-1245
FCVTAS (scalar) ................................................................................................ 19-1246
FCVTAU (scalar) ............................................... ............................................... 19-1247
FCVTMS (scalar) ............................................... ............................................... 19-1248
FCVTMU (scalar) ............................................... ............................................... 19-1249
FCVTNS (scalar) ............................................... ............................................... 19-1250
FCVTNU (scalar) ............................................... ............................................... 19-1251
FCVTPS (scalar) ............................................... ............................................... 19-1252
FCVTPU (scalar) ............................................... ............................................... 19-1253
FCVTXN (scalar) ............................................... ............................................... 19-1254
FCVTZS (scalar, fixed-point) ...................................... ...................................... 19-1255
FCVTZS (scalar, integer) ......................................... ......................................... 19-1256
FCVTZU (scalar, fixed-point) ...................................... ...................................... 19-1257
Copyright © 2014-2016 ARM Limited or its affiliates. All rights
reserved.
Non-Confidential

ARM DUI0801G

19.42

FCVTZU (scalar, integer) ......................................... ......................................... 19-1258

19.43
19.44
19.45
19.46
19.47
19.48
19.49
19.50
19.51
19.52
19.53
19.54
19.55
19.56
19.57
19.58
19.59
19.60
19.61
19.62
19.63
19.64
19.65
19.66
19.67
19.68
19.69
19.70
19.71
19.72
19.73
19.74
19.75
19.76
19.77
19.78
19.79
19.80
19.81
19.82
19.83
19.84
19.85
19.86
19.87
19.88
19.89
19.90
19.91

FMAXNMP (scalar) ............................................................................................ 19-1259
FMAXP (scalar) ................................................ ................................................ 19-1260
FMINNMP (scalar) .............................................. .............................................. 19-1261
FMINP (scalar) ................................................. ................................................. 19-1262
FMLA (scalar, by element) ........................................ ........................................ 19-1263
FMLS (scalar, by element) ........................................ ........................................ 19-1264
FMUL (scalar, by element) ........................................ ........................................ 19-1265
FMULX (scalar, by element) .............................................................................. 19-1266
FMULX (scalar) ................................................ ................................................ 19-1267
FRECPE (scalar) ............................................... ............................................... 19-1268
FRECPS (scalar) ............................................... ............................................... 19-1269
FRSQRTE (scalar) .............................................. .............................................. 19-1270
FRSQRTS (scalar) .............................................. .............................................. 19-1271
MOV (scalar) .................................................. .................................................. 19-1272
NEG (scalar) ...................................................................................................... 19-1273
SCVTF (scalar, fixed-point) ....................................... ....................................... 19-1274
SCVTF (scalar, integer) .......................................... .......................................... 19-1275
SHL (scalar) ................................................... ................................................... 19-1276
SLI (scalar) ........................................................................................................ 19-1277
SQABS (scalar) ................................................ ................................................ 19-1278
SQADD (scalar) ................................................ ................................................ 19-1279
SQDMLAL (scalar, by element) .................................... .................................... 19-1280
SQDMLAL (scalar) .............................................. .............................................. 19-1281
SQDMLSL (scalar, by element) .................................... .................................... 19-1282
SQDMLSL (scalar) .............................................. .............................................. 19-1283
SQDMULH (scalar, by element) ........................................................................ 19-1284
SQDMULH (scalar) ............................................................................................ 19-1285
SQDMULL (scalar, by element) .................................... .................................... 19-1286
SQDMULL (scalar) ............................................................................................ 19-1287
SQNEG (scalar) ................................................ ................................................ 19-1288
SQRDMLAH (scalar, by element) ...................................................................... 19-1289
SQRDMLAH (scalar) ............................................ ............................................ 19-1290
SQRDMLSH (scalar, by element) ...................................................................... 19-1291
SQRDMLSH (scalar) ............................................ ............................................ 19-1292
SQRDMULH (scalar, by element) ...................................................................... 19-1293
SQRDMULH (scalar) ............................................ ............................................ 19-1294
SQRSHL (scalar) ............................................... ............................................... 19-1295
SQRSHRN (scalar) ............................................................................................ 19-1296
SQRSHRUN (scalar) ............................................ ............................................ 19-1297
SQSHL (scalar, immediate) ....................................... ....................................... 19-1298
SQSHL (scalar, register) .................................................................................... 19-1299
SQSHLU (scalar) ............................................... ............................................... 19-1300
SQSHRN (scalar) .............................................................................................. 19-1301
SQSHRUN (scalar) ............................................................................................ 19-1302
SQSUB (scalar) ................................................ ................................................ 19-1303
SQXTN (scalar) ................................................ ................................................ 19-1304
SQXTUN (scalar) ............................................... ............................................... 19-1305
SRI (scalar) ........................................................................................................ 19-1306
SRSHL (scalar) .................................................................................................. 19-1307
Copyright © 2014-2016 ARM Limited or its affiliates. All rights
reserved.
Non-Confidential

Chapter 20

19.92

SRSHR (scalar) ................................................ ................................................ 19-1308

19.93
19.94
19.95
19.96
19.97
19.98
19.99
19.100
19.101
19.102
19.103
19.104
19.105
19.106
19.107
19.108
19.109
19.110
19.111
19.112
19.113
19.114
19.115

SRSRA (scalar) ................................................ ................................................ 19-1309
SSHL (scalar) .................................................................................................... 19-1310
SSHR (scalar) .................................................................................................... 19-1311
SSRA (scalar) .................................................................................................... 19-1312
SUB (scalar) ...................................................................................................... 19-1313
SUQADD (scalar) .............................................................................................. 19-1314
UCVTF (scalar, fixed-point) ....................................... ....................................... 19-1315
UCVTF (scalar, integer) .......................................... .......................................... 19-1316
UQADD (scalar) ................................................ ................................................ 19-1317
UQRSHL (scalar) ............................................... ............................................... 19-1318
UQRSHRN (scalar) ............................................. ............................................. 19-1319
UQSHL (scalar, immediate) ....................................... ....................................... 19-1320
UQSHL (scalar, register) ......................................... ......................................... 19-1321
UQSHRN (scalar) .............................................................................................. 19-1322
UQSUB (scalar) ................................................ ................................................ 19-1323
UQXTN (scalar) ................................................ ................................................ 19-1324
URSHL (scalar) ................................................ ................................................ 19-1325
URSHR (scalar) ................................................ ................................................ 19-1326
URSRA (scalar) ................................................ ................................................ 19-1327
USHL (scalar) .................................................................................................... 19-1328
USHR (scalar) ................................................. ................................................. 19-1329
USQADD (scalar) .............................................................................................. 19-1330
USRA (scalar) .................................................................................................... 19-1331

A64 SIMD Vector Instructions
20.1
20.2
20.3
20.4
20.5
20.6
20.7
20.8
20.9
20.10
20.11
20.12
20.13
20.14
20.15
20.16
20.17
20.18
20.19
20.20
20.21
20.22
20.23
20.24

ARM DUI0801G

A64 SIMD Vector instructions in alphabetical order ..................... ..................... 20-1338
ABS (vector) ...................................................................................................... 20-1349
ADD (vector) ...................................................................................................... 20-1350
ADDHN, ADDHN2 (vector) ................................................................................ 20-1351
ADDP (vector) ................................................. ................................................. 20-1352
ADDV (vector) ................................................. ................................................. 20-1353
AND (vector) ...................................................................................................... 20-1354
BIC (vector, immediate) .......................................... .......................................... 20-1355
BIC (vector, register) .......................................................................................... 20-1356
BIF (vector) ........................................................................................................ 20-1357
BIT (vector) ........................................................................................................ 20-1358
BSL (vector) ................................................... ................................................... 20-1359
CLS (vector) ................................................... ................................................... 20-1360
CLZ (vector) ................................................... ................................................... 20-1361
CMEQ (vector, register) .......................................... .......................................... 20-1362
CMEQ (vector, zero) .......................................................................................... 20-1363
CMGE (vector, register) .......................................... .......................................... 20-1364
CMGE (vector, zero) .......................................................................................... 20-1365
CMGT (vector, register) .......................................... .......................................... 20-1366
CMGT (vector, zero) .......................................................................................... 20-1367
CMHI (vector, register) ........................................... ........................................... 20-1368
CMHS (vector, register) .......................................... .......................................... 20-1369
CMLE (vector, zero) ............................................. ............................................. 20-1370
CMLT (vector, zero) ............................................. ............................................. 20-1371
Copyright © 2014-2016 ARM Limited or its affiliates. All rights
reserved.
Non-Confidential

ARM DUI0801G

20.25

CMTST (vector) ................................................ ................................................ 20-1372

20.26
20.27
20.28
20.29
20.30
20.31
20.32
20.33
20.34
20.35
20.36
20.37
20.38
20.39
20.40
20.41
20.42
20.43
20.44
20.45
20.46
20.47
20.48
20.49
20.50
20.51
20.52
20.53
20.54
20.55
20.56
20.57
20.58
20.59
20.60
20.61
20.62
20.63
20.64
20.65
20.66
20.67
20.68
20.69
20.70
20.71
20.72
20.73
20.74

CNT (vector) ...................................................................................................... 20-1373
DUP (vector, element) ........................................... ........................................... 20-1374
DUP (vector, general) ........................................................................................ 20-1375
EOR (vector) ...................................................................................................... 20-1376
EXT (vector) ................................................... ................................................... 20-1377
FABD (vector) .................................................................................................... 20-1378
FABS (vector) .................................................................................................... 20-1379
FACGE (vector) ................................................ ................................................ 20-1380
FACGT (vector) ................................................ ................................................ 20-1381
FADD (vector) .................................................................................................... 20-1382
FADDP (vector) ................................................ ................................................ 20-1383
FCADD (vector) ................................................ ................................................ 20-1384
FCMEQ (vector, register) ......................................... ......................................... 20-1385
FCMEQ (vector, zero) ........................................................................................ 20-1386
FCMGE (vector, register) ......................................... ......................................... 20-1387
FCMGE (vector, zero) ........................................................................................ 20-1388
FCMGT (vector, register) ......................................... ......................................... 20-1389
FCMGT (vector, zero) ........................................................................................ 20-1390
FCMLA (vector) ................................................ ................................................ 20-1391
FCMLE (vector, zero) ............................................ ............................................ 20-1392
FCMLT (vector, zero) ............................................ ............................................ 20-1393
FCVTAS (vector) ............................................... ............................................... 20-1394
FCVTAU (vector) ............................................... ............................................... 20-1395
FCVTL, FCVTL2 (vector) ......................................... ......................................... 20-1396
FCVTMS (vector) ............................................... ............................................... 20-1397
FCVTMU (vector) ............................................... ............................................... 20-1398
FCVTN, FCVTN2 (vector) ........................................ ........................................ 20-1399
FCVTNS (vector) ............................................... ............................................... 20-1400
FCVTNU (vector) ............................................... ............................................... 20-1401
FCVTPS (vector) ............................................... ............................................... 20-1402
FCVTPU (vector) ............................................... ............................................... 20-1403
FCVTXN, FCVTXN2 (vector) ...................................... ...................................... 20-1404
FCVTZS (vector, fixed-point) ...................................... ...................................... 20-1405
FCVTZS (vector, integer) ......................................... ......................................... 20-1406
FCVTZU (vector, fixed-point) ...................................... ...................................... 20-1407
FCVTZU (vector, integer) ......................................... ......................................... 20-1408
FDIV (vector) .................................................. .................................................. 20-1409
FMAX (vector) ................................................. ................................................. 20-1410
FMAXNM (vector) .............................................................................................. 20-1411
FMAXNMP (vector) ............................................. ............................................. 20-1412
FMAXNMV (vector) ............................................. ............................................. 20-1413
FMAXP (vector) ................................................ ................................................ 20-1414
FMAXV (vector) ................................................ ................................................ 20-1415
FMIN (vector) .................................................. .................................................. 20-1416
FMINNM (vector) ............................................... ............................................... 20-1417
FMINNMP (vector) .............................................. .............................................. 20-1418
FMINNMV (vector) .............................................. .............................................. 20-1419
FMINP (vector) .................................................................................................. 20-1420
FMINV (vector) .................................................................................................. 20-1421
Copyright © 2014-2016 ARM Limited or its affiliates. All rights
reserved.
Non-Confidential

ARM DUI0801G

20.75

FMLA (vector, by element) ........................................ ........................................ 20-1422

20.76
20.77
20.78
20.79
20.80
20.81
20.82
20.83
20.84
20.85
20.86
20.87
20.88
20.89
20.90
20.91
20.92
20.93
20.94
20.95
20.96
20.97
20.98
20.99
20.100
20.101
20.102
20.103
20.104
20.105
20.106
20.107
20.108
20.109
20.110
20.111
20.112
20.113
20.114
20.115
20.116
20.117
20.118
20.119
20.120
20.121
20.122
20.123
20.124

FMLA (vector) .................................................................................................... 20-1423
FMLS (vector, by element) ........................................ ........................................ 20-1424
FMLS (vector) .................................................................................................... 20-1425
FMOV (vector, immediate) ........................................ ........................................ 20-1426
FMUL (vector, by element) ................................................................................ 20-1428
FMUL (vector) .................................................................................................... 20-1429
FMULX (vector, by element) .............................................................................. 20-1430
FMULX (vector) ................................................ ................................................ 20-1432
FNEG (vector) ................................................. ................................................. 20-1433
FRECPE (vector) ............................................... ............................................... 20-1434
FRECPS (vector) ............................................... ............................................... 20-1435
FRECPX (vector) ............................................... ............................................... 20-1436
FRINTA (vector) ................................................ ................................................ 20-1437
FRINTI (vector) .................................................................................................. 20-1438
FRINTM (vector) ................................................................................................ 20-1439
FRINTN (vector) ................................................................................................ 20-1440
FRINTP (vector) ................................................ ................................................ 20-1441
FRINTX (vector) ................................................ ................................................ 20-1442
FRINTZ (vector) ................................................ ................................................ 20-1443
FRSQRTE (vector) ............................................................................................ 20-1444
FRSQRTS (vector) ............................................................................................ 20-1445
FSQRT (vector) ................................................ ................................................ 20-1446
FSUB (vector) .................................................................................................... 20-1447
INS (vector, element) ............................................ ............................................ 20-1448
INS (vector, general) .......................................................................................... 20-1449
LD1 (vector, multiple structures) ........................................................................ 20-1450
LD1 (vector, single structure) ...................................... ...................................... 20-1453
LD1R (vector) .................................................................................................... 20-1454
LD2 (vector, multiple structures) ........................................................................ 20-1455
LD2 (vector, single structure) ...................................... ...................................... 20-1456
LD2R (vector) .................................................................................................... 20-1457
LD3 (vector, multiple structures) ........................................................................ 20-1458
LD3 (vector, single structure) ...................................... ...................................... 20-1459
LD3R (vector) .................................................................................................... 20-1461
LD4 (vector, multiple structures) ........................................................................ 20-1462
LD4 (vector, single structure) ...................................... ...................................... 20-1463
LD4R (vector) .................................................................................................... 20-1465
MLA (vector, by element) ......................................... ......................................... 20-1466
MLA (vector) ...................................................................................................... 20-1467
MLS (vector, by element) ......................................... ......................................... 20-1468
MLS (vector) ...................................................................................................... 20-1469
MOV (vector, element) ........................................... ........................................... 20-1470
MOV (vector, from general) ....................................... ....................................... 20-1471
MOV (vector) .................................................. .................................................. 20-1472
MOV (vector, to general) ......................................... ......................................... 20-1473
MOVI (vector) .................................................................................................... 20-1474
MUL (vector, by element) ......................................... ......................................... 20-1475
MUL (vector) ...................................................................................................... 20-1476
MVN (vector) .................................................. .................................................. 20-1477

20.125 MVNI (vector) .................................................. .................................................. 20-1478
20.126
20.127
20.128
20.129
20.130
20.131
20.132
20.133
20.134
20.135
20.136
20.137
20.138
20.139
20.140
20.141
20.142
20.143
20.144
20.145
20.146
20.147
20.148
20.149
20.150
20.151
20.152
20.153
20.154
20.155
20.156
20.157
20.158
20.159
20.160
20.161
20.162
20.163
20.164
20.165
20.166
20.167
20.168
20.169
20.170
20.171
20.172
20.173
20.174
ARM DUI0801G

NEG (vector) ...................................................................................................... 20-1479
NOT (vector) ...................................................................................................... 20-1480
ORN (vector) .................................................. .................................................. 20-1481
ORR (vector, immediate) ......................................... ......................................... 20-1482
ORR (vector, register) ........................................................................................ 20-1483
PMUL (vector) ................................................. ................................................. 20-1484
PMULL, PMULL2 (vector) ........................................ ........................................ 20-1485
RADDHN, RADDHN2 (vector) ..................................... ..................................... 20-1486
RBIT (vector) .................................................. .................................................. 20-1487
REV16 (vector) .................................................................................................. 20-1488
REV32 (vector) .................................................................................................. 20-1489
REV64 (vector) .................................................................................................. 20-1490
RSHRN, RSHRN2 (vector) ................................................................................ 20-1491
RSUBHN, RSUBHN2 (vector) ..................................... ..................................... 20-1492
SABA (vector) .................................................................................................... 20-1493
SABAL, SABAL2 (vector) .................................................................................. 20-1494
SABD (vector) .................................................................................................... 20-1495
SABDL, SABDL2 (vector) .................................................................................. 20-1496
SADALP (vector) ............................................... ............................................... 20-1497
SADDL, SADDL2 (vector) ........................................ ........................................ 20-1498
SADDLP (vector) ............................................... ............................................... 20-1499
SADDLV (vector) ............................................... ............................................... 20-1500
SADDW, SADDW2 (vector) ....................................... ....................................... 20-1501
SCVTF (vector, fixed-point) ....................................... ....................................... 20-1502
SCVTF (vector, integer) .......................................... .......................................... 20-1503
SHADD (vector) ................................................ ................................................ 20-1504
SHL (vector) ................................................... ................................................... 20-1505
SHLL, SHLL2 (vector) ........................................... ........................................... 20-1506
SHRN, SHRN2 (vector) .......................................... .......................................... 20-1507
SHSUB (vector) ................................................ ................................................ 20-1508
SLI (vector) ........................................................................................................ 20-1509
SMAX (vector) ................................................. ................................................. 20-1510
SMAXP (vector) ................................................ ................................................ 20-1511
SMAXV (vector) ................................................ ................................................ 20-1512
SMIN (vector) .................................................. .................................................. 20-1513
SMINP (vector) .................................................................................................. 20-1514
SMINV (vector) .................................................................................................. 20-1515
SMLAL, SMLAL2 (vector, by element) ............................... ............................... 20-1516
SMLAL, SMLAL2 (vector) .................................................................................. 20-1517
SMLSL, SMLSL2 (vector, by element) ............................... ............................... 20-1518
SMLSL, SMLSL2 (vector) .................................................................................. 20-1519
SMOV (vector) ................................................. ................................................. 20-1520
SMULL, SMULL2 (vector, by element) .............................................................. 20-1521
SMULL, SMULL2 (vector) ........................................ ........................................ 20-1522
SQABS (vector) ................................................ ................................................ 20-1523
SQADD (vector) ................................................ ................................................ 20-1524
SQDMLAL, SQDMLAL2 (vector, by element) .................................................... 20-1525
SQDMLAL, SQDMLAL2 (vector) ................................... ................................... 20-1527
SQDMLSL, SQDMLSL2 (vector, by element) .................................................... 20-1528

20.175 SQDMLSL, SQDMLSL2 (vector) ................................... ................................... 20-1530
20.176
20.177
20.178
20.179
20.180
20.181
20.182
20.183
20.184
20.185
20.186
20.187
20.188
20.189
20.190
20.191
20.192
20.193
20.194
20.195
20.196
20.197
20.198
20.199
20.200
20.201
20.202
20.203
20.204
20.205
20.206
20.207
20.208
20.209
20.210
20.211
20.212
20.213
20.214
20.215
20.216
20.217
20.218
20.219
20.220
20.221
20.222
20.223
20.224
ARM DUI0801G

SQDMULH (vector, by element) ........................................................................ 20-1531
SQDMULH (vector) ............................................. ............................................. 20-1532
SQDMULL, SQDMULL2 (vector, by element) ......................... ......................... 20-1533
SQDMULL, SQDMULL2 (vector) ................................... ................................... 20-1535
SQNEG (vector) ................................................ ................................................ 20-1536
SQRDMLAH (vector, by element) ...................................................................... 20-1537
SQRDMLAH (vector) ............................................ ............................................ 20-1538
SQRDMLSH (vector, by element) ...................................................................... 20-1539
SQRDMLSH (vector) ............................................ ............................................ 20-1540
SQRDMULH (vector, by element) .................................. .................................. 20-1541
SQRDMULH (vector) ............................................ ............................................ 20-1542
SQRSHL (vector) ............................................... ............................................... 20-1543
SQRSHRN, SQRSHRN2 (vector) .................................. .................................. 20-1544
SQRSHRUN, SQRSHRUN2 (vector) ................................................................ 20-1545
SQSHL (vector, immediate) ....................................... ....................................... 20-1546
SQSHL (vector, register) ......................................... ......................................... 20-1547
SQSHLU (vector) ............................................... ............................................... 20-1548
SQSHRN, SQSHRN2 (vector) ..................................... ..................................... 20-1549
SQSHRUN, SQSHRUN2 (vector) .................................. .................................. 20-1550
SQSUB (vector) ................................................ ................................................ 20-1551
SQXTN, SQXTN2 (vector) ........................................ ........................................ 20-1552
SQXTUN, SQXTUN2 (vector) ..................................... ..................................... 20-1553
SRHADD (vector) .............................................................................................. 20-1554
SRI (vector) ................................................... ................................................... 20-1555
SRSHL (vector) ................................................ ................................................ 20-1556
SRSHR (vector) ................................................ ................................................ 20-1557
SRSRA (vector) ................................................ ................................................ 20-1558
SSHL (vector) .................................................................................................... 20-1559
SSHLL, SSHLL2 (vector) ......................................... ......................................... 20-1560
SSHR (vector) ................................................. ................................................. 20-1561
SSRA (vector) .................................................................................................... 20-1562
SSUBL, SSUBL2 (vector) .................................................................................. 20-1563
SSUBW, SSUBW2 (vector) ....................................... ....................................... 20-1564
ST1 (vector, multiple structures) ........................................................................ 20-1565
ST1 (vector, single structure) ...................................... ...................................... 20-1568
ST2 (vector, multiple structures) ........................................................................ 20-1569
ST2 (vector, single structure) ...................................... ...................................... 20-1570
ST3 (vector, multiple structures) ........................................................................ 20-1571
ST3 (vector, single structure) ...................................... ...................................... 20-1572
ST4 (vector, multiple structures) ........................................................................ 20-1573
ST4 (vector, single structure) ...................................... ...................................... 20-1574
SUB (vector) ...................................................................................................... 20-1576
SUBHN, SUBHN2 (vector) ................................................................................ 20-1577
SUQADD (vector) .............................................................................................. 20-1578
SXTL, SXTL2 (vector) ........................................... ........................................... 20-1579
TBL (vector) ................................................... ................................................... 20-1580
TBX (vector) ................................................... ................................................... 20-1581
TRN1 (vector) .................................................................................................... 20-1582
TRN2 (vector) .................................................................................................... 20-1583

20.225 UABA (vector) .................................................................................................... 20-1584
20.226
20.227
20.228
20.229
20.230
20.231
20.232
20.233
20.234
20.235
20.236
20.237
20.238
20.239
20.240
20.241
20.242
20.243
20.244
20.245
20.246
20.247
20.248
20.249
20.250
20.251
20.252
20.253
20.254
20.255
20.256
20.257
20.258
20.259
20.260
20.261
20.262
20.263
20.264
20.265
20.266
20.267
20.268
20.269
20.270
20.271
20.272
20.273
20.274
ARM DUI0801G

UABAL, UABAL2 (vector) .................................................................................. 20-1585
UABD (vector) ................................................. ................................................. 20-1586
UABDL, UABDL2 (vector) ........................................ ........................................ 20-1587
UADALP (vector) ............................................... ............................................... 20-1588
UADDL, UADDL2 (vector) ........................................ ........................................ 20-1589
UADDLP (vector) ............................................... ............................................... 20-1590
UADDLV (vector) ............................................... ............................................... 20-1591
UADDW, UADDW2 (vector) ....................................... ....................................... 20-1592
UCVTF (vector, fixed-point) ....................................... ....................................... 20-1593
UCVTF (vector, integer) .......................................... .......................................... 20-1594
UHADD (vector) ................................................ ................................................ 20-1595
UHSUB (vector) ................................................ ................................................ 20-1596
UMAX (vector) ................................................. ................................................. 20-1597
UMAXP (vector) ................................................ ................................................ 20-1598
UMAXV (vector) ................................................ ................................................ 20-1599
UMIN (vector) .................................................................................................... 20-1600
UMINP (vector) .................................................................................................. 20-1601
UMINV (vector) .................................................................................................. 20-1602
UMLAL, UMLAL2 (vector, by element) .............................................................. 20-1603
UMLAL, UMLAL2 (vector) ........................................ ........................................ 20-1604
UMLSL, UMLSL2 (vector, by element) .............................................................. 20-1605
UMLSL, UMLSL2 (vector) ........................................ ........................................ 20-1606
UMOV (vector) ................................................. ................................................. 20-1607
UMULL, UMULL2 (vector, by element) .............................................................. 20-1608
UMULL, UMULL2 (vector) ........................................ ........................................ 20-1609
UQADD (vector) ................................................ ................................................ 20-1610
UQRSHL (vector) ............................................... ............................................... 20-1611
UQRSHRN, UQRSHRN2 (vector) .................................. .................................. 20-1612
UQSHL (vector, immediate) ....................................... ....................................... 20-1613
UQSHL (vector, register) ......................................... ......................................... 20-1614
UQSHRN, UQSHRN2 (vector) .......................................................................... 20-1615
UQSUB (vector) ................................................ ................................................ 20-1617
UQXTN, UQXTN2 (vector) ................................................................................ 20-1618
URECPE (vector) ............................................... ............................................... 20-1619
URHADD (vector) .............................................................................................. 20-1620
URSHL (vector) ................................................ ................................................ 20-1621
URSHR (vector) ................................................ ................................................ 20-1622
URSQRTE (vector) ............................................................................................ 20-1623
URSRA (vector) ................................................ ................................................ 20-1624
USHL (vector) .................................................................................................... 20-1625
USHLL, USHLL2 (vector) .................................................................................. 20-1626
USHR (vector) ................................................. ................................................. 20-1627
USQADD (vector) .............................................................................................. 20-1628
USRA (vector) ................................................. ................................................. 20-1629
USUBL, USUBL2 (vector) ........................................ ........................................ 20-1630
USUBW, USUBW2 (vector) ....................................... ....................................... 20-1631
UXTL, UXTL2 (vector) ........................................... ........................................... 20-1632
UZP1 (vector) .................................................................................................... 20-1633
UZP2 (vector) .................................................................................................... 20-1634

20.275 XTN, XTN2 (vector) ............................................. ............................................. 20-1635
20.276 ZIP1 (vector) ...................................................................................................... 20-1636
20.277 ZIP2 (vector) ...................................................................................................... 20-1637

Chapter 21

Directives Reference
21.1
21.2
21.3
21.4
21.5
21.6
21.7
21.8
21.9
21.10
21.11
21.12
21.13
21.14
21.15
21.16
21.17
21.18
21.19
21.20
21.21
21.22
21.23
21.24
21.25
21.26
21.27
21.28
21.29
21.30
21.31
21.32
21.33
21.34
21.35
21.36
21.37
21.38
21.39
21.40
21.41
21.42
21.43
21.44
21.45

ARM DUI0801G

Alphabetical list of directives ...................................... ...................................... 21-1640
About assembly control directives .................................. .................................. 21-1641
About frame directives ........................................... ........................................... 21-1642
ALIAS ........................................................ ........................................................ 21-1643
ALIGN ................................................................................................................ 21-1644
AREA ........................................................ ........................................................ 21-1646
ARM or CODE32 directive ........................................ ........................................ 21-1649
ASSERT ............................................................................................................ 21-1650
ATTR ........................................................ ........................................................ 21-1651
CN .......................................................... .......................................................... 21-1652
CODE16 directive .............................................................................................. 21-1653
COMMON .......................................................................................................... 21-1654
CP ...................................................................................................................... 21-1655
DATA .................................................................................................................. 21-1656
DCB ......................................................... ......................................................... 21-1657
DCD and DCDU ................................................ ................................................ 21-1658
DCDO ................................................................................................................ 21-1659
DCFD and DCFDU ............................................................................................ 21-1660
DCFS and DCFSU .............................................. .............................................. 21-1661
DCI .......................................................... .......................................................... 21-1662
DCQ and DCQU ................................................................................................ 21-1663
DCW and DCWU ............................................... ............................................... 21-1664
END ......................................................... ......................................................... 21-1665
ENDFUNC or ENDP .......................................................................................... 21-1666
ENTRY ....................................................... ....................................................... 21-1667
EQU ......................................................... ......................................................... 21-1668
EXPORT or GLOBAL ........................................................................................ 21-1669
EXPORTAS ................................................... ................................................... 21-1671
FIELD ........................................................ ........................................................ 21-1672
FRAME ADDRESS ............................................................................................ 21-1673
FRAME POP .................................................. .................................................. 21-1674
FRAME PUSH ................................................. ................................................. 21-1675
FRAME REGISTER ............................................. ............................................. 21-1676
FRAME RESTORE ............................................................................................ 21-1677
FRAME RETURN ADDRESS ............................................................................ 21-1678
FRAME SAVE .................................................................................................... 21-1679
FRAME STATE REMEMBER ............................................................................ 21-1680
FRAME STATE RESTORE ................................................................................ 21-1681
FRAME UNWIND ON ........................................................................................ 21-1682
FRAME UNWIND OFF ...................................................................................... 21-1683
FUNCTION or PROC ............................................ ............................................ 21-1684
GBLA, GBLL, and GBLS ......................................... ......................................... 21-1685
GET or INCLUDE .............................................................................................. 21-1686
IF, ELSE, ENDIF, and ELIF ................................................................................ 21-1687
IMPORT and EXTERN ...................................................................................... 21-1689
Copyright © 2014-2016 ARM Limited or its affiliates. All rights
reserved.
Non-Confidential

Chapter 22

21.46

INCBIN ....................................................... ....................................................... 21-1691

21.47
21.48
21.49
21.50
21.51
21.52
21.53
21.54
21.55
21.56
21.57
21.58
21.59
21.60
21.61
21.62
21.63
21.64
21.65
21.66
21.67
21.68

INFO .................................................................................................................. 21-1692
KEEP ........................................................ ........................................................ 21-1693
LCLA, LCLL, and LCLS .......................................... .......................................... 21-1694
LTORG ....................................................... ....................................................... 21-1695
MACRO and MEND ............................................. ............................................. 21-1696
MAP ......................................................... ......................................................... 21-1699
MEXIT ................................................................................................................ 21-1700
NOFP ........................................................ ........................................................ 21-1701
OPT ......................................................... ......................................................... 21-1702
QN, DN, and SN ................................................................................................ 21-1704
RELOC .............................................................................................................. 21-1706
REQUIRE .......................................................................................................... 21-1707
REQUIRE8 and PRESERVE8 ..................................... ..................................... 21-1708
RLIST ........................................................ ........................................................ 21-1709
RN .......................................................... .......................................................... 21-1710
ROUT ........................................................ ........................................................ 21-1711
SETA, SETL, and SETS .................................................................................... 21-1712
SPACE or FILL .................................................................................................. 21-1714
THUMB directive ............................................... ............................................... 21-1715
TTL and SUBT ................................................. ................................................. 21-1716
WHILE and WEND ............................................................................................ 21-1717
WN and XN ........................................................................................................ 21-1718

Via File Syntax
22.1
22.2

ARM DUI0801G

Overview of via files ............................................. ............................................. 22-1720
Via file syntax rules ............................................................................................ 22-1721

List of Figures
ARM® Compiler armasm User Guide

Figure 1-1
Figure 3-1
Figure 9-1
Figure 9-2
Figure 10-1
Figure 10-2
Figure 13-1
Figure 13-2
Figure 13-3
Figure 13-4
Figure 13-5
Figure 14-1
Figure 14-2
Figure 14-3
Figure 14-4
Figure 14-5
Figure 14-6
Figure 14-7
Figure 14-8
Figure 14-9
Figure 14-10

ARM DUI0801G

Integration boundaries in ARM Compiler 6. ........................................................................... 1-54
Organization of general-purpose registers and Program Status Registers ........................... 3-68
Extension register bank for Advanced SIMD in AArch32 state ........................................... 9-183
Extension register bank for Advanced SIMD in AArch64 state ........................................... 9-185
Extension register bank for floating-point in AArch32 state ............................................... 10-208
Extension register bank for floating-point in AArch64 state ............................................... 10-210
ASR #3 .............................................................................................................................. 13-341
LSR #3 ............................................................................................................................... 13-342
LSL #3 ............................................................................................................................... 13-342
ROR #3 .............................................................................................................................. 13-342
RRX ................................................................................................................................... 13-343
De-interleaving an array of 3-element structures .............................................................. 14-609
Operation of doubleword VEXT for imm = 3 ...................................................................... 14-649
Example of operation of VPADAL (in this case for data type S16) ................................... 14-693
Example of operation of VPADD (in this case, for data type I16) ...................................... 14-694
Example of operation of doubleword VPADDL (in this case, for data type S16) ............... 14-695
Operation of quadword VSHL.I64 Qd, Qm, #1 .................................................................. 14-726
Operation of quadword VSLI.64 Qd, Qm, #1 ..................................................................... 14-731
Operation of doubleword VSRI.64 Dd, Dm, #2 .................................................................. 14-733
Operation of doubleword VTRN.8 ..................................................................................... 14-746
Operation of doubleword VTRN.32 ................................................................................... 14-746

List of Tables
ARM® Compiler armasm User Guide

Table 3-1
Table 3-2
Table 3-3
Table 3-4
Table 4-1
Table 4-2
Table 4-3
Table 6-1
Table 6-2
Table 6-3
Table 6-4
Table 6-5
Table 6-6
Table 6-7
Table 7-1
Table 7-2
Table 7-3
Table 7-4
Table 7-5
Table 8-1
Table 8-2
Table 8-3
Table 8-4

ARM DUI0801G

ARM processor modes .......................................................................................................... 3-65
Predeclared core registers in AArch32 state ......................................................................... 3-71
Predeclared extension registers in AArch32 state ................................................................. 3-72
A32 instruction groups ........................................................................................................... 3-78
Predeclared core registers in AArch64 state ......................................................................... 4-85
Predeclared extension registers in AArch64 state ................................................................. 4-86
A64 instruction groups ........................................................................................................... 4-92
Syntax differences between UAL and A64 assembly language .......................................... 6-103
A32 state immediate values (8-bit) ...................................................................................... 6-106
A32 state immediate values in MOV instructions ................................................................ 6-106
32-bit T32 immediate values ............................................................................................... 6-107
32-bit T32 immediate values in MOV instructions ............................................................... 6-107
Stack-oriented suffixes and equivalent addressing mode suffixes ...................................... 6-122
Suffixes for load and store multiple instructions .................................................................. 6-122
Condition code suffixes ....................................................................................................... 7-150
Condition code suffixes and related flags ............................................................................ 7-151
Condition codes ................................................................................................................... 7-152
Conditional branches only ................................................................................................... 7-155
All instructions conditional ................................................................................................... 7-156
Built-in variables .................................................................................................................. 8-163
Built-in Boolean constants ................................................................................................... 8-164
Predefined macros .............................................................................................................. 8-164
armclang equivalent command-line options ........................................................................ 8-176

Table 9-1
Table 9-2
Table 9-3
Table 10-1
Table 11-1
Table 11-2
Table 11-3
Table 12-1
Table 12-2
Table 12-3
Table 12-4
Table 12-5
Table 12-6
Table 12-7
Table 12-8
Table 12-9
Table 12-10
Table 13-1
Table 13-2
Table 13-3
Table 13-4
Table 13-5
Table 13-6
Table 13-7
Table 13-8
Table 13-9
Table 13-10
Table 13-11
Table 13-12
Table 13-13
Table 13-14
Table 13-15
Table 13-16
Table 13-17
Table 13-18
Table 14-1
Table 14-2
Table 14-3
Table 14-4
Table 14-5
Table 14-6
Table 14-7
Table 14-8
Table 14-9
Table 14-10
Table 14-11
ARM DUI0801G

Differences in syntax and mnemonics between A32/T32 and A64 Advanced SIMD instructions
.............................................................................................................................................. 9-189
Advanced SIMD data types ................................................................................................. 9-194
Advanced SIMD saturation ranges ...................................................................................... 9-198
Differences in syntax and mnemonics between A32/T32 and A64 floating-point instructions ....
10-213
Supported ARM architectures ............................................................................................ 11-239
Severity of diagnostic messages ....................................................................................... 11-245
Specifying a command-line option and an AREA directive for GNU-stack sections .......... 11-256
Unary operators that return strings .................................................................................... 12-316
Unary operators that return numeric or logical values ....................................................... 12-316
Multiplicative operators ...................................................................................................... 12-318
String manipulation operators ............................................................................................ 12-319
Shift operators ................................................................................................................... 12-320
Addition, subtraction, and logical operators ....................................................................... 12-321
Relational operators .......................................................................................................... 12-322
Boolean operators ............................................................................................................. 12-323
Operator precedence in ARM assembly language ............................................................ 12-325
Operator precedence in C ................................................................................................. 12-325
Summary of instructions .................................................................................................... 13-332
PC-relative offsets ............................................................................................................. 13-349
Register-relative offsets ..................................................................................................... 13-351
B instruction availability and range .................................................................................... 13-359
BL instruction availability and range .................................................................................. 13-366
BLX instruction availability and range ................................................................................ 13-368
BX instruction availability and range .................................................................................. 13-370
BXJ instruction availability and range ................................................................................ 13-372
Permitted instructions inside an IT block ........................................................................... 13-400
Offsets and architectures, LDR, word, halfword, and byte ................................................ 13-409
PC-relative offsets .............................................................................................................. 13-411
Options and architectures, LDR (register offsets) ............................................................. 13-413
Register-relative offsets ..................................................................................................... 13-415
Offsets and architectures, LDR (User mode) .................................................................... 13-419
Offsets and architectures, STR, word, halfword, and byte ................................................ 13-532
Options and architectures, STR (register offsets) ............................................................. 13-534
Offsets and architectures, STR (User mode) .................................................................... 13-536
Range and encoding of expr ............................................................................................. 13-575
Summary of Advanced SIMD instructions ......................................................................... 14-604
Summary of shared Advanced SIMD and floating-point instructions ................................ 14-607
Patterns for immediate value in VBIC (immediate) ............................................................ 14-622
Permitted combinations of parameters for VLDn (single n-element structure to one lane) .... 14653
Permitted combinations of parameters for VLDn (single n-element structure to all lanes) .... 14655
Permitted combinations of parameters for VLDn (multiple n-element structures) ............. 14-657
Available immediate values in VMOV (immediate) ............................................................ 14-673
Available immediate values in VMVN (immediate) ............................................................ 14-687
Patterns for immediate value in VORR (immediate) .......................................................... 14-692
Available immediate ranges in VQRSHRN and VQRSHRUN (by immediate) .................. 14-708
Available immediate ranges in VQSHL and VQSHLU (by immediate) .............................. 14-710
Copyright © 2014-2016 ARM Limited or its affiliates. All rights
reserved.
Non-Confidential

Table 14-12

Available immediate ranges in VQSHRN and VQSHRUN (by immediate) ........................ 14-711

Table 14-13
Table 14-14
Table 14-15
Table 14-16
Table 14-17
Table 14-18
Table 14-19
Table 14-20
Table 14-21
Table 14-22
Table 14-23
Table 14-24
Table 14-25
Table 14-26

Results for out-of-range inputs in VRECPE ....................................................................... 14-714
Results for out-of-range inputs in VRECPS ....................................................................... 14-715
Available immediate ranges in VRSHR (by immediate) .................................................... 14-719
Available immediate ranges in VRSHRN (by immediate) .................................................. 14-720
Results for out-of-range inputs in VRSQRTE .................................................................... 14-722
Results for out-of-range inputs in VRSQRTS .................................................................... 14-723
Available immediate ranges in VRSRA (by immediate) ..................................................... 14-724
Available immediate ranges in VSHL (by immediate) ....................................................... 14-726
Available immediate ranges in VSHLL (by immediate) ..................................................... 14-728
Available immediate ranges in VSHR (by immediate) ....................................................... 14-729
Available immediate ranges in VSHRN (by immediate) .................................................... 14-730
Available immediate ranges in VSRA (by immediate) ....................................................... 14-732
Permitted combinations of parameters for VSTn (multiple n-element structures) ............. 14-735
Permitted combinations of parameters for VSTn (single n-element structure to one lane) .... 14737
Operation of doubleword VUZP.8 ...................................................................................... 14-748
Operation of quadword VUZP.32 ....................................................................................... 14-748
Operation of doubleword VZIP.8 ........................................................................................ 14-749
Operation of quadword VZIP.32 ........................................................................................ 14-749
Summary of floating-point instructions .............................................................................. 15-752
Summary of A64 general instructions ................................................................................ 16-797
ADD (64-bit general registers) specifier combinations ...................................................... 16-806
ADDS (64-bit general registers) specifier combinations .................................................... 16-810
SYS parameter values corresponding to AT operations .................................................... 16-824
CMN (64-bit general registers) specifier combinations ...................................................... 16-854
CMP (64-bit general registers) specifier combinations ...................................................... 16-858
SYS parameter values corresponding to DC operations ................................................... 16-871
SYS parameter values corresponding to IC operations .................................................... 16-888
SUB (64-bit general registers) specifier combinations ...................................................... 16-951
SUBS (64-bit general registers) specifier combinations .................................................... 16-955
SYS parameter values corresponding to TLBI operations ................................................ 16-967
Summary of A64 data transfer instructions ....................................................................... 17-990
Summary of A64 floating-point instructions ..................................................................... 18-1139
Summary of A64 SIMD scalar instructions ...................................................................... 19-1212
DUP (Scalar) specifier combinations ............................................................................... 19-1231
FCMLA (Scalar) specifier combinations .......................................................................... 19-1243
FCVTZS (Scalar) specifier combinations ........................................................................ 19-1255
FCVTZU (Scalar) specifier combinations ........................................................................ 19-1257
FMLA (Scalar, single-precision and double-precision) specifier combinations ................ 19-1263
FMLS (Scalar, single-precision and double-precision) specifier combinations ................ 19-1264
FMUL (Scalar, single-precision and double-precision) specifier combinations ............... 19-1265
FMULX (Scalar, single-precision and double-precision) specifier combinations ............. 19-1266
MOV (Scalar) specifier combinations .............................................................................. 19-1272
SCVTF (Scalar) specifier combinations ........................................................................... 19-1274
SQDMLAL (Scalar) specifier combinations ..................................................................... 19-1280
SQDMLAL (Scalar) specifier combinations ..................................................................... 19-1281
SQDMLSL (Scalar) specifier combinations ..................................................................... 19-1282
SQDMLSL (Scalar) specifier combinations ..................................................................... 19-1283
SQDMULH (Scalar) specifier combinations .................................................................... 19-1284

Table 14-27
Table 14-28
Table 14-29
Table 14-30
Table 15-1
Table 16-1
Table 16-2
Table 16-3
Table 16-4
Table 16-5
Table 16-6
Table 16-7
Table 16-8
Table 16-9
Table 16-10
Table 16-11
Table 17-1
Table 18-1
Table 19-1
Table 19-2
Table 19-3
Table 19-4
Table 19-5
Table 19-6
Table 19-7
Table 19-8
Table 19-9
Table 19-10
Table 19-11
Table 19-12
Table 19-13
Table 19-14
Table 19-15
Table 19-16
ARM DUI0801G

Table 19-17

SQDMULL (Scalar) specifier combinations ..................................................................... 19-1286

Table 19-18
Table 19-19
Table 19-20
Table 19-21
Table 19-22
Table 19-23
Table 19-24
Table 19-25
Table 19-26
Table 19-27
Table 19-28
Table 19-29
Table 19-30
Table 19-31
Table 19-32
Table 19-33
Table 19-34
Table 20-1
Table 20-2
Table 20-3
Table 20-4
Table 20-5
Table 20-6
Table 20-7
Table 20-8
Table 20-9
Table 20-10
Table 20-11
Table 20-12
Table 20-13
Table 20-14
Table 20-15
Table 20-16
Table 20-17
Table 20-18
Table 20-19
Table 20-20
Table 20-21
Table 20-22
Table 20-23
Table 20-24
Table 20-25
Table 20-26
Table 20-27
Table 20-28
Table 20-29
Table 20-30
Table 20-31
Table 20-32

SQDMULL (Scalar) specifier combinations ..................................................................... 19-1287
SQRDMLAH (Scalar) specifier combinations .................................................................. 19-1289
SQRDMLSH (Scalar) specifier combinations .................................................................. 19-1291
SQRDMULH (Scalar) specifier combinations .................................................................. 19-1293
SQRSHRN (Scalar) specifier combinations .................................................................... 19-1296
SQRSHRUN (Scalar) specifier combinations .................................................................. 19-1297
SQSHL (Scalar) specifier combinations .......................................................................... 19-1298
SQSHLU (Scalar) specifier combinations ........................................................................ 19-1300
SQSHRN (Scalar) specifier combinations ....................................................................... 19-1301
SQSHRUN (Scalar) specifier combinations .................................................................... 19-1302
SQXTN (Scalar) specifier combinations .......................................................................... 19-1304
SQXTUN (Scalar) specifier combinations ....................................................................... 19-1305
UCVTF (Scalar) specifier combinations .......................................................................... 19-1315
UQRSHRN (Scalar) specifier combinations .................................................................... 19-1319
UQSHL (Scalar) specifier combinations .......................................................................... 19-1320
UQSHRN (Scalar) specifier combinations ....................................................................... 19-1322
UQXTN (Scalar) specifier combinations .......................................................................... 19-1324
Summary of A64 SIMD Vector instructions ..................................................................... 20-1338
ADDHN, ADDHN2 (Vector) specifier combinations ......................................................... 20-1351
ADDV (Vector) specifier combinations ............................................................................ 20-1353
DUP (Vector) specifier combinations ............................................................................... 20-1374
DUP (Vector) specifier combinations ............................................................................... 20-1375
EXT (Vector) specifier combinations ............................................................................... 20-1377
FCVTL, FCVTL2 (Vector) specifier combinations ............................................................ 20-1396
FCVTN, FCVTN2 (Vector) specifier combinations .......................................................... 20-1399
FCVTXN{2} (Vector) specifier combinations .................................................................... 20-1404
FCVTZS (Vector) specifier combinations ........................................................................ 20-1405
FCVTZU (Vector) specifier combinations ........................................................................ 20-1407
INS (Vector) specifier combinations ................................................................................ 20-1448
INS (Vector) specifier combinations ................................................................................ 20-1449
LD1 (One register, immediate offset) specifier combinations .......................................... 20-1451
LD1 (Two registers, immediate offset) specifier combinations ........................................ 20-1451
LD1 (Three registers, immediate offset) specifier combinations ..................................... 20-1451
LD1 (Four registers, immediate offset) specifier combinations ....................................... 20-1452
LD1R (Immediate offset) specifier combinations ............................................................. 20-1454
LD2R (Immediate offset) specifier combinations ............................................................. 20-1457
LD3R (Immediate offset) specifier combinations ............................................................. 20-1461
LD4R (Immediate offset) specifier combinations ............................................................. 20-1465
MLA (Vector) specifier combinations ............................................................................... 20-1466
MLS (Vector) specifier combinations ............................................................................... 20-1468
MOV (Vector) specifier combinations .............................................................................. 20-1470
MOV (Vector) specifier combinations .............................................................................. 20-1471
MUL (Vector) specifier combinations ............................................................................... 20-1475
PMULL, PMULL2 (Vector) specifier combinations .......................................................... 20-1485
RADDHN, RADDHN2 (Vector) specifier combinations .................................................... 20-1486
RSHRN, RSHRN2 (Vector) specifier combinations ......................................................... 20-1491
RSUBHN, RSUBHN2 (Vector) specifier combinations .................................................... 20-1492
SABAL, SABAL2 (Vector) specifier combinations ........................................................... 20-1494
SABDL, SABDL2 (Vector) specifier combinations ........................................................... 20-1496

ARM DUI0801G

Table 20-33

SADALP (Vector) specifier combinations ........................................................................ 20-1497

Table 20-34
Table 20-35
Table 20-36
Table 20-37
Table 20-38
Table 20-39
Table 20-40
Table 20-41
Table 20-42
Table 20-43
Table 20-44
Table 20-45
Table 20-46
Table 20-47
Table 20-48
Table 20-49
Table 20-50
Table 20-51
Table 20-52
Table 20-53
Table 20-54
Table 20-55
Table 20-56
Table 20-57
Table 20-58
Table 20-59
Table 20-60
Table 20-61
Table 20-62
Table 20-63
Table 20-64
Table 20-65
Table 20-66
Table 20-67
Table 20-68
Table 20-69
Table 20-70
Table 20-71
Table 20-72
Table 20-73
Table 20-74
Table 20-75
Table 20-76
Table 20-77
Table 20-78
Table 20-79
Table 20-80
Table 20-81
Table 20-82

SADDL, SADDL2 (Vector) specifier combinations .......................................................... 20-1498
SADDLP (Vector) specifier combinations ........................................................................ 20-1499
SADDLV (Vector) specifier combinations ........................................................................ 20-1500
SADDW, SADDW2 (Vector) specifier combinations ........................................................ 20-1501
SCVTF (Vector) specifier combinations ........................................................................... 20-1502
SHL (Vector) specifier combinations ............................................................................... 20-1505
SHLL, SHLL2 (Vector) specifier combinations ................................................................ 20-1506
SHRN, SHRN2 (Vector) specifier combinations .............................................................. 20-1507
SLI (Vector) specifier combinations ................................................................................. 20-1509
SMAXV (Vector) specifier combinations .......................................................................... 20-1512
SMINV (Vector) specifier combinations ........................................................................... 20-1515
SMLAL, SMLAL2 (Vector) specifier combinations ........................................................... 20-1516
SMLAL, SMLAL2 (Vector) specifier combinations ........................................................... 20-1517
SMLSL, SMLSL2 (Vector) specifier combinations ........................................................... 20-1518
SMLSL, SMLSL2 (Vector) specifier combinations ........................................................... 20-1519
SMOV (32-bit) specifier combinations ............................................................................. 20-1520
SMOV (64-bit) specifier combinations ............................................................................. 20-1520
SMULL, SMULL2 (Vector) specifier combinations .......................................................... 20-1521
SMULL, SMULL2 (Vector) specifier combinations .......................................................... 20-1522
SQDMLAL{2} (Vector) specifier combinations ................................................................. 20-1525
SQDMLAL{2} (Vector) specifier combinations ................................................................. 20-1527
SQDMLSL{2} (Vector) specifier combinations ................................................................. 20-1528
SQDMLSL{2} (Vector) specifier combinations ................................................................. 20-1530
SQDMULH (Vector) specifier combinations .................................................................... 20-1531
SQDMULL{2} (Vector) specifier combinations ................................................................. 20-1533
SQDMULL{2} (Vector) specifier combinations ................................................................. 20-1535
SQRDMLAH (Vector) specifier combinations .................................................................. 20-1537
SQRDMLSH (Vector) specifier combinations .................................................................. 20-1539
SQRDMULH (Vector) specifier combinations .................................................................. 20-1541
SQRSHRN{2} (Vector) specifier combinations ................................................................ 20-1544
SQRSHRUN{2} (Vector) specifier combinations ............................................................. 20-1545
SQSHL (Vector) specifier combinations .......................................................................... 20-1546
SQSHLU (Vector) specifier combinations ........................................................................ 20-1548
SQSHRN{2} (Vector) specifier combinations ................................................................... 20-1549
SQSHRUN{2} (Vector) specifier combinations ................................................................ 20-1550
SQXTN{2} (Vector) specifier combinations ...................................................................... 20-1552
SQXTUN{2} (Vector) specifier combinations ................................................................... 20-1553
SRI (Vector) specifier combinations ................................................................................ 20-1555
SRSHR (Vector) specifier combinations .......................................................................... 20-1557
SRSRA (Vector) specifier combinations .......................................................................... 20-1558
SSHLL, SSHLL2 (Vector) specifier combinations ............................................................ 20-1560
SSHR (Vector) specifier combinations ............................................................................ 20-1561
SSRA (Vector) specifier combinations ............................................................................. 20-1562
SSUBL, SSUBL2 (Vector) specifier combinations ........................................................... 20-1563
SSUBW, SSUBW2 (Vector) specifier combinations ........................................................ 20-1564
ST1 (One register, immediate offset) specifier combinations .......................................... 20-1566
ST1 (Two registers, immediate offset) specifier combinations ........................................ 20-1566
ST1 (Three registers, immediate offset) specifier combinations ..................................... 20-1566
ST1 (Four registers, immediate offset) specifier combinations ....................................... 20-1567

ARM DUI0801G

Table 20-83

SUBHN, SUBHN2 (Vector) specifier combinations ......................................................... 20-1577

Table 20-84
Table 20-85
Table 20-86
Table 20-87
Table 20-88
Table 20-89
Table 20-90
Table 20-91
Table 20-92
Table 20-93
Table 20-94
Table 20-95
Table 20-96
Table 20-97
Table 20-98
Table 20-99
Table 20-100
Table 20-101
Table 20-102
Table 20-103
Table 20-104
Table 20-105
Table 20-106
Table 20-107
Table 20-108
Table 20-109
Table 20-110
Table 20-111
Table 20-112
Table 20-113
Table 20-114
Table 21-1
Table 21-2

SXTL, SXTL2 (Vector) specifier combinations ................................................................ 20-1579
UABAL, UABAL2 (Vector) specifier combinations ........................................................... 20-1585
UABDL, UABDL2 (Vector) specifier combinations .......................................................... 20-1587
UADALP (Vector) specifier combinations ........................................................................ 20-1588
UADDL, UADDL2 (Vector) specifier combinations .......................................................... 20-1589
UADDLP (Vector) specifier combinations ........................................................................ 20-1590
UADDLV (Vector) specifier combinations ........................................................................ 20-1591
UADDW, UADDW2 (Vector) specifier combinations ....................................................... 20-1592
UCVTF (Vector) specifier combinations .......................................................................... 20-1593
UMAXV (Vector) specifier combinations .......................................................................... 20-1599
UMINV (Vector) specifier combinations ........................................................................... 20-1602
UMLAL, UMLAL2 (Vector) specifier combinations .......................................................... 20-1603
UMLAL, UMLAL2 (Vector) specifier combinations .......................................................... 20-1604
UMLSL, UMLSL2 (Vector) specifier combinations .......................................................... 20-1605
UMLSL, UMLSL2 (Vector) specifier combinations .......................................................... 20-1606
UMOV (32-bit) specifier combinations ............................................................................. 20-1607
UMULL, UMULL2 (Vector) specifier combinations .......................................................... 20-1608
UMULL, UMULL2 (Vector) specifier combinations .......................................................... 20-1609
UQRSHRN{2} (Vector) specifier combinations ................................................................ 20-1612
UQSHL (Vector) specifier combinations .......................................................................... 20-1613
UQSHRN{2} (Vector) specifier combinations .................................................................. 20-1615
UQXTN{2} (Vector) specifier combinations ..................................................................... 20-1618
URSHR (Vector) specifier combinations .......................................................................... 20-1622
URSRA (Vector) specifier combinations .......................................................................... 20-1624
USHLL, USHLL2 (Vector) specifier combinations ........................................................... 20-1626
USHR (Vector) specifier combinations ............................................................................ 20-1627
USRA (Vector) specifier combinations ............................................................................ 20-1629
USUBL, USUBL2 (Vector) specifier combinations .......................................................... 20-1630
USUBW, USUBW2 (Vector) specifier combinations ........................................................ 20-1631
UXTL, UXTL2 (Vector) specifier combinations ................................................................ 20-1632
XTN, XTN2 (Vector) specifier combinations .................................................................... 20-1635
List of directives ............................................................................................................... 21-1640
OPT directive settings ..................................................................................................... 21-1702

ARM DUI0801G

Preface

This preface introduces the ARM® Compiler armasm User Guide.
It contains the following:
• About this book on page 43.

ARM DUI0801G

Preface
About this book

About this book
ARM® Compiler armasm User Guide. This document provides topic based documentation for using the
ARM assembler (armasm). It contains information on command line options, A32, T32, and A64
instruction sets, Advanced SIMD and floating-point instructions, assembler directives, and supports the
ARMv7 and ARMv8 architectures.
Using this book
This book is organized into the following chapters:
Chapter 1 Overview of the Assembler
Gives an overview of the assemblers provided with ARM® Compiler toolchain.
Chapter 2 Overview of the ARM Architecture
Gives an overview of the ARMv8 architecture.
Chapter 3 Overview of AArch32 state
Gives an overview of the AArch32 state of ARMv8.
Chapter 4 Overview of AArch64 state
Gives an overview of the AArch64 state of ARMv8.
Chapter 5 Structure of Assembly Language Modules
Describes the structure of assembly language source files.
Chapter 6 Writing A32/T32 Assembly Language
Describes the use of a few basic A32 and T32 instructions and the use of macros.
Chapter 7 Condition Codes
Describes condition codes and conditional execution of A64, A32, and T32 code.
Chapter 8 Using armasm
Describes how to use armasm.
Chapter 9 Advanced SIMD Programming
Describes Advanced SIMD assembly language programming.
Chapter 10 Floating-point Programming
Describes floating-point assembly language programming.
Chapter 11 armasm Command-line Options
Describes the armasm command-line syntax and command-line options.
Chapter 12 Symbols, Literals, Expressions, and Operators
Describes how you can use symbols to represent variables, addresses, and constants in code, and
how you can combine these with operators to create numeric or string expressions.
Chapter 13 A32 and T32 Instructions
Describes the A32 and T32 instructions supported in AArch32 state.
Chapter 14 Advanced SIMD Instructions (32-bit)
Describes Advanced SIMD assembly language instructions.
Chapter 15 Floating-point Instructions (32-bit)
Describes floating-point assembly language instructions.
Chapter 16 A64 General Instructions
Describes the A64 general instructions.
Chapter 17 A64 Data Transfer Instructions
Describes the A64 data transfer instructions.
Chapter 18 A64 Floating-point Instructions
Describes the A64 floating-point instructions.

ARM DUI0801G

Preface
About this book

Chapter 19 A64 SIMD Scalar Instructions
Describes the A64 SIMD scalar instructions.
Chapter 20 A64 SIMD Vector Instructions
Describes the A64 SIMD vector instructions.
Chapter 21 Directives Reference
Describes the directives that are provided by the ARM assembler, armasm.
Chapter 22 Via File Syntax
Describes the syntax of via files accepted by armasm.
Glossary
The ARM Glossary is a list of terms used in ARM documentation, together with definitions for those
terms. The ARM Glossary does not contain terms that are industry standard unless the ARM meaning
differs from the generally accepted meaning.
See the ARM Glossary for more information.
Typographic conventions
italic
Introduces special terminology, denotes cross-references, and citations.
bold
Highlights interface elements, such as menu names. Denotes signal names. Also used for terms
in descriptive lists, where appropriate.
monospace

Denotes text that you can enter at the keyboard, such as commands, file and program names,
and source code.
monospace

Denotes a permitted abbreviation for a command or option. You can enter the underlined text
instead of the full command or option name.
monospace italic

Denotes arguments to monospace text where the argument is to be replaced by a specific value.
monospace bold

Denotes language keywords when used outside example code.

Encloses replaceable terms for assembler syntax where they appear in code or code fragments.
For example:
MRC p15, 0, , , ,
SMALL CAPITALS

Used in body text for a few terms that have specific technical meanings, that are defined in the
ARM glossary. For example, IMPLEMENTATION DEFINED, IMPLEMENTATION SPECIFIC, UNKNOWN, and
UNPREDICTABLE.
Feedback
Feedback on this product
If you have any comments or suggestions about this product, contact your supplier and give:
• The product name.
• The product revision or version.
• An explanation with as much information as you can provide. Include symptoms and diagnostic
procedures if appropriate.
Feedback on content
If you have comments on content then send an e-mail to errata@arm.com. Give:
ARM DUI0801G

Preface
About this book

•
•
•
•

The title ARM® Compiler armasm User Guide.
The number ARM DUI0801G.
If applicable, the page number(s) to which your comments refer.
A concise explanation of your comments.

ARM also welcomes general suggestions for additions and improvements.
Note
ARM tests the PDF only in Adobe Acrobat and Acrobat Reader, and cannot guarantee the quality of the
represented document when used with any other PDF reader.

Other information
•
•
•
•

ARM DUI0801G

ARM Information Center.
ARM Technical Support Knowledge Articles.
Support and Maintenance.
ARM Glossary.

Chapter 1
Overview of the Assembler

Gives an overview of the assemblers provided with ARM® Compiler toolchain.
It contains the following sections:
• 1.1 About the ARM Compiler toolchain assemblers on page 1-47.
• 1.2 Key features of the assembler on page 1-48.
• 1.3 How the assembler works on page 1-49.
• 1.4 Directives that can be omitted in pass 2 of the assembler on page 1-51.
• 1.5 Support level definitions on page 1-53.

ARM DUI0801G

1-46

1 Overview of the Assembler
1.1 About the ARM Compiler toolchain assemblers

1.1

About the ARM Compiler toolchain assemblers
The ARM Compiler toolchain provides different assemblers.
They are:
• The freestanding legacy assembler, armasm. Use armasm to assemble existing A64, A32, and T32
assembly language code written in ARM syntax.
• The armclang integrated assembler. Use this to assemble assembly language code written in GNU
syntax.
• An optimizing inline assembler built into armclang. Use this to assemble assembly language code
written in GNU syntax that is used inline in C or C++ source code.
Note
This book only applies to armasm. For information on armclang, see the armclang Reference Guide.
Note
Be aware of the following:
• Generated code might be different between two ARM Compiler releases.
• For a feature release, there might be significant code generation differences.
Note
The command-line option descriptions and related information in the individual ARM Compiler tools
documents describe all the features that ARM Compiler supports. Any features not documented are not
supported and are used at your own risk. You are responsible for making sure that any generated code
using community features on page 1-53 is operating correctly.

Related information
ARM Compiler armclang Reference Guide.
Mixing Assembly Code with C or C++ Code.
Assembling ARM and GNU syntax assembly code.

ARM DUI0801G

1-47

1 Overview of the Assembler
1.2 Key features of the assembler

1.2

Key features of the assembler
The ARM assembler supports instructions, directives, and user-defined macros.
It supports:
• Unified Assembly Language (UAL) for both A32 and T32 code.
• Assembly language for A64 code.
• Advanced SIMD instructions in A64, A32, and T32 code.
• Floating-point instructions in A64, A32, and T32 code.
• Directives in assembly source code.
• Processing of user-defined macros.
Related concepts
1.3 How the assembler works on page 1-49.
6.1 About the Unified Assembler Language on page 6-102.
9.1 Architecture support for Advanced SIMD on page 9-182.
6.22 Use of macros on page 6-130.
Related references
Chapter 9 Advanced SIMD Programming on page 9-181.
Chapter 21 Directives Reference on page 21-1638.

ARM DUI0801G

1-48

1 Overview of the Assembler
1.3 How the assembler works

1.3

How the assembler works
armasm reads the assembly language source code twice before it outputs object code. Each read of the

source code is called a pass.
This is because assembly language source code often contains forward references. A forward reference
occurs when a label is used as an operand, for example as a branch target, earlier in the code than the
definition of the label. The assembler cannot know the address of the forward reference label until it
reads the definition of the label.
During each pass, the assembler performs different functions. In the first pass, the assembler:
•
•
•
•

Checks the syntax of the instruction or directive. It faults if there is an error in the syntax, for
example if a label is specified on a directive that does not accept one.
Determines the size of the instruction and data being assembled and reserves space.
Determines offsets of labels within sections.
Creates a symbol table containing label definitions and their memory addresses.

In the second pass, the assembler:
• Faults if an undefined reference is specified in an instruction operand or directive.
• Encodes the instructions using the label offsets from pass 1, where applicable.
• Generates relocations.
• Generates debug information if requested.
• Outputs the object file.
Memory addresses of labels are determined and finalized in the first pass. Therefore, the assembly code
must not change during the second pass. All instructions must be seen in both passes. Therefore you
must not define a symbol after a :DEF: test for the symbol. The assembler faults if it sees code in pass 2
that was not seen in pass 1.
Line not seen in pass 1
The following example shows that num EQU 42 is not seen in pass 1 but is seen in pass 2:
AREA x,CODE
[ :DEF: foo
num EQU 42
]
foo DCD num
END

Assembling this code generates the error:
A1903E: Line not seen in first pass; cannot be assembled.

Line not seen in pass 2
The following example shows that MOV r1,r2 is seen in pass 1 but not in pass 2:
AREA x,CODE
[ :LNOT: :DEF: foo
MOV r1, r2
]
foo MOV r3, r4
END

Assembling this code generates the error:
A1909E: Line not seen in second pass; cannot be assembled.

Related concepts
8.13 Two pass assembler diagnostics on page 8-175.
6.25 Instruction and directive relocations on page 6-134.

ARM DUI0801G

1-49

1 Overview of the Assembler
1.3 How the assembler works

Related references
1.4 Directives that can be omitted in pass 2 of the assembler on page 1-51.
11.17 --diag_error=tag[,tag,…] on page 11-245.
11.14 --debug on page 11-242.

ARM DUI0801G

1-50

1 Overview of the Assembler
1.4 Directives that can be omitted in pass 2 of the assembler

1.4

Directives that can be omitted in pass 2 of the assembler
Most directives must appear in both passes of the assembly process. You can omit some directives from
the second pass over the source code by the assembler, but doing this is strongly discouraged.
Directives that can be omitted from pass 2 are:
• GBLA, GBLL, GBLS.
• LCLA, LCLL, LCLS.
• SETA, SETL, SETS.
• RN, RLIST.
• CN, CP.
• SN, DN, QN.
• EQU.
• MAP, FIELD.
• GET, INCLUDE.
• IF, ELSE, ELIF, ENDIF.
• WHILE, WEND.
• ASSERT.
• ATTR.
• COMMON.
• EXPORTAS.
• IMPORT.
• EXTERN.
• KEEP.
• MACRO, MEND, MEXIT.
• REQUIRE8.
• PRESERVE8.
Note
Macros that appear only in pass 1 and not in pass 2 must contain only these directives.

ASSERT directive appears in pass 1 only
The code in the following example assembles without error although the ASSERT directive does not
appear in pass 2:
AREA ||.text||,CODE
EQU 42
IF :LNOT: :DEF: sym
ASSERT x == 42
ENDIF
sym EQU 1
END
x

Use of ELSE and ELIF directives
Directives that appear in pass 2 but do not appear in pass 1 cause an assembly error. However, this does
not cause an assembly error when using the ELSE and ELIF directives if their matching IF directive
appears in pass 1. The following example assembles without error because the IF directive appears in
pass 1:
AREA ||.text||,CODE
EQU 42
IF :DEF: sym
ELSE
ASSERT x == 42
ENDIF
sym EQU 1
END
x

ARM DUI0801G

1-51

1 Overview of the Assembler
1.4 Directives that can be omitted in pass 2 of the assembler

Related concepts
1.3 How the assembler works on page 1-49.
8.13 Two pass assembler diagnostics on page 8-175.

ARM DUI0801G

1-52

1 Overview of the Assembler
1.5 Support level definitions

1.5

Support level definitions
This describes the levels of support for various ARM Compiler 6 features.
ARM Compiler 6 is built on Clang and LLVM technology and as such, has more functionality than the
set of product features described in the documentation. The following definitions clarify the levels of
support and guarantees on functionality that are expected from these features.
ARM welcomes feedback regarding the use of all ARM Compiler 6 features, and endeavors to support
users to a level that is appropriate for that feature. You can contact support at http://www.arm.com/
support.
Identification in the documentation
All features that are documented in the ARM Compiler 6 documentation are product features, except
where explicitly stated. The limitations of non-product features are explicitly stated.
Product features
Product features are suitable for use in a production environment. The functionality is well-tested, and is
expected to be stable across feature and update releases.
• ARM endeavors to give advance notice of significant functionality changes to product features.
• If you have a support and maintenance contract, ARM provides full support for use of all product
features.
• ARM welcomes feedback on product features.
• Any issues with product features that ARM encounters or is made aware of are considered for fixing
in future versions of ARM Compiler.
In addition to fully supported product features, some product features are only alpha or beta quality.
Beta product features
Beta product features are implementation complete, but have not been sufficiently tested to be
regarded as suitable for use in production environments.
Beta product features are indicated with [BETA].
• ARM endeavors to document known limitations on beta product features.
• Beta product features are expected to eventually become product features in a future release
of ARM Compiler 6.
• ARM encourages the use of beta product features, and welcomes feedback on them.
• Any issues with beta product features that ARM encounters or is made aware of are
considered for fixing in future versions of ARM Compiler.
Alpha product features
Alpha product features are not implementation complete, and are subject to change in future
releases, therefore the stability level is lower than in beta product features.
Alpha product features are indicated with [ALPHA].
• ARM endeavors to document known limitations of alpha product features.
• ARM encourages the use of alpha product features, and welcomes feedback on them.
• Any issues with alpha product features that ARM encounters or is made aware of are
considered for fixing in future versions of ARM Compiler.
Community features
ARM Compiler 6 is built on LLVM technology and preserves the functionality of that technology where
possible. This means that there are additional features available in ARM Compiler that are not listed in
the documentation. These additional features are known as community features. For information on these
community features, see the documentation for the Clang/LLVM project.

ARM DUI0801G

1-53

1 Overview of the Assembler
1.5 Support level definitions

Where community features are referenced in the documentation, they are indicated with
[COMMUNITY].
• ARM makes no claims about the quality level or the degree of functionality of these features, except
when explicitly stated in this documentation.
• Functionality might change significantly between feature releases.
• ARM makes no guarantees that community features are going to remain functional across update
releases, although changes are expected to be unlikely.
Some community features might become product features in the future, but ARM provides no roadmap
for this. ARM is interested in understanding your use of these features, and welcomes feedback on them.
ARM supports customers using these features on a best-effort basis, unless the features are unsupported.
ARM accepts defect reports on these features, but does not guarantee that these issues are going to be
fixed in future releases.
Guidance on use of community features
There are several factors to consider when assessing the likelihood of a community feature being
functional:
• The following figure shows the structure of the ARM Compiler 6 toolchain:

ARM C library

ARM C++ library

Assembly
Assembly

Source
Source code
code

Assembly
Assembly

LLVM Project
libc++

armclang

armasm
LLVM Project
clang

Objects
Objects

Source
Source code
code
headers
headers

Objects
Objects

armlink
Scatter/Steering/
Scatter/Steering/
Symdefs
Symdefs file
file

Image
Image

Figure 1-1 Integration boundaries in ARM Compiler 6.

The dashed boxes are toolchain components, and any interaction between these components is an
integration boundary. Community features that span an integration boundary might have significant
limitations in functionality. The exception to this is if the interaction is codified in one of the
standards supported by ARM Compiler 6. See Application Binary Interface (ABI) for the ARM®

ARM DUI0801G

1-54

1 Overview of the Assembler
1.5 Support level definitions

•
•

Architecture. Community features that do not span integration boundaries are more likely to work as
expected.
Features primarily used when targeting hosted environments such as Linux or BSD, might have
significant limitations, or might not be applicable, when targeting bare-metal environments.
The Clang implementations of compiler features, particularly those that have been present for a long
time in other toolchains, are likely to be mature. The functionality of new features, such as support
for new language features, is likely to be less mature and therefore more likely to have limited
functionality.

Unsupported features
With both the product and community feature categories, specific features and use-cases are known not
to function correctly, or are not intended for use with ARM Compiler 6.
Limitations of product features are stated in the documentation. ARM cannot provide an exhaustive list
of unsupported features or use-cases for community features. The known limitations on community
features are listed in Community features on page 1-53.
List of known unsupported features
The following is an incomplete list of unsupported features, and might change over time:
• The Clang option -stdlib=libstdc++ is not supported.
• C++ static initialization of local variables is not thread-safe when linked against the standard C++
libraries. For thread-safety, you must provide your own implementation of thread-safe functions as
described in Standard C++ library implementation definition.
Note
This restriction does not apply to the [ALPHA]-supported multi-threaded C++ libraries. Contact the
ARM Support team for more details.
•
•
•

ARM DUI0801G

Use of C11 library features is unsupported.
Any community feature that exclusively pertains to non-ARM architectures is not supported by ARM
Compiler 6.
Compilation for targets that implement architectures older that ARMv7 or ARMv6-M is not
supported.

1-55

Chapter 2
Overview of the ARM Architecture

Gives an overview of the ARMv8 architecture.
It contains the following sections:
• 2.1 About the ARM architecture on page 2-57.
• 2.2 A32 and T32 instruction sets on page 2-58.
• 2.3 A64 instruction set on page 2-59.
• 2.4 Changing between AArch64 and AArch32 states on page 2-60.
• 2.5 Advanced SIMD on page 2-61.
• 2.6 Floating-point hardware on page 2-62.

ARM DUI0801G

2-56

2 Overview of the ARM Architecture
2.1 About the ARM architecture

2.1

About the ARM architecture
The ARM architecture is a load-store architecture. The addressing range depends on whether you are
using the 32-bit or the 64-bit architecture.
ARM processors are typical of RISC processors in that only load and store instructions can access
memory. Data processing instructions operate on register contents only.
ARMv8 is the next major architectural update after ARMv7. It introduces a 64-bit architecture, but
maintains compatibility with existing 32-bit architectures. It uses two execution states:
AArch32
In AArch32 state, code has access to 32-bit general purpose registers.
Code executing in AArch32 state can only use the A32 and T32 instruction sets. This state is
broadly compatible with the ARMv7-A architecture.
AArch64
In AArch64 state, code has access to 64-bit general purpose registers. The AArch64 state exists
only in the ARMv8 architecture.
Code executing in AArch64 state can only use the A64 instruction set.
In the AArch32 execution state, there are the following instruction set states:
A32 state
The state that executes A32 instructions.
T32 state
The state that executes T32 instructions.
Note
Detailed information about the ARMv8 architecture is available under license. Contact your ARM
Account Representative for details.

Related information
ARM Architecture Reference Manual.

ARM DUI0801G

2-57

2 Overview of the ARM Architecture
2.2 A32 and T32 instruction sets

2.2

A32 and T32 instruction sets
A32 instructions are 32 bits wide. T32 instructions are 32-bits wide with 16-bit instructions in some
architectures.
The A32 instruction set provides a comprehensive range of operations.
Most of the functionality of the 32-bit A32 instruction set is available, but some operations require more
instructions. The T32 instruction set provides better code density, at the expense of performance.
The 32-bit and 16-bit T32 instructions together provide almost exactly the same functionality as the A32
instruction set. The T32 instruction set achieves the high performance of A32 code along with the
benefits of better code density.
ARMv6-M, ARMv7-M, ARMv8-M.baseline, and ARMv8-M.mainline do not support the A32
instruction set. On these architectures, instructions must not attempt to change to A32 state. ARMv7-A,
ARMv7-R, ARMv8-A, and ARMv8-R support both A32 and T32 instruction sets.
Note
With the exception of ARMv6-M and ARMv6S-M, assembling code for architectures earlier than
ARMv7 is not supported in ARM Compiler 6.
In ARMv8, the A32 and T32 instruction sets are largely unchanged from ARMv7. They are only
available when the processor is in AArch32 state. The main changes in ARMv8 are the addition of a few
new instructions and the deprecation of some behavior, including many uses of the IT instruction.
ARMv8 also defines an optional Crypto Extension. This extension provides cryptographic and hash
instructions in the A32 instruction set.
Note
•
•

The term A32 is an alias for the ARM instruction set.
The term T32 is an alias for the Thumb® instruction set.

Related references
3.14 A32 and T32 instruction set overview on page 3-78.

ARM DUI0801G

2-58

2 Overview of the ARM Architecture
2.3 A64 instruction set

2.3

A64 instruction set
A64 instructions are 32 bits wide.
ARMv8 introduces a new set of 32-bit instructions called A64, with new encodings and assembly
language. A64 is only available when the processor is in AArch64 state. It provides similar functionality
to the A32 and T32 instruction sets, but gives access to a larger virtual address space, and has some other
changes, including reduced conditionality.
ARMv8 also defines an optional Crypto Extension. This extension provides cryptographic and hash
instructions in the A64 instruction set.
Related references
4.12 A64 instruction set overview on page 4-92.

ARM DUI0801G

2-59

2 Overview of the ARM Architecture
2.4 Changing between AArch64 and AArch32 states

2.4

Changing between AArch64 and AArch32 states
The processor must be in the correct execution state for the instructions it is executing.
A processor that is executing A64 instructions is operating in AArch64 state. In this state, the
instructions can access both the 64-bit and 32-bit registers.
A processor that is executing A32 or T32 instructions is operating in AArch32 state. In this state, the
instructions can only access the 32-bit registers, and not the 64-bit registers.
A processor based on ARMv8 can run applications built for AArch32 and AArch64 states but a change
between AArch32 and AArch64 states can only happen at exception boundaries.
ARM Compiler toolchain builds images for either the AArch32 state or AArch64 state. Therefore, an
image built with ARM Compiler toolchain can either contain only A32 and T32 instructions or only A64
instructions.
A processor can only execute instructions from the instruction set that matches its current execution
state. A processor in AArch32 state cannot execute A64 instructions, and a processor in AArch64 state
cannot execute A32 or T32 instructions. You must ensure that the processor never receives instructions
from the wrong instruction set for the current execution state.
Related references
13.21 BLX, BLXNS on page 13-368.
13.22 BX, BXNS on page 13-370.
21.7 ARM or CODE32 directive on page 21-1649.
21.11 CODE16 directive on page 21-1653.
21.65 THUMB directive on page 21-1715.

ARM DUI0801G

2-60

2 Overview of the ARM Architecture
2.5 Advanced SIMD

2.5

Advanced SIMD
Advanced SIMD is a 64-bit and 128-bit hybrid Single Instruction Multiple Data (SIMD) technology
targeted at advanced media and signal processing applications and embedded processors.
Advanced SIMD is implemented as part of the ARM core, but has its own execution pipelines and a
register bank that is distinct from the ARM core register bank.
Advanced SIMD instructions are available in both A32 and A64. The A64 Advanced SIMD instructions
are based on those in A32. The main differences are the following:
• Different instruction mnemonics and syntax.
• Thirty-two 128-bit vector registers, increased from sixteen in A32.
• A different register packing scheme:
— In A64, smaller registers occupy the low order bits of larger registers. For example, S31 maps to
bits[31:0] of D31.
— In A32, smaller registers are packed into larger registers. For example, S31 maps to bits[63:32] of
D15.
• A64 Advanced SIMD instructions support both single-precision and double-precision floating-point
data types and arithmetic.
• A32 Advanced SIMD instructions support only single-precision floating-point data types.
Related concepts
9.1 Architecture support for Advanced SIMD on page 9-182.
9.4 Views of the Advanced SIMD register bank in AArch32 state on page 9-187.
9.5 Views of the Advanced SIMD register bank in AArch64 state on page 9-188.
Related references
Chapter 9 Advanced SIMD Programming on page 9-181.

ARM DUI0801G

2-61

2 Overview of the ARM Architecture
2.6 Floating-point hardware

2.6

Floating-point hardware
There are several floating-point architecture versions and variants.
The floating-point hardware, together with associated support code, provides single-precision and
double-precision floating-point arithmetic, as defined by IEEE Std. 754-2008 IEEE Standard for
Floating-Point Arithmetic. This document is referred to as the IEEE 754 standard.
The floating-point hardware uses a register bank that is distinct from the ARM core register bank.
Note
The floating-point register bank is shared with the SIMD register bank.
In AArch32 state, floating-point support is largely unchanged from VFPv4, apart from the addition of a
few instructions for compliance with the IEEE 754 standard.
The floating-point architecture in AArch64 state is also based on VFPv4. The main differences are the
following:
• In AArch64 state, the number of 128-bit SIMD and floating-point registers increases from sixteen to
thirty-two.
• Single-precision registers are no longer packed into double-precision registers, so register Sx is
Dx[31:0].
• The presence of floating-point hardware is mandated, so software floating-point linkage is not
supported.
• Earlier versions of the floating-point architecture, for instance VFPv2, VFPv3, and VFPv4, are not
supported in AArch64 state.
• VFP vector mode is not supported in either AArch32 or AArch64 state. Use Advanced SIMD
instructions for vector floating-point.
• Some new instructions have been added, including:
— Direct conversion between half-precision and double-precision.
— Load and store pair, replacing load and store multiple.
— Fused multiply-add and multiply-subtract.
— Instructions for IEEE 754-2008 compatibility.
Related concepts
10.1 Architecture support for floating-point on page 10-207.
10.4 Views of the floating-point extension register bank in AArch32 state on page 10-211.
10.5 Views of the floating-point extension register bank in AArch64 state on page 10-212.
Related references
Chapter 10 Floating-point Programming on page 10-206.

ARM DUI0801G

2-62

Chapter 3
Overview of AArch32 state

Gives an overview of the AArch32 state of ARMv8.
It contains the following sections:
• 3.1 Changing between A32 and T32 instruction set states on page 3-64.
• 3.2 Processor modes, and privileged and unprivileged software execution on page 3-65.
• 3.3 Processor modes in ARMv6-M, ARMv7-M, and ARMv8-M on page 3-66.
• 3.4 Registers in AArch32 state on page 3-67.
• 3.5 General-purpose registers in AArch32 state on page 3-69.
• 3.6 Register accesses in AArch32 state on page 3-70.
• 3.7 Predeclared core register names in AArch32 state on page 3-71.
• 3.8 Predeclared extension register names in AArch32 state on page 3-72.
• 3.9 Program Counter in AArch32 state on page 3-73.
• 3.10 The Q flag in AArch32 state on page 3-74.
• 3.11 Application Program Status Register on page 3-75.
• 3.12 Current Program Status Register in AArch32 state on page 3-76.
• 3.13 Saved Program Status Registers in AArch32 state on page 3-77.
• 3.14 A32 and T32 instruction set overview on page 3-78.
• 3.15 Access to the inline barrel shifter in AArch32 state on page 3-79.

ARM DUI0801G

3-63

3 Overview of AArch32 state
3.1 Changing between A32 and T32 instruction set states

3.1

Changing between A32 and T32 instruction set states
A processor that is executing A32 instructions is operating in A32 instruction set state. A processor that
is executing T32 instructions is operating in T32 instruction set state. For brevity, this document refers to
them as the A32 state and T32 state respectively.
A processor in A32 state cannot execute T32 instructions, and a processor in T32 state cannot execute
A32 instructions. You must ensure that the processor never receives instructions of the wrong instruction
set for the current state.
The initial state after reset depends on the processor being used and its configuration.
To direct armasm to generate A32 or T32 instruction encodings, you must set the assembler mode using
an ARM or THUMB directive. Assembly code using CODE32 and CODE16 directives can still be assembled,
but ARM recommends you use ARM and THUMB for new code.
These directives do not change the instruction set state of the processor. To do this, you must use an
appropriate instruction, for example BX or BLX to change between A32 and T32 states when performing a
branch.
Related references
13.21 BLX, BLXNS on page 13-368.
13.22 BX, BXNS on page 13-370.
21.7 ARM or CODE32 directive on page 21-1649.
21.11 CODE16 directive on page 21-1653.
21.65 THUMB directive on page 21-1715.

ARM DUI0801G

3-64

3 Overview of AArch32 state
3.2 Processor modes, and privileged and unprivileged software execution

3.2

Processor modes, and privileged and unprivileged software execution
The ARM architecture supports different levels of execution privilege. The privilege level depends on
the processor mode.
Note
ARMv6-M, ARMv7-M, ARMv8-M.baseline, and ARMv8-M.mainline do not support the same modes as
other ARM architectures and profiles. Some of the processor modes listed here do not apply to these
architectures.

Table 3-1 ARM processor modes
Processor mode Mode number
User

0b10000

FIQ

0b10001

IRQ

0b10010

Supervisor

0b10011

Monitor

0b10110

Abort

0b10111

Hyp

0b11010

Undefined

0b11011

System

0b11111

User mode is an unprivileged mode, and has restricted access to system resources. All other modes have
full access to system resources in the current security state, can change mode freely, and execute
software as privileged.
Applications that require task protection usually execute in User mode. Some embedded applications
might run entirely in any mode other than User mode. An application that requires full access to system
resources usually executes in System mode.
Modes other than User mode are entered to service exceptions, or to access privileged resources.
Code can run in either a Secure state or in a Non-secure state. Hypervisor (Hyp) mode has privileged
execution in Non-secure state.
Related concepts
3.3 Processor modes in ARMv6-M, ARMv7-M, and ARMv8-M on page 3-66.
Related information
ARM Architecture Reference Manual.

ARM DUI0801G

3-65

3 Overview of AArch32 state
3.3 Processor modes in ARMv6-M, ARMv7-M, and ARMv8-M

3.3

Processor modes in ARMv6-M, ARMv7-M, and ARMv8-M
The processor modes available in ARMv6-M, ARMv7-M, ARMv8-M.baseline, and ARMv8-M.mainline
are Thread mode and Handler mode.
Thread mode is the normal mode that programs run in. Thread mode can be privileged or unprivileged
software execution. Handler mode is the mode that exceptions are handled in. It is always privileged
software execution.
Related concepts
3.2 Processor modes, and privileged and unprivileged software execution on page 3-65.
Related information
ARM Architecture Reference Manual.

ARM DUI0801G

3-66

3 Overview of AArch32 state
3.4 Registers in AArch32 state

3.4

Registers in AArch32 state
ARM processors provide general-purpose and special-purpose registers. Some additional registers are
available in privileged execution modes.
In all ARM processors in AArch32 state, the following registers are available and accessible in any
processor mode:
•
•
•

15 general-purpose registers R0-R12, the Stack Pointer (SP), and Link Register (LR).
1 Program Counter (PC).
1 Application Program Status Register (APSR).
Note

•

SP and LR can be used as general-purpose registers, although ARM deprecates using SP other than as
a stack pointer.

Additional registers are available in privileged software execution. ARM processors have a total of 43
registers. The registers are arranged in partially overlapping banks. There is a different register bank for
each processor mode. The banked registers give rapid context switching for dealing with processor
exceptions and privileged operations.
The additional registers in ARM processors are:
•
•
•
•
•
•
•
•
•

2 supervisor mode registers for banked SP and LR.
2 abort mode registers for banked SP and LR.
2 undefined mode registers for banked SP and LR.
2 interrupt mode registers for banked SP and LR.
7 FIQ mode registers for banked R8-R12, SP and LR.
2 monitor mode registers for banked SP and LR.
1 Hyp mode register for banked SP.
7 Saved Program Status Register (SPSRs), one for each exception mode.
1 Hyp mode register for ELR_Hyp to store the preferred return address from Hyp mode.
Note

In privileged software execution, CPSR is an alias for APSR and gives access to additional bits.
The following figure shows how the registers are banked in the ARM architecture.

ARM DUI0801G

3-67

3 Overview of AArch32 state
3.4 Registers in AArch32 state

Application
level view

System level view
User

R0
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
SP
LR
PC

R0_usr
R1_usr
R2_usr
R3_usr
R4_usr
R5_usr
R6_usr
R7_usr
R8_usr
R9_usr
R10_usr
R11_usr
R12_usr
SP_usr
LR_usr
PC

APSR

CPSR

System

Hyp †

SP_hyp

Supervisor

SP_svc
LR_svc

Abort

SP_abt
LR_abt

SPSR_hyp SPSR_svc SPSR_abt
ELR_hyp

Undefined

SP_und
LR_und

Monitor ‡

SP_mon
LR_mon

IRQ

SP_irq
LR_irq

SPSR_und SPSR_mon SPSR_irq

FIQ

R8_fiq
R9_fiq
R10_fiq
R11_fiq
R12_fiq
SP_fiq
LR_fiq

SPSR_fiq

‡ Exists only in Secure state.
† Exists only in Non-secure state.
Cells with no entry indicate that the User mode register is used.

Figure 3-1 Organization of general-purpose registers and Program Status Registers

In ARMv6-M, ARMv7-M, ARMv8-M.baseline, and ARMv8-M.mainline based processors, SP is an
alias for the two banked stack pointer registers:
• Main stack pointer register, that is only available in privileged software execution.
• Process stack pointer register.
Related concepts
3.5 General-purpose registers in AArch32 state on page 3-69.
3.9 Program Counter in AArch32 state on page 3-73.
3.11 Application Program Status Register on page 3-75.
3.13 Saved Program Status Registers in AArch32 state on page 3-77.
3.12 Current Program Status Register in AArch32 state on page 3-76.
3.2 Processor modes, and privileged and unprivileged software execution on page 3-65.
Related information
ARM Architecture Reference Manual.

ARM DUI0801G

3-68

3 Overview of AArch32 state
3.5 General-purpose registers in AArch32 state

3.5

General-purpose registers in AArch32 state
There are restrictions on the use of SP and LR as general-purpose registers.
With the exception of ARMv6-M, ARMv7-M, ARMv8-M.baseline, and ARMv8-M.mainline based
processors, there are 33 general-purpose 32-bit registers, including the banked SP and LR registers.
Fifteen general-purpose registers are visible at any one time, depending on the current processor mode.
These are R0-R12, SP, and LR. The PC (R15) is not considered a general-purpose register.
SP (or R13) is the stack pointer. The C and C++ compilers always use SP as the stack pointer. ARM
deprecates most uses of SP as a general purpose register. In T32 state, SP is strictly defined as the stack
pointer. The instruction descriptions in Chapter 13 A32 and T32 Instructions on page 13-327 describe
when SP and PC can be used.
In User mode, LR (or R14) is used as a link register to store the return address when a subroutine call is
made. It can also be used as a general-purpose register if the return address is stored on the stack.
In the exception handling modes, LR holds the return address for the exception, or a subroutine return
address if subroutine calls are executed within an exception. LR can be used as a general-purpose register
if the return address is stored on the stack.
Related concepts
3.9 Program Counter in AArch32 state on page 3-73.
3.6 Register accesses in AArch32 state on page 3-70.
Related references
3.7 Predeclared core register names in AArch32 state on page 3-71.
13.68 MRS (PSR to general-purpose register) on page 13-437.
13.71 MSR (general-purpose register to PSR) on page 13-441.

ARM DUI0801G

3-69

3 Overview of AArch32 state
3.6 Register accesses in AArch32 state

3.6

Register accesses in AArch32 state
16-bit T32 instructions can access only a limited set of registers. There are also some restrictions on the
use of special-purpose registers by A32 and 32-bit T32 instructions.
Most 16-bit T32 instructions can only access R0 to R7. Only a small number of T32 instructions can
access R8-R12, SP, LR, and PC. Registers R0 to R7 are called Lo registers. Registers R8-R12, SP, LR,
and PC are called Hi registers.
All 32-bit T32 instructions can access R0 to R12, and LR. However, apart from a few designated stack
manipulation instructions, most T32 instructions cannot use SP. Except for a few specific instructions
where PC is useful, most T32 instructions cannot use PC.
In A32 state, all instructions can access R0 to R12, SP, and LR, and most instructions can also access PC
(R15). However, the use of the SP in an A32 instruction, in any way that is not possible in the
corresponding T32 instruction, is deprecated. Explicit use of the PC in an A32 instruction is not usually
useful, and except for specific instances that are useful, such use is deprecated. Implicit use of the PC, for
example in branch instructions or load (literal) instructions, is never deprecated.
The MRS instructions can move the contents of a status register to a general-purpose register, where they
can be manipulated by normal data processing operations. You can use the MSR instruction to move the
contents of a general-purpose register to a status register.
Related concepts
3.5 General-purpose registers in AArch32 state on page 3-69.
3.9 Program Counter in AArch32 state on page 3-73.
3.11 Application Program Status Register on page 3-75.
3.12 Current Program Status Register in AArch32 state on page 3-76.
3.13 Saved Program Status Registers in AArch32 state on page 3-77.
6.20 The Read-Modify-Write operation on page 6-128.
Related references
3.7 Predeclared core register names in AArch32 state on page 3-71.
13.68 MRS (PSR to general-purpose register) on page 13-437.
13.71 MSR (general-purpose register to PSR) on page 13-441.

ARM DUI0801G

3-70

3 Overview of AArch32 state
3.7 Predeclared core register names in AArch32 state

3.7

Predeclared core register names in AArch32 state
Many of the core register names have synonyms.
The following table shows the predeclared core registers:
Table 3-2 Predeclared core registers in AArch32 state
Register names

Meaning

r0-r15 and R0-R15 General purpose registers.
a1-a4

Argument, result or scratch registers. These are synonyms for R0 to R3.

v1-v8

Variable registers. These are synonyms for R4 to R11.

Static base register. This is a synonym for R9.

Intra-procedure call scratch register. This is a synonym for R12.

Stack pointer. This is a synonym for R13.

Link register. This is a synonym for R14.

Program counter. This is a synonym for R15.

With the exception of a1-a4 and v1-v8, you can write the register names either in all upper case or all
lower case.
Related concepts
3.5 General-purpose registers in AArch32 state on page 3-69.

ARM DUI0801G

3-71

3 Overview of AArch32 state
3.8 Predeclared extension register names in AArch32 state

3.8

Predeclared extension register names in AArch32 state
You can write the names of Advanced SIMD and floating-point registers either in upper case or lower
case.
The following table shows the predeclared extension register names:
Table 3-3 Predeclared extension registers in AArch32 state
Register names Meaning
Q0-Q15

Advanced SIMD quadword registers

D0-D31

Advanced SIMD doubleword registers, floating-point double-precision registers

S0-S31

Floating-point single-precision registers

You can write the register names either in upper case or lower case.
Related concepts
9.2 Extension register bank mapping for Advanced SIMD in AArch32 state on page 9-183.

ARM DUI0801G

3-72

3 Overview of AArch32 state
3.9 Program Counter in AArch32 state

3.9

Program Counter in AArch32 state
You can use the Program Counter explicitly, for example in some T32 data processing instructions, and
implicitly, for example in branch instructions.
The Program Counter (PC) is accessed as PC (or R15). It is incremented by the size of the instruction
executed, which is always four bytes in A32 state. Branch instructions load the destination address into
the PC. You can also load the PC directly using data operation instructions. For example, to branch to the
address in a general purpose register, use:
MOV PC,R0

During execution, the PC does not contain the address of the currently executing instruction. The address
of the currently executing instruction is typically PC–8 for A32, or PC–4 for T32.
Note
ARM recommends you use the BX instruction to jump to an address or to return from a function, rather
than writing to the PC directly.

Related concepts
12.5 Register-relative and PC-relative expressions on page 12-302.
Related references
13.15 B on page 13-359.
13.22 BX, BXNS on page 13-370.
13.24 CBZ and CBNZ on page 13-373.
13.161 TBB and TBH on page 13-554.

ARM DUI0801G

3-73

3 Overview of AArch32 state
3.10 The Q flag in AArch32 state

3.10

The Q flag in AArch32 state
The Q flag indicates overflow or saturation. It is one of the program status flags held in the APSR.
The Q flag is set to 1 when saturation occurs in saturating arithmetic instructions, or when overflow
occurs in certain multiply instructions.
The Q flag is a sticky flag. Although the saturating and certain multiply instructions can set the flag, they
cannot clear it. You can execute a series of such instructions, and then test the flag to find out whether
saturation or overflow occurred at any point in the series, without having to check the flag after each
instruction.
To clear the Q flag, use an MSR instruction to read-modify-write the APSR:
MRS r5, APSR
BIC r5, r5, #(1<<27)
MSR APSR_nzcvq, r5

The state of the Q flag cannot be tested directly by the condition codes. To read the state of the Q flag,
use an MRS instruction.
MRS r6, APSR
TST r6, #(1<<27); Z is clear if Q flag was set

Related concepts
6.20 The Read-Modify-Write operation on page 6-128.
Related references
13.68 MRS (PSR to general-purpose register) on page 13-437.
13.71 MSR (general-purpose register to PSR) on page 13-441.
13.82 QADD on page 13-457.
13.132 SMULxy on page 13-514.
13.134 SMULWy on page 13-516.

ARM DUI0801G

3-74

3 Overview of AArch32 state
3.11 Application Program Status Register

3.11

Application Program Status Register
The Application Program Status Register (APSR) holds the program status flags that are accessible in
any processor mode.
It holds copies of the N, Z, C, and V condition flags. The processor uses them to determine whether or
not to execute conditional instructions.
The APSR also holds:
• The Q (saturation) flag.
• The APSR also holds the GE (Greater than or Equal) flags. The GE flags can be set by the parallel
add and subtract instructions. They are used by the SEL instruction to perform byte-based selection
from two registers.
These flags are accessible in all modes, using the MSR and MRS instructions.
Related concepts
7.1 Conditional instructions on page 7-140.
Related references
7.6 Updates to the condition flags in A32/T32 code on page 7-145.
13.68 MRS (PSR to general-purpose register) on page 13-437.
13.71 MSR (general-purpose register to PSR) on page 13-441.
13.107 SEL on page 13-487.

ARM DUI0801G

3-75

3 Overview of AArch32 state
3.12 Current Program Status Register in AArch32 state

3.12

Current Program Status Register in AArch32 state
The Current Program Status Register (CPSR) holds the same program status flags as the APSR, and
some additional information.
It holds:
• The APSR flags.
• The processor mode.
• The interrupt disable flags.
• Either:
— The instruction set state for ARMv8 (A32 or T32).
— The instruction set state for ARMv7 (ARM or Thumb).
• The endianness state.
• The execution state bits for the IT block.
The execution state bits control conditional execution in the IT block.
Only the APSR flags are accessible in all modes. ARM deprecates using an MSR instruction to change the
endianness bit (E) of the CPSR, in any mode. Each exception level can have its own endianness, but
mixed endianness within an exception level is deprecated.
The SETEND instruction is deprecated in A32 and T32 and has no equivalent in A64.
The execution state bits for the IT block (IT[1:0]) and the T32 bit (T) can be accessed by MRS only in
Debug state.
Related concepts
3.13 Saved Program Status Registers in AArch32 state on page 3-77.
Related references
13.45 IT on page 13-399.
13.68 MRS (PSR to general-purpose register) on page 13-437.
13.71 MSR (general-purpose register to PSR) on page 13-441.
13.108 SETEND on page 13-489.
7.6 Updates to the condition flags in A32/T32 code on page 7-145.

ARM DUI0801G

3-76

3 Overview of AArch32 state
3.13 Saved Program Status Registers in AArch32 state

3.13

Saved Program Status Registers in AArch32 state
The Saved Program Status Register (SPSR) stores the current value of the CPSR when an exception is
taken so that it can be restored after handling the exception.
Each exception handling mode can access its own SPSR. User mode and System mode do not have an
SPSR because they are not exception handling modes.
The execution state bits, including the endianness state and current instruction set state can be accessed
from the SPSR in any exception mode, using the MSR and MRS instructions. You cannot access the SPSR
using MSR or MRS in User or System mode.
Related concepts
3.12 Current Program Status Register in AArch32 state on page 3-76.

ARM DUI0801G

3-77

3 Overview of AArch32 state
3.14 A32 and T32 instruction set overview

3.14

A32 and T32 instruction set overview
A32 and T32 instructions can be grouped by functional area.
All A32 instructions are 32 bits long. Instructions are stored word-aligned, so the least significant two
bits of instruction addresses are always zero in A32 state.
T32 instructions are either 16 or 32 bits long. Instructions are stored half-word aligned. Some
instructions use the least significant bit of the address to determine whether the code being branched to is
T32 or A32.
Before the introduction of 32-bit T32 instructions, the T32 instruction set was limited to a restricted
subset of the functionality of the A32 instruction set. Almost all T32 instructions were 16-bit. Together,
the 32-bit and 16-bit T32 instructions provide functionality that is almost identical to that of the A32
instruction set.
The following table describes some of the functional groupings of the available instructions.
Table 3-4 A32 instruction groups

Instruction group

Description

Branch and control

These instructions do the following:
• Branch to subroutines.
• Branch backwards to form loops.
• Branch forward in conditional structures.
• Make the following instruction conditional without branching.
• Change the processor between A32 state and T32 state.

Data processing

These instructions load or store the value of a single register from or to memory. They can load or store a 32bit word, a 16-bit halfword, or an 8-bit unsigned byte. Byte and halfword loads can either be sign extended or
zero extended to fill the 32-bit register.
A few instructions are also defined that can load or store 64-bit doubleword values into two 32-bit registers.

Multiple register load
and store

These instructions load or store any subset of the general-purpose registers from or to memory.

Status register access

These instructions move the contents of a status register to or from a general-purpose register.

Related concepts
6.14 Load and store multiple register instructions on page 6-120.

ARM DUI0801G

3-78

3 Overview of AArch32 state
3.15 Access to the inline barrel shifter in AArch32 state

3.15

Access to the inline barrel shifter in AArch32 state
The ARM arithmetic logic unit has a 32-bit barrel shifter that is capable of shift and rotate operations.
The second operand to many A32 and T32 data-processing and single register data-transfer instructions
can be shifted, before the data-processing or data-transfer is executed, as part of the instruction. This
supports, but is not limited to:
• Scaled addressing.
• Multiplication by an immediate value.
• Constructing immediate values.
32-bit T32 instructions give almost the same access to the barrel shifter as A32 instructions.
16-bit T32 instructions only allow access to the barrel shifter using separate instructions.
Related concepts
6.4 Load immediate values on page 6-105.
6.5 Load immediate values using MOV and MVN on page 6-106.

ARM DUI0801G

3-79

Chapter 4
Overview of AArch64 state

Gives an overview of the AArch64 state of ARMv8.
It contains the following sections:
• 4.1 Registers in AArch64 state on page 4-81.
• 4.2 Exception levels on page 4-82.
• 4.3 Link registers on page 4-83.
• 4.4 Stack Pointer register on page 4-84.
• 4.5 Predeclared core register names in AArch64 state on page 4-85.
• 4.6 Predeclared extension register names in AArch64 state on page 4-86.
• 4.7 Program Counter in AArch64 state on page 4-87.
• 4.8 Conditional execution in AArch64 state on page 4-88.
• 4.9 The Q flag in AArch64 state on page 4-89.
• 4.10 Process State on page 4-90.
• 4.11 Saved Program Status Registers in AArch64 state on page 4-91.
• 4.12 A64 instruction set overview on page 4-92.

ARM DUI0801G

4-80

4 Overview of AArch64 state
4.1 Registers in AArch64 state

4.1

Registers in AArch64 state
ARM processors provide general-purpose and special-purpose registers. Some additional registers are
available in privileged execution modes.
In AArch64 state, the following registers are available:
• Thirty-one 64-bit general-purpose registers X0-X30, the bottom halves of which are accessible as
W0-W30.
• Four stack pointer registers SP_EL0, SP_EL1, SP_EL2, SP_EL3.
• Three exception link registers ELR_EL1, ELR_EL2, ELR_EL3.
• Three saved program status registers SPSR_EL1, SPSR_EL2, SPSR_EL3.
• One program counter.
All these registers are 64 bits wide except SPSR_EL1, SPSR_EL2, and SPSR_EL3, which are 32 bits
wide.
Most A64 integer instructions can operate on either 32-bit or 64-bit registers. The register width is
determined by the register identifier, where W means 32-bit and X means 64-bit. The names Wn and Xn,
where n is in the range 0-30, refer to the same register. When you use the 32-bit form of an instruction,
the upper 32 bits of the source registers are ignored and the upper 32 bits of the destination register are
set to zero.
There is no register named W31 or X31. Depending on the instruction, register 31 is either the stack
pointer or the zero register. When used as the stack pointer, you refer to it as SP. When used as the zero
register, you refer to it as WZR in a 32-bit context or XZR in a 64-bit context.
Related concepts
4.2 Exception levels on page 4-82.
4.3 Link registers on page 4-83.
4.4 Stack Pointer register on page 4-84.
4.7 Program Counter in AArch64 state on page 4-87.
4.8 Conditional execution in AArch64 state on page 4-88.
4.11 Saved Program Status Registers in AArch64 state on page 4-91.

ARM DUI0801G

4-81

4 Overview of AArch64 state
4.2 Exception levels

4.2

Exception levels
ARMv8 defines four exception levels, EL0 to EL3, where EL3 is the highest exception level with the
most execution privilege. When taking an exception, the exception level can either increase or remain the
same, and when returning from an exception, it can either decrease or remain the same.
The following is a common usage model for the exception levels:
EL0
Applications.
EL1
OS kernels and associated functions that are typically described as privileged.
EL2
Hypervisor.
EL3
Secure monitor.
When taking an exception to a higher exception level, the execution state can either remain the same, or
change from AArch32 to AArch64.
When returning to a lower exception level, the execution state can either remain the same or change from
AArch64 to AArch32.
The only way the execution state can change is by taking or returning from an exception. It is not
possible to change between execution states in the same way as changing between A32 and T32 code in
AArch32 state.
On powerup and on reset, the processor enters the highest implemented exception level. The execution
state for this exception level is a property of the implementation, and might be determined by a
configuration input signal.
For exception levels other than EL0, the execution state is determined by one or more control register
configuration bits. These bits can be set only in a higher exception level.
For EL0, the execution state is determined as part of the exception return to EL0, under the control of the
exception level that the execution is returning from.
Related concepts
4.3 Link registers on page 4-83.
4.11 Saved Program Status Registers in AArch64 state on page 4-91.
2.4 Changing between AArch64 and AArch32 states on page 2-60.
4.10 Process State on page 4-90.

ARM DUI0801G

4-82

4 Overview of AArch64 state
4.3 Link registers

4.3

Link registers
In AArch64 state, the Link Register (LR) stores the return address when a subroutine call is made. It can
also be used as a general-purpose register if the return address is stored on the stack. The LR maps to
register 30. Unlike in AArch32 state, the LR is distinct from the Exception Link Registers (ELRs) and is
therefore unbanked.
There are three Exception Link Registers, ELR_EL1, ELR_EL2, and ELR_EL3, that correspond to each
of the exception levels. When an exception is taken, the Exception Link Register for the target exception
level stores the return address to jump to after the handling of that exception completes. If the exception
was taken from AArch32 state, the top 32 bits in the ELR are all set to zero. Subroutine calls within the
exception level use the LR to store the return address from the subroutine.
For example when the exception level changes from EL0 to EL1, the return address is stored in
ELR_EL1.
When in an exception level, if you enable interrupts that use the same exception level, you must ensure
you store the ELR on the stack because it will be overwritten with a new return address when the
interrupt is taken.
Related concepts
4.7 Program Counter in AArch64 state on page 4-87.
3.5 General-purpose registers in AArch32 state on page 3-69.
Related references
4.5 Predeclared core register names in AArch64 state on page 4-85.

ARM DUI0801G

4-83

4 Overview of AArch64 state
4.4 Stack Pointer register

4.4

Stack Pointer register
In AArch64 state, SP represents the 64-bit Stack Pointer. SP_EL0 is an alias for SP. Do not use SP as a
general purpose register.
You can only use SP as an operand in the following instructions:
• As the base register for loads and stores. In this case it must be quadword-aligned before adding any
offset, or a stack alignment exception occurs.
• As a source or destination for arithmetic instructions, but it cannot be used as the destination in
instructions that set the condition flags.
• In logical instructions, for example in order to align it.
There is a separate stack pointer for each of the three exception levels, SP_EL1, SP_EL2, and SP_EL3.
Within an exception level you can either use the dedicated stack pointer for that exception level or you
can use SP_EL0, the stack pointer associated with EL0. You can use the SPSel register to select which
stack pointer to use in the exception level.
The choice of stack pointer is indicated by the letter t or h appended to the exception level name, for
example EL0t or EL3h. The t suffix indicates that the exception level uses SP_EL0 and the h suffix
indicates it uses SP_ELx, where x is the current exception level number. EL0 always uses SP_EL0 so
cannot have an h suffix.
Related concepts
3.5 General-purpose registers in AArch32 state on page 3-69.
4.2 Exception levels on page 4-82.
4.10 Process State on page 4-90.
Related references
4.1 Registers in AArch64 state on page 4-81.

ARM DUI0801G

4-84

4 Overview of AArch64 state
4.5 Predeclared core register names in AArch64 state

4.5

Predeclared core register names in AArch64 state
In AArch64 state, the predeclared core registers are different from those in AArch32 state.
The following table shows the predeclared core registers in AArch64 state:
Table 4-1 Predeclared core registers in AArch64 state
Register names Meaning
W0-W30

32-bit general purpose registers.

X0-X30

64-bit general purpose registers.

WZR

32-bit RAZ/WI register. This is the name for register 31 when it is used as the zero register in a 32-bit context.

XZR

64-bit RAZ/WI register. This is the name for register 31 when it is used as the zero register in a 64-bit context.

WSP

32-bit stack pointer. This is the name for register 31 when it is used as the stack pointer in a 32-bit context.

64-bit stack pointer. This is the name for register 31 when it is used as the stack pointer in a 64-bit context.

Link register. This is a synonym for X30.

You can write the register names either in all upper case or all lower case.
Note
In AArch64 state, the PC is not a general purpose register and you cannot access it by name.

Related concepts
4.3 Link registers on page 4-83.
4.4 Stack Pointer register on page 4-84.
4.7 Program Counter in AArch64 state on page 4-87.
Related references
3.7 Predeclared core register names in AArch32 state on page 3-71.
4.1 Registers in AArch64 state on page 4-81.

ARM DUI0801G

4-85

4 Overview of AArch64 state
4.6 Predeclared extension register names in AArch64 state

4.6

Predeclared extension register names in AArch64 state
You can write the names of Advanced SIMD and floating-point registers either in upper case or lower
case.
The following table shows the predeclared extension register names in AArch64 state:
Table 4-2 Predeclared extension registers in AArch64 state
Register names Meaning
V0-V31

Advanced SIMD 128-bit vector registers.

Q0-Q31

Advanced SIMD registers holding a 128-bit scalar.

D0-D31

Advanced SIMD registers holding a 64-bit scalar, floating-point double-precision registers.

S0-S31

Advanced SIMD registers holding a 32-bit scalar, floating-point single-precision registers.

H0-H31

Advanced SIMD registers holding a 16-bit scalar, floating-point half-precision registers.

B0-B31

Advanced SIMD registers holding an 8-bit scalar.

Related concepts
9.3 Extension register bank mapping for Advanced SIMD in AArch64 state on page 9-185.
Related references
3.8 Predeclared extension register names in AArch32 state on page 3-72.
4.1 Registers in AArch64 state on page 4-81.

ARM DUI0801G

4-86

4 Overview of AArch64 state
4.7 Program Counter in AArch64 state

4.7

Program Counter in AArch64 state
In AArch64 state, the Program Counter (PC) contains the address of the currently executing instruction.
It is incremented by the size of the instruction executed, which is always four bytes.
In AArch64 state, the PC is not a general purpose register and you cannot access it explicitly. The
following types of instructions read it implicitly:
•
•
•
•

Instructions that compute a PC-relative address.
PC-relative literal loads.
Direct branches to a PC-relative label.
Branch and link instructions, which store it in the procedure link register.

The only types of instructions that can write to the PC are:
• Conditional and unconditional branches.
• Exception generation and exception returns.
Branch instructions load the destination address into the PC.
Related concepts
3.9 Program Counter in AArch32 state on page 3-73.
12.5 Register-relative and PC-relative expressions on page 12-302.
Related references
13.15 B on page 13-359.
13.20 BL on page 13-366.
13.21 BLX, BLXNS on page 13-368.
13.22 BX, BXNS on page 13-370.

ARM DUI0801G

4-87

4 Overview of AArch64 state
4.8 Conditional execution in AArch64 state

4.8

Conditional execution in AArch64 state
In AArch64 state, the NZCV register holds copies of the N, Z, C, and V condition flags. The processor
uses them to determine whether or not to execute conditional instructions. The NZCV register contains
the flags in bits[31:28].
The condition flags are accessible in all exception levels, using the MSR and MRS instructions.
A64 makes less use of conditionality than A32. For example, in A64:
• Only a few instructions can set or test the condition flags.
• There is no equivalent of the T32 IT instruction.
• The only conditionally executed instruction, which behaves as a NOP if the condition is false, is the
conditional branch, B.cond.
Related concepts
3.11 Application Program Status Register on page 3-75.
7.1 Conditional instructions on page 7-140.
Related references
7.6 Updates to the condition flags in A32/T32 code on page 7-145.
7.7 Updates to the condition flags in A64 code on page 7-146.
13.68 MRS (PSR to general-purpose register) on page 13-437.
13.71 MSR (general-purpose register to PSR) on page 13-441.

ARM DUI0801G

4-88

4 Overview of AArch64 state
4.9 The Q flag in AArch64 state

4.9

The Q flag in AArch64 state
In AArch64 state, you cannot read or write to the Q flag because in A64 there are no saturating
arithmetic instructions that operate on the general purpose registers.
The Advanced SIMD saturating arithmetic instructions set the QC bit in the floating-point status register
(FPSR) to indicate that saturation has occurred. You can identify such instructions by the Q mnemonic
modifier, for example SQADD.
Related references
Chapter 19 A64 SIMD Scalar Instructions on page 19-1209.
Chapter 20 A64 SIMD Vector Instructions on page 20-1332.

ARM DUI0801G

4-89

4 Overview of AArch64 state
4.10 Process State

4.10

Process State
In AArch64 state, there is no Current Program Status Register (CPSR). You can access the different
components of the traditional CPSR independently as Process State fields.
The Process State fields are:
•
•
•
•
•
•
•

N, Z, C, and V condition flags (NZCV).
Current register width (nRW).
Stack pointer selection bit (SPSel).
Interrupt disable flags (DAIF).
Current exception level (EL).
Single step process state bit (SS).
Illegal exception return state bit (IL).

You can use MSR to write to:
•
•
•

The N, Z, C, and V flags in the NZCV register.
The interrupt disable flags in the DAIF register.
The SP selection bit in the SPSel register, in EL1 or higher.

You can use MRS to read:
• The N, Z, C, and V flags in the NZCV register.
• The interrupt disable flags in the DAIF register.
• The exception level bits in the CurrentEL register, in EL1 or higher.
• The SP selection bit in the SPSel register, in EL1 or higher.
When an exception occurs, all Process State fields associated with the current exception level are stored
in a single register associated with the target exception level, the SPSR. You can access the SS, IL, and
nRW bits only from the SPSR.
Related concepts
3.12 Current Program Status Register in AArch32 state on page 3-76.
3.13 Saved Program Status Registers in AArch32 state on page 3-77.
4.11 Saved Program Status Registers in AArch64 state on page 4-91.
Related references
7.6 Updates to the condition flags in A32/T32 code on page 7-145.
7.7 Updates to the condition flags in A64 code on page 7-146.
13.68 MRS (PSR to general-purpose register) on page 13-437.
13.71 MSR (general-purpose register to PSR) on page 13-441.

ARM DUI0801G

4-90

4 Overview of AArch64 state
4.11 Saved Program Status Registers in AArch64 state

4.11

Saved Program Status Registers in AArch64 state
The Saved Program Status Registers (SPSRs) are 32-bit registers that store the process state of the
current exception level when an exception is taken to an exception level that uses AArch64 state. This
allows the process state to be restored after the exception has been handled.
In AArch64 state, each target exception level has its own SPSR:
•
•
•

SPSR_EL1.
SPSR_EL2.
SPSR_EL3.

When taking an exception, the process state of the current exception level is stored in the SPSR of the
target exception level. On returning from an exception, the exception handler uses the SPSR of the
exception level that is being returned from to restore the process state of the exception level that is being
returned to.
Note
On returning from an exception, the preferred return address is restored from the ELR associated with the
exception level that is being returned from.
The SPSRs store the following information:
• N, Z, C, and V flags.
• D, A, I, and F interrupt disable bits.
• The register width.
• The execution mode.
• The IL and SS bits.
Related concepts
4.4 Stack Pointer register on page 4-84.
4.10 Process State on page 4-90.
3.13 Saved Program Status Registers in AArch32 state on page 3-77.

ARM DUI0801G

4-91

4 Overview of AArch64 state
4.12 A64 instruction set overview

4.12

A64 instruction set overview
A64 instructions can be grouped by functional area.
The following table describes some of the functional groupings of the instructions in A64.
Table 4-3 A64 instruction groups

Instruction
group

Description

Branch and control These instructions do the following:
• Branch to and return from subroutines.
• Branch backwards to form loops.
• Branch forward in conditional structures.
• Generate and return from exceptions.
Data processing

These instructions operate on the general-purpose registers. They can perform operations such as addition,
subtraction, or bitwise logic on the contents of two registers and place the result in a third register. They can also
operate on the value in a single register, or on a value in a register and an immediate value supplied within the
instruction.
The addition and subtraction instructions can optionally left shift the immediate operand, or can sign or zeroextend and shift the final source operand register.
A64 includes signed and unsigned 32-bit and 64-bit multiply and divide instructions.

These instructions load or store the value of a single register or pair of registers from or to memory. You can load
or store a single 64-bit doubleword, 32-bit word, 16-bit halfword, or 8-bit byte, or a pair of words or
doublewords. Byte and halfword loads can either be sign-extended or zero-extended to fill the 32-bit register.
You can also load and sign-extend a signed byte, halfword or word into a 64-bit register, or load a pair of signed
words into two 64-bit registers.

System register
access

These instructions move the contents of a system register to or from a general-purpose register.

Related references
3.14 A32 and T32 instruction set overview on page 3-78.
Chapter 16 A64 General Instructions on page 16-793.
Chapter 17 A64 Data Transfer Instructions on page 17-986.

ARM DUI0801G

4-92

Chapter 5
Structure of Assembly Language Modules

Describes the structure of assembly language source files.
It contains the following sections:
• 5.1 Syntax of source lines in assembly language on page 5-94.
• 5.2 Literals on page 5-96.
• 5.3 ELF sections and the AREA directive on page 5-97.
• 5.4 An example ARM assembly language module on page 5-98.

ARM DUI0801G

5-93

5 Structure of Assembly Language Modules
5.1 Syntax of source lines in assembly language

5.1

Syntax of source lines in assembly language
The assembler parses and assembles assembly language to produce object code.
Syntax
Each line of assembly language source code has this general form:
{symbol} {instruction|directive|pseudo-instruction} {;comment}

All three sections of the source line are optional.
symbol is usually a label. In instructions and pseudo-instructions it is always a label. In some directives it

is a symbol for a variable or a constant. The description of the directive makes this clear in each case.
symbol must begin in the first column. It cannot contain any white space character such as a space or a

tab unless it is enclosed by bars (|).
Labels are symbolic representations of addresses. You can use labels to mark specific addresses that you
want to refer to from other parts of the code. Numeric local labels are a subclass of labels that begin with
a number in the range 0-99. Unlike other labels, a numeric local label can be defined many times. This
makes them useful when generating labels with a macro.
Directives provide important information to the assembler that either affects the assembly process or
affects the final output image.
Instructions and pseudo-instructions make up the code a processor uses to perform tasks.
Note
Instructions, pseudo-instructions, and directives must be preceded by white space, such as a space or a
tab, irrespective of whether there is a preceding label or not.
Some directives do not allow the use of a label.
A comment is the final part of a source line. The first semicolon on a line marks the beginning of a
comment except where the semicolon appears inside a string literal. The end of the line is the end of the
comment. A comment alone is a valid line. The assembler ignores all comments. You can use blank lines
to make your code more readable.
Considerations when writing assembly language source code
ENTRY
start

stop

; Mark first instruction to execute

MOV
MOV
ADD

r0, #10
r1, #3
r0, r0, r1

MOV
LDR
SVC
END

r0, #0x18
r1, =0x20026
#0x123456

; Set up parameters
; r0 = r0 + r1
;
;
;
;

angel_SWIreason_ReportException
ADP_Stopped_ApplicationExit
ARM semihosting (formerly SWI)
Mark end of file

You must write instruction mnemonics, pseudo-instructions, directives, and symbolic register names
(except a1-a4 and v1-v8 in A32 or T32 instructions) in either all uppercase or all lowercase. You must
not use mixed case. Labels and comments can be in uppercase, lowercase, or mixed case.
AREA

start

stop

ARM DUI0801G

A32ex, CODE, READONLY
; Name this block of code A32ex

ENTRY

; Mark first instruction to execute

MOV
MOV
ADD

r0, #10
r1, #3
r0, r0, r1

; Set up parameters

MOV

r0, #0x18

; angel_SWIreason_ReportException

; r0 = r0 + r1

5-94

5 Structure of Assembly Language Modules
5.1 Syntax of source lines in assembly language
LDR
SVC
END

r1, =0x20026
#0x123456

; ADP_Stopped_ApplicationExit
; ARM semihosting (formerly SWI)
; Mark end of file

To make source files easier to read, you can split a long line of source into several lines by placing a
backslash character (\) at the end of the line. The backslash must not be followed by any other
characters, including spaces and tabs. The assembler treats the backslash followed by end-of-line
sequence as white space. You can also use blank lines to make your code more readable.
Note
Do not use the backslash followed by end-of-line sequence within quoted strings.
The limit on the length of lines, including any extensions using backslashes, is 4095 characters.
Related concepts
12.6 Labels on page 12-303.
12.10 Numeric local labels on page 12-307.
12.13 String literals on page 12-310.
Related references
5.2 Literals on page 5-96.
12.1 Symbol naming rules on page 12-298.
12.15 Syntax of numeric literals on page 12-312.

ARM DUI0801G

5-95

5 Structure of Assembly Language Modules
5.2 Literals

5.2

Literals
Assembly language source code can contain numeric, string, Boolean, and single character literals.
Literals can be expressed as:
• Decimal numbers, for example 123.
• Hexadecimal numbers, for example 0x7B.
• Numbers in any base from 2 to 9, for example 5_204 is a number in base 5.
• Floating point numbers, for example 123.4.
• Boolean values {TRUE} or {FALSE}.
• Single character values enclosed by single quotes, for example 'w'.
• Strings enclosed in double quotes, for example "This is a string".
Note
In most cases, a string containing a single character is accepted as a single character value. For example
ADD r0,r1,#"a" is accepted, but ADD r0,r1,#"ab" is faulted.
You can also use variables and names to represent literals.
Related references
5.1 Syntax of source lines in assembly language on page 5-94.

ARM DUI0801G

5-96

5 Structure of Assembly Language Modules
5.3 ELF sections and the AREA directive

5.3

ELF sections and the AREA directive
Object files produced by the assembler are divided into sections. In assembly source code, you use the
AREA directive to mark the start of a section.
ELF sections are independent, named, indivisible sequences of code or data. A single code section is the
minimum required to produce an application.
The output of an assembly or compilation can include:
• One or more code sections. These are usually read-only sections.
• One or more data sections. These are usually read-write sections. They might be zero-initialized (ZI).
The linker places each section in a program image according to section placement rules. Sections that are
adjacent in source files are not necessarily adjacent in the application image
Use the AREA directive to name the section and set its attributes. The attributes are placed after the name,
separated by commas.
You can choose any name for your sections. However, names starting with any non-alphabetic character
must be enclosed in bars, or an AREA name missing error is generated. For example, |1_DataArea|.
The following example defines a single read-only section called A32ex that contains code:
AREA

A32ex, CODE, READONLY ; Name this block of code A32ex

Related concepts
5.4 An example ARM assembly language module on page 5-98.
Related references
21.6 AREA on page 21-1646.
Related information
Information about scatter files.

ARM DUI0801G

5-97

5 Structure of Assembly Language Modules
5.4 An example ARM assembly language module

5.4

An example ARM assembly language module
An ARM assembly language module has several constituent parts.
These are:
• ELF sections (defined by the AREA directive).
• Application entry (defined by the ENTRY directive).
• Application execution.
• Application termination.
• Program end (defined by the END directive).
Constituents of an A32 assembly language module
The following example defines a single section called A32ex that contains code and is marked as being
READONLY. This example uses the A32 instruction set.
AREA
start

stop

ENTRY

A32ex, CODE, READONLY
; Name this block of code A32ex
; Mark first instruction to execute

MOV
MOV
ADD

r0, #10
r1, #3
r0, r0, r1

MOV
LDR
SVC
END

r0, #0x18
r1, =0x20026
#0x123456

; Set up parameters
; r0 = r0 + r1
;
;
;
;

angel_SWIreason_ReportException
ADP_Stopped_ApplicationExit
ARM semihosting (formerly SWI)
Mark end of file

Constituents of an A64 assembly language module
The following example defines a single section called A64ex that contains code and is marked as being
READONLY. This example uses the A64 instruction set.
AREA
start

stop

ENTRY

A64ex, CODE, READONLY
; Name this block of code A64ex
; Mark first instruction to execute

MOV
MOV
ADD

w0, #10
w1, #3
w0, w0, w1

MOV
MOVK
STR
MOV
STR
MOV
MOV
HLT
END

x1, #0x26
x1, #2, LSL #16
x1, [sp,#0]
; ADP_Stopped_ApplicationExit
x0, #0
x0, [sp,#8]
; Exit status code
x1, sp
; x1 contains the address of parameter block
w0, #0x18
; angel_SWIreason_ReportException
0xf000
; AArch64 semihosting
; Mark end of file

; Set up parameters
; w0 = w0 + w1

Constituents of an T32 assembly language module
The following example defines a single section called T32ex that contains code and is marked as being
READONLY. This example uses the T32 instruction set.
AREA

start

stop

ARM DUI0801G

ENTRY
THUMB

T32ex, CODE, READONLY
; Name this block of code T32ex
; Mark first instruction to execute

MOV
MOV
ADD

r0, #10
r1, #3
r0, r0, r1

; Set up parameters

MOV
LDR
SVC
END

r0, #0x18
r1, =0x20026
#0x123456

;
;
;
;

; r0 = r0 + r1
angel_SWIreason_ReportException
ADP_Stopped_ApplicationExit
ARM semihosting (formerly SWI)
Mark end of file

5-98

5 Structure of Assembly Language Modules
5.4 An example ARM assembly language module

Application entry
The ENTRY directive declares an entry point to the program. It marks the first instruction to be executed.
In applications using the C library, an entry point is also contained within the C library initialization
code. Initialization code and exception handlers also contain entry points.
Application execution in A32 or T32 code
The application code begins executing at the label start, where it loads the decimal values 10 and 3 into
registers R0 and R1. These registers are added together and the result placed in R0.
Application execution in A64 code
The application code begins executing at the label start, where it loads the decimal values 10 and 3 into
registers W0 and W1. These registers are added together and the result placed in W0.
Application termination
After executing the main code, the application terminates by returning control to the debugger. You do
this in A32 using the A32 semihosting SVC (0x123456 by default), or in A64, using HLT 0xF000 to
invoke the semihosting interface.
A32 code uses the following parameters:
• R0 equal to angel_SWIreason_ReportException (0x18).
• R1 equal to ADP_Stopped_ApplicationExit (0x20026).
A64 code uses the following parameters:
• W0 equal to angel_SWIreason_ReportException (0x18).
• X1 is the address of a block of two parameters. The first is the exception type,
ADP_Stopped_ApplicationExit (0x20026) and the second is the exit status code.
Program end
The END directive instructs the assembler to stop processing this source file. Every assembly language
source module must finish with an END directive on a line by itself. Any lines following the END directive
are ignored by the assembler.
Related concepts
5.3 ELF sections and the AREA directive on page 5-97.
Related references
21.23 END on page 21-1665.
21.25 ENTRY on page 21-1667.

ARM DUI0801G

5-99

Chapter 6
Writing A32/T32 Assembly Language

Describes the use of a few basic A32 and T32 instructions and the use of macros.
It contains the following sections:
• 6.1 About the Unified Assembler Language on page 6-102.
• 6.2 Syntax differences between UAL and A64 assembly language on page 6-103.
• 6.3 Register usage in subroutine calls on page 6-104.
• 6.4 Load immediate values on page 6-105.
• 6.5 Load immediate values using MOV and MVN on page 6-106.
• 6.6 Load immediate values using MOV32 on page 6-109.
• 6.7 Load immediate values using LDR Rd, =const on page 6-110.
• 6.8 Literal pools on page 6-111.
• 6.9 Load addresses into registers on page 6-113.
• 6.10 Load addresses to a register using ADR on page 6-114.
• 6.11 Load addresses to a register using ADRL on page 6-116.
• 6.12 Load addresses to a register using LDR Rd, =label on page 6-117.
• 6.13 Other ways to load and store registers on page 6-119.
• 6.14 Load and store multiple register instructions on page 6-120.
• 6.15 Load and store multiple register instructions in A32 and T32 on page 6-121.
• 6.16 Stack implementation using LDM and STM on page 6-122.
• 6.17 Stack operations for nested subroutines on page 6-124.
• 6.18 Block copy with LDM and STM on page 6-125.
• 6.19 Memory accesses on page 6-127.
• 6.20 The Read-Modify-Write operation on page 6-128.
• 6.21 Optional hash with immediate constants on page 6-129.
• 6.22 Use of macros on page 6-130.
• 6.23 Test-and-branch macro example on page 6-131.

ARM DUI0801G

6-100

6 Writing A32/T32 Assembly Language

•
•
•
•
•

ARM DUI0801G

6.24 Unsigned integer division macro example on page 6-132.
6.25 Instruction and directive relocations on page 6-134.
6.26 Symbol versions on page 6-136.
6.27 Frame directives on page 6-137.
6.28 Exception tables and Unwind tables on page 6-138.

6-101

6 Writing A32/T32 Assembly Language
6.1 About the Unified Assembler Language

6.1

About the Unified Assembler Language
Unified Assembler Language (UAL) is a common syntax for A32 and T32 instructions. It supersedes
earlier versions of both the ARM and Thumb assembler languages.
Code that is written using UAL can be assembled for A32 or T32 for any ARM processor. armasm faults
the use of unavailable instructions.
armasm can assemble code that is written in pre-UAL and UAL syntax.

By default, armasm expects source code to be written in UAL. armasm accepts UAL syntax if any of the
directives CODE32, ARM, or THUMB is used or if you assemble with any of the --32, --arm, or --thumb
command-line options. armasm also accepts source code that is written in pre-UAL ARM assembly
language when you assemble with CODE32 or ARM.
armasm accepts source code that is written in pre-UAL Thumb assembly language when you assemble
using the --16 command-line option, or the CODE16 directive in the source code.

Note
The pre-UAL Thumb assembly language does not support 32-bit T32 instructions.

Related references
11.1 --16 on page 11-226.
21.7 ARM or CODE32 directive on page 21-1649.
21.11 CODE16 directive on page 21-1653.
21.65 THUMB directive on page 21-1715.
11.2 --32 on page 11-227.
11.4 --arm on page 11-230.
11.59 --thumb on page 11-287.

ARM DUI0801G

6-102

6 Writing A32/T32 Assembly Language
6.2 Syntax differences between UAL and A64 assembly language

6.2

Syntax differences between UAL and A64 assembly language
UAL is the assembler syntax that is used by the A32 and T32 instruction sets. A64 assembly language is
the assembler syntax that is used by the A64 instruction set.
UAL in ARMv8 is unchanged from ARMv7.
The general statement format and operand order of A64 assembly language is the same as UAL, but
there are some differences between them. The following table describes the main differences:
Table 6-1 Syntax differences between UAL and A64 assembly language

UAL

A64

You make an instruction conditional by appending For conditionally executed instructions, you separate the condition code suffix
a condition code suffix directly to the mnemonic, from the mnemonic using a . delimiter. For example:
with no delimiter. For example:
B.EQ label

BEQ label

Apart from the IT instruction, there are no
unconditionally executed integer instructions that
use a condition code as an operand.

A64 provides several unconditionally executed instructions that use a condition
code as an operand. For these instructions, you specify the condition code to test
for in the final operand position. For example:
CSEL w1,w2,w3,EQ

The .W and .N instruction width specifiers control A64 is a fixed width 32-bit instruction set so does not support .W and .N
whether the assembler generates a 32-bit or 16-bit qualifiers.
encoding for a T32 instruction.
The core register names are R0-R15.

Qualify register names to indicate the operand data size, either 32-bit (W0-W31)
or 64-bit (X0-X31).

You can refer to registers R13, R14, and R15 as
synonyms for SP, LR, and PC respectively.

In AArch64, there is no register that is named W31 or X31. Instead, you can refer
to register 31 as SP, WZR, or XZR, depending on the context. You cannot refer to
PC either by name or number. LR is an alias for register 30.

A32 has no equivalent of the extend operators.

You can specify an extend operator in several instructions to control how a portion
of the second source register value is sign or zero extended. For example, in the
following instruction, UXTB is the extend type (zero extend, byte) and #2 is an
optional left shift amount:
ADD X1, X2, W3, UXTB #2

ARM DUI0801G

6-103

6 Writing A32/T32 Assembly Language
6.3 Register usage in subroutine calls

6.3

Register usage in subroutine calls
You use branch instructions to call and return from subroutines. The Procedure Call Standard for the
ARM Architecture defines how to use registers in subroutine calls.
A subroutine is a block of code that performs a task based on some arguments and optionally returns a
result. By convention, you use registers R0 to R3 to pass arguments to subroutines, and R0 to pass a
result back to the callers. A subroutine that requires more than four inputs uses the stack for the
additional inputs.
To call subroutines, use a branch and link instruction. The syntax is:
BL

destination

where destination is usually the label on the first instruction of the subroutine.
destination can also be a PC-relative expression.

The BL instruction:
•
•

Places the return address in the link register.
Sets the PC to the address of the subroutine.

After the subroutine code has executed you can use a BX LR instruction to return.
Note
Calls between separately assembled or compiled modules must comply with the restrictions and
conventions defined by the Procedure Call Standard for the ARM Architecture.

Example
The following example shows a subroutine, doadd, that adds the values of two arguments and returns a
result in R0:
start
stop
doadd

AREA
ENTRY
MOV
MOV
BL
MOV
LDR
SVC
ADD
BX
END

subrout, CODE, READONLY
; Name this block of code
; Mark first instruction to execute
r0, #10
; Set up parameters
r1, #3
doadd
; Call subroutine
r0, #0x18
; angel_SWIreason_ReportException
r1, =0x20026
; ADP_Stopped_ApplicationExit
#0x123456
; ARM semihosting (formerly SWI)
r0, r0, r1
; Subroutine code
lr
; Return from subroutine
; Mark end of file

Related concepts
6.17 Stack operations for nested subroutines on page 6-124.
Related references
13.20 BL on page 13-366.
13.22 BX, BXNS on page 13-370.
Related information
Procedure Call Standard for the ARM Architecture.
Procedure Call Standard for the ARM 64-bit Architecture (AArch64).

ARM DUI0801G

6-104

6 Writing A32/T32 Assembly Language
6.4 Load immediate values

6.4

Load immediate values
To represent some immediate values, you might have to use a sequence of instructions rather than a
single instruction.
A32 and T32 instructions can only be 32 bits wide. You can use a MOV or MVN instruction to load a
register with an immediate value from a range that depends on the instruction set. Certain 32-bit values
cannot be represented as an immediate operand to a single 32-bit instruction, although you can load these
values from memory in a single instruction.
You can load any 32-bit immediate value into a register with two instructions, a MOV followed by a MOVT.
Or, you can use a pseudo-instruction, MOV32, to construct the instruction sequence for you.
You can also use the LDR pseudo-instruction to load immediate values into a register.
You can include many commonly-used immediate values directly as operands within data processing
instructions, without a separate load operation. The range of immediate values that you can include as
operands in 16-bit T32 instructions is much smaller.
Related concepts
6.5 Load immediate values using MOV and MVN on page 6-106.
6.6 Load immediate values using MOV32 on page 6-109.
6.7 Load immediate values using LDR Rd, =const on page 6-110.
Related references
13.54 LDR pseudo-instruction on page 13-417.

ARM DUI0801G

6-105

6 Writing A32/T32 Assembly Language
6.5 Load immediate values using MOV and MVN

6.5

Load immediate values using MOV and MVN
The MOV and MVN instructions can write a range of immediate values to a register.
In A32:
•

MOV can load any 8-bit immediate value, giving a range of 0x0-0xFF (0-255).

It can also rotate these values by any even number.

•
•

These values are also available as immediate operands in many data processing operations, without
being loaded in a separate instruction.
MVN can load the bitwise complements of these values. The numerical values are -(n+1), where n is
the value available in MOV.
MOV can load any 16-bit number, giving a range of 0x0-0xFFFF (0-65535).

The following table shows the range of 8-bit values that can be loaded in a single A32 MOV or MVN
instruction (for data processing operations). The value to load must be a multiple of the value shown in
the Step column.
Table 6-2 A32 state immediate values (8-bit)
Step Hexadecimal MVN valuea

Notes

000000000000000000000000abcdefgh 0-255

0-0xFF

–1 to –256

0000000000000000000000abcdefgh00 0-1020

0-0x3FC

–4 to –1024

00000000000000000000abcdefgh0000 0-4080

0-0xFF0

–16 to –4096

000000000000000000abcdefgh000000 0-16320

0-0x3FC0

–64 to –16384 -

...

Binary

Decimal

...

abcdefgh000000000000000000000000 0-255 x 224 224

0-0xFF000000 1-256 x –224

cdefgh000000000000000000000000ab (bit pattern) -

(bit pattern)

See b in Note

efgh000000000000000000000000abcd (bit pattern) -

(bit pattern)

See b in Note

gh000000000000000000000000abcdef (bit pattern) -

(bit pattern)

See b in Note

The following table shows the range of 16-bit values that can be loaded in a single MOV A32 instruction:
Table 6-3 A32 state immediate values in MOV instructions
Binary

Decimal Step Hexadecimal MVN value Notes

0000000000000000abcdefghijklmnop 0-65535

0-0xFFFF

See c in Note

Note
These notes give extra information on both tables.
a
The MVN values are only available directly as operands in MVN instructions.
b
These values are available in A32 only. All the other values in this table are also available in 32bit T32 instructions.
c
These values are not available directly as operands in other instructions.
In T32:
ARM DUI0801G

6-106

6 Writing A32/T32 Assembly Language
6.5 Load immediate values using MOV and MVN

•

•
•

The 32-bit MOV instruction can load:
— Any 8-bit immediate value, giving a range of 0x0-0xFF (0-255).
— Any 8-bit immediate value, shifted left by any number.
— Any 8-bit pattern duplicated in all four bytes of a register.
— Any 8-bit pattern duplicated in bytes 0 and 2, with bytes 1 and 3 set to 0.
— Any 8-bit pattern duplicated in bytes 1 and 3, with bytes 0 and 2 set to 0.
These values are also available as immediate operands in many data processing operations, without
being loaded in a separate instruction.
The 32-bit MVN instruction can load the bitwise complements of these values. The numerical values
are -(n+1), where n is the value available in MOV.
The 32-bit MOV instruction can load any 16-bit number, giving a range of 0x0-0xFFFF (0-65535).
These values are not available as immediate operands in data processing operations.

In architectures with T32, the 16-bit T32 MOV instruction can load any immediate value in the range
0-255.
The following table shows the range of values that can be loaded in a single 32-bit T32 MOV or MVN
instruction (for data processing operations). The value to load must be a multiple of the value shown in
the Step column.
Table 6-4 32-bit T32 immediate values
Binary

Decimal

Step Hexadecimal

MVN valuea Notes

000000000000000000000000abcdefgh 0-255

0x0-0xFF

–1 to –256

00000000000000000000000abcdefgh0 0-510

0x0-0x1FE

–2 to –512

0000000000000000000000abcdefgh00 0-1020

0x0-0x3FC

–4 to –1024

...

0abcdefgh00000000000000000000000 0-255 x 223 223
abcdefgh000000000000000000000000 0-255 x

224

0x0-0x7F800000 1-256 x –223

–224

0x0-0xFF000000 1-256 x

abcdefghabcdefghabcdefghabcdefgh (bit pattern) -

0xXYXYXYXY

0xXYXYXYXY -

00000000abcdefgh00000000abcdefgh (bit pattern) -

0x00XY00XY

0xFFXYFFXY -

abcdefgh00000000abcdefgh00000000 (bit pattern) -

0xXY00XY00

0xXYFFXYFF -

00000000000000000000abcdefghijkl 0-4095

0x0-0xFFF

See b in Note

The following table shows the range of 16-bit values that can be loaded by the MOV 32-bit T32
instruction:
Table 6-5 32-bit T32 immediate values in MOV instructions
Binary

Decimal Step Hexadecimal MVN value Notes

0000000000000000abcdefghijklmnop 0-65535

0x0-0xFFFF

See c in Note

Note
These notes give extra information on the tables.
a
The MVN values are only available directly as operands in MVN instructions.
b
These values are available directly as operands in ADD, SUB, and MOV instructions, but not in MVN
or any other data processing instructions.

ARM DUI0801G

6-107

6 Writing A32/T32 Assembly Language
6.5 Load immediate values using MOV and MVN

c
These values are only available in MOV instructions.
In both A32 and T32, you do not have to decide whether to use MOV or MVN. The assembler uses
whichever is appropriate. This is useful if the value is an assembly-time variable.
If you write an instruction with an immediate value that is not available, the assembler reports the error:
Immediate n out of range for this operation.

Related concepts
6.4 Load immediate values on page 6-105.

ARM DUI0801G

6-108

6 Writing A32/T32 Assembly Language
6.6 Load immediate values using MOV32

6.6

Load immediate values using MOV32
To load any 32-bit immediate value, a pair of MOV and MOVT instructions is equivalent to a MOV32 pseudoinstruction.
Both A32 and T32 instruction sets include:
• A MOV instruction that can load any value in the range 0x00000000 to 0x0000FFFF into a register.
• A MOVT instruction that can load any value in the range 0x0000 to 0xFFFF into the most significant
half of a register, without altering the contents of the least significant half.
You can use these two instructions to construct any 32-bit immediate value in a register. Alternatively,
you can use the MOV32 pseudo-instruction. The assembler generates the MOV, MOVT instruction pair for
you.
You can also use the MOV32 instruction to load addresses into registers by using a label or any PC-relative
expression in place of an immediate value. The assembler puts a relocation directive into the object file
for the linker to resolve the address at link-time.
Related concepts
12.5 Register-relative and PC-relative expressions on page 12-302.
Related references
13.64 MOV32 pseudo-instruction on page 13-433.

ARM DUI0801G

6-109

6 Writing A32/T32 Assembly Language
6.7 Load immediate values using LDR Rd, =const

6.7

Load immediate values using LDR Rd, =const
The LDR Rd,=const pseudo-instruction generates the most efficient single instruction to load any 32-bit
number.
You can use this pseudo-instruction to generate constants that are out of range of the MOV and MVN
instructions.
The LDR pseudo-instruction generates the most efficient single instruction for the specified immediate
value:
• If the immediate value can be constructed with a single MOV or MVN instruction, the assembler
generates the appropriate instruction.
• If the immediate value cannot be constructed with a single MOV or MVN instruction, the assembler:
— Places the value in a literal pool (a portion of memory embedded in the code to hold constant
values).
— Generates an LDR instruction with a PC-relative address that reads the constant from the literal
pool.
For example:
LDR

rn, [pc, #offset to literal pool]
; load register n with one word
; from the address [pc + offset]

You must ensure that there is a literal pool within range of the LDR instruction generated by the
assembler.
Related concepts
6.8 Literal pools on page 6-111.
Related references
13.54 LDR pseudo-instruction on page 13-417.

ARM DUI0801G

6-110

6 Writing A32/T32 Assembly Language
6.8 Literal pools

6.8

Literal pools
The assembler uses literal pools to store some constant data in code sections. You can use the LTORG
directive to ensure a literal pool is within range.
The assembler places a literal pool at the end of each section. The end of a section is defined either by
the END directive at the end of the assembly or by the AREA directive at the start of the following section.
The END directive at the end of an included file does not signal the end of a section.
In large sections the default literal pool can be out of range of one or more LDR instructions. The offset
from the PC to the constant must be:
•
•

Less than 4KB in A32 or T32 code when the 32-bit LDR instruction is available, but can be in either
direction.
Forward and less than 1KB when only the 16-bit T32 LDR instruction is available.

When an LDR Rd,=const pseudo-instruction requires the immediate value to be placed in a literal pool,
the assembler:
• Checks if the value is available and addressable in any previous literal pools. If so, it addresses the
existing constant.
• Attempts to place the value in the next literal pool if it is not already available.
If the next literal pool is out of range, the assembler generates an error message. In this case you must
use the LTORG directive to place an additional literal pool in the code. Place the LTORG directive after the
failed LDR pseudo-instruction, and within the valid range for an LDR instruction.
You must place literal pools where the processor does not attempt to execute them as instructions. Place
them after unconditional branch instructions, or after the return instruction at the end of a subroutine.
Example of placing literal pools
The following example shows the placement of literal pools. The instructions listed as comments are the
A32 instructions generated by the assembler.
start
stop

func1

func2

AREA
ENTRY

Loadcon, CODE, READONLY
; Mark first instruction to execute

BL
BL

func1
func2

; Branch to first subroutine
; Branch to second subroutine

MOV
LDR
SVC

r0, #0x18
r1, =0x20026
#0x123456

; angel_SWIreason_ReportException
; ADP_Stopped_ApplicationExit
; ARM semihosting (formerly SWI)

LDR
LDR

r0, =42
r1, =0x55555555

LDR
BX
LTORG

r2, =0xFFFFFFFF
lr

;
;
;
;

LDR

r3, =0x55555555

; LDR r4, =0x66666666
BX
LargeTable
SPACE
END

lr
4200

=> MOV R0, #42
=> LDR R1, [PC, #offset to
Literal Pool 1]
=> MVN R2, #0

; Literal Pool 1 contains
; literal Ox55555555
;
;
;
;
;

=> LDR R3, [PC, #offset to
Literal Pool 1]
If this is uncommented it
fails, because Literal Pool 2
is out of reach

;
;
;
;
;
;

Starting at the current location,
clears a 4200 byte area of memory
to zero
Literal Pool 2 is inserted here,
but is out of range of the LDR
pseudo-instruction that needs it

Related concepts
6.7 Load immediate values using LDR Rd, =const on page 6-110.

ARM DUI0801G

6-111

6 Writing A32/T32 Assembly Language
6.8 Literal pools

Related references
21.50 LTORG on page 21-1695.

ARM DUI0801G

6-112

6 Writing A32/T32 Assembly Language
6.9 Load addresses into registers

6.9

Load addresses into registers
It is often necessary to load an address into a register. There are several ways to do this.
For example, you might have to load the address of a variable, a string literal, or the start location of a
jump table.
Addresses are normally expressed as offsets from a label, or from the current PC or other register.
You can load an address into a register either:
• Using the instruction ADR.
• Using the pseudo-instruction ADRL.
• Using the pseudo-instruction MOV32.
• From a literal pool using the pseudo-instruction LDR Rd,=Label.
Related concepts
6.10 Load addresses to a register using ADR on page 6-114.
6.11 Load addresses to a register using ADRL on page 6-116.
6.6 Load immediate values using MOV32 on page 6-109.
6.12 Load addresses to a register using LDR Rd, =label on page 6-117.

ARM DUI0801G

6-113

6 Writing A32/T32 Assembly Language
6.10 Load addresses to a register using ADR

6.10

Load addresses to a register using ADR
The ADR instruction loads an address within a certain range, without performing a data load.
ADR accepts a PC-relative expression, that is, a label with an optional offset where the address of the label

is relative to the PC.
Note
The label used with ADR must be within the same code section. The assembler faults references to labels
that are out of range in the same section.
The available range of addresses for the ADR instruction depends on the instruction set and encoding:
A32
Any value that can be produced by rotating an 8-bit value right by any even number of bits
within a 32-bit word. The range is relative to the PC.
32-bit T32 encoding
±4095 bytes to a byte, halfword, or word-aligned address.
16-bit T32 encoding
0 to 1020 bytes. label must be word-aligned. You can use the ALIGN directive to ensure this.
Example of a jump table implementation with ADR
This example shows A32 code that implements a jump table. Here, the ADR instruction loads the address
of the jump table.

num
start

stop

AREA

Jump, CODE, READONLY ; Name this block of code

ARM
EQU
ENTRY

MOV
MOV
MOV
BL

r0, #0
r1, #3
r2, #2
arithfunc

MOV
LDR
SVC
arithfunc
CMP

r0, #0x18
r1, =0x20026
#0x123456

BXHS
ADR
LDR
JumpTable
DCD
DCD
DoAdd
ADD
BX
DoSub
SUB
BX
END

lr
r3, JumpTable
pc, [r3,r0,LSL#2]

r0, #num

;
;
;
;
;

Following code is A32 code
Number of entries in jump table
Mark first instruction to execute
First instruction to call
Set up the three arguments

; Call the function
;
;
;
;
;
;
;
;
;

angel_SWIreason_ReportException
ADP_Stopped_ApplicationExit
ARM semihosting (formerly SWI)
Label the function
Treat function code as unsigned
integer
If code is >= num then return
Load address of jump table
Jump to the appropriate routine

DoAdd
DoSub
r0, r1, r2
lr

; Operation 0
; Return

r0, r1, r2
lr

; Operation 1
; Return
; Mark the end of this file

In this example, the function arithfunc takes three arguments and returns a result in R0. The first
argument determines the operation to be carried out on the second and third arguments:
argument1=0
Result = argument2 + argument3.
argument1=1
Result = argument2 – argument3.
The jump table is implemented with the following instructions and assembler directives:

ARM DUI0801G

6-114

6 Writing A32/T32 Assembly Language
6.10 Load addresses to a register using ADR

EQU

Is an assembler directive. You use it to give a value to a symbol. In this example, it assigns the
value 2 to num. When num is used elsewhere in the code, the value 2 is substituted. Using EQU in
this way is similar to using #define to define a constant in C.
DCD

Declares one or more words of store. In this example, each DCD stores the address of a routine
that handles a particular clause of the jump table.
LDR

The LDR PC,[R3,R0,LSL#2] instruction loads the address of the required clause of the jump
table into the PC. It:
• Multiplies the clause number in R0 by 4 to give a word offset.
• Adds the result to the address of the jump table.
• Loads the contents of the combined address into the PC.
Related concepts
6.12 Load addresses to a register using LDR Rd, =label on page 6-117.
6.11 Load addresses to a register using ADRL on page 6-116.
Related references
13.10 ADR (PC-relative) on page 13-349.

ARM DUI0801G

6-115

6 Writing A32/T32 Assembly Language
6.11 Load addresses to a register using ADRL

6.11

Load addresses to a register using ADRL
The ADRL pseudo-instruction loads an address within a certain range, without performing a data load. The
range is wider than that of the ADR instruction.
ADRL accepts a PC-relative expression, that is, a label with an optional offset where the address of the

label is relative to the current PC.
Note
The label used with ADRL must be within the same code section. The assembler faults references to labels
that are out of range in the same section.
The assembler converts an ADRL rn,label pseudo-instruction by generating:
• Two data processing instructions that load the address, if it is in range.
• An error message if the address cannot be constructed in two instructions.
The available range depends on the instruction set and encoding.
A32
Any value that can be generated by two ADD or two SUB instructions. That is, any value that can
be produced by the addition of two values, each of which is 8 bits rotated right by any even
number of bits within a 32-bit word. The range is relative to the PC.
32-bit T32 encoding
±1MB to a byte, halfword, or word-aligned address.
16-bit T32 encoding
ADRL is not available.
Related concepts
6.10 Load addresses to a register using ADR on page 6-114.
6.12 Load addresses to a register using LDR Rd, =label on page 6-117.

ARM DUI0801G

6-116

6 Writing A32/T32 Assembly Language
6.12 Load addresses to a register using LDR Rd, =label

6.12

Load addresses to a register using LDR Rd, =label
The LDR Rd,=label pseudo-instruction places an address in a literal pool and then loads the address into
a register.
LDR Rd,=label can load any 32-bit numeric value into a register. It also accepts PC-relative expressions

such as labels, and labels with offsets.
The assembler converts an LDR Rd,=label pseudo-instruction by:
• Placing the address of label in a literal pool (a portion of memory embedded in the code to hold
constant values).
• Generating a PC-relative LDR instruction that reads the address from the literal pool, for example:
LDR rn [pc, #offset_to_literal_pool]
; load register n with one word
; from the address [pc + offset]

You must ensure that the literal pool is within range of the LDR pseudo-instruction that needs to access
it.
Example of loading using LDR Rd, =label
The following example shows a section with two literal pools. The final LDR pseudo-instruction needs to
access the second literal pool, but it is out of range. Uncommenting this line causes the assembler to
generate an error.
The instructions listed in the comments are the A32 instructions generated by the assembler.
start
stop

func1

func2

AREA
ENTRY

LDRlabel, CODE, READONLY
; Mark first instruction to execute

BL
BL

func1
func2

; Branch to first subroutine
; Branch to second subroutine

MOV
LDR
SVC

r0, #0x18
r1, =0x20026
#0x123456

; angel_SWIreason_ReportException
; ADP_Stopped_ApplicationExit
; ARM semihosting (formerly SWI)

LDR
LDR
LDR
BX
LTORG

r0, =start
r1, =Darea + 12
r2, =Darea + 6000
lr

;
;
;
;
;

=> LDR r0,[PC, #offset into Literal Pool 1]
=> LDR r1,[PC, #offset into Literal Pool 1]
=> LDR r2,[PC, #offset into Literal Pool 1]
Return
Literal Pool 1

LDR

r3, =Darea + 6000

;
;
r4, =Darea + 6004 ;
;
lr
;
8000
;
;
;
;
;
;

=> LDR r3,[PC, #offset into Literal Pool 1]
(sharing with previous literal)
If uncommented, produces an error because
Literal Pool 2 is out of range.
Return
Starting at the current location, clears
a 8000 byte area of memory to zero.
Literal Pool 2 is automatically inserted
after the END directive.
It is out of range of all the LDR
pseudo-instructions in this example.

; LDR
Darea

BX
SPACE
END

Example of string copy
The following example shows an A32 code routine that overwrites one string with another. It uses the
LDR pseudo-instruction to load the addresses of the two strings from a data section. The following are
particularly significant:
DCB

The DCB directive defines one or more bytes of store. In addition to integer values, DCB accepts
quoted strings. Each character of the string is placed in a consecutive byte.

ARM DUI0801G

6-117

6 Writing A32/T32 Assembly Language
6.12 Load addresses to a register using LDR Rd, =label

LDR, STR

The LDR and STR instructions use post-indexed addressing to update their address registers. For
example, the instruction:
LDRB

r2,[r1],#1

loads R2 with the contents of the address pointed to by R1 and then increments R1 by 1.
The example also shows how, unlike the ADR and ADRL pseudo-instructions, you can use the LDR pseudoinstruction with labels that are outside the current section. The assembler places a relocation directive in
the object code when the source file is assembled. The relocation directive instructs the linker to resolve
the address at link time. The address remains valid wherever the linker places the section containing the
LDR and the literal pool.
start

stop

strcopy

srcstr
dststr

AREA
ENTRY

StrCopy, CODE, READONLY
; Mark first instruction to execute

LDR
LDR
BL

r1, =srcstr
r0, =dststr
strcopy

; Pointer to first string
; Pointer to second string
; Call subroutine to do copy

MOV
LDR
SVC

r0, #0x18
r1, =0x20026
#0x123456

; angel_SWIreason_ReportException
; ADP_Stopped_ApplicationExit
; ARM semihosting (formerly SWI)

LDRB
STRB
CMP
BNE
MOV
AREA
DCB
DCB
END

r2, [r1],#1
; Load byte and update address
r2, [r0],#1
; Store byte and update address
r2, #0
; Check for zero terminator
strcopy
; Keep going if not
pc,lr
; Return
Strings, DATA, READWRITE
"First string - source",0
"Second string - destination",0

Related concepts
6.11 Load addresses to a register using ADRL on page 6-116.
6.7 Load immediate values using LDR Rd, =const on page 6-110.
Related references
13.54 LDR pseudo-instruction on page 13-417.
21.15 DCB on page 21-1657.

ARM DUI0801G

6-118

6 Writing A32/T32 Assembly Language
6.13 Other ways to load and store registers

6.13

Other ways to load and store registers
You can load and store registers using LDR, STR and MOV (register) instructions.
You can load any 32-bit value from memory into a register with an LDR data load instruction. To store
registers into memory you can use the STR data store instruction.
You can use the MOV instruction to move any 32-bit data from one register to another.
Related concepts
6.14 Load and store multiple register instructions on page 6-120.
6.15 Load and store multiple register instructions in A32 and T32 on page 6-121.
Related references
13.63 MOV on page 13-431.

ARM DUI0801G

6-119

6 Writing A32/T32 Assembly Language
6.14 Load and store multiple register instructions

6.14

Load and store multiple register instructions
The A32 and T32 instruction sets include instructions that load and store multiple registers. These
instructions can provide a more efficient way of transferring the contents of several registers to and from
memory than using single register loads and stores.
Multiple register transfer instructions are most often used for block copy and for stack operations at
subroutine entry and exit. The advantages of using a multiple register transfer instruction instead of a
series of single data transfer instructions include:
• Smaller code size.
• A single instruction fetch overhead, rather than many instruction fetches.
• On uncached ARM processors, the first word of data transferred by a load or store multiple is always
a nonsequential memory cycle, but all subsequent words transferred can be sequential memory
cycles. Sequential memory cycles are faster in most systems.
Note
The lowest numbered register is transferred to or from the lowest memory address accessed, and the
highest numbered register to or from the highest address accessed. The order of the registers in the
register list in the instructions makes no difference.
You can use the --diag_warning 1206 assembler command line option to check that registers in register
lists are specified in increasing order.

Related concepts
6.15 Load and store multiple register instructions in A32 and T32 on page 6-121.
6.16 Stack implementation using LDM and STM on page 6-122.
6.17 Stack operations for nested subroutines on page 6-124.
6.18 Block copy with LDM and STM on page 6-125.

ARM DUI0801G

6-120

6 Writing A32/T32 Assembly Language
6.15 Load and store multiple register instructions in A32 and T32

6.15

Load and store multiple register instructions in A32 and T32
Instructions are available in both the A32 and T32 instruction sets to load and store multiple registers.
They are:
LDM

Load Multiple registers.
STM

Store Multiple registers.
PUSH

Store multiple registers onto the stack and update the stack pointer.
POP

Load multiple registers off the stack, and update the stack pointer.
In LDM and STM instructions:
• The list of registers loaded or stored can include:
— In A32 instructions, any or all of R0-R12, SP, LR, and PC.
— In 32-bit T32 instructions, any or all of R0-R12, and optionally LR or PC (LDM only) with some
restrictions.
— In 16-bit T32 instructions, any or all of R0-R7.
• The address must be word-aligned. It can be:
— Incremented after each transfer.
— Incremented before each transfer (A32 instructions only).
— Decremented after each transfer (A32 instructions only).
— Decremented before each transfer (not in 16-bit encoded T32 instructions).
• The base register can be either:
— Updated to point to the next block of data in memory.
— Left as it was before the instruction.
When the base register is updated to point to the next block in memory, this is called writeback, that is,
the adjusted address is written back to the base register.
In PUSH and POP instructions:
• The stack pointer (SP) is the base register, and is always updated.
• The address is incremented after each transfer in POP instructions, and decremented before each
transfer in PUSH instructions.
• The list of registers loaded or stored can include:
— In A32 instructions, any or all of R0-R12, SP, LR, and PC.
— In 32-bit T32 instructions, any or all of R0-R12, and optionally LR or PC (POP only) with some
restrictions.
— In 16-bit T32 instructions, any or all of R0-R7, and optionally LR (PUSH only) or PC (POP only).
Note
Use of SP in the list of registers in these A32 instructions is deprecated.
A32 STM and PUSH instructions that use PC in the list of registers, and A32 LDM and POP instructions that
use both PC and LR in the list of registers are deprecated.

Related concepts
6.14 Load and store multiple register instructions on page 6-120.

ARM DUI0801G

6-121

6 Writing A32/T32 Assembly Language
6.16 Stack implementation using LDM and STM

6.16

Stack implementation using LDM and STM
You can use the LDM and STM instructions to implement pop and push operations respectively. You use a
suffix to indicate the stack type.
The load and store multiple instructions can update the base register. For stack operations, the base
register is usually the stack pointer, SP. This means that you can use these instructions to implement push
and pop operations for any number of registers in a single instruction.
The load and store multiple instructions can be used with several types of stack:
Descending or ascending
The stack grows downwards, starting with a high address and progressing to a lower one (a
descending stack), or upwards, starting from a low address and progressing to a higher address
(an ascending stack).
Full or empty
The stack pointer can either point to the last item in the stack (a full stack), or the next free space
on the stack (an empty stack).
To make it easier for the programmer, stack-oriented suffixes can be used instead of the increment or
decrement, and before or after suffixes. The following table shows the stack-oriented suffixes and their
equivalent addressing mode suffixes for load and store instructions:
Table 6-6 Stack-oriented suffixes and equivalent addressing mode suffixes
Stack-oriented suffix

For store or push instructions For load or pop instructions

FD (Full Descending stack)

DB (Decrement Before)

IA (Increment After)

FA (Full Ascending stack)

IB (Increment Before)

DA (Decrement After)

ED (Empty Descending stack) DA (Decrement After)

IB (Increment Before)

EA (Empty Ascending stack)

DB (Decrement Before)

IA (Increment After)

The following table shows the load and store multiple instructions with the stack-oriented suffixes for the
various stack types:
Table 6-7 Suffixes for load and store multiple instructions
Stack type

Store

Full descending

STMFD (STMDB, Decrement Before) LDMFD (LDM, increment after)

Full ascending

STMFA (STMIB, Increment Before)

LDMFA (LDMDA, Decrement After)

Empty descending STMED (STMDA, Decrement After)

LDMED (LDMIB, Increment Before)

Empty ascending

STMEA (STM, increment after)

Load

LDMEA (LDMDB, Decrement Before)

For example:
STMFD
LDMFD

sp!, {r0-r5}
sp!, {r0-r5}

; Push onto a Full Descending Stack
; Pop from a Full Descending Stack

Note
The Procedure Call Standard for the ARM Architecture (AAPCS), and armclang always use a full
descending stack.
The PUSH and POP instructions assume a full descending stack. They are the preferred synonyms for
STMDB and LDM with writeback.

ARM DUI0801G

6-122

6 Writing A32/T32 Assembly Language
6.16 Stack implementation using LDM and STM

Related concepts
6.14 Load and store multiple register instructions on page 6-120.
Related references
13.49 LDM on page 13-407.
Related information
Procedure Call Standard for the ARM Architecture.

ARM DUI0801G

6-123

6 Writing A32/T32 Assembly Language
6.17 Stack operations for nested subroutines

6.17

Stack operations for nested subroutines
Stack operations can be very useful at subroutine entry and exit to avoid losing register contents if other
subroutines are called.
At the start of a subroutine, any working registers required can be stored on the stack, and at exit they
can be popped off again.
In addition, if the link register is pushed onto the stack at entry, additional subroutine calls can be made
safely without causing the return address to be lost. If you do this, you can also return from a subroutine
by popping the PC off the stack at exit, instead of popping the LR and then moving that value into the
PC. For example:
subroutine

PUSH
; code
BL
; code
POP

{r5-r7,lr} ; Push work registers and lr
somewhere_else
{r5-r7,pc} ; Pop work registers and pc

Related concepts
6.3 Register usage in subroutine calls on page 6-104.
6.14 Load and store multiple register instructions on page 6-120.
Related information
Procedure Call Standard for the ARM Architecture.
Procedure Call Standard for the ARM 64-bit Architecture (AArch64).

ARM DUI0801G

6-124

6 Writing A32/T32 Assembly Language
6.18 Block copy with LDM and STM

6.18

Block copy with LDM and STM
You can sometimes make code more efficient by using LDM and STM instead of LDR and STR instructions.
Example of block copy without LDM and STM
The following example is an A32 code routine that copies a set of words from a source location to a
destination a single word at a time:
AREA Word, CODE, READONLY
EQU
20
ENTRY

; name the block of code
; set number of words to be copied
; mark the first instruction called

LDR
LDR
MOV

r0, =src
r1, =dst
r2, #num

; r0 = pointer to source block
; r1 = pointer to destination block
; r2 = number of words to copy

LDR
STR
SUBS
BNE

r3, [r0], #4
r3, [r1], #4
r2, r2, #1
wordcopy

;
;
;
;

MOV
LDR
SVC

r0, #0x18
r1, =0x20026
#0x123456

; angel_SWIreason_ReportException
; ADP_Stopped_ApplicationExit
; ARM semihosting (formerly SWI)

AREA
DCD
DCD
END

BlockData, DATA, READWRITE
1,2,3,4,5,6,7,8,1,2,3,4,5,6,7,8,1,2,3,4
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

num
start

wordcopy

stop

src
dst

load a word from the source and
store it to the destination
decrement the counter
... copy more

You can make this module more efficient by using LDM and STM for as much of the copying as possible.
Eight is a sensible number of words to transfer at a time, given the number of available registers. You can
find the number of eight-word multiples in the block to be copied (if R2 = number of words to be copied)
using:
MOVS

r3, r2, LSR #3

; number of eight word multiples

You can use this value to control the number of iterations through a loop that copies eight words per
iteration. When there are fewer than eight words left, you can find the number of words left (assuming
that R2 has not been corrupted) using:
ANDS

r2, r2, #7

Example of block copy using LDM and STM
The following example lists the block copy module rewritten to use LDM and STM for copying:
num
start

AREA
EQU
ENTRY

LDR
LDR
MOV
MOV
blockcopy
MOVS
BEQ
PUSH
octcopy
LDM
STM
SUBS
BNE
POP
copywords
ANDS
BEQ
wordcopy
LDR
STR
SUBS
BNE
stop
ARM DUI0801G

Block, CODE, READONLY ; name this block of code
20
; set number of words to be copied
; mark the first instruction called
r0,
r1,
r2,
sp,

=src
=dst
#num
#0x400

;
;
;
;

r0 = pointer to source block
r1 = pointer to destination block
r2 = number of words to copy
Set up stack pointer (sp)

r3,r2, LSR #3
copywords
{r4-r11}

; Number of eight word multiples
; Fewer than eight words to move?
; Save some working registers

r0!, {r4-r11}
r1!, {r4-r11}
r3, r3, #1
octcopy
{r4-r11}

;
;
;
;
;
;

r2, r2, #7
stop

; Number of odd words to copy
; No words left to copy?

r3, [r0], #4
r3, [r1], #4
r2, r2, #1
wordcopy

;
;
;
;

Load 8 words from the source
and put them at the destination
Decrement the counter
... copy more
Don't require these now - restore
originals

Load a word from the source and
store it to the destination
Decrement the counter
... copy more

6-125

6 Writing A32/T32 Assembly Language
6.18 Block copy with LDM and STM

src
dst

MOV
LDR
SVC

r0, #0x18
r1, =0x20026
#0x123456

; angel_SWIreason_ReportException
; ADP_Stopped_ApplicationExit
; ARM semihosting (formerly SWI)

AREA
DCD
DCD
END

BlockData, DATA, READWRITE
1,2,3,4,5,6,7,8,1,2,3,4,5,6,7,8,1,2,3,4
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

Note
The purpose of this example is to show the use of the LDM and STM instructions. There are other ways to
perform bulk copy operations, the most efficient of which depends on many factors and is outside the
scope of this document.

Related information
What is the fastest way to copy memory on a Cortex-A8?.

ARM DUI0801G

6-126

6 Writing A32/T32 Assembly Language
6.19 Memory accesses

6.19

Memory accesses
Many load and store instructions support different addressing modes.
Offset addressing
The offset value is applied to an address obtained from the base register. The result is used as the
address for the memory access. The base register is unchanged. The assembly language syntax
for this mode is:
[Rn, offset]

Pre-indexed addressing
The offset value is applied to an address obtained from the base register. The result is used as the
address for the memory access, and written back into the base register. The assembly language
syntax for this mode is:
[Rn, offset]!

Post-indexed addressing
The address obtained from the base register is used, unchanged, as the address for the memory
access. The offset value is applied to the address, and written back into the base register. The
assembly language syntax for this mode is:
[Rn], offset

In each case, Rn is the base register and offset can be:
• An immediate constant.
• An index register, Rm.
• A shifted index register, such as Rm, LSL #shift.
Related concepts
8.15 Address alignment in A32/T32 code on page 8-178.
Related references
3.4 Registers in AArch32 state on page 3-67.

ARM DUI0801G

6-127

6 Writing A32/T32 Assembly Language
6.20 The Read-Modify-Write operation

6.20

The Read-Modify-Write operation
The read-modify-write operation ensures that you modify only the specific bits in a system register that
you want to change.
Individual bits in a system register control different system functionality. Modifying the wrong bits in a
system register might cause your program to behave incorrectly.
VMRS
BIC
ORR
VMSR

r10,FPSCR
r10,r10,#0x00370000
r10,r10,#0x00030000
FPSCR,r10

;
;
;
;

copy FPSCR into the general-purpose r10
clear STRIDE bits[21:20] and LEN bits[18:16]
set bits[17:16] (STRIDE =1 and LEN = 4)
copy r10 back into FPSCR

To read-modify-write a system register, the instruction sequence is:
1. The first instruction copies the value from the target system register to a temporary general-purpose
register.
2. The next one or more instructions modify the required bits in the general-purpose register. This can
be one or both of:
• BIC to clear to 0 only the bits that must be cleared.
• ORR to set to 1 only the bits that must be set.
3. The final instruction writes the value from the general-purpose register to the target system register.
Related concepts
3.6 Register accesses in AArch32 state on page 3-70.
Related references
3.10 The Q flag in AArch32 state on page 3-74.
13.68 MRS (PSR to general-purpose register) on page 13-437.
13.71 MSR (general-purpose register to PSR) on page 13-441.
14.72 VMRS on page 14-680.

ARM DUI0801G

6-128

6 Writing A32/T32 Assembly Language
6.21 Optional hash with immediate constants

6.21

Optional hash with immediate constants
You do not have to specify a hash before an immediate constant in any instruction syntax.
This applies to A32, T32, Advanced SIMD, and floating-point instructions. For example, the following
are valid instructions:
BKPT 100
MOVT R1, 256
VCEQ.I8 Q1, Q2, 0

By default, the assembler warns if you do not specify a hash:
WARNING: A1865W: '#' not seen before constant expression.

You can suppressed this with --diag_suppress=1865.
If you use the assembly code with another assembler, you are advised to use the # before all immediates.
The disassembler always shows the # for clarity.
Related references
Chapter 13 A32 and T32 Instructions on page 13-327.
Chapter 14 Advanced SIMD Instructions (32-bit) on page 14-600.

ARM DUI0801G

6-129

6 Writing A32/T32 Assembly Language
6.22 Use of macros

6.22

Use of macros
A macro definition is a block of code enclosed between MACRO and MEND directives. It defines a name that
you can use as a convenient alternative to repeating the block of code.
The main uses for a macro are:
• To make it easier to follow the logic of the source code by replacing a block of code with a single
meaningful name.
• To avoid repeating a block of code several times.
Related concepts
6.23 Test-and-branch macro example on page 6-131.
6.24 Unsigned integer division macro example on page 6-132.
Related references
21.51 MACRO and MEND on page 21-1696.

ARM DUI0801G

6-130

6 Writing A32/T32 Assembly Language
6.23 Test-and-branch macro example

6.23

Test-and-branch macro example
You can use a macro to perform a test-and-branch operation.
In A32 code, a test-and-branch operation requires two instructions to implement.
You can define a macro such as this:
$label
$label

MACRO
TestAndBranch $dest, $reg, $cc
CMP
$reg, #0
B$cc
$dest
MEND

The line after the MACRO directive is the macro prototype statement. This defines the name
(TestAndBranch) you use to invoke the macro. It also defines parameters ($label, $dest, $reg, and
$cc). Unspecified parameters are substituted with an empty string. For this macro you must give values
for $dest, $reg and $cc to avoid syntax errors. The assembler substitutes the values you give into the
code.
This macro can be invoked as follows:
test
NonZero

TestAndBranch
...
...

NonZero, r0, NE

After substitution this becomes:
test

NonZero

CMP
BNE
...
...

r0, #0
NonZero

Related concepts
6.22 Use of macros on page 6-130.
6.24 Unsigned integer division macro example on page 6-132.
12.10 Numeric local labels on page 12-307.

ARM DUI0801G

6-131

6 Writing A32/T32 Assembly Language
6.24 Unsigned integer division macro example

6.24

Unsigned integer division macro example
You can use a macro to perform unsigned integer division.
The macro takes the following parameters:
$Bot

The register that holds the divisor.
$Top

The register that holds the dividend before the instructions are executed. After the instructions
are executed, it holds the remainder.
$Div

The register where the quotient of the division is placed. It can be NULL ("") if only the
remainder is required.
$Temp

A temporary register used during the calculation.
Example unsigned integer division with a macro
$Lab

$Lab
90

MACRO
DivMod $Div,$Top,$Bot,$Temp
ASSERT $Top <> $Bot
;
ASSERT $Top <> $Temp
;
ASSERT $Bot <> $Temp
;
IF
"$Div" <> ""
ASSERT $Div <> $Top
;
ASSERT $Div <> $Bot
;
ASSERT $Div <> $Temp
;
ENDIF
MOV
CMP
MOVLS
CMP
BLS
IF

$Temp,
$Temp,
$Temp,
$Temp,
%b90
"$Div"

$Bot
$Top, LSR #1
$Temp, LSL #1
$Top, LSR #1

Produce an error message if the
registers supplied are
not all different
These three only matter if $Div
is not null ("")

; Put divisor in $Temp
; double it until
; 2 * $Temp > $Top
;
;
;
;

The b means search backwards
Omit next instruction if $Div
is null
Initialize quotient

;
;
;
;

Can we subtract $Temp?
If we can, do so
Omit next instruction if $Div
is null
; Double $Div

<> ""

MOV
$Div, #0
ENDIF
CMP
$Top, $Temp
SUBCS
$Top, $Top,$Temp
IF
"$Div" <> ""
ADC
$Div, $Div, $Div
ENDIF
MOV
$Temp, $Temp, LSR #1
CMP
$Temp, $Bot
BHS
%b91
MEND

; Halve $Temp,
; and loop until
; less than divisor

The macro checks that no two parameters use the same register. It also optimizes the code produced if
only the remainder is required.
To avoid multiple definitions of labels if DivMod is used more than once in the assembler source, the
macro uses numeric local labels (90, 91).
The following example shows the code that this macro produces if it is invoked as follows:
ratio

DivMod

R0,R5,R4,R2

Output from the example division macro

ratio
90

ARM DUI0801G

ASSERT
ASSERT
ASSERT
ASSERT
ASSERT
ASSERT

r5
r5
r4
r0
r0
r0

<>
<>
<>
<>
<>
<>

MOV
CMP
MOVLS
CMP

r2,
r2,
r2,
r2,

r4
r2
r2
r5
r4
r2

r4
r5, LSR #1
r2, LSL #1
r5, LSR #1

;
;
;
;
;
;

Produce an error if the
registers supplied are
not all different
These three only matter if $Div
is not null ("")

; Put divisor in $Temp
; double it until
; 2 * r2 > r5

6-132

6 Writing A32/T32 Assembly Language
6.24 Unsigned integer division macro example

BLS
MOV
CMP
SUBCS
ADC
MOV
CMP
BHS

%b90
r0, #0
r5, r2
r5, r5, r2
r0, r0, r0
r2, r2, LSR #1
r2, r4
%b91

;
;
;
;
;
;
;
;

The b means search backwards
Initialize quotient
Can we subtract r2?
If we can, do so
Double r0
Halve r2,
and loop until
less than divisor

Related concepts
6.22 Use of macros on page 6-130.
6.23 Test-and-branch macro example on page 6-131.
12.10 Numeric local labels on page 12-307.

ARM DUI0801G

6-133

6 Writing A32/T32 Assembly Language
6.25 Instruction and directive relocations

6.25

Instruction and directive relocations
The assembler can embed relocation directives in object files to indicate labels with addresses that are
unknown at assembly time. The assembler can relocate several types of instruction.
A relocation is a directive embedded in the object file that enables source code to refer to a label whose
target address is unknown or cannot be calculated at assembly time. The assembler emits a relocation in
the object file, and the linker resolves this to the address where the target is placed.
The assembler relocates the data directives DCB, DCW, DCWU, DCD, and DCDU if their syntax contains an
external symbol, that is a symbol declared using IMPORT or EXTERN. This causes the bottom 8, 16, or 32
bits of the address to be used at link-time.
The REQUIRE directive emits a relocation to signal to the linker that the target label must be present if the
current section is present.
The assembler is permitted to emit a relocation for these instructions:
LDR (PC-relative)

All A32 and T32 instructions, except the T32 doubleword instruction, can be relocated.
PLD, PLDW, and PLI

All A32 and T32 instructions can be relocated.
B, BL, and BLX

All A32 and T32 instructions can be relocated.
CBZ and CBNZ

All T32 instructions can be relocated but this is discouraged because of the limited branch range
of these instructions.
LDC and LDC2

Only A32 instructions can be relocated.
VLDR

Only A32 instructions can be relocated.
The assembler emits a relocation for these instructions if the label used meets any of the following
requirements, as appropriate for the instruction type:
•
•
•

The label is WEAK.
The label is not in the same AREA.
The label is external to the object (IMPORT or EXTERN).

For B, BL, and BX instructions, the assembler emits a relocation also if:
• The label is a function.
• The label is exported using EXPORT or GLOBAL.
Note
You can use the RELOC directive to control the relocation at a finer level, but this requires knowledge of
the ABI.

Example
IMPORT sym
DCW sym

;
;
;
;

sym is an external symbol
Because DCW only outputs 16 bits, only the lower
16 bits of the address of sym are inserted at
link-time.

Related references
21.6 AREA on page 21-1646.
21.27 EXPORT or GLOBAL on page 21-1669.
21.45 IMPORT and EXTERN on page 21-1689.
21.58 REQUIRE on page 21-1707.
ARM DUI0801G

6-134

6 Writing A32/T32 Assembly Language
6.25 Instruction and directive relocations

21.57 RELOC on page 21-1706.
21.15 DCB on page 21-1657.
21.16 DCD and DCDU on page 21-1658.
21.22 DCW and DCWU on page 21-1664.
13.51 LDR (PC-relative) on page 13-411.
13.10 ADR (PC-relative) on page 13-349.
13.79 PLD, PLDW, and PLI on page 13-453.
13.15 B on page 13-359.
13.24 CBZ and CBNZ on page 13-373.
13.48 LDC and LDC2 on page 13-405.
14.52 VLDR on page 14-660.
Related information
ELF for the ARM Architecture.

ARM DUI0801G

6-135

6 Writing A32/T32 Assembly Language
6.26 Symbol versions

6.26

Symbol versions
The ARM linker conforms to the Base Platform ABI for the ARM Architecture (BPABI) and supports
the GNU-extended symbol versioning model.
To add a symbol version to an existing symbol, you must define a version symbol at the same address. A
version symbol is of the form:
• name@ver if ver is a non default version of name.
• name@@ver if ver is the default version of name.
The version symbols must be enclosed in vertical bars.
For example, to define a default version:
|my_versioned_symbol@@ver2|
my_asm_function PROC
...
BX lr
ENDP

; Default version

To define a non default version:
|my_versioned_symbol@ver1|
; Non default version
my_old_asm_function
PROC
...
BX lr
ENDP

Related information
Base Platform ABI for the ARM Architecture.
Accessing and managing symbols with armlink.

ARM DUI0801G

6-136

6 Writing A32/T32 Assembly Language
6.27 Frame directives

6.27

Frame directives
Frame directives provide information in object files that enables debugging and profiling of assembly
language functions.
You must use frame directives to describe the way that your code uses the stack if you want to be able to
do either of the following:
•
•

Debug your application using stack unwinding.
Use either flat or call-graph profiling.

The assembler uses frame directives to insert DWARF debug frame information into the object file in
ELF format that it produces. This information is required by a debugger for stack unwinding and for
profiling.
Be aware of the following:
• Frame directives do not affect the code produced by the assembler.
• The assembler does not validate the information in frame directives against the instructions emitted.
Related concepts
6.28 Exception tables and Unwind tables on page 6-138.
Related references
21.3 About frame directives on page 21-1642.
Related information
Procedure Call Standard for the ARM Architecture.

ARM DUI0801G

6-137

6 Writing A32/T32 Assembly Language
6.28 Exception tables and Unwind tables

6.28

Exception tables and Unwind tables
You use FRAME directives to enable the assembler to generate unwind tables.
Note
Not supported for AArch64 state.
Exception tables are necessary to handle exceptions thrown by functions in high-level languages such as
C++. Unwind tables contain debug frame information which is also necessary for the handling of such
exceptions. An exception can only propagate through a function with an unwind table.
An assembly language function is code enclosed by either PROC and ENDP or FUNC and ENDFUNC
directives. Functions written in C++ have unwind information by default. However, for assembly
language functions that are called from C++ code, you must ensure that there are exception tables and
unwind tables to enable the exceptions to propagate through them.
An exception cannot propagate through a function with a nounwind table. The exception handling
runtime environment terminates the program if it encounters a nounwind table during exception
processing.
The assembler can generate nounwind table entries for all functions and non-functions. The assembler
can generate an unwind table for a function only if the function contains sufficient FRAME directives to
describe the use of the stack within the function. To be able to create an unwind table for a function, each
POP or PUSH instruction must be followed by a FRAME POP or FRAME PUSH directive respectively.
Functions must conform to the conditions set out in the Exception Handling ABI for the ARM
Architecture (EHABI), section 9.1 Constraints on Use. If the assembler cannot generate an unwind table
it generates a nounwind table.
Related concepts
6.27 Frame directives on page 6-137.
Related references
21.3 About frame directives on page 21-1642.
11.26 --exceptions, --no_exceptions on page 11-254.
11.27 --exceptions_unwind, --no_exceptions_unwind on page 11-255.
21.39 FRAME UNWIND ON on page 21-1682.
21.40 FRAME UNWIND OFF on page 21-1683.
21.41 FUNCTION or PROC on page 21-1684.
21.24 ENDFUNC or ENDP on page 21-1666.
Related information
Exception Handling ABI for the ARM Architecture.

ARM DUI0801G

6-138

Chapter 7
Condition Codes

Describes condition codes and conditional execution of A64, A32, and T32 code.
It contains the following sections:
• 7.1 Conditional instructions on page 7-140.
• 7.2 Conditional execution in A32 code on page 7-141.
• 7.3 Conditional execution in T32 code on page 7-142.
• 7.4 Conditional execution in A64 code on page 7-143.
• 7.5 Condition flags on page 7-144.
• 7.6 Updates to the condition flags in A32/T32 code on page 7-145.
• 7.7 Updates to the condition flags in A64 code on page 7-146.
• 7.8 Floating-point instructions that update the condition flags on page 7-147.
• 7.9 Carry flag on page 7-148.
• 7.10 Overflow flag on page 7-149.
• 7.11 Condition code suffixes on page 7-150.
• 7.12 Condition code suffixes and related flags on page 7-151.
• 7.13 Comparison of condition code meanings in integer and floating-point code on page 7-152.
• 7.14 Benefits of using conditional execution in A32 and T32 code on page 7-154.
• 7.15 Example showing the benefits of conditional instructions in A32 and T32 code on page 7-155.
• 7.16 Optimization for execution speed on page 7-158.

ARM DUI0801G

7-139

7 Condition Codes
7.1 Conditional instructions

7.1

Conditional instructions
A32 and T32 instructions can execute conditionally on the condition flags set by a previous instruction.
The conditional instruction can occur either:
•
•

Immediately after the instruction that updated the flags.
After any number of intervening instructions that have not updated the flags.

In AArch32 state, whether an instruction can be conditional or not depends on the instruction set state
that the processor is in. Few A64 instructions can be conditionally executed.
To make an instruction conditional, you must add a condition code suffix to the instruction mnemonic.
The condition code suffix enables the processor to test a condition based on the flags. If the condition test
of a conditional instruction fails, the instruction:
• Does not execute.
• Does not write any value to its destination register.
• Does not affect any of the flags.
• Does not generate any exception.
Related concepts
7.2 Conditional execution in A32 code on page 7-141.
7.3 Conditional execution in T32 code on page 7-142.
Related references
7.12 Condition code suffixes and related flags on page 7-151.
7.6 Updates to the condition flags in A32/T32 code on page 7-145.
7.7 Updates to the condition flags in A64 code on page 7-146.

ARM DUI0801G

7-140

7 Condition Codes
7.2 Conditional execution in A32 code

7.2

Conditional execution in A32 code
Almost all A32 instructions can be executed conditionally on the value of the condition flags in the
APSR. You can either add a condition code suffix to the instruction or you can conditionally skip over
the instruction using a conditional branch instruction.
Using conditional branch instructions to control the flow of execution can be more efficient when a
series of instructions depend on the same condition.
Conditional instructions to control execution
; flags set by a previous instruction
LSLEQ r0, r0, #24
ADDEQ r0, r0, #2
;…

Conditional branch to control execution
; flags
BNE
LSL
ADD
over
;…

set by a previous instruction
over
r0, r0, #24
r0, r0, #2

Related concepts
7.3 Conditional execution in T32 code on page 7-142.

ARM DUI0801G

7-141

7 Condition Codes
7.3 Conditional execution in T32 code

7.3

Conditional execution in T32 code
In T32 code, there are several ways to achieve conditional execution. You can conditionally skip over the
instruction using a conditional branch instruction.
Instructions can also be conditionally executed by using either of the following:
•
•

CBZ and CBNZ.
The IT (If-Then) instruction.

The T32 CBZ (Conditional Branch on Zero) and CBNZ (Conditional Branch on Non-Zero) instructions
compare the value of a register against zero and branch on the result.
IT is a 16-bit instruction that enables a single subsequent 16-bit T32 instruction from a restricted set to
be conditionally executed, based on the value of the condition flags, and the condition code suffix
specified.

Conditional instructions using IT block
; flags set by a previous instruction
IT
EQ
LSLEQ r0, r0, #24
;…

The use of the IT instruction is deprecated when any of the following are true:
• There is more than one instruction in the IT block.
• There is a 32-bit instruction in the IT block.
• The instruction in the IT block references the PC.
Related concepts
7.2 Conditional execution in A32 code on page 7-141.
Related references
13.45 IT on page 13-399.
13.24 CBZ and CBNZ on page 13-373.

ARM DUI0801G

7-142

7 Condition Codes
7.4 Conditional execution in A64 code

7.4

Conditional execution in A64 code
In the A64 instruction set, there are a few instructions that are truly conditional. Truly conditional means
that when the condition is false, the instruction advances the program counter but has no other effect.
The conditional branch, B.cond is a truly conditional instruction. The condition code is appended to the
instruction with a '.' delimiter, for example B.EQ.
There are other truly conditional branch instructions that execute depending on the value of the Zero
condition flag. You cannot append any condition code suffix to them. These instructions are:
•
•
•
•

CBNZ.
CBZ.
TBNZ.
TBZ.

There are a few A64 instructions that are unconditionally executed but use the condition code as a source
operand. These instructions always execute but the operation depends on the value of the condition code.
These instructions can be categorized as:
• Conditional data processing instructions, for example CSEL.
• Conditional comparison instructions, CCMN and CCMP.
In these instructions, you specify the condition code in the final operand position, for example CSEL
Wd,Wm,Wn,NE.
Related concepts
7.3 Conditional execution in T32 code on page 7-142.
7.2 Conditional execution in A32 code on page 7-141.

ARM DUI0801G

7-143

7 Condition Codes
7.5 Condition flags

7.5

Condition flags
The N, Z, C, and V condition flags are held in the APSR.
The condition flags are held in the APSR. They are set or cleared as follows:
N
Set to 1 when the result of the operation is negative, cleared to 0 otherwise.
Z
Set to 1 when the result of the operation is zero, cleared to 0 otherwise.
C
Set to 1 when the operation results in a carry, or when a subtraction results in no borrow, cleared
to 0 otherwise.
V
Set to 1 when the operation causes overflow, cleared to 0 otherwise.
C is set in one of the following ways:
• For an addition, including the comparison instruction CMN, C is set to 1 if the addition produced a
carry (that is, an unsigned overflow), and to 0 otherwise.
• For a subtraction, including the comparison instruction CMP, C is set to 0 if the subtraction produced a
borrow (that is, an unsigned underflow), and to 1 otherwise.
• For non-addition/subtractions that incorporate a shift operation, C is set to the last bit shifted out of
the value by the shifter.
• For other non-addition/subtractions, C is normally left unchanged, but see the individual instruction
descriptions for any special cases.
Overflow occurs if the result of a signed add, subtract, or compare is greater than or equal to 231, or less
than –231.
Related references
7.6 Updates to the condition flags in A32/T32 code on page 7-145.
7.7 Updates to the condition flags in A64 code on page 7-146.
7.12 Condition code suffixes and related flags on page 7-151.

ARM DUI0801G

7-144

7 Condition Codes
7.6 Updates to the condition flags in A32/T32 code

7.6

Updates to the condition flags in A32/T32 code
In AArch32 state, the condition flags are held in the Application Program Status Register (APSR). You
can read and modify the flags using the read-modify-write procedure.
Most A32 and T32 data processing instructions have an option to update the condition flags according to
the result of the operation. Instructions with the optional S suffix update the flags. Conditional
instructions that are not executed have no effect on the flags.
Which flags are updated depends on the instruction. Some instructions update all flags, and some update
a subset of the flags. If a flag is not updated, the original value is preserved. The description of each
instruction mentions the effect that it has on the flags.
Note
Most instructions update the condition flags only if the S suffix is specified. The instructions CMP, CMN,
TEQ, and TST always update the flags.

Related concepts
7.1 Conditional instructions on page 7-140.
Related references
7.5 Condition flags on page 7-144.
7.7 Updates to the condition flags in A64 code on page 7-146.
7.12 Condition code suffixes and related flags on page 7-151.
Chapter 13 A32 and T32 Instructions on page 13-327.

ARM DUI0801G

7-145

7 Condition Codes
7.7 Updates to the condition flags in A64 code

7.7

Updates to the condition flags in A64 code
In AArch64 state, the N, Z, C, and V condition flags are held in the NZCV system register, which is part
of the process state. You can access the flags using the MSR and MRS instructions.
Note
An instruction updates the condition flags only if the S suffix is specified, except the instructions CMP,
CMN, CCMP, CCMN, and TST, which always update the condition flags. The instruction also determines
which flags get updated. If a conditional instruction does not execute, it does not affect the flags.

Example
This example shows the read-modify-write procedure to change some of the condition flags in A64 code.
MRS
MOV
BIC
ORR
MSR

x1, NZCV
x2, #0x30000000
x1,x1,x2
x1,x1,#0xC0000000
NZCV, x1

; copy N, Z, C, and V flags into general-purpose x1
; clears the C and V flags (bits 29,28)
; sets the N and Z flags (bits 31,30)
; copy x1 back into NZCV register to update the condition flags

Related concepts
7.1 Conditional instructions on page 7-140.
Related references
7.5 Condition flags on page 7-144.
7.6 Updates to the condition flags in A32/T32 code on page 7-145.
7.12 Condition code suffixes and related flags on page 7-151.

ARM DUI0801G

7-146

7 Condition Codes
7.8 Floating-point instructions that update the condition flags

7.8

Floating-point instructions that update the condition flags
The only A32/T32 floating-point instructions that can update the condition flags are VCMP and VCMPE.
Other floating-point or Advanced SIMD instructions cannot modify the flags.
VCMP and VCMPE do not update the flags directly, but update a separate set of flags in the Floating-Point
Status and Control Register (FPSCR). To use these flags to control conditional instructions, including
conditional floating-point instructions, you must first update the condition flags yourself. To do this,
copy the flags from the FPSCR into the APSR using a VMRS instruction:
VMRS APSR_nzcv, FPSCR

All A64 floating-point comparison instructions can update the condition flags. These instructions update
the flags directly in the NZCV register.
Related concepts
6.20 The Read-Modify-Write operation on page 6-128.
7.9 Carry flag on page 7-148.
7.10 Overflow flag on page 7-149.
Related references
7.7 Updates to the condition flags in A64 code on page 7-146.
15.4 VCMP, VCMPE on page 15-756.
14.72 VMRS on page 14-680.
15.25 VMRS (floating-point) on page 15-777.
Related information
ARM Architecture Reference Manual.

ARM DUI0801G

7-147

7 Condition Codes
7.9 Carry flag

7.9

Carry flag
The carry (C) flag is set when an operation results in a carry, or when a subtraction results in no borrow.
In A32/T32 code, C is set in one of the following ways:
• For an addition, including the comparison instruction CMN, C is set to 1 if the addition produced a
carry (that is, an unsigned overflow), and to 0 otherwise.
• For a subtraction, including the comparison instruction CMP, C is set to 0 if the subtraction produced a
borrow (that is, an unsigned underflow), and to 1 otherwise.
• For non-additions/subtractions that incorporate a shift operation, C is set to the last bit shifted out of
the value by the shifter.
• For other non-additions/subtractions, C is normally left unchanged, but see the individual instruction
descriptions for any special cases.
• The floating-point compare instructions, VCMP and VCMPE set the C flag and the other condition flags
in the FPSCR to the result of the comparison.
In A64 code, C is set in one of the following ways:
• For an addition, including the comparison instruction CMN, C is set to 1 if the addition produced a
carry (that is, an unsigned overflow), and to 0 otherwise.
• For a subtraction, including the comparison instruction CMP and the negate instructions NEGS and
NGCS, C is set to 0 if the subtraction produced a borrow (that is, an unsigned underflow), and to 1
otherwise.
• For the integer and floating-point conditional compare instructions CCMP, CCMN, FCCMP, and FCCMPE, C
and the other condition flags are set either to the result of the comparison, or directly from an
immediate value.
• For the floating-point compare instructions, FCMP and FCMPE, C and the other condition flags are set
to the result of the comparison.
• For other instructions, C is normally left unchanged, but see the individual instruction descriptions
for any special cases.
Related concepts
7.10 Overflow flag on page 7-149.
Related references
3.7 Predeclared core register names in AArch32 state on page 3-71.
4.5 Predeclared core register names in AArch64 state on page 4-85.
7.12 Condition code suffixes and related flags on page 7-151.
7.6 Updates to the condition flags in A32/T32 code on page 7-145.
7.7 Updates to the condition flags in A64 code on page 7-146.

ARM DUI0801G

7-148

7 Condition Codes
7.10 Overflow flag

7.10

Overflow flag
Overflow can occur for add, subtract, and compare operations.
In A32/T32 code, overflow occurs if the result of the operation is greater than or equal to 231, or less than
–231.
In A64 instructions that use the 64-bit X registers, overflow occurs if the result of the operation is greater
than or equal to 263, or less than –263.
In A64 instructions that use the 32-bit W registers, overflow occurs if the result of the operation is
greater than or equal to 231, or less than –231.
Related concepts
7.9 Carry flag on page 7-148.
Related references
3.7 Predeclared core register names in AArch32 state on page 3-71.
7.6 Updates to the condition flags in A32/T32 code on page 7-145.
7.7 Updates to the condition flags in A64 code on page 7-146.

ARM DUI0801G

7-149

7 Condition Codes
7.11 Condition code suffixes

7.11

Condition code suffixes
Instructions that can be conditional have an optional two character condition code suffix.
Condition codes are shown in syntax descriptions as {cond}. The following table shows the condition
codes that you can use:
Table 7-1 Condition code suffixes
Suffix Meaning
EQ

Equal

Not equal

Carry set (identical to HS)

Unsigned higher or same (identical to CS)

Carry clear (identical to LO)

Unsigned lower (identical to CC)

Minus or negative result

Positive or zero result

Overflow

No overflow

Unsigned higher

Unsigned lower or same

Signed greater than or equal

Signed less than

Signed greater than

Signed less than or equal

Always (this is the default)

Note
The meaning of some of these condition codes depends on whether the instruction that last updated the
condition flags is a floating-point or integer instruction.

Related concepts
9.8 Conditional execution of A32/T32 Advanced SIMD instructions on page 9-192.
10.8 Conditional execution of A32/T32 floating-point instructions on page 10-215.
Related references
7.13 Comparison of condition code meanings in integer and floating-point code on page 7-152.
13.45 IT on page 13-399.
14.72 VMRS on page 14-680.
15.25 VMRS (floating-point) on page 15-777.

ARM DUI0801G

7-150

7 Condition Codes
7.12 Condition code suffixes and related flags

7.12

Condition code suffixes and related flags
Condition code suffixes define the conditions that must be met for the instruction to execute.
The following table shows the condition codes that you can use and the flag settings they depend on:
Table 7-2 Condition code suffixes and related flags
Suffix

Flags

Meaning

Z set

Equal

Z clear

Not equal

CS or HS C set

Higher or same (unsigned >= )

CC or LO C clear

Lower (unsigned < )

N set

Negative

N clear

Positive or zero

V set

Overflow

V clear

No overflow

C set and Z clear

Higher (unsigned >)

C clear or Z set

Lower or same (unsigned <=)

N and V the same

Signed >=

N and V differ

Signed <

Z clear, N and V the same Signed >

Z set, N and V differ

Signed <=

Any

Always. This suffix is normally omitted.

The optional condition code is shown in syntax descriptions as {cond}. This condition is encoded in A32
instructions and in A64 instructions. For T32 instructions, the condition is encoded in a preceding IT
instruction. An instruction with a condition code is only executed if the condition flags meet the
specified condition.
The following is an example of conditional execution in A32 code:
ADD
ADDS
ADDSCS

r0, r1, r2
r0, r1, r2
r0, r1, r2

CMP

r0, r1

;
;
;
;
;

r0 = r1 + r2, don't update flags
r0 = r1 + r2, and update flags
If C flag set then r0 = r1 + r2,
and update flags
update flags based on r0-r1.

Related concepts
7.1 Conditional instructions on page 7-140.
Related references
7.5 Condition flags on page 7-144.
7.13 Comparison of condition code meanings in integer and floating-point code on page 7-152.
7.6 Updates to the condition flags in A32/T32 code on page 7-145.
7.7 Updates to the condition flags in A64 code on page 7-146.
Chapter 13 A32 and T32 Instructions on page 13-327.

ARM DUI0801G

7-151

7 Condition Codes
7.13 Comparison of condition code meanings in integer and floating-point code

7.13

Comparison of condition code meanings in integer and floating-point code
The meaning of the condition code mnemonic suffixes depends on whether the condition flags were set
by a floating-point instruction or by an A32 or T32 data processing instruction.
This is because:
• Floating-point values are never unsigned, so the unsigned conditions are not required.
• Not-a-Number (NaN) values have no ordering relationship with numbers or with each other, so
additional conditions are required to account for unordered results.
The meaning of the condition code mnemonic suffixes is shown in the following table:
Table 7-3 Condition codes
Suffix Meaning after integer data processing instruction Meaning after floating-point instruction
EQ

Equal

Not equal

Not equal, or unordered

Carry set

Greater than or equal, or unordered

Unsigned higher or same

Greater than or equal, or unordered

Carry clear

Less than

Unsigned lower

Less than

Negative

Less than

Positive or zero

Greater than or equal, or unordered

Overflow

Unordered (at least one NaN operand)

No overflow

Not unordered

Unsigned higher

Greater than, or unordered

Unsigned lower or same

Less than or equal

Signed greater than or equal

Greater than or equal

Signed less than

Less than, or unordered

Signed greater than

Greater than

Signed less than or equal

Less than or equal, or unordered

Always (normally omitted)

Note
The type of the instruction that last updated the condition flags determines the meaning of the condition
codes.

Related concepts
7.1 Conditional instructions on page 7-140.
Related references
7.12 Condition code suffixes and related flags on page 7-151.
7.6 Updates to the condition flags in A32/T32 code on page 7-145.
7.7 Updates to the condition flags in A64 code on page 7-146.
15.4 VCMP, VCMPE on page 15-756.
ARM DUI0801G

7-152

7 Condition Codes
7.13 Comparison of condition code meanings in integer and floating-point code

14.72 VMRS on page 14-680.
15.25 VMRS (floating-point) on page 15-777.
Related information
ARM Architecture Reference Manual.

ARM DUI0801G

7-153

7 Condition Codes
7.14 Benefits of using conditional execution in A32 and T32 code

7.14

Benefits of using conditional execution in A32 and T32 code
It can be more efficient to use conditional instructions rather than conditional branches.
You can use conditional execution of A32 instructions to reduce the number of branch instructions in
your code, and improve code density. The IT instruction in T32 achieves a similar improvement.
Branch instructions are also expensive in processor cycles. On ARM processors without branch
prediction hardware, it typically takes three processor cycles to refill the processor pipeline each time a
branch is taken.
Some ARM processors have branch prediction hardware. In systems using these processors, the pipeline
only has to be flushed and refilled when there is a misprediction.
Related concepts
7.15 Example showing the benefits of conditional instructions in A32 and T32 code on page 7-155.

ARM DUI0801G

7-154

7 Condition Codes
7.15 Example showing the benefits of conditional instructions in A32 and T32 code

7.15

Example showing the benefits of conditional instructions in A32 and T32
code
Using conditional instructions rather than conditional branches can save both code size and cycles.
This example shows the difference between using branches and using conditional instructions. It uses the
Euclid algorithm for the Greatest Common Divisor (gcd) to show how conditional instructions improve
code size and speed.
In C the gcd algorithm can be expressed as:
int gcd(int a, int b)
{
while (a != b)
{
if (a > b)
a = a - b;
else
b = b - a;
}
return a;
}

The following examples show implementations of the gcd algorithm with and without conditional
instructions.
Example of conditional execution using branches in A32 code
This example is an A32 code implementation of the gcd algorithm. It achieves conditional execution by
using conditional branches, rather than individual conditional instructions:
gcd

less
end

CMP
BEQ
BLT
SUBS
B

r0, r1
end
less
r0, r0, r1
gcd

SUBS
B

r1, r1, r0
gcd

; could be SUB r0, r0, r1 for A32
; could be SUB r1, r1, r0 for A32

The code is seven instructions long because of the number of branches. Every time a branch is taken, the
processor must refill the pipeline and continue from the new location. The other instructions and nonexecuted branches use a single cycle each.
The following table shows the number of cycles this implementation uses on an ARM7™ processor when
R0 equals 1 and R1 equals 2.
Table 7-4 Conditional branches only
R0: a R1: b Instruction

Cycles (ARM7)

CMP r0, r1

BEQ end

1 (not executed)

BLT less

SUB r1, r1, r0 1

B gcd

CMP r0, r1

BEQ end

3
Total = 13

ARM DUI0801G

7-155

7 Condition Codes
7.15 Example showing the benefits of conditional instructions in A32 and T32 code

Example of conditional execution using conditional instructions in A32 code
This example is an A32 code implementation of the gcd algorithm using individual conditional
instructions in A32 code. The gcd algorithm only takes four instructions:
gcd

CMP
SUBGT
SUBLE
BNE

r0, r1
r0, r0, r1
r1, r1, r0
gcd

In addition to improving code size, in most cases this code executes faster than the version that uses only
branches.
The following table shows the number of cycles this implementation uses on an ARM7 processor when
R0 equals 1 and R1 equals 2.
Table 7-5 All instructions conditional
R0: a R1: b Instruction

Cycles (ARM7)

CMP r0, r1

SUBGT r0,r0,r1 1 (not executed)

SUBLT r1,r1,r0 1

BNE gcd

CMP r0,r1

SUBGT r0,r0,r1 1 (not executed)

SUBLT r1,r1,r0 1 (not executed)

BNE gcd

1 (not executed)
Total = 10

Comparing this with the example that uses only branches:
• Replacing branches with conditional execution of all instructions saves three cycles.
• Where R0 equals R1, both implementations execute in the same number of cycles. For all other cases,
the implementation that uses conditional instructions executes in fewer cycles than the
implementation that uses branches only.
Example of conditional execution using conditional instructions in T32 code
You can use the IT instruction to write conditional instructions in T32 code. The T32 code
implementation of the gcd algorithm using conditional instructions is similar to the implementation in
A32 code. The implementation in T32 code is:
gcd

CMP
ITE
SUBGT
SUBLE
BNE

r0, r1
GT
r0, r0, r1
r1, r1, r0
gcd

These instructions assemble equally well to A32 or T32 code. The assembler checks the IT instructions,
but omits them on assembly to A32 code.
It requires one more instruction in T32 code (the IT instruction) than in A32 code, but the overall code
size is 10 bytes in T32 code, compared with 16 bytes in A32 code.
Example of conditional execution code using branches in T32 code
In architectures before ARMv6T2, there is no IT instruction and therefore T32 instructions cannot be
executed conditionally except for the B branch instruction. The gcd algorithm must be written with
ARM DUI0801G

7-156

7 Condition Codes
7.15 Example showing the benefits of conditional instructions in A32 and T32 code

conditional branches and is similar to the A32 code implementation using branches, without conditional
instructions.
The T32 code implementation of the gcd algorithm without conditional instructions requires seven
instructions. The overall code size is 14 bytes. This figure is even less than the A32 implementation that
uses conditional instructions, which uses 16 bytes.
In addition, on a system using 16-bit memory this T32 implementation runs faster than both A32
implementations because only one memory access is required for each 16-bit T32 instruction, whereas
each 32-bit A32 instruction requires two fetches.
Related concepts
7.14 Benefits of using conditional execution in A32 and T32 code on page 7-154.
7.16 Optimization for execution speed on page 7-158.
Related references
13.45 IT on page 13-399.
7.12 Condition code suffixes and related flags on page 7-151.
Related information
ARM Architecture Reference Manual.

ARM DUI0801G

7-157

7 Condition Codes
7.16 Optimization for execution speed

7.16

Optimization for execution speed
To optimize code for execution speed you must have detailed knowledge of the instruction timings,
branch prediction logic, and cache behavior of your target system.
For more information, see the Technical Reference Manual for your processor.
Related information
ARM Architecture Reference Manual.
Further reading.

ARM DUI0801G

7-158

Chapter 8
Using armasm

Describes how to use armasm.
It contains the following sections:
• 8.1 armasm command-line syntax on page 8-160.
• 8.2 Specify command-line options with an environment variable on page 8-161.
• 8.3 Using stdin to input source code to the assembler on page 8-162.
• 8.4 Built-in variables and constants on page 8-163.
• 8.5 Identifying versions of armasm in source code on page 8-167.
• 8.6 Diagnostic messages on page 8-168.
• 8.7 Interlocks diagnostics on page 8-169.
• 8.8 Automatic IT block generation in T32 code on page 8-170.
• 8.9 T32 branch target alignment on page 8-171.
• 8.10 T32 code size diagnostics on page 8-172.
• 8.11 A32 and T32 instruction portability diagnostics on page 8-173.
• 8.12 T32 instruction width diagnostics on page 8-174.
• 8.13 Two pass assembler diagnostics on page 8-175.
• 8.14 Using the C preprocessor on page 8-176.
• 8.15 Address alignment in A32/T32 code on page 8-178.
• 8.16 Address alignment in A64 code on page 8-179.
• 8.17 Instruction width selection in T32 code on page 8-180.

ARM DUI0801G

8-159

8 Using armasm
8.1 armasm command-line syntax

8.1

armasm command-line syntax
You can use a command line to invoke armasm. You must specify an input source file and you can
specify various options.
The command for invoking the assembler is:
armasm {options} inputfile

where:
options

are commands that instruct the assembler how to assemble the inputfile. You can invoke
armasm with any combination of options separated by spaces. You can specify values for some
options. To specify a value for an option, use either ‘=’ (option=value) or a space character
(option value).
inputfile

is an assembly source file. It must contain UAL, pre-UAL A32 or T32, or A64 assembly
language.
The assembler command line is case-insensitive, except in filenames and where specified. The assembler
uses the same command-line ordering rules as the compiler. This means that if the command line
contains options that conflict with each other, then the last option found always takes precedence.

ARM DUI0801G

8-160

8 Using armasm
8.2 Specify command-line options with an environment variable

8.2

Specify command-line options with an environment variable
The ARMCOMPILER6_ASMOPT environment variable can hold command-line options for the assembler.
The syntax is identical to the command-line syntax. The assembler reads the value of
ARMCOMPILER6_ASMOPT and inserts it at the front of the command string. This means that options
specified in ARMCOMPILER6_ASMOPT can be overridden by arguments on the command line.

Related concepts
8.1 armasm command-line syntax on page 8-160.
Related information
Toolchain environment variables.

ARM DUI0801G

8-161

8 Using armasm
8.3 Using stdin to input source code to the assembler

8.3

Using stdin to input source code to the assembler
You can use stdin to pipe output from another program into armasm or to input source code directly on
the command line. This is useful if you want to test a short piece of code without having to create a file
for it.
To use stdin to pipe output from another program into armasm, invoke the program and the assembler
using the pipe character (|). Use the minus character (-) as the source filename to instruct the assembler
to take input from stdin. You must specify the output filename using the -o option. You can specify the
command-line options you want to use. For example to pipe output from fromelf:
fromelf --disassemble A32input.o | armasm --cpu=8-A.32 -o A32output.o -

Note
The source code from stdin is stored in an internal cache that can hold up to 8 MB. You can increase
this cache size using the --maxcache command-line option.
To use stdin to input source code directly on the command line:
Procedure
1. Invoke the assembler with the command-line options you want to use. Use the minus character (-) as
the source filename to instruct the assembler to take input from stdin. You must specify the output
filename using the -o option. For example:
armasm --cpu=8-A.32 -o output.o -

2. Enter your input. For example:
AREA
start

stop

ENTRY

A32ex, CODE, READONLY
; Name this block of code A32ex
; Mark first instruction to execute

MOV
MOV
ADD

r0, #10
r1, #3
r0, r0, r1

; Set up parameters

MOV
LDR
SVC

r0, #0x18
r1, =0x20026
#0x123456

; angel_SWIreason_ReportException
; ADP_Stopped_ApplicationExit
; ARM semihosting (formerly SWI)

END

; r0 = r0 + r1

; Mark end of file

3. Terminate your input by entering:
• Ctrl+Z then Return on Microsoft Windows systems.
• Ctrl+D on Unix-based operating systems.
Related concepts
8.1 armasm command-line syntax on page 8-160.
Related references
11.44 --maxcache=n on page 11-272.

ARM DUI0801G

8-162

8 Using armasm
8.4 Built-in variables and constants

8.4

Built-in variables and constants
armasm defines built-in variables that hold information about, for example, the state of armasm, the

command-line options used, and the target architecture or processor.
The following table lists the built-in variables defined by armasm:
Table 8-1 Built-in variables
{ARCHITECTURE}

Holds the name of the selected ARM architecture.

{AREANAME}

Holds the name of the current AREA.

{ARMASM_VERSION}

Holds an integer that increases with each version of armasm. The format of the
version number is Mmmuuxx where:
•
•
•
•

M is the major version number, 6.
mm is the minor version number.
uu is the update number.
xx is reserved for ARM internal use. You can ignore this for the purposes of
checking whether the current release is a specific version or within a range of
versions.
Note

The built-in variable|ads$version| is deprecated.

|ads$version|

Has the same value as {ARMASM_VERSION}.

{CODESIZE}

Is a synonym for {CONFIG}.

{COMMANDLINE}

Holds the contents of the command line.

{CONFIG}

Has the value:
• 64 if the assembler is assembling A64 code.
• 32 if the assembler is assembling A32 code.
• 16 if the assembler is assembling T32 code.

{CPU}

Holds the name of the selected processor. The value of {CPU} is derived from the
value specified in the --cpu option on the command line.

{ENDIAN}

Has the value "big" if the assembler is in big-endian mode, or "little" if it is in
little-endian mode.

{FPU}

Holds the name of the selected FPU. The default in AArch32 state is "FP-ARMv8".
The default in AArch64 state is "A64".

{INPUTFILE}

Holds the name of the current source file.

{INTER}

Has the Boolean value True if --apcs=/inter is set. The default is {False}.

{LINENUM}

Holds an integer indicating the line number in the current source file.

{LINENUMUP}

When used in a macro, holds an integer indicating the line number of the current
macro. The value is the same as {LINENUM} when used in a non-macro context.

{LINENUMUPPER}

When used in a macro, holds an integer indicating the line number of the top macro.
The value is the same as {LINENUM} when used in a non-macro context.

{OPT}

Value of the currently-set listing option. You can use the OPT directive to save the
current listing option, force a change in it, or restore its original value.

{PC} or .

Address of current instruction.

ARM DUI0801G

8-163

8 Using armasm
8.4 Built-in variables and constants

{PCSTOREOFFSET}

Is the offset between the address of the STR PC,[…] or STM Rb,{…, PC}
instruction and the value of PC stored out. This varies depending on the processor or
architecture specified.

{ROPI}

Has the Boolean value {True} if --apcs=/ropi is set. The default is {False}.

{RWPI}

Has the Boolean value {True} if --apcs=/rwpi is set. The default is {False}.

{VAR} or @

Current value of the storage area location counter.

You can use built-in variables in expressions or conditions in assembly source code. For example:
IF {ARCHITECTURE} = "8-A"

They cannot be set using the SETA, SETL, or SETS directives.
The names of the built-in variables can be in uppercase, lowercase, or mixed, for example:
IF {CpU} = "Generic ARM"

Note
All built-in string variables contain case-sensitive values. Relational operations on these built-in
variables do not match with strings that contain an incorrect case. Use the command-line options --cpu
and --fpu to determine valid values for {CPU}, {ARCHITECTURE}, and {FPU}.
The assembler defines the built-in Boolean constants TRUE and FALSE.
Table 8-2 Built-in Boolean constants
{FALSE} Logical constant false.
{TRUE}

Logical constant true.

The following table lists the target processor-related built-in variables that are predefined by the
assembler. Where the value field is empty, the symbol is a Boolean value and the meaning column
describes when its value is {TRUE}.
Table 8-3 Predefined macros
Name

Value

Meaning

{TARGET_ARCH_AARCH32}

boolean

{TRUE} when assembling for AArch32 state. {FALSE} when assembling
for AArch64 state.

{TARGET_ARCH_AARCH64}

boolean

{TRUE} when assembling for AArch64 state. {FALSE} when assembling
for AArch32 state.

{TARGET_ARCH_ARM}

num

The number of the A32 base architecture of the target processor
irrespective of whether the assembler is assembling for A32 or T32. The
value is defined as zero when assembling for A64, and eight when
assembling for A32/T32.

{TARGET_ARCH_THUMB}

num

The number of the T32 base architecture of the target processor
irrespective of whether the assembler is assembling for A32 or T32. The
value is defined as zero when assembling for A64, and five when
assembling for A32/T32.

ARM DUI0801G

8-164

8 Using armasm
8.4 Built-in variables and constants

Table 8-3 Predefined macros (continued)
Name

Value

Meaning

{TARGET_ARCH_XX}

–

XX represents the target architecture and its value depends on the target
processor:
For the ARMv8 architecture:
• If you specify the assembler option --cpu=8-A.32 or --cpu=8-A.
64 then {TARGET_ARCH_8_A} is defined.
• If you specify the assembler option --cpu=8.1-A.32 or -cpu=8.1-A.64 then {TARGET_ARCH_8_1_A} is defined.
For the ARMv7 architecture, if you specify --cpu=Cortex-A8, for
example, then {TARGET_ARCH_7_A} is defined.

{TARGET_FEATURE_EXTENSION_REGIS
TER_COUNT}

num

The number of 64-bit extension registers available in Advanced SIMD or
floating-point.

{TARGET_FEATURE_CLZ}

–

If the target processor supports the CLZ instruction.

{TARGET_FEATURE_CRYPTOGRAPHY}

–

If the target processor has cryptographic instructions.

{TARGET_FEATURE_DIVIDE}

–

If the target processor supports the hardware divide instructions SDIV and
UDIV.

{TARGET_FEATURE_DOUBLEWORD}

–

If the target processor supports doubleword load and store instructions,
for example the A32 and T32 instructions LDRD and STRD (except
ARMv6-M).

{TARGET_FEATURE_DSPMUL}

–

If the DSP-enhanced multiplier (for example the SMLAxy instruction) is
available.

{TARGET_FEATURE_MULTIPLY}

–

If the target processor supports long multiply instructions, for example the
A32 and T32 instructions SMULL, SMLAL, UMULL, and UMLAL (that is, all
architectures except ARMv6-M).

{TARGET_FEATURE_MULTIPROCESSING
}

–

If assembling for a target processor with Multiprocessing Extensions.

{TARGET_FEATURE_NEON}

–

If the target processor has Advanced SIMD.

{TARGET_FEATURE_NEON_FP16}

–

If the target processor has Advanced SIMD with half-precision floatingpoint operations.

{TARGET_FEATURE_NEON_FP32}

–

If the target processor has Advanced SIMD with single-precision floatingpoint operations.

{TARGET_FEATURE_NEON_INTEGER}

–

If the target processor has Advanced SIMD with integer operations.

{TARGET_FEATURE_UNALIGNED}

–

If the target processor has support for unaligned accesses (all architectures
except ARMv6-M).

{TARGET_FPU_SOFTVFP}

–

If assembling with the option --fpu=SoftVFP.

{TARGET_FPU_SOFTVFP_VFP}

–

If assembling for a target processor with SoftVFP and floating-point
hardware, for example --fpu=SoftVFP+FP-ARMv8.

{TARGET_FPU_VFP}

–

If assembling for a target processor with floating-point hardware, without
using SoftVFP, for example --fpu=FP-ARMv8.

{TARGET_FPU_VFPV2}

–

If assembling for a target processor with VFPv2.

{TARGET_FPU_VFPV3}

–

If assembling for a target processor with VFPv3.

{TARGET_FPU_VFPV4}

–

If assembling for a target processor with VFPv4.

ARM DUI0801G

8-165

8 Using armasm
8.4 Built-in variables and constants

Table 8-3 Predefined macros (continued)
Name

Value

Meaning

{TARGET_PROFILE_A}

–

If assembling for a Cortex™-A profile processor, for example, if you
specify the assembler option --cpu=7-A.

{TARGET_PROFILE_M}

–

If assembling for a Cortex-M profile processor, for example, if you
specify the assembler option --cpu=7-M.

{TARGET_PROFILE_R}

–

If assembling for a Cortex-R profile processor, for example, if you specify
the assembler option --cpu=7-R.

Related concepts
8.5 Identifying versions of armasm in source code on page 8-167.
Related references
11.13 --cpu=name on page 11-239.
11.32 --fpu=name on page 11-260.

ARM DUI0801G

8-166

8 Using armasm
8.5 Identifying versions of armasm in source code

8.5

Identifying versions of armasm in source code
The assembler defines the built-in variable ARMASM_VERSION to hold the version number of the
assembler.
You can use it as follows:
IF ( {ARMASM_VERSION} /
; using armasm in ARM
ELIF ( {ARMASM_VERSION}
; using armasm in ARM
ELSE
; using armasm in ARM
ENDIF

100000) >= 6
Compiler 6
/ 1000000) = 5
Compiler 5
Compiler 4.1 or earlier

Note
The built-in variable |ads$version| is deprecated.

Related references
8.4 Built-in variables and constants on page 8-163.

ARM DUI0801G

8-167

8 Using armasm
8.6 Diagnostic messages

8.6

Diagnostic messages
The assembler can provide extra error, warning, and remark diagnostic messages in addition to the
default ones.
By default, these additional diagnostic messages are not displayed. However, you can enable them using
the command-line options --diag_error, --diag_warning, and --diag_remark.
Related concepts
8.7 Interlocks diagnostics on page 8-169.
8.8 Automatic IT block generation in T32 code on page 8-170.
8.9 T32 branch target alignment on page 8-171.
8.10 T32 code size diagnostics on page 8-172.
8.11 A32 and T32 instruction portability diagnostics on page 8-173.
8.12 T32 instruction width diagnostics on page 8-174.
8.13 Two pass assembler diagnostics on page 8-175.
Related references
11.17 --diag_error=tag[,tag,…] on page 11-245.

ARM DUI0801G

8-168

8 Using armasm
8.7 Interlocks diagnostics

8.7

Interlocks diagnostics
armasm can report warning messages about possible interlocks in your code caused by the pipeline of the
processor chosen by the --cpu option.

To do this, use the --diag_warning 1563 command-line option when invoking armasm.
Note
•

armasm does not have an accurate model of the target processor, so these messages are not reliable

•

when used with a multi-issue processor such as Cortex-A8.
Interlocks diagnostics apply to A32 and T32 code, but not to A64 code.

Related concepts
8.8 Automatic IT block generation in T32 code on page 8-170.
8.9 T32 branch target alignment on page 8-171.
8.12 T32 instruction width diagnostics on page 8-174.
8.6 Diagnostic messages on page 8-168.
Related references
11.21 --diag_warning=tag[,tag,…] on page 11-249.

ARM DUI0801G

8-169

8 Using armasm
8.8 Automatic IT block generation in T32 code

8.8

Automatic IT block generation in T32 code
armasm can automatically insert an IT block for conditional instructions in T32 code, without requiring
the use of explicit IT instructions.

If you write the following code:
AREA x,
THUMB
MOVNE
NOP
IT
MOVNE
END

CODE
r0,r1
NE
r0,r1

armasm generates the following instructions:
IT
MOVNE
NOP
IT
MOVNE

NE
r0,r1
NE
r0,r1

You can receive warning messages about the automatic generation of IT blocks when assembling T32
code. To do this, use the armasm --diag_warning 1763 command-line option when invoking armasm.
Related concepts
8.6 Diagnostic messages on page 8-168.
Related references
11.21 --diag_warning=tag[,tag,…] on page 11-249.

ARM DUI0801G

8-170

8 Using armasm
8.9 T32 branch target alignment

8.9

T32 branch target alignment
armasm can issue warnings about non word-aligned branch targets in T32 code.

On some processors, non word-aligned T32 instructions sometimes take one or more additional cycles to
execute in loops. This means that it can be an advantage to ensure that branch targets are word-aligned.
To ensure armasm reports such warnings, use the --diag_warning 1604 command-line option when
invoking it.
Related concepts
8.6 Diagnostic messages on page 8-168.
Related references
11.21 --diag_warning=tag[,tag,…] on page 11-249.

ARM DUI0801G

8-171

8 Using armasm
8.10 T32 code size diagnostics

8.10

T32 code size diagnostics
In T32 code, some instructions, for example a branch or LDR (PC-relative), can be encoded as either a 32bit or 16-bit instruction. armasm chooses the size of the instruction encoding.
armasm can issue a warning when it assembles a T32 instruction to a 32-bit encoding when it could have

used a 16-bit encoding.
To enable this warning, use the --diag_warning 1813 command-line option when invoking armasm.
Related concepts
8.17 Instruction width selection in T32 code on page 8-180.
2.2 A32 and T32 instruction sets on page 2-58.
8.6 Diagnostic messages on page 8-168.
Related references
11.21 --diag_warning=tag[,tag,…] on page 11-249.

ARM DUI0801G

8-172

8 Using armasm
8.11 A32 and T32 instruction portability diagnostics

8.11

A32 and T32 instruction portability diagnostics
armasm can issue warnings about instructions that cannot assemble to both A32 and T32 code.

There are a few UAL instructions that can assemble as either A32 code or T32 code, but not both. You
can identify these instructions in the source code using the --diag_warning 1812 command-line option
when invoking armasm.
It warns for any instruction that cannot be assembled in the other instruction set. This is only a hint, and
other factors, like relocation availability or target distance might affect the accuracy of the message.
Related concepts
2.2 A32 and T32 instruction sets on page 2-58.
8.6 Diagnostic messages on page 8-168.
Related references
11.21 --diag_warning=tag[,tag,…] on page 11-249.

ARM DUI0801G

8-173

8 Using armasm
8.12 T32 instruction width diagnostics

8.12

T32 instruction width diagnostics
armasm can issue a warning when it assembles a T32 instruction to a 32-bit encoding when it could have

used a 16-bit encoding.
If you use the .W specifier, the instruction is encoded in 32 bits even if it could be encoded in 16 bits. You
can use a diagnostic warning to detect when a branch instruction could have been encoded in 16 bits, but
has been encoded in 32 bits. To do this, use the --diag_warning 1607 command-line option when
invoking armasm.
Note
This diagnostic does not produce a warning for relocated branch instructions, because the final address is
not known. The linker might even insert a veneer, if the branch is out of range for a 32-bit instruction.

Related concepts
8.6 Diagnostic messages on page 8-168.
Related references
11.21 --diag_warning=tag[,tag,…] on page 11-249.

ARM DUI0801G

8-174

8 Using armasm
8.13 Two pass assembler diagnostics

8.13

Two pass assembler diagnostics
armasm can issue a warning about code that might not be identical in both assembler passes.
armasm is a two pass assembler and the input code that the assembler reads must be identical in both
passes. If a symbol is defined after the :DEF: test for that symbol, then the code read in pass one might
be different from the code read in pass two. armasm can warn in this situation.

To do this, use the --diag_warning 1907 command-line option when invoking armasm.
Example
The following example shows that the symbol foo is defined after the :DEF: foo test.
AREA x,CODE
[ :DEF: foo
]
foo MOV r3, r4
END

Assembling this code with --diag_warning 1907 generates the message:
Warning A1907W: Test for this symbol has been seen and may cause failure in the second pass.

Related concepts
8.8 Automatic IT block generation in T32 code on page 8-170.
8.9 T32 branch target alignment on page 8-171.
8.12 T32 instruction width diagnostics on page 8-174.
8.6 Diagnostic messages on page 8-168.
1.3 How the assembler works on page 1-49.
Related references
11.21 --diag_warning=tag[,tag,…] on page 11-249.
1.4 Directives that can be omitted in pass 2 of the assembler on page 1-51.

ARM DUI0801G

8-175

8 Using armasm
8.14 Using the C preprocessor

8.14

Using the C preprocessor
armasm can invoke armclang to preprocess an assembly language source file before assembling it. This

allows you to use C preprocessor commands in assembly source code.
If you do this, you must use the --cpreproc command-line option together with the --cpreproc_opts
command-line option when invoking the assembler. This causes armasm to call armclang to preprocess
the file before assembling it.
Note
As a minimum, you must specify the armclang --target option and either the -mcpu or -march option
with --cpreproc_opts.
armasm looks for the armclang binary in the same directory as the armasm binary. If it does not find the

binary, it expects it to be on the PATH.
armasm passes the following options by default to armclang if present on the command line:

•
•
•
•

Basic pre-processor configuration options, such as -E.
User specified include directories, -I directives.
User specified licensing options, such as --site_license.
Anything specified in --cpreproc_opts.

Some of the options that armasm passes to armclang are converted to the armclang equivalent
beforehand. These are shown in the following table:
Table 8-4 armclang equivalent command-line options
armasm armclang
--thumb -mthumb
--arm

-marm

-i

-I

armasm correctly interprets the preprocessed #line commands. It can generate error messages and
debug_line tables using the information in the #line commands.

Preprocessing an assembly language source file
The following example shows the command you write to preprocess and assemble a file, source.S. The
example also passes the compiler options to define a macro called RELEASE, and to undefine a macro
called ALPHA.
armasm --cpu=cortex-m3 --cpreproc --cpreproc_opts=--target=arm-arm-none-eabi,-mcpu=cortexa9,-D,RELEASE,-U,ALPHA source.S

Preprocessing an assembly language source file manually
Alternatively, you must manually call armclang to preprocess the file before calling armasm. The
following example shows the commands you write to manually preprocess and assemble a file,
source.S:
armclang --target=arm-arm-none-eabi -mcpu=cortex-m3 -E source.S > preprocessed.S
armasm --cpu=cortex-m3 preprocessed.S

In this example, the preprocessor outputs a file called preprocessed.S, and armasm assembles it.

ARM DUI0801G

8-176

8 Using armasm
8.14 Using the C preprocessor

Related references
11.10 --cpreproc on page 11-236.
11.11 --cpreproc_opts=option[,option,…] on page 11-237.
Related information
Specifying a target architecture, processor, and instruction set.
-march armclang option.
-mcpu armclang option.
--target armclang option.

ARM DUI0801G

8-177

8 Using armasm
8.15 Address alignment in A32/T32 code

8.15

Address alignment in A32/T32 code
In ARMv7-A and ARMv7-R, the A bit in the System Control Register (SCTLR) controls whether
alignment checking is enabled or disabled. In ARMv7-M, the UNALIGN_TRP bit, bit 3, in the
Configuration and Control Register (CCR) controls this.
If alignment checking is enabled, all unaligned word and halfword transfers cause an alignment
exception. If disabled, unaligned accesses are permitted for the LDR, LDRH, STR, STRH, LDRSH, LDRT, STRT,
LDRSHT, LDRHT, STRHT, and TBH instructions. Other data-accessing instructions always cause an alignment
exception for unaligned data.
For STRD and LDRD, the specified address must be word-aligned.
If all your data accesses are aligned, you can use the --no_unaligned_access command-line option to
declare that the output object was not permitted to make unaligned access. The linker can then avoid
linking in any library functions that support unaligned access if all input objects declare that they were
not permitted to use unaligned accesses.
Related references
11.60 --unaligned_access, --no_unaligned_access on page 11-288.

ARM DUI0801G

8-178

8 Using armasm
8.16 Address alignment in A64 code

8.16

Address alignment in A64 code
If alignment checking is not enabled, then unaligned accesses are permitted for all load and store
instructions other than exclusive load, exclusive store, load acquire, and store release instructions. If
alignment checking is enabled, then unaligned accesses are not permitted.
This means all load and store instructions must use addresses that are aligned to the size of the data being
accessed. In other words, addresses for 8-byte transfers must be 8-byte aligned, addresses for 4-byte
transfers are 4-byte word aligned, and addresses for 2-byte transfers are 2-byte aligned. Unaligned
accesses cause an alignment exception.
For any memory access, if the stack pointer is used as the base register, then it must be quadword
aligned. Otherwise it generates a stack alignment exception.
If all your data accesses are aligned, you can use the --no_unaligned_access command-line option to
declare that the output object was not permitted to make unaligned access. The linker can then avoid
linking in any library functions that support unaligned access if all input objects declare that they were
not permitted to use unaligned accesses.

ARM DUI0801G

8-179

8 Using armasm
8.17 Instruction width selection in T32 code

8.17

Instruction width selection in T32 code
Some T32 instructions can have either a 16-bit encoding or a 32-bit encoding.
If you do not specify the instruction size, by default:
• For forward reference LDR, ADR, and B instructions, armasm always generates a 16-bit instruction,
even if that results in failure for a target that could be reached using a 32-bit instruction.
• For external reference LDR and B instructions, armasm always generates a 32-bit instruction.
• In all other cases, armasm generates the smallest size encoding that can be output.
If you want to override this behavior, you can use the .W or .N width specifier to ensure a particular
instruction size. armasm faults if it cannot generate an instruction with the specified width.
The .W specifier is ignored when assembling to A32 code, so you can safely use this specifier in code
that might assemble to either A32 or T32 code. However, the .N specifier is faulted when assembling to
A32 code.
Related concepts
8.10 T32 code size diagnostics on page 8-172.
Related references
13.2 Instruction width specifiers on page 13-337.

ARM DUI0801G

8-180

Chapter 9
Advanced SIMD Programming

Describes Advanced SIMD assembly language programming.
It contains the following sections:
• 9.1 Architecture support for Advanced SIMD on page 9-182.
• 9.2 Extension register bank mapping for Advanced SIMD in AArch32 state on page 9-183.
• 9.3 Extension register bank mapping for Advanced SIMD in AArch64 state on page 9-185.
• 9.4 Views of the Advanced SIMD register bank in AArch32 state on page 9-187.
• 9.5 Views of the Advanced SIMD register bank in AArch64 state on page 9-188.
• 9.6 Differences between A32/T32 and A64 Advanced SIMD instruction syntax on page 9-189.
• 9.7 Load values to Advanced SIMD registers on page 9-191.
• 9.8 Conditional execution of A32/T32 Advanced SIMD instructions on page 9-192.
• 9.9 Floating-point exceptions for Advanced SIMD in A32/T32 instructions on page 9-193.
• 9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
• 9.11 Polynomial arithmetic over {0,1} on page 9-195.
• 9.12 Advanced SIMD vectors on page 9-196.
• 9.13 Normal, long, wide, and narrow Advanced SIMD instructions on page 9-197.
• 9.14 Saturating Advanced SIMD instructions on page 9-198.
• 9.15 Advanced SIMD scalars on page 9-199.
• 9.16 Extended notation extension for Advanced SIMD in A32/T32 code on page 9-200.
• 9.17 Advanced SIMD system registers in AArch32 state on page 9-201.
• 9.18 Flush-to-zero mode in Advanced SIMD on page 9-202.
• 9.19 When to use flush-to-zero mode in Advanced SIMD on page 9-203.
• 9.20 The effects of using flush-to-zero mode in Advanced SIMD on page 9-204.
• 9.21 Advanced SIMD operations not affected by flush-to-zero mode on page 9-205.

ARM DUI0801G

9-181

9 Advanced SIMD Programming
9.1 Architecture support for Advanced SIMD

9.1

Architecture support for Advanced SIMD
Advanced SIMD is an optional extension to the ARMv8 and ARMv7 architectures.
All Advanced SIMD instructions are available on systems that support Advanced SIMD. In A32, some
of these instructions are also available on systems that implement the floating-point extension without
Advanced SIMD. These are called shared instructions.
In AArch32 state, the Advanced SIMD register bank consists of thirty-two 64-bit registers, and smaller
registers are packed into larger ones, as in ARMv7.
In AArch64 state, the Advanced SIMD register bank includes thirty-two 128-bit registers and has a new
register packing model.
Note
Advanced SIMD and floating-point instructions share the same extension register bank.
Advanced SIMD instructions in A64 are closely based on VFPv4 and A32, but with new instruction
mnemonics and some functional enhancements.
Related information
Floating-point support.
Further reading.

ARM DUI0801G

9-182

9 Advanced SIMD Programming
9.2 Extension register bank mapping for Advanced SIMD in AArch32 state

9.2

Extension register bank mapping for Advanced SIMD in AArch32 state
The Advanced SIMD extension register bank is a collection of registers that can be accessed as either 64bit or 128-bit registers.
Advanced SIMD and floating-point instructions use the same extension register bank, and is distinct
from the ARM register bank.
The following figure shows the views of the extension register bank, and the overlap between the
different size registers. For example, the 128-bit register Q0 is an alias for two consecutive 64-bit
registers D0 and D1. The 128-bit register Q8 is an alias for 2 consecutive 64-bit registers D16 and D17.
D0
Q0
D1

D2
Q1
D3

...

D14
Q7
D15

D16
Q8
D17

...

D30
Q15
D31

Figure 9-1 Extension register bank for Advanced SIMD in AArch32 state

Note
If your processor supports both Advanced SIMD and floating-point, all the Advanced SIMD registers
overlap with the floating-point registers.
The aliased views enable half-precision, single-precision, and double-precision values, and Advanced
SIMD vectors to coexist in different non-overlapped registers at the same time.
You can also use the same overlapped registers to store half-precision, single-precision, and doubleprecision values, and Advanced SIMD vectors at different times.
ARM DUI0801G

9-183

9 Advanced SIMD Programming
9.2 Extension register bank mapping for Advanced SIMD in AArch32 state

Do not attempt to use overlapped 64-bit and 128-bit registers at the same time because it creates
meaningless results.
The mapping between the registers is as follows:
• D<2n> maps to the least significant half of Q
• D<2n+1> maps to the most significant half of Q.
For example, you can access the least significant half of the elements of a vector in Q6 by referring to
D12, and the most significant half of the elements by referring to D13.

Related concepts
9.3 Extension register bank mapping for Advanced SIMD in AArch64 state on page 9-185.
10.4 Views of the floating-point extension register bank in AArch32 state on page 10-211.
9.4 Views of the Advanced SIMD register bank in AArch32 state on page 9-187.
10.4 Views of the floating-point extension register bank in AArch32 state on page 10-211.

ARM DUI0801G

9-184

9 Advanced SIMD Programming
9.3 Extension register bank mapping for Advanced SIMD in AArch64 state

9.3

Extension register bank mapping for Advanced SIMD in AArch64 state
The extension register bank is a collection of registers that can be accessed as 8-bit, 16-bit, 32-bit, 64-bit,
or 128-bit.
Advanced SIMD and floating-point instructions use the same extension register bank, and is distinct
from the ARM register bank.
The following figure shows the views of the extension register bank, and the overlap between the
different size registers.
B0

D0
V0

D1
V1

...
B7

...

D7
V7

D8
V8

...
B31

...

H31

S31

...

D31
V31

Figure 9-2 Extension register bank for Advanced SIMD in AArch64 state

The mapping between the registers is as follows:
• D maps to the least significant half of V
• S maps to the least significant half of D
• H maps to the least significant half of S
• B maps to the least significant half of H.
For example, you can access the least significant half of the elements of a vector in V7 by referring to D7.
Registers Q0-Q31 map directly to registers V0-V31.
Related concepts
9.2 Extension register bank mapping for Advanced SIMD in AArch32 state on page 9-183.
ARM DUI0801G

9-185

9 Advanced SIMD Programming
9.3 Extension register bank mapping for Advanced SIMD in AArch64 state

10.4 Views of the floating-point extension register bank in AArch32 state on page 10-211.
9.4 Views of the Advanced SIMD register bank in AArch32 state on page 9-187.
10.4 Views of the floating-point extension register bank in AArch32 state on page 10-211.

ARM DUI0801G

9-186

9 Advanced SIMD Programming
9.4 Views of the Advanced SIMD register bank in AArch32 state

9.4

Views of the Advanced SIMD register bank in AArch32 state
Advanced SIMD can have different views of the extension register bank in AArch32 state.
It can view the extension register bank as:
• Sixteen 128-bit registers, Q0-Q15.
• Thirty-two 64-bit registers, D0-D31.
• A combination of registers from these views.
Advanced SIMD views each register as containing a vector of 1, 2, 4, 8, or 16 elements, all of the same
size and type. Individual elements can also be accessed as scalars.
In Advanced SIMD, the 64-bit registers are called doubleword registers and the 128-bit registers are
called quadword registers.
Related concepts
9.5 Views of the Advanced SIMD register bank in AArch64 state on page 9-188.
9.2 Extension register bank mapping for Advanced SIMD in AArch32 state on page 9-183.
10.4 Views of the floating-point extension register bank in AArch32 state on page 10-211.

ARM DUI0801G

9-187

9 Advanced SIMD Programming
9.5 Views of the Advanced SIMD register bank in AArch64 state

9.5

Views of the Advanced SIMD register bank in AArch64 state
Advanced SIMD can have different views of the extension register bank in AArch64 state.
It can view the extension register bank as:
• Thirty-two 128-bit registers V0-V31.
• Thirty-two 64-bit registers D0-D31.
• Thirty-two 32-bit registers S0-S31.
• Thirty-two 16-bit registers H0-H31.
• Thirty-two 8-bit registers B0-B31.
• A combination of registers from these views.
Related concepts
9.4 Views of the Advanced SIMD register bank in AArch32 state on page 9-187.
9.2 Extension register bank mapping for Advanced SIMD in AArch32 state on page 9-183.
10.4 Views of the floating-point extension register bank in AArch32 state on page 10-211.

ARM DUI0801G

9-188

9 Advanced SIMD Programming
9.6 Differences between A32/T32 and A64 Advanced SIMD instruction syntax

9.6

Differences between A32/T32 and A64 Advanced SIMD instruction syntax
The syntax and mnemonics of A64 Advanced SIMD instructions are based on those in A32/T32 but with
some differences.
The following table describes the main differences.
Table 9-1 Differences in syntax and mnemonics between A32/T32 and A64 Advanced SIMD instructions

A32/T32

A64

All Advanced SIMD instruction mnemonics begin with
V, for example VMAX.

The first letter of the instruction mnemonic indicates the data type of the
instruction. For example, SMAX, UMAX, and FMAX mean signed, unsigned,
and floating-point respectively. No suffix means the type is irrelevant and P
means polynomial.

A mnemonic qualifier specifies the type and width of
elements in a vector. For example, in the following
instruction, U32 means 32-bit unsigned integers:

A register qualifier specifies the data width and the number of elements in
the register. For example, in the following instruction .4S means 4 32-bit
elements:

VMAX.U32 Q0, Q1, Q2

UMAX V0.4S, V1.4S, V2.4S

The 128-bit vector registers are named Q0-Q15 and the
64-bit vector registers are named D0-D31.

All vector registers are named Vn , where n is a register number between 0
and 31. You only use one of the qualified register names Qn, Dn, Sn, Hn or
Bn when referring to a scalar register, to indicate the number of significant
bits.

You load a single element into one or more vector
registers by appending an index to each register
individually, for example:

You load a single element into one or more vector registers by appending the
index to the register list, for example:
LD4 {V0.B, V1.B, V2.B, V3.B}[3], [X0]

VLD4.8 {D0[3], D1[3], D2[3], D3[3]}, [R0]

You can append a condition code to most Advanced
SIMD instruction mnemonics to make them
conditional.

A64 has no conditionally executed floating-point or Advanced SIMD
instructions.

L, W and N suffixes indicate long, wide and narrow
variants of Advanced SIMD data processing
instructions. A32/T32 Advanced SIMD does not
include vector narrowing or widening second part
instructions.

L, W and N suffixes indicate long, wide and narrow variants of Advanced
SIMD data processing instructions. You can additionally append a 2 to
implement the second part of a narrowing or widening operation, for
example:

A32/T32 Advanced SIMD does not include vector
reduction instructions.

The V Advanced SIMD mnemonic suffix identifies vector reduction
instructions, in which the operand is a vector and the result a scalar, for
example:

UADDL2 V0.4S, V1.8H, V2.8H ; take input from 4 highnumbered lanes of V1 and V2

ADDV S0, V1.4S

The P mnemonic qualifier which indicates pairwise
instructions is a prefix, for example, VPADD.

The P mnemonic qualifier is a suffix, for example ADDP.

Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
9.8 Conditional execution of A32/T32 Advanced SIMD instructions on page 9-192.
9.15 Advanced SIMD scalars on page 9-199.
9.13 Normal, long, wide, and narrow Advanced SIMD instructions on page 9-197.
6.2 Syntax differences between UAL and A64 assembly language on page 6-103.
ARM DUI0801G

9-189

9 Advanced SIMD Programming
9.6 Differences between A32/T32 and A64 Advanced SIMD instruction syntax

Related references
15.35 VSEL on page 15-787.
18.8 FCSEL on page 18-1150.

ARM DUI0801G

9-190

9 Advanced SIMD Programming
9.7 Load values to Advanced SIMD registers

9.7

Load values to Advanced SIMD registers
To load a register with a floating-point immediate value, use VMOV in A32 or FMOV in A64. Both
instructions exist in scalar and vector forms.
The A32 Advanced SIMD instructions VMOV and VMVN can also load integer immediates. The A64
Advanced SIMD instructions to load integer immediates are MOVI and MVNI.
You can load any 64-bit integer, single-precision, or double-precision floating-point value from a literal
pool using the VLDR pseudo-instruction.
Related references
14.54 VLDR pseudo-instruction on page 14-662.
15.21 VMOV (floating-point) on page 15-773.
14.65 VMOV (immediate) on page 14-673.

ARM DUI0801G

9-191

9 Advanced SIMD Programming
9.8 Conditional execution of A32/T32 Advanced SIMD instructions

9.8

Conditional execution of A32/T32 Advanced SIMD instructions
Most Advanced SIMD instructions always execute unconditionally.
You cannot use any of the following Advanced SIMD instructions in an IT block:
• VCVT{A, N, P, M}.
• VMAXNM.
• VMINNM.
• VRINT{N, X, A, Z, M, P}.
• All instructions in the Crypto extension.
In addition, specifying any other Advanced SIMD instruction in an IT block is deprecated.
ARM deprecates conditionally executing any Advanced SIMD instruction unless it is a shared Advanced
SIMD and floating-point instruction.
Related concepts
7.2 Conditional execution in A32 code on page 7-141.
7.3 Conditional execution in T32 code on page 7-142.
Related references
7.13 Comparison of condition code meanings in integer and floating-point code on page 7-152.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

9-192

9 Advanced SIMD Programming
9.9 Floating-point exceptions for Advanced SIMD in A32/T32 instructions

9.9

Floating-point exceptions for Advanced SIMD in A32/T32 instructions
The Advanced SIMD extension records floating-point exceptions in the FPSCR cumulative flags.
It records the following exceptions:
Invalid operation
The exception is caused if the result of an operation has no mathematical value or cannot be
represented.
Division by zero
The exception is caused if a divide operation has a zero divisor and a dividend that is not zero,
an infinity or a NaN.
Overflow
The exception is caused if the absolute value of the result of an operation, produced after
rounding, is greater than the maximum positive normalized number for the destination precision.
Underflow
The exception is caused if the absolute value of the result of an operation, produced before
rounding, is less than the minimum positive normalized number for the destination precision,
and the rounded result is inexact.
Inexact
The exception is caused if the result of an operation is not equivalent to the value that would be
produced if the operation were performed with unbounded precision and exponent range.
Input denormal
The exception is caused if a denormalized input operand is replaced in the computation by a
zero.
The descriptions of the Advanced SIMD instructions that can cause floating-point exceptions include a
subsection listing the exceptions. If there is no such subsection, that instruction cannot cause any
floating-point exception.
Related concepts
9.18 Flush-to-zero mode in Advanced SIMD on page 9-202.
Related references
Chapter 9 Advanced SIMD Programming on page 9-181.
Related information
ARM Architecture Reference Manual.
Further reading.

ARM DUI0801G

9-193

9 Advanced SIMD Programming
9.10 Advanced SIMD data types in A32/T32 instructions

9.10

Advanced SIMD data types in A32/T32 instructions
Most Advanced SIMD instructions use a data type specifier to define the size and type of data that the
instruction operates on.
Data type specifiers in Advanced SIMD instructions consist of a letter indicating the type of data, usually
followed by a number indicating the width. They are separated from the instruction mnemonic by a
point. The following table shows the data types available in Advanced SIMD instructions:
Table 9-2 Advanced SIMD data types
8-bit

16-bit 32-bit

64-bit

Unsigned integer

U16

U32

U64

Signed integer

S16

S32

S64

Integer of unspecified type I8

I16

I32

I64
not available

Floating-point number

not available F16

F32 (or F)

Polynomial over {0,1}

not available not available

P16

The datatype of the second (or only) operand is specified in the instruction.
Note
Most instructions have a restricted range of permitted data types. See the instruction descriptions for
details. However, the data type description is flexible:
• If the description specifies I, you can also use the S or U data types.
• If only the data size is specified, you can specify a type (I, S, U, P or F).
• If no data type is specified, you can specify a data type.

Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
9.11 Polynomial arithmetic over {0,1} on page 9-195.

ARM DUI0801G

9-194

9 Advanced SIMD Programming
9.11 Polynomial arithmetic over {0,1}

9.11

Polynomial arithmetic over {0,1}
The coefficients 0 and 1 are manipulated using the rules of Boolean arithmetic.
The following rules apply:
• 0 + 0 = 1 + 1 = 0.
• 0 + 1 = 1 + 0 = 1.
• 0 * 0 = 0 * 1 = 1 * 0 = 0.
• 1 * 1 = 1.
That is, adding two polynomials over {0,1} is the same as a bitwise exclusive OR, and multiplying two
polynomials over {0,1} is the same as integer multiplication except that partial products are exclusiveORed instead of being added.
Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.

ARM DUI0801G

9-195

9 Advanced SIMD Programming
9.12 Advanced SIMD vectors

9.12

Advanced SIMD vectors
An Advanced SIMD operand can be a vector or a scalar. An Advanced SIMD vector can be a 64-bit
doubleword vector or a 128-bit quadword vector.
In A32/T32 Advanced SIMD instructions, the size of the elements in an Advanced SIMD vector is
specified by a datatype suffix appended to the mnemonic. In A64 Advanced SIMD instructions, the size
and number of the elements in an Advanced SIMD vector are specified by a suffix appended to the
register.
Doubleword vectors can contain:
•
•
•
•

Eight 8-bit elements.
Four 16-bit elements.
Two 32-bit elements.
One 64-bit element.

Quadword vectors can contain:
• Sixteen 8-bit elements.
• Eight 16-bit elements.
• Four 32-bit elements.
• Two 64-bit elements.
Related concepts
9.15 Advanced SIMD scalars on page 9-199.
9.2 Extension register bank mapping for Advanced SIMD in AArch32 state on page 9-183.
9.16 Extended notation extension for Advanced SIMD in A32/T32 code on page 9-200.
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
9.13 Normal, long, wide, and narrow Advanced SIMD instructions on page 9-197.

ARM DUI0801G

9-196

9 Advanced SIMD Programming
9.13 Normal, long, wide, and narrow Advanced SIMD instructions

9.13

Normal, long, wide, and narrow Advanced SIMD instructions
Many A32/T32 and A64 Advanced SIMD data processing instructions are available in Normal, Long,
Wide, Narrow, and saturating variants.
Normal operation
The operands can be any of the vector types. The result vector is the same width, and usually the
same type, as the operand vectors, for example:
VADD.I16 D0, D1, D2

You can specify that the operands and result of a normal A32/T32 Advanced SIMD instruction
must all be quadwords by appending a Q to the instruction mnemonic. If you do this, armasm
produces an error if the operands or result are not quadwords.
Long operation
The operands are doubleword vectors and the result is a quadword vector. The elements of the
result are usually twice the width of the elements of the operands, and the same type.
Long operation is specified using an L appended to the instruction mnemonic, for example:
VADDL.S16 Q0, D2, D3

Wide operation
One operand vector is doubleword and the other is quadword. The result vector is quadword.
The elements of the result and the first operand are twice the width of the elements of the second
operand.
Wide operation is specified using a W appended to the instruction mnemonic, for example:
VADDW.S16 Q0, Q1, D4

Narrow operation
The operands are quadword vectors and the result is a doubleword vector. The elements of the
result are half the width of the elements of the operands.
Narrow operation is specified using an N appended to the instruction mnemonic, for example:
VADDHN.I16 D0, Q1, Q2

Related concepts
9.12 Advanced SIMD vectors on page 9-196.

ARM DUI0801G

9-197

9 Advanced SIMD Programming
9.14 Saturating Advanced SIMD instructions

9.14

Saturating Advanced SIMD instructions
Saturating instructions saturate the result to the value of the upper limit or lower limit if the result
overflows or underflows.
The saturation limits depend on the datatype of the instruction. The following table shows the ranges that
Advanced SIMD saturating instructions saturate to, where x is the result of the operation.
Table 9-3 Advanced SIMD saturation ranges
Data type

Saturation range of x

Signed byte (S8)

–27 <= x < 27

Signed halfword (S16)

–215 <= x < 215

Signed word (S32)

–231 <= x < 231

Signed doubleword (S64)

–263 <= x < 263

Unsigned byte (U8)

0 <= x < 28

Unsigned halfword (U16)

0 <= x < 216

Unsigned word (U32)

0 <= x < 232

Unsigned doubleword (U64) 0 <= x < 264

Saturating Advanced SIMD arithmetic instructions set the QC bit in the floating-point status register
(FPSCR in AArch32 or FPSR in AArch64) to indicate that saturation has occurred.
Saturating instructions are specified using a Q prefix. In A32/T32 Advanced SIMD instructions, this is
inserted between the V and the instruction mnemonic, or between the S or U and the mnemonic in A64
Advanced SIMD instructions.
Related references
13.7 Saturating instructions on page 13-344.

ARM DUI0801G

9-198

9 Advanced SIMD Programming
9.15 Advanced SIMD scalars

9.15

Advanced SIMD scalars
Some Advanced SIMD instructions act on scalars in combination with vectors. Advanced SIMD scalars
can be 8-bit, 16-bit, 32-bit, or 64-bit.
In A32/T32 Advanced SIMD instructions, the instruction syntax refers to a single element in a vector
register using an index, x, into the vector, so that Dm[x] is the xth element in vector Dm. In A64 Advanced
SIMD instructions, you append the index to the element size specifier, so that Vm.D[x] is the xth
doubleword element in vector Vm.
In A64 Advanced SIMD scalar instructions, you refer to registers using a name that indicates the number
of significant bits. The names are Bn, Hn, Sn, or Dn, where n is the register number (0-31). The unused
high bits are ignored on a read and set to zero on a write.
Other than A32/T32 Advanced SIMD multiply instructions, instructions that access scalars can access
any element in the register bank.
A32/T32 Advanced SIMD multiply instructions only allow 16-bit or 32-bit scalars, and can only access
the first 32 scalars in the register bank. That is, in multiply instructions:
• 16-bit scalars are restricted to registers D0-D7, with x in the range 0-3.
• 32-bit scalars are restricted to registers D0-D15, with x either 0 or 1.
Related concepts
9.12 Advanced SIMD vectors on page 9-196.
9.2 Extension register bank mapping for Advanced SIMD in AArch32 state on page 9-183.

ARM DUI0801G

9-199

9 Advanced SIMD Programming
9.16 Extended notation extension for Advanced SIMD in A32/T32 code

9.16

Extended notation extension for Advanced SIMD in A32/T32 code
armasm implements an extension to the architectural Advanced SIMD assembly syntax, called extended

notation. This extension allows you to include datatype information or scalar indexes in register names.
Note
Extended notation is not supported for A64 code.
If you use extended notation, you do not have to include the data type or scalar index information in
every instruction.
Register names can be any of the following:
Untyped
The register name specifies the register, but not what datatype it contains, nor any index to a
particular scalar within the register.
Untyped with scalar index
The register name specifies the register, but not what datatype it contains, It specifies an index to
a particular scalar within the register.
Typed
The register name specifies the register, and what datatype it contains, but not any index to a
particular scalar within the register.
Typed with scalar index
The register name specifies the register, what datatype it contains, and an index to a particular
scalar within the register.
Use the DN and QN directives to define names for typed and scalar registers.
Related concepts
9.12 Advanced SIMD vectors on page 9-196.
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
9.15 Advanced SIMD scalars on page 9-199.
Related references
21.56 QN, DN, and SN on page 21-1704.

ARM DUI0801G

9-200

9 Advanced SIMD Programming
9.17 Advanced SIMD system registers in AArch32 state

9.17

Advanced SIMD system registers in AArch32 state
Advanced SIMD system registers are accessible in all implementations of Advanced SIMD.
For exception levels using AArch32, the following Advanced SIMD system registers are accessible in all
Advanced SIMD implementations:
•
•
•

FPSCR, the floating-point status and control register.
FPEXC, the floating-point exception register.
FPSID, the floating-point system ID register.

A particular Advanced SIMD implementation can have additional registers. For more information, see
the Technical Reference Manual for your processor.
Note
Advanced SIMD technology shares the same set of system registers as floating-point.

Related concepts
6.20 The Read-Modify-Write operation on page 6-128.
Related information
ARM Architecture Reference Manual.
Further reading.

ARM DUI0801G

9-201

9 Advanced SIMD Programming
9.18 Flush-to-zero mode in Advanced SIMD

9.18

Flush-to-zero mode in Advanced SIMD
Flush-to-zero mode replaces denormalized numbers with zero. This does not comply with IEEE 754
arithmetic, but in some circumstances can improve performance considerably.
Flush-to-zero mode in Advanced SIMD always preserves the sign bit.
Advanced SIMD always uses flush-to-zero mode.
Related concepts
9.20 The effects of using flush-to-zero mode in Advanced SIMD on page 9-204.
Related references
9.19 When to use flush-to-zero mode in Advanced SIMD on page 9-203.
9.21 Advanced SIMD operations not affected by flush-to-zero mode on page 9-205.

ARM DUI0801G

9-202

9 Advanced SIMD Programming
9.19 When to use flush-to-zero mode in Advanced SIMD

9.19

When to use flush-to-zero mode in Advanced SIMD
You can change between flush-to-zero mode and normal mode, depending on the requirements of
different parts of your code.
You must select flush-to-zero mode if all the following are true:
• IEEE 754 compliance is not a requirement for your system.
• The algorithms you are using sometimes generate denormalized numbers.
• Your system uses support code to handle denormalized numbers.
• The algorithms you are using do not depend for their accuracy on the preservation of denormalized
numbers.
• The algorithms you are using do not generate frequent exceptions as a result of replacing
denormalized numbers with 0.
You select flush-to-zero mode in one of the following ways:
• In A32 code, by setting the FZ bit in the FPSCR to 1. You do this using the VMRS and VMSR
instructions.
• In A64 code, by setting the FZ bit in the FPCR to 1. You do this using the MRS and MSR instructions.
You can change between flush-to-zero and normal mode at any time, if different parts of your code have
different requirements. Numbers already in registers are not affected by changing mode.
Related concepts
9.18 Flush-to-zero mode in Advanced SIMD on page 9-202.
9.20 The effects of using flush-to-zero mode in Advanced SIMD on page 9-204.

ARM DUI0801G

9-203

9 Advanced SIMD Programming
9.20 The effects of using flush-to-zero mode in Advanced SIMD

9.20

The effects of using flush-to-zero mode in Advanced SIMD
In flush-to-zero mode, denormalized inputs are treated as zero. Results that are too small to be
represented in a normalized number are replaced with zero.
With certain exceptions, flush-to-zero mode has the following effects on floating-point operations:
• A denormalized number is treated as 0 when used as an input to a floating-point operation. The
source register is not altered.
• If the result of a single-precision floating-point operation, before rounding, is in the range -2-126 to
+2-126, it is replaced by 0.
• If the result of a double-precision floating-point operation, before rounding, is in the range -2-1022 to
+2-1022, it is replaced by 0.
In flush-to-zero mode, an Input Denormal exception occurs whenever a denormalized number is used as
an operand. An Underflow exception occurs when a result is flushed-to-zero.
Related concepts
9.18 Flush-to-zero mode in Advanced SIMD on page 9-202.
Related references
9.21 Advanced SIMD operations not affected by flush-to-zero mode on page 9-205.

ARM DUI0801G

9-204

9 Advanced SIMD Programming
9.21 Advanced SIMD operations not affected by flush-to-zero mode

9.21

Advanced SIMD operations not affected by flush-to-zero mode
Some Advanced SIMD instructions can be carried out on denormalized numbers even in flush-to-zero
mode, without flushing the results to zero.
These instructions are as follows:
• Copy, absolute value, and negate (VMOV, VMVN, V{Q}ABS, and V{Q}NEG).
• Duplicate (VDUP).
• Swap (VSWP).
• Load and store (VLDR and VSTR).
• Load multiple and store multiple (VLDM and VSTM).
• Transfer between extension registers and ARM general-purpose registers (VMOV).
Related concepts
9.18 Flush-to-zero mode in Advanced SIMD on page 9-202.
Related references
14.10 VABS on page 14-615.
15.2 VABS (floating-point) on page 15-754.
14.42 VDUP on page 14-647.
14.51 VLDM on page 14-659.
14.52 VLDR on page 14-660.
14.66 VMOV (register) on page 14-674.
14.67 VMOV (between two ARM registers and a 64-bit extension register) on page 14-675.
14.68 VMOV (between an ARM register and an Advanced SIMD scalar) on page 14-676.
14.134 VSWP on page 14-744.

ARM DUI0801G

9-205

Chapter 10
Floating-point Programming

Describes floating-point assembly language programming.
It contains the following sections:
• 10.1 Architecture support for floating-point on page 10-207.
• 10.2 Extension register bank mapping for floating-point in AArch32 state on page 10-208.
• 10.3 Extension register bank mapping in AArch64 state on page 10-210.
• 10.4 Views of the floating-point extension register bank in AArch32 state on page 10-211.
• 10.5 Views of the floating-point extension register bank in AArch64 state on page 10-212.
• 10.6 Differences between A32/T32 and A64 floating-point instruction syntax on page 10-213.
• 10.7 Load values to floating-point registers on page 10-214.
• 10.8 Conditional execution of A32/T32 floating-point instructions on page 10-215.
• 10.9 Floating-point exceptions for floating-point in A32/T32 instructions on page 10-216.
• 10.10 Floating-point data types in A32/T32 instructions on page 10-217.
• 10.11 Extended notation extension for floating-point in A32/T32 code on page 10-218.
• 10.12 Floating-point system registers in AArch32 state on page 10-219.
• 10.13 Flush-to-zero mode in floating-point on page 10-220.
• 10.14 When to use flush-to-zero mode in floating-point on page 10-221.
• 10.15 The effects of using flush-to-zero mode in floating-point on page 10-222.
• 10.16 Floating-point operations not affected by flush-to-zero mode on page 10-223.

ARM DUI0801G

10-206

10 Floating-point Programming
10.1 Architecture support for floating-point

10.1

Architecture support for floating-point
Floating-point is an optional extension to the ARM architecture. There are versions that provide
additional instructions.
The floating-point instruction set supported in A32 is based on VFPv4, but with the addition of some
new instructions, including the following:
• Floating-point round to integral.
• Conversion from floating-point to integer with a directed rounding mode.
• Direct conversion between half-precision and double-precision floating-point.
• Floating-point conditional select.
In AArch32 state, the register bank consists of thirty-two 64-bit registers, and smaller registers are
packed into larger ones, as in ARMv7 and earlier.
In AArch64 state, the register bank includes thirty-two 128-bit registers and has a new register packing
model.
Floating point instructions in A64 are closely based on VFPv4 and A32, but with new instruction
mnemonics and some functional enhancements.
Related information
Floating-point support.
Further reading.

ARM DUI0801G

10-207

10 Floating-point Programming
10.2 Extension register bank mapping for floating-point in AArch32 state

10.2

Extension register bank mapping for floating-point in AArch32 state
The floating-point extension register bank is a collection of registers that can be accessed as either 32-bit
or 64-bit registers. It is distinct from the ARM register bank.
The following figure shows the views of the extension register bank, and the overlap between the
different size registers. For example, the 64-bit register D0 is an alias for two consecutive 32-bit registers
S0 and S1. The 64-bit registers D16 and D17 do not have an alias.
S0
S1
S2
S3
S4
S5
S6
S7
...

S28
S29
S30
S31

...

D14

D15

D16

D17

...

D30

D31

Figure 10-1 Extension register bank for floating-point in AArch32 state

The aliased views enable half-precision, single-precision, and double-precision values to coexist in
different non-overlapped registers at the same time.
You can also use the same overlapped registers to store half-precision, single-precision, and doubleprecision values at different times.
Do not attempt to use overlapped 32-bit and 64-bit registers at the same time because it creates
meaningless results.
The mapping between the registers is as follows:
• S<2n> maps to the least significant half of D
• S<2n+1> maps to the most significant half of D
For example, you can access the least significant half of register D6 by referring to S12, and the most
significant half of D6 by referring to S13.
ARM DUI0801G

10-208

10 Floating-point Programming
10.2 Extension register bank mapping for floating-point in AArch32 state

Related concepts
10.4 Views of the floating-point extension register bank in AArch32 state on page 10-211.

ARM DUI0801G

10-209

10 Floating-point Programming
10.3 Extension register bank mapping in AArch64 state

10.3

Extension register bank mapping in AArch64 state
The extension register bank is a collection of registers that can be accessed as 16-bit, 32-bit, or 64-bit. It
is distinct from the ARM register bank.
The following figure shows the views of the extension register bank, and the overlap between the
different size registers.
H0

...

H31

S31

...

D31

Figure 10-2 Extension register bank for floating-point in AArch64 state

The mapping between the registers is as follows:
• S maps to the least significant half of D
• H maps to the least significant half of S
For example, you can access the least significant half of register D7 by referring to S7.
Related concepts
10.5 Views of the floating-point extension register bank in AArch64 state on page 10-212.

ARM DUI0801G

10-210

10 Floating-point Programming
10.4 Views of the floating-point extension register bank in AArch32 state

10.4

Views of the floating-point extension register bank in AArch32 state
Floating-point can have different views of the extension register bank in AArch32 state.
The floating-point extension register bank can be viewed as:
• Thirty-two 64-bit registers, D0-D31.
• Thirty-two 32-bit registers, S0-S31. Only half of the register bank is accessible in this view.
• A combination of registers from these views.
64-bit floating-point registers are called double-precision registers and can contain double-precision
floating-point values. 32-bit floating-point registers are called single-precision registers and can contain
either a single-precision or two half-precision floating-point values.
Related concepts
10.2 Extension register bank mapping for floating-point in AArch32 state on page 10-208.

ARM DUI0801G

10-211

10 Floating-point Programming
10.5 Views of the floating-point extension register bank in AArch64 state

10.5

Views of the floating-point extension register bank in AArch64 state
Floating-point can have different views of the extension register bank in AArch64 state.
The floating-point extension register bank can be viewed as:
• Thirty-two 64-bit registers D0-D31.
• Thirty-two 32-bit registers S0-S31.
• Thirty-two 16-bit registers H0-H31.
• A combination of registers from these views.
Related concepts
10.3 Extension register bank mapping in AArch64 state on page 10-210.

ARM DUI0801G

10-212

10 Floating-point Programming
10.6 Differences between A32/T32 and A64 floating-point instruction syntax

10.6

Differences between A32/T32 and A64 floating-point instruction syntax
The syntax and mnemonics of A64 floating-point instructions are based on those in A32/T32 but with
some differences.
The following table describes the main differences.
Table 10-1 Differences in syntax and mnemonics between A32/T32 and A64 floating-point instructions

A32/T32

A64

All floating-point instruction mnemonics begin with V, for
example VMAX.

The first letter of the instruction mnemonic indicates the data type of
the instruction. For example, SMAX, UMAX, and FMAX mean signed,
unsigned, and floating-point respectively. No suffix means the type is
irrelevant and P means polynomial.

A mnemonic qualifier specifies the type and width of elements A register qualifier specifies the data width and the number of
in a vector. For example, in the following instruction, U32
elements in the register. For example, in the following instruction .4S
means 32-bit unsigned integers:
means 4 32-bit elements:
VMAX.U32 Q0, Q1, Q2

UMAX V0.4S, V1.4S, V2.4S

You can append a condition code to most floating-point
instruction mnemonics to make them conditional.

A64 has no conditionally executed floating-point instructions.

The floating-point select instruction, VSEL, is unconditionally
executed but uses a condition code as an operand. You append
the condition code to the mnemonic, for example:

There are several floating-point instructions that use a condition code
as an operand. You specify the condition code in the final operand
position, for example:
FCSEL S1,S2,S3,EQ

VSELEQ.F32 S1,S2,S3

The P mnemonic qualifier which indicates pairwise
instructions is a prefix, for example, VPADD.

ARM DUI0801G

The P mnemonic qualifier is a suffix, for example ADDP.

10-213

10 Floating-point Programming
10.7 Load values to floating-point registers

10.7

Load values to floating-point registers
To load a register with a floating-point immediate value, use VMOV in A32 or FMOV in A64. Both
instructions exist in scalar and vector forms.
You can load any 64-bit integer, single-precision, or double-precision floating-point value from a literal
pool using the VLDR pseudo-instruction.
Related references
15.17 VLDR pseudo-instruction (floating-point) on page 15-769.
15.21 VMOV (floating-point) on page 15-773.
18.31 FMOV (scalar, immediate) on page 18-1173.

ARM DUI0801G

10-214

10 Floating-point Programming
10.8 Conditional execution of A32/T32 floating-point instructions

10.8

Conditional execution of A32/T32 floating-point instructions
You can execute floating-point instructions conditionally, in the same way as most A32 and T32
instructions.
You cannot use any of the following floating-point instructions in an IT block:
• VRINT{A, N, P, M}.
• VSEL.
• VCVT{A, N, P, M}.
• VMAXNM.
• VMINNM.
In addition, specifying any other floating-point instruction in an IT block is deprecated.
Most A32 floating-point instructions can be conditionally executed, by appending a condition code suffix
to the instruction.
Related concepts
7.2 Conditional execution in A32 code on page 7-141.
7.3 Conditional execution in T32 code on page 7-142.
Related references
7.13 Comparison of condition code meanings in integer and floating-point code on page 7-152.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

10-215

10 Floating-point Programming
10.9 Floating-point exceptions for floating-point in A32/T32 instructions

10.9

Floating-point exceptions for floating-point in A32/T32 instructions
The floating-point extension records floating-point exceptions in the FPSCR cumulative flags.
It records the following exceptions:
Invalid operation
The exception is caused if the result of an operation has no mathematical value or cannot be
represented.
Division by zero
The exception is caused if a divide operation has a zero divisor and a dividend that is not zero,
an infinity or a NaN.
Overflow
The exception is caused if the absolute value of the result of an operation, produced after
rounding, is greater than the maximum positive normalized number for the destination precision.
Underflow
The exception is caused if the absolute value of the result of an operation, produced before
rounding, is less than the minimum positive normalized number for the destination precision,
and the rounded result is inexact.
Inexact
The exception is caused if the result of an operation is not equivalent to the value that would be
produced if the operation were performed with unbounded precision and exponent range.
Input denormal
The exception is caused if a denormalized input operand is replaced in the computation by a
zero.
The descriptions of the floating-point instructions that can cause floating-point exceptions include a
subsection listing the exceptions. If there is no such subsection, that instruction cannot cause any
floating-point exception.
Related concepts
10.13 Flush-to-zero mode in floating-point on page 10-220.
Related references
Chapter 15 Floating-point Instructions (32-bit) on page 15-750.
Related information
ARM Architecture Reference Manual.
Further reading.

ARM DUI0801G

10-216

10 Floating-point Programming
10.10 Floating-point data types in A32/T32 instructions

10.10

Floating-point data types in A32/T32 instructions
Most floating-point instructions use a data type specifier to define the size and type of data that the
instruction operates on.
Data type specifiers in floating-point instructions consist of a letter indicating the type of data, usually
followed by a number indicating the width. They are separated from the instruction mnemonic by a
point.
The following data types are available in floating-point instructions:
16-bit
F16

32-bit
F32 (or F)

64-bit
F64 (or D)

The datatype of the second (or only) operand is specified in the instruction.
Note
•

Most instructions have a restricted range of permitted data types. See the instruction descriptions for
details. However, the data type description is flexible:
— If the description specifies I, you can also use the S or U data types.
— If only the data size is specified, you can specify a type (S, U, P or F).
— If no data type is specified, you can specify a data type.

Related concepts
9.11 Polynomial arithmetic over {0,1} on page 9-195.

ARM DUI0801G

10-217

10 Floating-point Programming
10.11 Extended notation extension for floating-point in A32/T32 code

10.11

Extended notation extension for floating-point in A32/T32 code
armasm implements an extension to the architectural floating-point assembly syntax, called extended

notation. This extension allows you to include datatype information or scalar indexes in register names.
Note
Extended notation is not supported for A64 code.
If you use extended notation, you do not have to include the data type or scalar index information in
every instruction.
Register names can be any of the following:
Untyped
The register name specifies the register, but not what datatype it contains, nor any index to a
particular scalar within the register.
Untyped with scalar index
The register name specifies the register, but not what datatype it contains, It specifies an index to
a particular scalar within the register.
Typed
The register name specifies the register, and what datatype it contains, but not any index to a
particular scalar within the register.
Typed with scalar index
The register name specifies the register, what datatype it contains, and an index to a particular
scalar within the register.
Use the SN and DN directives to define names for typed and scalar registers.
Related concepts
10.10 Floating-point data types in A32/T32 instructions on page 10-217.
Related references
21.56 QN, DN, and SN on page 21-1704.

ARM DUI0801G

10-218

10 Floating-point Programming
10.12 Floating-point system registers in AArch32 state

10.12

Floating-point system registers in AArch32 state
Floating-point system registers are accessible in all implementations of floating-point.
For exception levels using AArch32, the following floating-point system registers are accessible in all
floating-point implementations:
• FPSCR, the floating-point status and control register.
• FPEXC, the floating-point exception register.
• FPSID, the floating-point system ID register.
A particular floating-point implementation can have additional registers. For more information, see the
Technical Reference Manual for your processor.
Related concepts
6.20 The Read-Modify-Write operation on page 6-128.
Related information
ARM Architecture Reference Manual.
Further reading.

ARM DUI0801G

10-219

10 Floating-point Programming
10.13 Flush-to-zero mode in floating-point

10.13

Flush-to-zero mode in floating-point
Flush-to-zero mode replaces denormalized numbers with zero. This does not comply with IEEE 754
arithmetic, but in some circumstances can improve performance considerably.
Some implementations of floating-point use support code to handle denormalized numbers. The
performance of such systems, in calculations involving denormalized numbers, is much less than it is in
normal calculations.
Flush-to-zero mode in floating-point always preserves the sign bit.
Related concepts
10.15 The effects of using flush-to-zero mode in floating-point on page 10-222.
Related references
10.14 When to use flush-to-zero mode in floating-point on page 10-221.
10.16 Floating-point operations not affected by flush-to-zero mode on page 10-223.

ARM DUI0801G

10-220

10 Floating-point Programming
10.14 When to use flush-to-zero mode in floating-point

10.14

When to use flush-to-zero mode in floating-point
You can change between flush-to-zero mode and normal mode, depending on the requirements of
different parts of your code.
You must select flush-to-zero mode if all the following are true:
• IEEE 754 compliance is not a requirement for your system.
• The algorithms you are using sometimes generate denormalized numbers.
• Your system uses support code to handle denormalized numbers.
• The algorithms you are using do not depend for their accuracy on the preservation of denormalized
numbers.
• The algorithms you are using do not generate frequent exceptions as a result of replacing
denormalized numbers with 0.
You select flush-to-zero mode in one of the following ways:
• In A32 code, by setting the FZ bit in the FPSCR to 1. You do this using the VMRS and VMSR
instructions.
• In A64 code, by setting the FZ bit in the FPCR to 1. You do this using the MRS and MSR instructions.
You can change between flush-to-zero and normal mode at any time, if different parts of your code have
different requirements. Numbers already in registers are not affected by changing mode.
Related concepts
10.13 Flush-to-zero mode in floating-point on page 10-220.
10.15 The effects of using flush-to-zero mode in floating-point on page 10-222.

ARM DUI0801G

10-221

10 Floating-point Programming
10.15 The effects of using flush-to-zero mode in floating-point

10.15

The effects of using flush-to-zero mode in floating-point
In flush-to-zero mode, denormalized inputs are treated as zero. Results that are too small to be
represented in a normalized number are replaced with zero.
With certain exceptions, flush-to-zero mode has the following effects on floating-point operations:
• A denormalized number is treated as 0 when used as an input to a floating-point operation. The
source register is not altered.
• If the result of a single-precision floating-point operation, before rounding, is in the range -2-126 to
+2-126, it is replaced by 0.
• If the result of a double-precision floating-point operation, before rounding, is in the range -2-1022 to
+2-1022, it is replaced by 0.
In flush-to-zero mode, an Input Denormal exception occurs whenever a denormalized number is used as
an operand. An Underflow exception occurs when a result is flushed-to-zero.
Related concepts
10.13 Flush-to-zero mode in floating-point on page 10-220.
Related references
10.16 Floating-point operations not affected by flush-to-zero mode on page 10-223.

ARM DUI0801G

10-222

10 Floating-point Programming
10.16 Floating-point operations not affected by flush-to-zero mode

10.16

Floating-point operations not affected by flush-to-zero mode
Some floating-point instructions can be carried out on denormalized numbers even in flush-to-zero
mode, without flushing the results to zero.
These instructions are as follows:
• Absolute value and negate (VABS and VNEG).
• Load and store (VLDR and VSTR).
• Load multiple and store multiple (VLDM and VSTM).
• Transfer between extension registers and ARM general-purpose registers (VMOV).
Related concepts
10.13 Flush-to-zero mode in floating-point on page 10-220.
Related references
15.2 VABS (floating-point) on page 15-754.
15.14 VLDM (floating-point) on page 15-766.
15.15 VLDR (floating-point) on page 15-767.
15.37 VSTM (floating-point) on page 15-789.
15.38 VSTR (floating-point) on page 15-790.
14.51 VLDM on page 14-659.
14.52 VLDR on page 14-660.
14.126 VSTM on page 14-734.
14.129 VSTR on page 14-739.
15.22 VMOV (between one ARM register and single precision floating-point register) on page 15-774.
14.67 VMOV (between two ARM registers and a 64-bit extension register) on page 14-675.
15.28 VNEG (floating-point) on page 15-780.
14.80 VNEG on page 14-688.

ARM DUI0801G

10-223

Chapter 11
armasm Command-line Options

Describes the armasm command-line syntax and command-line options.
It contains the following sections:
• 11.1 --16 on page 11-226.
• 11.2 --32 on page 11-227.
• 11.3 --apcs=qualifier…qualifier on page 11-228.
• 11.4 --arm on page 11-230.
• 11.5 --arm_only on page 11-231.
• 11.6 --bi on page 11-232.
• 11.7 --bigend on page 11-233.
• 11.8 --brief_diagnostics, --no_brief_diagnostics on page 11-234.
• 11.9 --checkreglist on page 11-235.
• 11.10 --cpreproc on page 11-236.
• 11.11 --cpreproc_opts=option[,option,…] on page 11-237.
• 11.12 --cpu=list on page 11-238.
• 11.13 --cpu=name on page 11-239.
• 11.14 --debug on page 11-242.
• 11.15 --depend=dependfile on page 11-243.
• 11.16 --depend_format=string on page 11-244.
• 11.17 --diag_error=tag[,tag,…] on page 11-245.
• 11.18 --diag_remark=tag[,tag,…] on page 11-246.
• 11.19 --diag_style={arm|ide|gnu} on page 11-247.
• 11.20 --diag_suppress=tag[,tag,…] on page 11-248.
• 11.21 --diag_warning=tag[,tag,…] on page 11-249.
• 11.22 --dllexport_all on page 11-250.
• 11.23 --dwarf2 on page 11-251.

ARM DUI0801G

11-224

11 armasm Command-line Options

•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•

ARM DUI0801G

11.24 --dwarf3 on page 11-252.
11.25 --errors=errorfile on page 11-253.
11.26 --exceptions, --no_exceptions on page 11-254.
11.27 --exceptions_unwind, --no_exceptions_unwind on page 11-255.
11.28 --execstack, --no_execstack on page 11-256.
11.29 --execute_only on page 11-257.
11.30 --fpmode=model on page 11-258.
11.31 --fpu=list on page 11-259.
11.32 --fpu=name on page 11-260.
11.33 -g on page 11-261.
11.34 --help on page 11-262.
11.35 -idir[,dir, …] on page 11-263.
11.36 --keep on page 11-264.
11.37 --length=n on page 11-265.
11.38 --li on page 11-266.
11.39 --library_type=lib on page 11-267.
11.40 --list=file on page 11-268.
11.41 --list= on page 11-269.
11.42 --littleend on page 11-270.
11.43 -m on page 11-271.
11.44 --maxcache=n on page 11-272.
11.45 --md on page 11-273.
11.46 --no_code_gen on page 11-274.
11.47 --no_esc on page 11-275.
11.48 --no_hide_all on page 11-276.
11.49 --no_regs on page 11-277.
11.50 --no_terse on page 11-278.
11.51 --no_warn on page 11-279.
11.52 -o filename on page 11-280.
11.53 --pd on page 11-281.
11.54 --predefine "directive" on page 11-282.
11.55 --reduce_paths, --no_reduce_paths on page 11-283.
11.56 --regnames on page 11-284.
11.57 --report-if-not-wysiwyg on page 11-285.
11.58 --show_cmdline on page 11-286.
11.59 --thumb on page 11-287.
11.60 --unaligned_access, --no_unaligned_access on page 11-288.
11.61 --unsafe on page 11-289.
11.62 --untyped_local_labels on page 11-290.
11.63 --version_number on page 11-291.
11.64 --via=filename on page 11-292.
11.65 --vsn on page 11-293.
11.66 --width=n on page 11-294.
11.67 --xref on page 11-295.

11-225

11 armasm Command-line Options
11.1 --16

11.1

--16
Instructs armasm to interpret instructions as T32 instructions using the pre-UAL T32 syntax.
This option is equivalent to a CODE16 directive at the head of the source file. Use the --thumb option to
specify T32 instructions using the UAL syntax.
Note
Not supported for AArch64 state.

Related references
11.59 --thumb on page 11-287.
21.11 CODE16 directive on page 21-1653.

ARM DUI0801G

11-226

11 armasm Command-line Options
11.2 --32

11.2

--32
A synonym for the --arm command-line option.
Note
Not supported for AArch64 state.

Related references
11.4 --arm on page 11-230.

ARM DUI0801G

11-227

11 armasm Command-line Options
11.3 --apcs=qualifier…qualifier

11.3

--apcs=qualifier…qualifier
Controls interworking and position independence when generating code.
Syntax
--apcs=qualifier...qualifier
Where qualifier...qualifier denotes a list of qualifiers. There must be:

•
•

At least one qualifier present.
No spaces or commas separating individual qualifiers in the list.

Each instance of qualifier must be one of:
none

Specifies that the input file does not use AAPCS. AAPCS registers are not set up. Other
qualifiers are not permitted if you use none.
/interwork, /nointerwork
For ARMv7-A, /interwork specifies that the code in the input file can interwork between

ARM and Thumb safely.
For ARMv8-A, /interwork specifies that the code in the input file can interwork between A32
and T32 safely.
The default is /nointerwork.
/nointerwork is not supported for AArch64 state.
/inter, /nointer

Are synonyms for /interwork and /nointerwork.
/inter is not supported for AArch64 state.
/ropi, /noropi
/ropi specifies that the code in the input file is Read-Only Position-Independent (ROPI). The
default is /noropi.
/pic, /nopic
Are synonyms for /ropi and /noropi.
/rwpi, /norwpi
/rwpi specifies that the code in the input file is Read-Write Position-Independent (RWPI). The
default is /norwpi.
/pid, /nopid
Are synonyms for /rwpi and /norwpi.
/fpic, /nofpic
/fpic specifies that the code in the input file is read-only independent and references to
addresses are suitable for use in a Linux shared object. The default is /nofpic.
/hardfp, /softfp

Requests hardware or software floating-point linkage. This enables the procedure call standard
to be specified separately from the version of the floating-point hardware available through the
--fpu option. It is still possible to specify the procedure call standard by using the --fpu option,
but ARM recommends you use --apcs. If floating-point support is not permitted (for example,
because --fpu=none is specified, or because of other means), then /hardfp and /softfp are
ignored. If floating-point support is permitted and the softfp calling convention is used
(--fpu=softvfp or --fpu=softvfp+fp-armv8), then /hardfp gives an error.
/softfp is not supported for AArch64 state.

Usage
This option specifies whether you are using the Procedure Call Standard for the ARM Architecture
(AAPCS). It can also specify some attributes of code sections.

ARM DUI0801G

11-228

11 armasm Command-line Options
11.3 --apcs=qualifier…qualifier

The AAPCS forms part of the Base Standard Application Binary Interface for the ARM Architecture
(BSABI) specification. By writing code that adheres to the AAPCS, you can ensure that separately
compiled and assembled modules can work together.
Note
AAPCS qualifiers do not affect the code produced by armasm. They are an assertion by the programmer
that the code in the input file complies with a particular variant of AAPCS. They cause attributes to be
set in the object file produced by armasm. The linker uses these attributes to check compatibility of files,
and to select appropriate library variants.

Example
armasm --cpu=8-A.32 --apcs=/inter/hardfp inputfile.s

Related information
Procedure Call Standard for the ARM Architecture.
Application Binary Interface (ABI) for the ARM Architecture.

ARM DUI0801G

11-229

11 armasm Command-line Options
11.4 --arm

11.4

--arm
Instructs armasm to interpret instructions as A32 instructions. It does not, however, guarantee A32-only
code in the object file. This is the default. Using this option is equivalent to specifying the ARM or CODE32
directive at the start of the source file.
Note
Not supported for AArch64 state.

Related references
11.2 --32 on page 11-227.
11.5 --arm_only on page 11-231.
21.7 ARM or CODE32 directive on page 21-1649.

ARM DUI0801G

11-230

11 armasm Command-line Options
11.5 --arm_only

11.5

--arm_only
Instructs armasm to only generate A32 code. This is similar to --arm but also has the property that
armasm does not permit the generation of any T32 code.
Note
Not supported for AArch64 state.

Related references
11.4 --arm on page 11-230.

ARM DUI0801G

11-231

11 armasm Command-line Options
11.6 --bi

11.6

--bi
A synonym for the --bigend command-line option.
Related references
11.7 --bigend on page 11-233.
11.42 --littleend on page 11-270.

ARM DUI0801G

11-232

11 armasm Command-line Options
11.7 --bigend

11.7

--bigend
Generates code suitable for an ARM processor using big-endian memory access.
The default is --littleend.
Related references
11.42 --littleend on page 11-270.
11.6 --bi on page 11-232.

ARM DUI0801G

11-233

11 armasm Command-line Options
11.8 --brief_diagnostics, --no_brief_diagnostics

11.8

--brief_diagnostics, --no_brief_diagnostics
Enables and disables the output of brief diagnostic messages.
This option instructs the assembler whether to use a shorter form of the diagnostic output. In this form,
the original source line is not displayed and the error message text is not wrapped when it is too long to
fit on a single line. The default is --no_brief_diagnostics.
Related references
11.17 --diag_error=tag[,tag,…] on page 11-245.
11.21 --diag_warning=tag[,tag,…] on page 11-249.

ARM DUI0801G

11-234

11 armasm Command-line Options
11.9 --checkreglist

11.9

--checkreglist
Instructs the armasm to check RLIST, LDM, and STM register lists to ensure that all registers are provided in
increasing register number order.
When this option is used, armasm gives a warning if the registers are not listed in order.
Note
In AArch32 state, this option is deprecated. Use --diag_warning 1206 instead. In AArch64 state, this
option is not supported..

Related references
11.21 --diag_warning=tag[,tag,…] on page 11-249.

ARM DUI0801G

11-235

11 armasm Command-line Options
11.10 --cpreproc

11.10

--cpreproc
Instructs armasm to call armclang to preprocess the input file before assembling it.
Restrictions
You must use --cpreproc_opts with this option to correctly configure the armclang compiler for preprocessing.
armasm only passes the following command-line options to armclang by default:

•
•
•
•

Related concepts
8.14 Using the C preprocessor on page 8-176.
Related references
11.11 --cpreproc_opts=option[,option,…] on page 11-237.
Related information
-x armclang option.
Command-line options for preprocessing assembly source code.

ARM DUI0801G

11-236

11 armasm Command-line Options
11.11 --cpreproc_opts=option[,option,…]

11.11

--cpreproc_opts=option[,option,…]
Enables armasm to pass options to armclang when using the C preprocessor.
Syntax
--cpreproc_opts=option[,option,…]

Where option[,option,…] is a comma-separated list of C preprocessing options.
At least one option must be specified.
Restrictions
As a minimum, you must specify the armclang options --target and either -mcpu or -march in -cpreproc_opts.

To assemble code containing C directives that require the C preprocessor, the input assembly source
filename must have an upper-case extension .S.
You cannot pass the armclang option -x assembler-with-cpp, because it gets added to armclang after
the source file name.
Note
Ensure that you specify compatible architectures in the armclang options --target, -mcpu or -march,
and the armasm --cpu option.

Example
The options to the preprocessor in this example are --cpreproc_opts=--target=arm-arm-noneeabi,-mcpu=cortex-a9,-D,DEF1,-D,DEF2.
armasm --cpu=cortex-a9 --cpreproc --cpreproc_opts=--target=arm-arm-none-eabi,-mcpu=cortexa9,-D,DEF1,-D,DEF2 -I /path/to/includes1 -I /path/to/includes2 input.S

Related concepts
8.14 Using the C preprocessor on page 8-176.
Related references
11.10 --cpreproc on page 11-236.
Related information
Command-line options for preprocessing assembly source code.
Specifying a target architecture, processor, and instruction set.
-march armclang option.
-mcpu armclang option.
--target armclang option.
-x armclang option.

ARM DUI0801G

11-237

11 armasm Command-line Options
11.12 --cpu=list

11.12

--cpu=list
Lists the architecture and processor names that are supported by the --cpu=name option.
Syntax
--cpu=list

Related references
11.13 --cpu=name on page 11-239.

ARM DUI0801G

11-238

11 armasm Command-line Options
11.13 --cpu=name

11.13

--cpu=name
Enables code generation for the selected ARM processor or architecture.
Syntax
--cpu=name

Where name is the name of a processor or architecture:
Processor and architecture names are not case-sensitive.
Wildcard characters are not accepted.
The following table shows the supported architectures. For a complete list of the supported architecture
and processor names, specify the --cpu=list option.
Table 11-1 Supported ARM architectures
Architecture name Description
6-M

ARMv6 microcontroller profile.

6S-M

ARMv6 microcontroller profile with OS extensions.

7-A

ARMv7 application profile.

7-A.security

ARMv7-A architecture profile with Security Extensions and includes the SMC instruction (formerly SMI).

7-R

ARMv7 real-time profile.

7-M

ARMv7 microcontroller profile.

7E-M

ARMv7-M architecture profile with DSP extension.

8-A.32

ARMv8-A architecture profile, AArch32 state.

8-A.32.crypto

ARMv8-A architecture profile, AArch32 state with cryptographic instructions.

8-A.64

ARMv8-A architecture profile, AArch64 state.

8-A.64.crypto

ARMv8-A architecture profile, AArch64 state with cryptographic instructions.

8.1-A.32

ARMv8.1, for ARMv8-A architecture profile, AArch32 state.

8.1-A.32.crypto

ARMv8.1, for ARMv8-A architecture profile, AArch32 state with cryptographic instructions.

8.1-A.64

ARMv8.1, for ARMv8-A architecture profile, AArch64 state.

8.1-A.64.crypto

ARMv8.1, for ARMv8-A architecture profile, AArch64 state with cryptographic instructions.

8.2-A.32

ARMv8.2, for ARMv8-A architecture profile, AArch32 state.

8.2-A.32.crypto

ARMv8.2, for ARMv8-A architecture profile, AArch32 state with cryptographic instructions.

8.2-A.64

ARMv8.2, for ARMv8-A architecture profile, AArch64 state.

8.2-A.64.crypto

ARMv8.2, for ARMv8-A architecture profile, AArch64 state with cryptographic instructions.

8.3-A.32

ARMv8.3, for ARMv8-A architecture profile, AArch32 state.

8.3-A.32.crypto

ARMv8.3, for ARMv8-A architecture profile, AArch32 state with cryptographic instructions.

8.3-A.64

ARMv8.3, for ARMv8-A architecture profile, AArch64 state.

8.3-A.64.crypto

ARMv8.3, for ARMv8-A architecture profile, AArch64 state with cryptographic instructions.

8-R

ARMv8-R architecture profile.

8-M.Base

ARMv8-M baseline architecture profile. Derived from the ARMv6-M architecture.

8-M.Main

ARMv8-M mainline architecture profile. Derived from the ARMv7-M architecture.

8-M.Main.dsp

ARMv8-M mainline architecture profile with DSP extension.

ARM DUI0801G

11-239

11 armasm Command-line Options
11.13 --cpu=name

•

Note
The full list of supported architectures and processors depends on your license.

Default
There is no default option for --cpu.
Usage
The following general points apply to processor and architecture options:
Processors
• Selecting the processor selects the appropriate architecture, Floating-Point Unit (FPU), and
memory organization.
• If you specify a processor for the --cpu option, the generated code is optimized for that
processor. This enables the assembler to use specific coprocessors or instruction scheduling
for optimum performance.
Architectures
• If you specify an architecture name for the --cpu option, the generated code can run on any
processor supporting that architecture. For example, --cpu=7-A produces code that can be
used by the Cortex®-A9 processor.
FPU
•

Some specifications of --cpu imply an --fpu selection.
Note
Any explicit FPU, set with --fpu on the command line, overrides an implicit FPU.

•
A32/T32
•

If no --fpu option is specified and the --cpu option does not imply an --fpu selection, then
--fpu=softvfp is used.
Specifying a processor or architecture that supports T32 instructions, such as
--cpu=cortex-a9, does not make the assembler generate T32 code. It only enables features
of the processor to be used, such as long multiply. Use the --thumb option to generate T32
code, unless the processor only supports T32 instructions.
Note
Specifying the target processor or architecture might make the generated object code
incompatible with other ARM processors. For example, A32 code generated for architecture
ARMv8 might not run on a Cortex-A9 processor, if the generated object code includes
instructions specific to ARMv8. Therefore, you must choose the lowest common
denominator processor suited to your purpose.

•

If the architecture only supports T32, you do not have to specify --thumb on the command
line. For example, if building for Cortex-M4 or ARMv7-M with --cpu=7-M, you do not have
to specify --thumb on the command line, because ARMv7-M only supports T32. Similarly,
ARMv6-M and other T32-only architectures.

Restrictions
You cannot specify both a processor and an architecture on the same command-line.
Example
armasm --cpu=Cortex-A17 inputfile.s

ARM DUI0801G

11-240

11 armasm Command-line Options
11.13 --cpu=name

Related references
11.3 --apcs=qualifier…qualifier on page 11-228.
11.12 --cpu=list on page 11-238.
11.32 --fpu=name on page 11-260.
11.59 --thumb on page 11-287.
11.61 --unsafe on page 11-289.
Related information
ARM Architecture Reference Manual.

ARM DUI0801G

11-241

11 armasm Command-line Options
11.14 --debug

11.14

--debug
Instructs the assembler to generate DWARF debug tables.
--debug is a synonym for -g. The default is DWARF 3.

Note
Local symbols are not preserved with --debug. You must specify --keep if you want to preserve the
local symbols to aid debugging.

Related references
11.23 --dwarf2 on page 11-251.
11.24 --dwarf3 on page 11-252.
11.36 --keep on page 11-264.
11.33 -g on page 11-261.

ARM DUI0801G

11-242

11 armasm Command-line Options
11.15 --depend=dependfile

11.15

--depend=dependfile
Writes makefile dependency lines to a file.
Source file dependency lists are suitable for use with make utilities.
Related references
11.45 --md on page 11-273.
11.16 --depend_format=string on page 11-244.

ARM DUI0801G

11-243

11 armasm Command-line Options
11.16 --depend_format=string

11.16

--depend_format=string
Specifies the format of output dependency files, for compatibility with some UNIX make programs.
Syntax
--depend_format=string

Where string is one of:
unix

generates dependency file entries using UNIX-style path separators.
unix_escaped

is the same as unix, but escapes spaces with \.
unix_quoted

is the same as unix, but surrounds path names with double quotes.
Related references
11.15 --depend=dependfile on page 11-243.

ARM DUI0801G

11-244

11 armasm Command-line Options
11.17 --diag_error=tag[,tag,…]

11.17

--diag_error=tag[,tag,…]
Sets diagnostic messages that have a specific tag to Error severity.
Syntax
--diag_error=tag[,tag,…]
Where tag can be:

•
•

A diagnostic message number to set to error severity. This is the four-digit number, nnnn, with the
tool letter prefix, but without the letter suffix indicating the severity.
warning, to treat all warnings as errors.

Usage
Diagnostic messages output by the assembler can be identified by a tag in the form of {prefix}number,
where the prefix is A.
You can specify more than one tag with this option by separating each tag using a comma. You can
specify the optional assembler prefix A before the tag number. If any prefix other than A is included, the
message number is ignored.
The following table shows the meaning of the term severity used in the option descriptions:
Table 11-2 Severity of diagnostic messages
Severity Description
Error

Errors indicate violations in the syntactic or semantic rules of assembly language. Assembly continues, but object code is
not generated.

Warning

Warnings indicate unusual conditions in your code that might indicate a problem. Assembly continues, and object code is
generated unless any problems with an Error severity are detected.

Remark

Remarks indicate common, but not recommended, use of assembly language. These diagnostics are not issued by default.
Assembly continues, and object code is generated unless any problems with an Error severity are detected.

Related references
11.8 --brief_diagnostics, --no_brief_diagnostics on page 11-234.
11.18 --diag_remark=tag[,tag,…] on page 11-246.
11.20 --diag_suppress=tag[,tag,…] on page 11-248.
11.21 --diag_warning=tag[,tag,…] on page 11-249.

ARM DUI0801G

11-245

11 armasm Command-line Options
11.18 --diag_remark=tag[,tag,…]

11.18

--diag_remark=tag[,tag,…]
Sets diagnostic messages that have a specific tag to Remark severity.
Syntax
--diag_remark=tag[,tag,…]

Where tag is a comma-separated list of diagnostic message numbers. This is the four-digit number,
nnnn, with the tool letter prefix, but without the letter suffix indicating the severity.
Usage
Diagnostic messages output by the assembler can be identified by a tag in the form of {prefix}number,
where the prefix is A.
You can specify more than one tag with this option by separating each tag using a comma. You can
specify the optional assembler prefix A before the tag number. If any prefix other than A is included, the
message number is ignored.
Related references
11.8 --brief_diagnostics, --no_brief_diagnostics on page 11-234.
11.17 --diag_error=tag[,tag,…] on page 11-245.
11.20 --diag_suppress=tag[,tag,…] on page 11-248.
11.21 --diag_warning=tag[,tag,…] on page 11-249.

ARM DUI0801G

11-246

11 armasm Command-line Options
11.19 --diag_style={arm|ide|gnu}

11.19

--diag_style={arm|ide|gnu}
Specifies the display style for diagnostic messages.
Syntax
--diag_style=string

Where string is one of:
arm

Display messages using the ARM compiler style.
ide

Include the line number and character count for any line that is in error. These values are
displayed in parentheses.
gnu

Display messages in the format used by gcc.
Usage
--diag_style=gnu matches the format reported by the GNU Compiler, gcc.
--diag_style=ide matches the format reported by Microsoft Visual Studio.

Choosing the option --diag_style=ide implicitly selects the option --brief_diagnostics. Explicitly
selecting --no_brief_diagnostics on the command line overrides the selection of
--brief_diagnostics implied by --diag_style=ide.
Selecting either the option --diag_style=arm or the option --diag_style=gnu does not imply any
selection of --brief_diagnostics.
Default
The default is --diag_style=arm.
Related references
11.8 --brief_diagnostics, --no_brief_diagnostics on page 11-234.

ARM DUI0801G

11-247

11 armasm Command-line Options
11.20 --diag_suppress=tag[,tag,…]

11.20

--diag_suppress=tag[,tag,…]
Suppresses diagnostic messages that have a specific tag.
Syntax
--diag_suppress=tag[,tag,…]
Where tag can be:

•
•
•

A diagnostic message number to be suppressed. This is the four-digit number, nnnn, with the tool
letter prefix, but without the letter suffix indicating the severity.
error, to suppress all errors that can be downgraded.
warning, to suppress all warnings.

Diagnostic messages output by armasm can be identified by a tag in the form of {prefix}number, where
the prefix is A.
You can specify more than one tag with this option by separating each tag using a comma.
Example
For example, to suppress the warning messages that have numbers 1293 and 187, use the following
command:
armasm --cpu=8-A.64 --diag_suppress=1293,187

You can specify the optional assembler prefix A before the tag number. For example:
armasm --cpu=8-A.64 --diag_suppress=A1293,A187

If any prefix other than A is included, the message number is ignored. Diagnostic message tags can be
cut and pasted directly into a command line.
Related references
11.8 --brief_diagnostics, --no_brief_diagnostics on page 11-234.
11.17 --diag_error=tag[,tag,…] on page 11-245.
11.18 --diag_remark=tag[,tag,…] on page 11-246.
11.20 --diag_suppress=tag[,tag,…] on page 11-248.
11.21 --diag_warning=tag[,tag,…] on page 11-249.

ARM DUI0801G

11-248

11 armasm Command-line Options
11.21 --diag_warning=tag[,tag,…]

11.21

--diag_warning=tag[,tag,…]
Sets diagnostic messages that have a specific tag to Warning severity.
Syntax
--diag_warning=tag[,tag,…]
Where tag can be:

•
•

A diagnostic message number to set to warning severity. This is the four-digit number, nnnn, with the
tool letter prefix, but without the letter suffix indicating the severity.
error, to set all errors that can be downgraded to warnings.

Diagnostic messages output by the assembler can be identified by a tag in the form of {prefix}number,
where the prefix is A.
You can specify more than one tag with this option by separating each tag using a comma.
You can specify the optional assembler prefix A before the tag number. If any prefix other than A is
included, the message number is ignored.
Related references
11.8 --brief_diagnostics, --no_brief_diagnostics on page 11-234.
11.17 --diag_error=tag[,tag,…] on page 11-245.
11.18 --diag_remark=tag[,tag,…] on page 11-246.
11.20 --diag_suppress=tag[,tag,…] on page 11-248.

ARM DUI0801G

11-249

11 armasm Command-line Options
11.22 --dllexport_all

11.22

--dllexport_all
Controls symbol visibility when building DLLs.
This option gives all exported global symbols STV_PROTECTED visibility in ELF rather than STV_HIDDEN,
unless overridden by source directives.
Related references
21.27 EXPORT or GLOBAL on page 21-1669.

ARM DUI0801G

11-250

11 armasm Command-line Options
11.23 --dwarf2

11.23

--dwarf2
Uses DWARF 2 debug table format.
Note
Not supported for AArch64 state.
This option can be used with --debug, to instruct armasm to generate DWARF 2 debug tables.
Related references
11.14 --debug on page 11-242.
11.24 --dwarf3 on page 11-252.

ARM DUI0801G

11-251

11 armasm Command-line Options
11.24 --dwarf3

11.24

--dwarf3
Uses DWARF 3 debug table format.
This option can be used with --debug, to instruct the assembler to generate DWARF 3 debug tables. This
is the default if --debug is specified.
Related references
11.14 --debug on page 11-242.
11.23 --dwarf2 on page 11-251.

ARM DUI0801G

11-252

11 armasm Command-line Options
11.25 --errors=errorfile

11.25

--errors=errorfile
Redirects the output of diagnostic messages from stderr to the specified errors file.

ARM DUI0801G

11-253

11 armasm Command-line Options
11.26 --exceptions, --no_exceptions

11.26

--exceptions, --no_exceptions
Enables or disables exception handling.
Note
Not supported for AArch64 state.
These options instruct armasm to switch on or off exception table generation for all functions defined by
FUNCTION (or PROC) and ENDFUNC (or ENDP) directives.
--no_exceptions causes no tables to be generated. It is the default.

Related references
11.27 --exceptions_unwind, --no_exceptions_unwind on page 11-255.
21.39 FRAME UNWIND ON on page 21-1682.
21.40 FRAME UNWIND OFF on page 21-1683.
21.41 FUNCTION or PROC on page 21-1684.
21.24 ENDFUNC or ENDP on page 21-1666.

ARM DUI0801G

11-254

11 armasm Command-line Options
11.27 --exceptions_unwind, --no_exceptions_unwind

11.27

--exceptions_unwind, --no_exceptions_unwind
Enables or disables function unwinding for exception-aware code. This option is only effective if
--exceptions is enabled.
Note
Not supported for AArch64 state.
The default is --exceptions_unwind.
For finer control, use the FRAME UNWIND ON and FRAME UNWIND OFF directives.
Related references
11.26 --exceptions, --no_exceptions on page 11-254.
21.39 FRAME UNWIND ON on page 21-1682.
21.40 FRAME UNWIND OFF on page 21-1683.
21.41 FUNCTION or PROC on page 21-1684.
21.24 ENDFUNC or ENDP on page 21-1666.

ARM DUI0801G

11-255

11 armasm Command-line Options
11.28 --execstack, --no_execstack

11.28

--execstack, --no_execstack
Generates a .note.GNU-stack section marking the stack as either executable or non-executable.
You can also use the AREA directive to generate either an executable or non-executable .note.GNU-stack
section. The following code generates an executable .note.GNU-stack section. Omitting the CODE
attribute generates a non-executable .note.GNU-stack section.
AREA

|.note.GNU-stack|,ALIGN=0,READONLY,NOALLOC,CODE

In the absence of --execstack and --no_execstack, the .note.GNU-stack section is not generated
unless it is specified by the AREA directive.
If both the command-line option and source directive are used and are different, then the stack is marked
as executable.
Table 11-3 Specifying a command-line option and an AREA directive for GNU-stack sections
--execstack command-line option

--no_execstack command-line
option

execstack AREA directive

execstack

no_execstack AREA directive

execstack

no_execstack

Related references
21.6 AREA on page 21-1646.

ARM DUI0801G

11-256

11 armasm Command-line Options
11.29 --execute_only

11.29

--execute_only
Adds the EXECONLY AREA attribute to all code sections.
Usage
The EXECONLY AREA attribute causes the linker to treat the section as execute-only.
It is the user's responsibility to ensure that the code in the section is safe to run in execute-only memory.
For example:
• The code must not contain literal pools.
• The code must not attempt to load data from the same, or another, execute-only section.
Restrictions
This option is only supported for:
• Processors that support the ARMv8-M.mainline or ARMv8-M.baseline architecture.
• Processors that support the ARMv7-M architecture, such as Cortex-M3, Cortex-M4, and Cortex-M7.
• Processors that support the ARMv6-M architecture.
Note
ARM has only performed limited testing of execute-only code on ARMv6-M targets.

ARM DUI0801G

11-257

11 armasm Command-line Options
11.30 --fpmode=model

11.30

--fpmode=model
Specifies floating-point standard conformance and sets library attributes and floating-point
optimizations.
Syntax
--fpmode=model

Where model is one of:
none

Source code is not permitted to use any floating-point type or floating-point instruction. This
option overrides any explicit --fpu=name option.
ieee_full

All facilities, operations, and representations guaranteed by the IEEE standard are available in
single and double-precision. Modes of operation can be selected dynamically at runtime.
ieee_fixed

IEEE standard with round-to-nearest and no inexact exceptions.
ieee_no_fenv

IEEE standard with round-to-nearest and no exceptions. This mode is compatible with the Java
floating-point arithmetic model.
std

IEEE finite values with denormals flushed to zero, round-to-nearest and no exceptions. It is C
and C++ compatible. This is the default option.
Finite values are as predicted by the IEEE standard. It is not guaranteed that NaNs and infinities
are produced in all circumstances defined by the IEEE model, or that when they are produced,
they have the same sign. Also, it is not guaranteed that the sign of zero is that predicted by the
IEEE model.
fast

Some value altering optimizations, where accuracy is sacrificed to fast execution. This is not
IEEE compatible, and is not standard C.
Note
This does not cause any changes to the code that you write.

Example
armasm --cpu=8-A.32 --fpmode ieee_full inputfile.s

Related references
11.32 --fpu=name on page 11-260.
Related information
IEEE Standards Association.

ARM DUI0801G

11-258

11 armasm Command-line Options
11.31 --fpu=list

11.31

--fpu=list
Lists the FPU architecture names that are supported by the --fpu=name option.
Example
armasm --fpu=list

Related references
11.30 --fpmode=model on page 11-258.
11.32 --fpu=name on page 11-260.

ARM DUI0801G

11-259

11 armasm Command-line Options
11.32 --fpu=name

11.32

--fpu=name
Specifies the target FPU architecture.
Syntax
--fpu=name

Where name is the name of the target FPU architecture. Specify --fpu=list to list the supported FPU
architecture names that you can use with --fpu=name.
The default floating-point architecture depends on the target architecture.
Note
Software floating-point linkage is not supported for AArch64 state.

Usage
If you specify this option, it overrides any implicit FPU option that appears on the command line, for
example, where you use the --cpu option. Floating-point instructions also produce either errors or
warnings if assembled for the wrong target FPU.
armasm sets a build attribute corresponding to name in the object file. The linker determines

compatibility between object files, and selection of libraries, accordingly.
Related references
11.30 --fpmode=model on page 11-258.

ARM DUI0801G

11-260

11 armasm Command-line Options
11.33 -g

11.33

-g
Enables the generation of debug tables.
This option is a synonym for --debug.
Related references
11.14 --debug on page 11-242.

ARM DUI0801G

11-261

11 armasm Command-line Options
11.34 --help

11.34

--help
Displays a summary of the main command-line options.
Default
This is the default if you specify armasm without any options or source files.
Related references
11.63 --version_number on page 11-291.
11.65 --vsn on page 11-293.

ARM DUI0801G

11-262

11 armasm Command-line Options
11.35 -idir[,dir, …]

11.35

-idir[,dir, …]
Adds directories to the source file include path.
Any directories added using this option have to be fully qualified.
Related references
21.43 GET or INCLUDE on page 21-1686.

ARM DUI0801G

11-263

11 armasm Command-line Options
11.36 --keep

11.36

--keep
Instructs the assembler to keep named local labels in the symbol table of the object file, for use by the
debugger.
Related references
21.48 KEEP on page 21-1693.

ARM DUI0801G

11-264

11 armasm Command-line Options
11.37 --length=n

11.37

--length=n
Sets the listing page length.
Length zero means an unpaged listing. The default is 66 lines.
Related references
11.40 --list=file on page 11-268.

ARM DUI0801G

11-265

11 armasm Command-line Options
11.38 --li

11.38

--li
A synonym for the --littleend command-line option.
Related references
11.42 --littleend on page 11-270.
11.7 --bigend on page 11-233.

ARM DUI0801G

11-266

11 armasm Command-line Options
11.39 --library_type=lib

11.39

--library_type=lib
Enables the selected library to be used at link time.
Syntax
--library_type=lib

Where lib is one of:
standardlib

Specifies that the full ARM runtime libraries are selected at link time. This is the default.
microlib

Specifies that the C micro-library (microlib) is selected at link time.
•
•
•

Note
This option can be used with the compiler, assembler, or linker when use of the libraries require more
specialized optimizations.
This option can be overridden at link time by providing it to the linker.
microlib is not supported for AArch64 state.

Related information
Building an application with microlib.

ARM DUI0801G

11-267

11 armasm Command-line Options
11.40 --list=file

11.40

--list=file
Instructs the assembler to output a detailed listing of the assembly language produced by the assembler to
a file.
If - is given as file, the listing is sent to stdout.
Use the following command-line options to control the behavior of --list:
• --no_terse.
• --width.
• --length.
• --xref.
Related references
11.50 --no_terse on page 11-278.
11.66 --width=n on page 11-294.
11.37 --length=n on page 11-265.
11.67 --xref on page 11-295.
21.55 OPT on page 21-1702.

ARM DUI0801G

11-268

11 armasm Command-line Options
11.41 --list=

11.41

--list=
Instructs the assembler to send the detailed assembly language listing to inputfile.lst.
Note
You can use --list without the equals sign and filename to send the output to inputfile.lst.
However, this syntax is deprecated and the assembler issues a warning. This syntax is to be removed in a
later release. Use --list= instead.

Related references
11.40 --list=file on page 11-268.

ARM DUI0801G

11-269

11 armasm Command-line Options
11.42 --littleend

11.42

--littleend
Generates code suitable for an ARM processor using little-endian memory access.
Related references
11.7 --bigend on page 11-233.
11.38 --li on page 11-266.

ARM DUI0801G

11-270

11 armasm Command-line Options
11.43 -m

11.43

-m
Instructs the assembler to write source file dependency lists to stdout.
Related references
11.45 --md on page 11-273.

ARM DUI0801G

11-271

11 armasm Command-line Options
11.44 --maxcache=n

11.44

--maxcache=n
Sets the maximum source cache size in bytes.
The default is 8MB. armasm gives a warning if the size is less than 8MB.

ARM DUI0801G

11-272

11 armasm Command-line Options
11.45 --md

11.45

--md
Creates makefile dependency lists.
This option instructs the assembler to write source file dependency lists to inputfile.d.
Related references
11.43 -m on page 11-271.

ARM DUI0801G

11-273

11 armasm Command-line Options
11.46 --no_code_gen

11.46

--no_code_gen
Instructs the assembler to exit after pass 1, generating no object file. This option is useful if you only
want to check the syntax of the source code or directives.

ARM DUI0801G

11-274

11 armasm Command-line Options
11.47 --no_esc

11.47

--no_esc
Instructs the assembler to ignore C-style escaped special characters, such as \n and \t.

ARM DUI0801G

11-275

11 armasm Command-line Options
11.48 --no_hide_all

11.48

--no_hide_all
Gives all exported and imported global symbols STV_DEFAULT visibility in ELF rather than STV_HIDDEN,
unless overridden using source directives.
You can use the following directives to specify an attribute that overrides the implicit symbol visibility:
• EXPORT.
• EXTERN.
• GLOBAL.
• IMPORT.
Related references
21.27 EXPORT or GLOBAL on page 21-1669.
21.45 IMPORT and EXTERN on page 21-1689.

ARM DUI0801G

11-276

11 armasm Command-line Options
11.49 --no_regs

11.49

--no_regs
Instructs armasm not to predefine register names.
Note
This option is deprecated. In AArch32 state, use --regnames=none instead.

Related references
11.56 --regnames on page 11-284.

ARM DUI0801G

11-277

11 armasm Command-line Options
11.50 --no_terse

11.50

--no_terse
Instructs the assembler to show in the list file the lines of assembly code that it has skipped because of
conditional assembly.
If you do not specify this option, the assembler does not output the skipped assembly code to the list file.
This option turns off the terse flag. By default the terse flag is on.
Related references
11.40 --list=file on page 11-268.

ARM DUI0801G

11-278

11 armasm Command-line Options
11.51 --no_warn

11.51

--no_warn
Turns off warning messages.
Related references
11.21 --diag_warning=tag[,tag,…] on page 11-249.

ARM DUI0801G

11-279

11 armasm Command-line Options
11.52 -o filename

11.52

-o filename
Specifies the name of the output file.
If this option is not used, the assembler creates an object filename in the form inputfilename.o. This
option is case-sensitive.

ARM DUI0801G

11-280

11 armasm Command-line Options
11.53 --pd

11.53

--pd
A synonym for the --predefine command-line option.
Related references
11.54 --predefine "directive" on page 11-282.

ARM DUI0801G

11-281

11 armasm Command-line Options
11.54 --predefine "directive"

11.54

--predefine "directive"
Instructs armasm to pre-execute one of the SETA, SETL, or SETS directives.
You must enclose directive in quotes, for example:
armasm --cpu=8-A.64 --predefine "VariableName SETA 20" inputfile.s
armasm also executes a corresponding GBLL, GBLS, or GBLA directive to define the variable before setting

its value.
The variable name is case-sensitive. The variables defined using the command line are global to armasm
source files specified on the command line.
Considerations when using --predefine
Be aware of the following:
• The command-line interface of your system might require you to enter special character
combinations, such as \", to include strings in directive. Alternatively, you can use --via file to
include a --predefine argument. The command-line interface does not alter arguments from --via
files.
• --predefine is not equivalent to the compiler option -Dname. --predefine defines a global variable
whereas -Dname defines a macro that the C preprocessor expands.
Although you can use predefined global variables in combination with assembly control directives,
for example IF and ELSE to control conditional assembly, they are not intended to provide the same
functionality as the C preprocessor in armasm. If you require this functionality, ARM recommends
you use the compiler to pre-process your assembly code.
Related references
11.53 --pd on page 11-281.
21.42 GBLA, GBLL, and GBLS on page 21-1685.
21.44 IF, ELSE, ENDIF, and ELIF on page 21-1687.
21.63 SETA, SETL, and SETS on page 21-1712.

ARM DUI0801G

11-282

11 armasm Command-line Options
11.55 --reduce_paths, --no_reduce_paths

11.55

--reduce_paths, --no_reduce_paths
Enables or disables the elimination of redundant path name information in file paths.
Windows systems impose a 260 character limit on file paths. Where relative pathnames exist whose
absolute names expand to longer than 260 characters, you can use the --reduce_paths option to reduce
absolute pathname length by matching up directories with corresponding instances of .. and eliminating
the directory/.. sequences in pairs.
--no_reduce_paths is the default.

Note
ARM recommends that you avoid using long and deeply nested file paths, in preference to minimizing
path lengths using the --reduce_paths option.
Note
This option is valid for 32-bit Windows systems only.

ARM DUI0801G

11-283

11 armasm Command-line Options
11.56 --regnames

11.56

--regnames
Controls the predefinition of register names.
Note
Not supported for AArch64 state.

Syntax
--regnames=option

Where option is one of the following:
none
Instructs armasm not to predefine register names.
callstd
Defines additional register names based on the AAPCS variant that you are using, as specified
by the --apcs option.
all
Defines all AAPCS registers regardless of the value of --apcs.
Related references
11.49 --no_regs on page 11-277.
3.7 Predeclared core register names in AArch32 state on page 3-71.
3.8 Predeclared extension register names in AArch32 state on page 3-72.
11.56 --regnames on page 11-284.
11.3 --apcs=qualifier…qualifier on page 11-228.

ARM DUI0801G

11-284

11 armasm Command-line Options
11.57 --report-if-not-wysiwyg

11.57

--report-if-not-wysiwyg
Instructs armasm to report when it outputs an encoding that was not directly requested in the source code.
This can happen when armasm:
• Uses a pseudo-instruction that is not available in other assemblers, for example MOV32.
• Outputs an encoding that does not directly match the instruction mnemonic, for example if the
assembler outputs the MVN encoding when assembling the MOV instruction.
• Inserts additional instructions where necessary for instruction syntax semantics, for example armasm
can insert a missing IT instruction before a conditional T32 instruction.
Note
Not supported for AArch64 state.

ARM DUI0801G

11-285

11 armasm Command-line Options
11.58 --show_cmdline

11.58

--show_cmdline
Outputs the command line used by the assembler.
Usage
Shows the command line after processing by the assembler, and can be useful to check:
• The command line a build system is using.
• How the assembler is interpreting the supplied command line, for example, the ordering of
command-line options.
The commands are shown normalized, and the contents of any via files are expanded.
The output is sent to the standard error stream (stderr).
Related references
11.64 --via=filename on page 11-292.

ARM DUI0801G

11-286

11 armasm Command-line Options
11.59 --thumb

11.59

--thumb
Instructs armasm to interpret instructions as T32 instructions, using UAL syntax. This is equivalent to a
THUMB directive at the start of the source file.
Note
Not supported for AArch64 state.

Related references
11.4 --arm on page 11-230.
21.65 THUMB directive on page 21-1715.

ARM DUI0801G

11-287

11 armasm Command-line Options
11.60 --unaligned_access, --no_unaligned_access

11.60

--unaligned_access, --no_unaligned_access
Enables or disables unaligned accesses to data on ARM architecture-based processors.
These options instruct the assembler to set an attribute in the object file to enable or disable the use of
unaligned accesses.

ARM DUI0801G

11-288

11 armasm Command-line Options
11.61 --unsafe

11.61

--unsafe
Enables instructions for other architectures to be assembled without error.
Note
Not supported for AArch64 state.
It downgrades error messages to corresponding warning messages. It also suppresses warnings about
operator precedence.
Related concepts
12.20 Binary operators on page 12-317.
Related references
11.17 --diag_error=tag[,tag,…] on page 11-245.
11.21 --diag_warning=tag[,tag,…] on page 11-249.

ARM DUI0801G

11-289

11 armasm Command-line Options
11.62 --untyped_local_labels

11.62

--untyped_local_labels
Causes armasm not to set the T32 bit for the address of a numeric local label referenced in an LDR
pseudo-instruction.
Note
Not supported for AArch64 state.
When this option is not used, if you reference a numeric local label in an LDR pseudo-instruction, and the
label is in T32 code, then armasm sets the T32 bit (bit 0) of the address. You can then use the address as
the target for a BX or BLX instruction.
If you require the actual address of the numeric local label, without the T32 bit set, then use this option.
Note
When using this option, if you use the address in a branch (register) instruction, armasm treats it as an
A32 code address, causing the branch to arrive in A32 state, meaning it would interpret this code as A32
instructions.

Example
THUMB
...
1
...
LDR r0,=%B1 ; r0 contains the address of numeric local label "1",
; T32 bit is not set if --untyped_local_labels was used
...

Related concepts
12.10 Numeric local labels on page 12-307.
Related references
13.54 LDR pseudo-instruction on page 13-417.
13.15 B on page 13-359.

ARM DUI0801G

11-290

11 armasm Command-line Options
11.63 --version_number

11.63

--version_number
Displays the version of armasm you are using.
Usage
The assembler displays the version number in the format Mmmuuxx, where:
• M is the major version number, 6.
• mm is the minor version number.
• uu is the update number.
• xx is reserved for ARM internal use. You can ignore this for the purposes of checking whether the
current release is a specific version or within a range of versions.

ARM DUI0801G

11-291

11 armasm Command-line Options
11.64 --via=filename

11.64

--via=filename
Reads an additional list of input filenames and assembler options from filename.
Syntax
--via=filename

Where filename is the name of a via file containing options to be included on the command line.
Usage
You can enter multiple --via options on the assembler command line. The --via options can also be
included within a via file.
Related concepts
22.1 Overview of via files on page 22-1720.
Related references
22.2 Via file syntax rules on page 22-1721.

ARM DUI0801G

11-292

11 armasm Command-line Options
11.65 --vsn

11.65

--vsn
Displays the version information and the license details.
Note
--vsn is intended to report the version information for manual inspection. The Component line indicates
the release of ARM Compiler you are using. If you need to access the version in other tools or scripts, for
example in build scripts, use the output from --version_number.

Example
> armasm --vsn
Product: ARM Compiler N.n
Component: ARM Compiler N.n
Tool: armasm [tool_id]

license_type

Software supplied by: ARM Limited

ARM DUI0801G

11-293

11 armasm Command-line Options
11.66 --width=n

11.66

--width=n
Sets the listing page width.
The default is 79 characters.
Related references
11.40 --list=file on page 11-268.

ARM DUI0801G

11-294

11 armasm Command-line Options
11.67 --xref

11.67

--xref
Instructs the assembler to list cross-referencing information on symbols, including where they were
defined and where they were used, both inside and outside macros.
The default is off.
Related references
11.40 --list=file on page 11-268.

ARM DUI0801G

11-295

Chapter 12
Symbols, Literals, Expressions, and Operators

Describes how you can use symbols to represent variables, addresses, and constants in code, and how
you can combine these with operators to create numeric or string expressions.
It contains the following sections:
• 12.1 Symbol naming rules on page 12-298.
• 12.2 Variables on page 12-299.
• 12.3 Numeric constants on page 12-300.
• 12.4 Assembly time substitution of variables on page 12-301.
• 12.5 Register-relative and PC-relative expressions on page 12-302.
• 12.6 Labels on page 12-303.
• 12.7 Labels for PC-relative addresses on page 12-304.
• 12.8 Labels for register-relative addresses on page 12-305.
• 12.9 Labels for absolute addresses on page 12-306.
• 12.10 Numeric local labels on page 12-307.
• 12.11 Syntax of numeric local labels on page 12-308.
• 12.12 String expressions on page 12-309.
• 12.13 String literals on page 12-310.
• 12.14 Numeric expressions on page 12-311.
• 12.15 Syntax of numeric literals on page 12-312.
• 12.16 Syntax of floating-point literals on page 12-313.
• 12.17 Logical expressions on page 12-314.
• 12.18 Logical literals on page 12-315.
• 12.19 Unary operators on page 12-316.
• 12.20 Binary operators on page 12-317.
• 12.21 Multiplicative operators on page 12-318.
• 12.22 String manipulation operators on page 12-319.

ARM DUI0801G

12-296

12 Symbols, Literals, Expressions, and Operators

•
•
•
•
•
•

ARM DUI0801G

12.23 Shift operators on page 12-320.
12.24 Addition, subtraction, and logical operators on page 12-321.
12.25 Relational operators on page 12-322.
12.26 Boolean operators on page 12-323.
12.27 Operator precedence on page 12-324.
12.28 Difference between operator precedence in assembly language and C on page 12-325.

12-297

12 Symbols, Literals, Expressions, and Operators
12.1 Symbol naming rules

12.1

Symbol naming rules
You must follow some rules when naming symbols in assembly language source code.
The following rules apply:
• Symbol names must be unique within their scope.
• You can use uppercase letters, lowercase letters, numeric characters, or the underscore character in
symbol names. Symbol names are case-sensitive, and all characters in the symbol name are
significant.
• Do not use numeric characters for the first character of symbol names, except in numeric local labels.
• Symbols must not use the same name as built-in variable names or predefined symbol names.
• If you use the same name as an instruction mnemonic or directive, use double bars to delimit the
symbol name. For example:
||ASSERT||

•

The bars are not part of the symbol.
You must not use the symbols |$a|, |$t|, or |$d| as program labels. These are mapping symbols
that mark the beginning of A32, T32, and A64 code, and data within the object file. You must not use
|$x|in A64 code.
Symbols beginning with the characters $v are mapping symbols that relate to floating-point code.
ARM recommends you avoid using symbols beginning with $v in your source code.

If you have to use a wider range of characters in symbols, for example, when working with compilers,
use single bars to delimit the symbol name. For example:
|.text|

The bars are not part of the symbol. You cannot use bars, semicolons, or newlines within the bars.
Related concepts
12.10 Numeric local labels on page 12-307.
Related references
3.7 Predeclared core register names in AArch32 state on page 3-71.
3.8 Predeclared extension register names in AArch32 state on page 3-72.
8.4 Built-in variables and constants on page 8-163.

ARM DUI0801G

12-298

12 Symbols, Literals, Expressions, and Operators
12.2 Variables

12.2

Variables
You can declare numeric, logical, or string variables using assembler directives.
The value of a variable can be changed as assembly proceeds. Variables are local to the assembler. This
means that in the generated code or data, every instance of the variable has a fixed value.
The type of a variable cannot be changed. Variables are one of the following types:
• Numeric.
• Logical.
• String.
The range of possible values of a numeric variable is the same as the range of possible values of a
numeric constant or numeric expression.
The possible values of a logical variable are {TRUE} or {FALSE}.
The range of possible values of a string variable is the same as the range of values of a string expression.
Use the GBLA, GBLL, GBLS, LCLA, LCLL, and LCLS directives to declare symbols representing variables, and
assign values to them using the SETA, SETL, and SETS directives.
Example
a
L1
a

SETA 100
MOV R1, #(a*5) ; In the object file, this is MOV R1, #500
SETA 200
; Value of 'a' is 200 only after this point.
; The previous instruction is always MOV R1, #500
…
BNE L1
; When the processor branches to L1, it executes
; MOV R1, #500

Related concepts
12.14 Numeric expressions on page 12-311.
12.12 String expressions on page 12-309.
12.3 Numeric constants on page 12-300.
12.17 Logical expressions on page 12-314.
Related references
21.42 GBLA, GBLL, and GBLS on page 21-1685.
21.49 LCLA, LCLL, and LCLS on page 21-1694.
21.63 SETA, SETL, and SETS on page 21-1712.

ARM DUI0801G

12-299

12 Symbols, Literals, Expressions, and Operators
12.3 Numeric constants

12.3

Numeric constants
You can define 32-bit numeric constants using the EQU assembler directive.
Numeric constants are 32-bit integers in A32 and T32 code. You can set them using unsigned numbers in
the range 0 to 232-1, or signed numbers in the range -231 to 231 -1. However, the assembler makes no
distinction between -n and 232-n.
In A64 code, numeric constants are 64-bit integers. You can set them using unsigned numbers in the
range 0 to 264-1, or signed numbers in the range -263 to 263-1. However, the assembler makes no
distinction between -n and 264-n.
Relational operators such as >= use the unsigned interpretation. This means that 0 > -1 is {FALSE}.
Use the EQU directive to define constants. You cannot change the value of a numeric constant after you
define it. You can construct expressions by combining numeric constants and binary operators.
Related concepts
12.14 Numeric expressions on page 12-311.
Related references
12.15 Syntax of numeric literals on page 12-312.
21.26 EQU on page 21-1668.

ARM DUI0801G

12-300

12 Symbols, Literals, Expressions, and Operators
12.4 Assembly time substitution of variables

12.4

Assembly time substitution of variables
You can assign a string variable to all or part of a line of assembly language code. A string variable can
contain numeric and logical variables.
Use the variable with a $ prefix in the places where the value is to be substituted for the variable. The
dollar character instructs armasm to substitute the string into the source code line before checking the
syntax of the line. armasm faults if the substituted line is larger than the source line limit.
Numeric and logical variables can also be substituted. The current value of the variable is converted to a
hexadecimal string (or T or F for logical variables) before substitution.
Use a dot to mark the end of the variable name if the following character would be permissible in a
symbol name. You must set the contents of the variable before you can use it.
If you require a $ that you do not want to be substituted, use $$. This is converted to a single $.
You can include a variable with a $ prefix in a string. Substitution occurs in the same way as anywhere
else.
Substitution does not occur within vertical bars, except that vertical bars within double quotes do not
affect substitution.
Example
; straightforward substitution
GBLS
add4ff
;
add4ff
SETS
"ADD r4,r4,#0xFF"
; set up add4ff
$add4ff.00
; invoke add4ff
; this produces
ADD r4,r4,#0xFF00
; elaborate substitution
GBLS
s1
GBLS
s2
GBLS
fixup
GBLA
count
;
count
SETA
14
s1
SETS
"a$$b$count" ; s1 now has value a$b0000000E
s2
SETS
"abc"
fixup
SETS
"|xy$s2.z|" ; fixup now has value |xyabcz|
|C$$code|
MOV
r4,#16
; but the label here is C$$code

Related references
5.1 Syntax of source lines in assembly language on page 5-94.
12.1 Symbol naming rules on page 12-298.

ARM DUI0801G

12-301

12 Symbols, Literals, Expressions, and Operators
12.5 Register-relative and PC-relative expressions

12.5

Register-relative and PC-relative expressions
The assembler supports PC-relative and register-relative expressions.
A register-relative expression evaluates to a named register combined with a numeric expression.
You write a PC-relative expression in source code as a label or the PC, optionally combined with a
numeric expression. Some instructions can also accept PC-relative expressions in the form [PC,
#number].
If you specify a label, the assembler calculates the offset from the PC value of the current instruction to
the address of the label. The assembler encodes the offset in the instruction. If the offset is too large, the
assembler produces an error. The offset is either added to or subtracted from the PC value to form the
required address.
ARM recommends you write PC-relative expressions using labels rather than the PC because the value
of the PC depends on the instruction set.
Note
•
•

•

In A32 code, the value of the PC is the address of the current instruction plus 8 bytes.
In T32 code:
— For B, BL, CBNZ, and CBZ instructions, the value of the PC is the address of the current instruction
plus 4 bytes.
— For all other instructions that use labels, the value of the PC is the address of the current
instruction plus 4 bytes, with bit[1] of the result cleared to 0 to make it word-aligned.
In A64 code, the value of the PC is the address of the current instruction.

Example

data

LDR
r4,=data+4*n
; code
MOV
pc,lr
DCD
value_0
; n-1 DCD directives
DCD
value_n
; more DCD directives

; n is an assembly-time variable

; data+4*n points here

Related concepts
12.6 Labels on page 12-303.
Related references
21.52 MAP on page 21-1699.

ARM DUI0801G

12-302

12 Symbols, Literals, Expressions, and Operators
12.6 Labels

12.6

Labels
A label is a symbol that represents the memory address of an instruction or data.
The address can be PC-relative, register-relative, or absolute. Labels are local to the source file unless
you make them global using the EXPORT directive.
The address given by a label is calculated during assembly. armasm calculates the address of a label
relative to the origin of the section where the label is defined. A reference to a label within the same
section can use the PC plus or minus an offset. This is called PC-relative addressing.
Addresses of labels in other sections are calculated at link time, when the linker has allocated specific
locations in memory for each section.
Related concepts
12.7 Labels for PC-relative addresses on page 12-304.
12.8 Labels for register-relative addresses on page 12-305.
12.9 Labels for absolute addresses on page 12-306.
Related references
5.1 Syntax of source lines in assembly language on page 5-94.
21.27 EXPORT or GLOBAL on page 21-1669.

ARM DUI0801G

12-303

12 Symbols, Literals, Expressions, and Operators
12.7 Labels for PC-relative addresses

12.7

Labels for PC-relative addresses
A label can represent the PC value plus or minus the offset from the PC to the label. Use these labels as
targets for branch instructions, or to access small items of data embedded in code sections.
You can define PC-relative labels using a label on an instruction or on one of the data definition
directives.
You can also use the section name of an AREA directive as a label for PC-relative addresses. In this case
the label points to the first byte of the specified AREA. ARM does not recommend using AREA names as
branch targets because when branching from A32 to T32 state or T32 to A32 state in this way, the
processor does not change the state properly.
Related references
21.6 AREA on page 21-1646.
21.15 DCB on page 21-1657.
21.16 DCD and DCDU on page 21-1658.
21.18 DCFD and DCFDU on page 21-1660.
21.19 DCFS and DCFSU on page 21-1661.
21.20 DCI on page 21-1662.
21.21 DCQ and DCQU on page 21-1663.
21.22 DCW and DCWU on page 21-1664.

ARM DUI0801G

12-304

12 Symbols, Literals, Expressions, and Operators
12.8 Labels for register-relative addresses

12.8

Labels for register-relative addresses
A label can represent a named register plus a numeric value. You define these labels in a storage map.
They are most commonly used to access data in data sections.
You can use the EQU directive to define additional register-relative labels, based on labels defined in
storage maps.
Note
Register-relative addresses are not supported in A64 code.

Example of storage map definitions
MAP
MAP

0,r9
0xff,r9

Related references
21.17 DCDO on page 21-1659.
21.26 EQU on page 21-1668.
21.52 MAP on page 21-1699.
21.64 SPACE or FILL on page 21-1714.

ARM DUI0801G

12-305

12 Symbols, Literals, Expressions, and Operators
12.9 Labels for absolute addresses

12.9

Labels for absolute addresses
A label can represent the absolute address of code or data.
These labels are numeric constants. In A32 and T32 code they are integers in the range 0 to 232-1. In A64
code, they are integers in the range 0 to 264-1. They address the memory directly. You can use labels to
represent absolute addresses using the EQU directive. To ensure that the labels are used correctly when
referenced in code, you can specify the absolute address as:
• A32 code with the ARM directive.
• T32 code with the THUMB directive.
• Data.
Example of defining labels for absolute address
abc EQU 2
xyz EQU label+8
fiq EQU 0x1C, ARM

;
;
;
;

assigns the value 2 to the symbol abc
assigns the address (label+8) to the symbol xyz
assigns the absolute address 0x1C to the symbol fiq
and marks it as A32 code

Related concepts
12.6 Labels on page 12-303.
12.7 Labels for PC-relative addresses on page 12-304.
12.8 Labels for register-relative addresses on page 12-305.
Related references
21.26 EQU on page 21-1668.

ARM DUI0801G

12-306

12 Symbols, Literals, Expressions, and Operators
12.10 Numeric local labels

12.10

Numeric local labels
Numeric local labels are a type of label that you refer to by number rather than by name. They are used
in a similar way to PC-relative labels, but their scope is more limited.
A numeric local label is a number in the range 0-99, optionally followed by a name. Unlike other labels,
a numeric local label can be defined many times and the same number can be used for more than one
numeric local label in an area.
Numeric local labels do not appear in the object file. This means that, for example, a debugger cannot set
a breakpoint directly on a numeric local label, like it can for named local labels kept using the KEEP
directive.
A numeric local label can be used in place of symbol in source lines in an assembly language module:
•
•
•

On its own, that is, where there is no instruction or directive.
On a line that contains an instruction.
On a line that contains a code- or data-generating directive.

A numeric local label is generally used where you might use a PC-relative label.
Numeric local labels are typically used for loops and conditional code within a routine, or for small
subroutines that are only used locally. They are particularly useful when you are generating labels in
macros.
The scope of numeric local labels is limited by the AREA directive. Use the ROUT directive to limit the
scope of numeric local labels more tightly. A reference to a numeric local label refers to a matching label
within the same scope. If there is no matching label within the scope in either direction, armasm
generates an error message and the assembly fails.
You can use the same number for more than one numeric local label even within the same scope. By
default, armasm links a numeric local label reference to:
• The most recent numeric local label with the same number, if there is one within the scope.
• The next following numeric local label with the same number, if there is not a preceding one within
the scope.
Use the optional parameters to modify this search pattern if required.
Related concepts
12.6 Labels on page 12-303.
Related references
5.1 Syntax of source lines in assembly language on page 5-94.
12.11 Syntax of numeric local labels on page 12-308.
21.51 MACRO and MEND on page 21-1696.
21.48 KEEP on page 21-1693.
21.62 ROUT on page 21-1711.

ARM DUI0801G

12-307

12 Symbols, Literals, Expressions, and Operators
12.11 Syntax of numeric local labels

12.11

Syntax of numeric local labels
When referring to numeric local labels you can specify how armasm searches for the label.
Syntax
n[routname] ; a numeric local label
%[F|B][A|T]n[routname] ; a reference to a numeric local label

where:
n

is the number of the numeric local label in the range 0-99.
routname

is the name of the current scope.
%

introduces the reference.
F

instructs armasm to search forwards only.
B

instructs armasm to search backwards only.
A

instructs armasm to search all macro levels.
T

instructs armasm to look at this macro level only.
Usage
If neither F nor B is specified, armasm searches backwards first, then forwards.
If neither A nor T is specified, armasm searches all macros from the current level to the top level, but does
not search lower level macros.
If routname is specified in either a label or a reference to a label, armasm checks it against the name of
the nearest preceding ROUT directive. If it does not match, armasm generates an error message and the
assembly fails.
Related concepts
12.10 Numeric local labels on page 12-307.
Related references
21.62 ROUT on page 21-1711.

ARM DUI0801G

12-308

12 Symbols, Literals, Expressions, and Operators
12.12 String expressions

12.12

String expressions
String expressions consist of combinations of string literals, string variables, string manipulation
operators, and parentheses.
Characters that cannot be placed in string literals can be placed in string expressions using the :CHR:
unary operator. Any ASCII character from 0 to 255 is permitted.
The value of a string expression cannot exceed 5120 characters in length. It can be of zero length.
Example
improb

SETS
"literal":CC:(strvar2:LEFT:4)
; sets the variable improb to the value "literal"
; with the left-most four characters of the
; contents of string variable strvar2 appended

Related concepts
12.13 String literals on page 12-310.
12.19 Unary operators on page 12-316.
12.2 Variables on page 12-299.
Related references
12.22 String manipulation operators on page 12-319.
21.63 SETA, SETL, and SETS on page 21-1712.

ARM DUI0801G

12-309

12 Symbols, Literals, Expressions, and Operators
12.13 String literals

12.13

String literals
String literals consist of a series of characters or spaces contained between double quote characters.
The length of a string literal is restricted by the length of the input line.
To include a double quote character or a dollar character within the string literal, include the character
twice as a pair. For example, you must use $$ if you require a single $ in the string.
C string escape sequences are also enabled and can be used within the string, unless --no_esc is
specified.
Examples
abc
def

SETS
SETS

"this string contains only one "" double quote"
"this string contains only one $$ dollar symbol"

Related references
5.1 Syntax of source lines in assembly language on page 5-94.
11.47 --no_esc on page 11-275.

ARM DUI0801G

12-310

12 Symbols, Literals, Expressions, and Operators
12.14 Numeric expressions

12.14

Numeric expressions
Numeric expressions consist of combinations of numeric constants, numeric variables, ordinary numeric
literals, binary operators, and parentheses.
Numeric expressions can contain register-relative or program-relative expressions if the overall
expression evaluates to a value that does not include a register or the PC.
Numeric expressions evaluate to 32-bit integers in A32 and T32 code. You can interpret them as
unsigned numbers in the range 0 to 232-1, or signed numbers in the range -231 to 231-1. However, armasm
makes no distinction between -n and 232-n. Relational operators such as >= use the unsigned
interpretation. This means that 0 > -1 is {FALSE}.
In A64 code, numeric expressions evaluate to 64-bit integers. You can interpret them as unsigned
numbers in the range 0 to 264-1, or signed numbers in the range -263 to 263-1. However, armasm makes no
distinction between -n and 264-n.
Note
armasm does not support 64-bit arithmetic variables. See 21.63 SETA, SETL, and SETS on page 21-1712
(Restrictions) for a workaround.
ARM recommends that you only use armasm for legacy ARM syntax assembly code, and that you use
the armclang assembler and GNU syntax for all new assembly files.

Example
a

SETA
MOV

256*256
r1,#(a*22)

; 256*256 is a numeric expression
; (a*22) is a numeric expression

Related concepts
12.20 Binary operators on page 12-317.
12.2 Variables on page 12-299.
12.3 Numeric constants on page 12-300.
Related references
12.15 Syntax of numeric literals on page 12-312.
21.63 SETA, SETL, and SETS on page 21-1712.

ARM DUI0801G

12-311

12 Symbols, Literals, Expressions, and Operators
12.15 Syntax of numeric literals

12.15

Syntax of numeric literals
Numeric literals consist of a sequence of characters, or a single character in quotes, evaluating to an
integer.
They can take any of the following forms:
• decimal-digits.
• 0xhexadecimal-digits.
• &hexadecimal-digits.
• n_base-n-digits.
• 'character'.
where:
decimal-digits

Is a sequence of characters using only the digits 0 to 9.
hexadecimal-digits

Is a sequence of characters using only the digits 0 to 9 and the letters A to F or a to f.
n_

Is a single digit between 2 and 9 inclusive, followed by an underscore character.
base-n-digits

Is a sequence of characters using only the digits 0 to (n –1)
character

Is any single character except a single quote. Use the standard C escape character (\') if you
require a single quote. The character must be enclosed within opening and closing single quotes.
In this case, the value of the numeric literal is the numeric code of the character.
You must not use any other characters. The sequence of characters must evaluate to an integer.
In A32/T32 code, the range is 0 to 232-1, except in DCQ, DCQU, DCD, and DCDU directives.
In A64 code, the range is 0 to 264-1, except in DCD and DCDU directives.
Note
•
•

In the DCQ and DCQU, the integer range is 0 to 264-1
In the DCO and DCOU directives, the integer range is 0 to 2128-1

Examples
a
addr
c3

SETA
DCD
LDR
DCD
SETA
DCQ
LDR
ADD

34906
0xA10E
r4,=&1000000F
2_11001010
8_74007
0x0123456789abcdef
r1,='A'
; pseudo-instruction loading 65 into r1
r3,r2,#'\'' ; add 39 to contents of r2, result to r3

Related concepts
12.3 Numeric constants on page 12-300.

ARM DUI0801G

12-312

12 Symbols, Literals, Expressions, and Operators
12.16 Syntax of floating-point literals

12.16

Syntax of floating-point literals
Floating-point literals consist of a sequence of characters evaluating to a floating-point number.
They can take any of the following forms:
•
•
•
•
•
•
•

{-}digitsE{-}digits
{-}{digits}.digits
{-}{digits}.digitsE{-}digits
0xhexdigits
&hexdigits
0f_hexdigits
0d_hexdigits

where:
digits

Are sequences of characters using only the digits 0 to 9. You can write E in uppercase or
lowercase. These forms correspond to normal floating-point notation.
hexdigits

Are sequences of characters using only the digits 0 to 9 and the letters A to F or a to f. These
forms correspond to the internal representation of the numbers in the computer. Use these forms
to enter infinities and NaNs, or if you want to be sure of the exact bit patterns you are using.
The 0x and & forms allow the floating-point bit pattern to be specified by any number of hex digits.
The 0f_ form requires the floating-point bit pattern to be specified by exactly 8 hex digits.
The 0d_ form requires the floating-point bit pattern to be specified by exactly 16 hex digits.
The range for half-precision floating-point values is:
• Maximum 65504 (IEEE format) or 131008 (alternative format).
• Minimum 0.00012201070785522461.
The range for single-precision floating-point values is:
•
•

Maximum 3.40282347e+38.
Minimum 1.17549435e–38.

The range for double-precision floating-point values is:
• Maximum 1.79769313486231571e+308.
• Minimum 2.22507385850720138e–308.
Floating-point numbers are only available if your system has floating-point, Advanced SIMD with
floating-point.
Examples
DCFD
DCFS
DCFS
DCFD
DCFS
DCFD

1E308,-4E-100
1.0
0.02
3.725e15
0x7FC00000
&FFF0000000000000

; Quiet NaN
; Minus infinity

Related concepts
12.3 Numeric constants on page 12-300.
Related references
12.15 Syntax of numeric literals on page 12-312.

ARM DUI0801G

12-313

12 Symbols, Literals, Expressions, and Operators
12.17 Logical expressions

12.17

Logical expressions
Logical expressions consist of combinations of logical literals ({TRUE} or {FALSE}), logical variables,
Boolean operators, relations, and parentheses.
Relations consist of combinations of variables, literals, constants, or expressions with appropriate
relational operators.
Related references
12.26 Boolean operators on page 12-323.
12.25 Relational operators on page 12-322.

ARM DUI0801G

12-314

12 Symbols, Literals, Expressions, and Operators
12.18 Logical literals

12.18

Logical literals
Logical or Boolean literals can have one of two values, {TRUE} or {FALSE}.
Related concepts
12.13 String literals on page 12-310.
Related references
12.15 Syntax of numeric literals on page 12-312.

ARM DUI0801G

12-315

12 Symbols, Literals, Expressions, and Operators
12.19 Unary operators

12.19

Unary operators
Unary operators return a string, numeric, or logical value. They have higher precedence than other
operators and are evaluated first.
A unary operator precedes its operand. Adjacent operators are evaluated from right to left.
The following table lists the unary operators that return strings:
Table 12-1 Unary operators that return strings

Operator

Usage

Description

:CHR:

:CHR:A

Returns the character with ASCII code A.

:LOWERCASE:

:LOWERCASE:string

Returns the given string, with all uppercase characters converted to lowercase.

:REVERSE_CC: :REVERSE_CC:cond_code Returns the inverse of the condition code in cond_code, or an error if cond_code
does not contain a valid condition code.
:STR:

:STR:A

In A32 and T32 code, returns an 8-digit hexadecimal string corresponding to a
numeric expression, or the string "T" or "F"" if used on a logical expression. In A64
code, returns a 16-digit hexadecimal string.

:UPPERCASE:

:UPPERCASE:string

Returns the given string, with all lowercase characters converted to uppercase.

The following table lists the unary operators that return numeric values:
Table 12-2 Unary operators that return numeric or logical values
Operator

Usage

Description

Number of bytes of code generated by line defining symbol A.

+ and -

Unary plus. Unary minus. + and – can act on numeric and PC-relative expressions.

-A
:BASE:

:BASE:A

If A is a PC-relative or register-relative expression, :BASE: returns the number of
its register component. :BASE: is most useful in macros.

:CC_ENCODING: :CC_ENCODING:cond_code Returns the numeric value of the condition code in cond_code, or an error if
cond_code does not contain a valid condition code.
:DEF:

:DEF:A

{TRUE} if A is defined, otherwise {FALSE}.

:INDEX:

:INDEX:A

If A is a register-relative expression, :INDEX: returns the offset from that base
register. :INDEX: is most useful in macros.

:LEN:

:LEN:A

Length of string A.

:LNOT:

:LNOT:A

Logical complement of A.

:NOT:

:NOT:A

Bitwise complement of A (~ is an alias, for example ~A).

:RCONST:

:RCONST:Rn

Number of register. In A32/T32 code, 0-15 corresponds to R0-R15. In A64 code,
0-30 corresponds to W0-W30 or X0-X30.

Related concepts
12.20 Binary operators on page 12-317.

ARM DUI0801G

12-316

12 Symbols, Literals, Expressions, and Operators
12.20 Binary operators

12.20

Binary operators
You write binary operators between the pair of sub-expressions they operate on. They have lower
precedence than unary operators.
Note
The order of precedence is not the same as in C.

Related concepts
12.28 Difference between operator precedence in assembly language and C on page 12-325.
Related references
12.21 Multiplicative operators on page 12-318.
12.22 String manipulation operators on page 12-319.
12.23 Shift operators on page 12-320.
12.24 Addition, subtraction, and logical operators on page 12-321.
12.25 Relational operators on page 12-322.
12.26 Boolean operators on page 12-323.

ARM DUI0801G

12-317

12 Symbols, Literals, Expressions, and Operators
12.21 Multiplicative operators

12.21

Multiplicative operators
Multiplicative operators have the highest precedence of all binary operators. They act only on numeric
expressions.
The following table shows the multiplicative operators:
Table 12-3 Multiplicative operators
Operator Alias Usage

Explanation

A*B

Multiply

A/B

Divide

:MOD:

A:MOD:B A modulo B

You can use the :MOD: operator on PC-relative expressions to ensure code is aligned correctly. These
alignment checks have the form PC-relative:MOD:Constant. For example:

AREA x,CODE
ASSERT ({PC}:MOD:4) == 0
DCB 1
DCB 2
ASSERT (y:MOD:4) == 1
ASSERT ({PC}:MOD:4) == 2
END

Related concepts
12.20 Binary operators on page 12-317.
12.5 Register-relative and PC-relative expressions on page 12-302.
12.14 Numeric expressions on page 12-311.
Related references
12.15 Syntax of numeric literals on page 12-312.

ARM DUI0801G

12-318

12 Symbols, Literals, Expressions, and Operators
12.22 String manipulation operators

12.22

String manipulation operators
You can use string manipulation operators to concatenate two strings, or to extract a substring.
The following table shows the string manipulation operators. In CC, both A and B must be strings. In the
slicing operators LEFT and RIGHT:
• A must be a string.
• B must be a numeric expression.
Table 12-4 String manipulation operators
Operator Usage

Explanation

:CC:

A:CC:B

B concatenated onto the end of A

:LEFT:

A:LEFT:B

The left-most B characters of A

:RIGHT:

A:RIGHT:B The right-most B characters of A

Related concepts
12.12 String expressions on page 12-309.
12.14 Numeric expressions on page 12-311.

ARM DUI0801G

12-319

12 Symbols, Literals, Expressions, and Operators
12.23 Shift operators

12.23

Shift operators
Shift operators act on numeric expressions, by shifting or rotating the first operand by the amount
specified by the second.
The following table shows the shift operators:
Table 12-5 Shift operators
Operator Alias Usage

Explanation

:ROL:

A:ROL:B Rotate A left by B bits

:ROR:

A:ROR:B Rotate A right by B bits

:SHL:

A:SHL:B Shift A left by B bits

:SHR:

A:SHR:B Shift A right by B bits

Note
SHR is a logical shift and does not propagate the sign bit.

Related concepts
12.20 Binary operators on page 12-317.

ARM DUI0801G

12-320

12 Symbols, Literals, Expressions, and Operators
12.24 Addition, subtraction, and logical operators

12.24

Addition, subtraction, and logical operators
Addition, subtraction, and logical operators act on numeric expressions.
Logical operations are performed bitwise, that is, independently on each bit of the operands to produce
the result.
The following table shows the addition, subtraction, and logical operators:
Table 12-6 Addition, subtraction, and logical operators
Operator Alias Usage

Explanation

A+B

Add A to B

A-B

Subtract B from A

:AND:

A:AND:B Bitwise AND of A and B

:EOR:

A:EOR:B Bitwise Exclusive OR of A and B

:OR:

A:OR:B

Bitwise OR of A and B

The use of | as an alias for :OR: is deprecated.
Related concepts
12.20 Binary operators on page 12-317.

ARM DUI0801G

12-321

12 Symbols, Literals, Expressions, and Operators
12.25 Relational operators

12.25

Relational operators
Relational operators act on two operands of the same type to produce a logical value.
The operands can be one of:
• Numeric.
• PC-relative.
• Register-relative.
• Strings.
Strings are sorted using ASCII ordering. String A is less than string B if it is a leading substring of string
B, or if the left-most character in which the two strings differ is less in string A than in string B.
Arithmetic values are unsigned, so the value of 0>-1 is {FALSE}.
The following table shows the relational operators:
Table 12-7 Relational operators
Operator Alias Usage Explanation
=

A=B

A equal to B

A>B

A greater than B

A>=B

A greater than or equal to B

A != A/=B

A not equal to B

Related concepts
12.20 Binary operators on page 12-317.

ARM DUI0801G

12-322

12 Symbols, Literals, Expressions, and Operators
12.26 Boolean operators

12.26

Boolean operators
Boolean operators perform standard logical operations on their operands. They have the lowest
precedence of all operators.
In all three cases, both A and B must be expressions that evaluate to either {TRUE} or {FALSE}.
The following table shows the Boolean operators:
Table 12-8 Boolean operators
Operator Alias Usage
:LAND:

:LEOR:
:LOR:

Explanation

A:LAND:B Logical AND of A and B
A:LEOR:B Logical Exclusive OR of A and B

A:LOR:B

Logical OR of A and B

Related concepts
12.20 Binary operators on page 12-317.

ARM DUI0801G

12-323

12 Symbols, Literals, Expressions, and Operators
12.27 Operator precedence

12.27

Operator precedence
armasm includes an extensive set of operators for use in expressions. It evaluates them using a strict order

of precedence.
Many of the operators resemble their counterparts in high-level languages such as C.
armasm evaluates operators in the following order:

1.
2.
3.
4.

Expressions in parentheses are evaluated first.
Operators are applied in precedence order.
Adjacent unary operators are evaluated from right to left.
Binary operators of equal precedence are evaluated from left to right.

Related concepts
12.19 Unary operators on page 12-316.
12.20 Binary operators on page 12-317.
12.28 Difference between operator precedence in assembly language and C on page 12-325.
Related references
12.21 Multiplicative operators on page 12-318.
12.22 String manipulation operators on page 12-319.
12.23 Shift operators on page 12-320.
12.24 Addition, subtraction, and logical operators on page 12-321.
12.25 Relational operators on page 12-322.
12.26 Boolean operators on page 12-323.

ARM DUI0801G

12-324

12 Symbols, Literals, Expressions, and Operators
12.28 Difference between operator precedence in assembly language and C

12.28

Difference between operator precedence in assembly language and C
armasm does not follow exactly the same order of precedence when evaluating operators as a C compiler.

For example, (1 + 2 :SHR: 3) evaluates as (1 + (2 :SHR: 3)) = 1 in assembly language. The
equivalent expression in C evaluates as ((1 + 2) >> 3) = 0.
ARM recommends you use brackets to make the precedence explicit.
If your code contains an expression that would parse differently in C, and you are not using the --unsafe
option, armasm gives a warning:
A1466W: Operator precedence means that expression would evaluate differently in C

In the following tables:
• The highest precedence operators are at the top of the list.
• The highest precedence operators are evaluated first.
• Operators of equal precedence are evaluated from left to right.
The following table shows the order of precedence of operators in assembly language, and a comparison
with the order in C.
Table 12-9 Operator precedence in ARM assembly language
assembly language precedence equivalent C operators
unary operators

unary operators

* / :MOD:

* / %

string manipulation

n/a

:SHL: :SHR: :ROR: :ROL:

<< >>

+ - :AND: :OR: :EOR:

+ - & | ^

= > >= < <= /= <>

== > >= < <= !=

:LAND: :LOR: :LEOR:

&& ||

The following table shows the order of precedence of operators in C.
Table 12-10 Operator precedence in C
C precedence
unary operators
* / %
+ - (as binary operators)
<< >>
< <= > >=
== !=
&
^
|
&&
||
ARM DUI0801G

12-325

12 Symbols, Literals, Expressions, and Operators
12.28 Difference between operator precedence in assembly language and C

Related concepts
12.20 Binary operators on page 12-317.
Related references
12.27 Operator precedence on page 12-324.

ARM DUI0801G

12-326

Chapter 13
A32 and T32 Instructions

Describes the A32 and T32 instructions supported in AArch32 state.
It contains the following sections:
• 13.1 A32 and T32 instruction summary on page 13-332.
• 13.2 Instruction width specifiers on page 13-337.
• 13.3 Flexible second operand (Operand2) on page 13-338.
• 13.4 Syntax of Operand2 as a constant on page 13-339.
• 13.5 Syntax of Operand2 as a register with optional shift on page 13-340.
• 13.6 Shift operations on page 13-341.
• 13.7 Saturating instructions on page 13-344.
• 13.8 ADC on page 13-345.
• 13.9 ADD on page 13-347.
• 13.10 ADR (PC-relative) on page 13-349.
• 13.11 ADR (register-relative) on page 13-351.
• 13.12 ADRL pseudo-instruction on page 13-353.
• 13.13 AND on page 13-355.
• 13.14 ASR on page 13-357.
• 13.15 B on page 13-359.
• 13.16 BFC on page 13-361.
• 13.17 BFI on page 13-362.
• 13.18 BIC on page 13-363.
• 13.19 BKPT on page 13-365.
• 13.20 BL on page 13-366.
• 13.21 BLX, BLXNS on page 13-368.
• 13.22 BX, BXNS on page 13-370.
• 13.23 BXJ on page 13-372.

ARM DUI0801G

13-327

13 A32 and T32 Instructions

•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
ARM DUI0801G

13.24 CBZ and CBNZ on page 13-373.
13.25 CDP and CDP2 on page 13-374.
13.26 CLREX on page 13-375.
13.27 CLZ on page 13-376.
13.28 CMP and CMN on page 13-377.
13.29 CPS on page 13-379.
13.30 CPY pseudo-instruction on page 13-381.
13.31 CRC32 on page 13-382.
13.32 CRC32C on page 13-383.
13.33 DBG on page 13-384.
13.34 DCPS1 (T32 instruction) on page 13-385.
13.35 DCPS2 (T32 instruction) on page 13-386.
13.36 DCPS3 (T32 instruction) on page 13-387.
13.37 DMB on page 13-388.
13.38 DSB on page 13-390.
13.39 EOR on page 13-392.
13.40 ERET on page 13-394.
13.41 ESB on page 13-395.
13.42 HLT on page 13-396.
13.43 HVC on page 13-397.
13.44 ISB on page 13-398.
13.45 IT on page 13-399.
13.46 LDA on page 13-402.
13.47 LDAEX on page 13-403.
13.48 LDC and LDC2 on page 13-405.
13.49 LDM on page 13-407.
13.50 LDR (immediate offset) on page 13-409.
13.51 LDR (PC-relative) on page 13-411.
13.52 LDR (register offset) on page 13-413.
13.53 LDR (register-relative) on page 13-415.
13.54 LDR pseudo-instruction on page 13-417.
13.55 LDR, unprivileged on page 13-419.
13.56 LDREX on page 13-421.
13.57 LSL on page 13-423.
13.58 LSR on page 13-425.
13.59 MCR and MCR2 on page 13-427.
13.60 MCRR and MCRR2 on page 13-428.
13.61 MLA on page 13-429.
13.62 MLS on page 13-430.
13.63 MOV on page 13-431.
13.64 MOV32 pseudo-instruction on page 13-433.
13.65 MOVT on page 13-434.
13.66 MRC and MRC2 on page 13-435.
13.67 MRRC and MRRC2 on page 13-436.
13.68 MRS (PSR to general-purpose register) on page 13-437.
13.69 MRS (system coprocessor register to ARM register) on page 13-439.
13.70 MSR (ARM register to system coprocessor register) on page 13-440.
13.71 MSR (general-purpose register to PSR) on page 13-441.
13.72 MUL on page 13-443.
13.73 MVN on page 13-444.
13.74 NEG pseudo-instruction on page 13-446.
13.75 NOP on page 13-447.
13.76 ORN (T32 only) on page 13-448.
13.77 ORR on page 13-449.
13.78 PKHBT and PKHTB on page 13-451.
13.79 PLD, PLDW, and PLI on page 13-453.
Copyright © 2014-2016 ARM Limited or its affiliates. All rights
reserved.
Non-Confidential

13-328

13 A32 and T32 Instructions

13.80 POP on page 13-455.
13.81 PUSH on page 13-456.
13.82 QADD on page 13-457.
13.83 QADD8 on page 13-458.
13.84 QADD16 on page 13-459.
13.85 QASX on page 13-460.
13.86 QDADD on page 13-461.
13.87 QDSUB on page 13-462.
13.88 QSAX on page 13-463.
13.89 QSUB on page 13-464.
13.90 QSUB8 on page 13-465.
13.91 QSUB16 on page 13-466.
13.92 RBIT on page 13-467.
13.93 REV on page 13-468.
13.94 REV16 on page 13-469.
13.95 REVSH on page 13-470.
13.96 RFE on page 13-471.
13.97 ROR on page 13-473.
13.98 RRX on page 13-475.
13.99 RSB on page 13-477.
13.100 RSC on page 13-479.
13.101 SADD8 on page 13-480.
13.102 SADD16 on page 13-481.
13.103 SASX on page 13-482.
13.104 SBC on page 13-483.
13.105 SBFX on page 13-485.
13.106 SDIV on page 13-486.
13.107 SEL on page 13-487.
13.108 SETEND on page 13-489.
13.109 SETPAN on page 13-490.
13.110 SEV on page 13-491.
13.111 SEVL on page 13-492.
13.112 SG on page 13-493.
13.113 SHADD8 on page 13-494.
13.114 SHADD16 on page 13-495.
13.115 SHASX on page 13-496.
13.116 SHSAX on page 13-497.
13.117 SHSUB8 on page 13-498.
13.118 SHSUB16 on page 13-499.
13.119 SMC on page 13-500.
13.120 SMLAxy on page 13-501.
13.121 SMLAD on page 13-503.
13.122 SMLAL on page 13-504.
13.123 SMLALD on page 13-505.
13.124 SMLALxy on page 13-506.
13.125 SMLAWy on page 13-507.
13.126 SMLSD on page 13-508.
13.127 SMLSLD on page 13-509.
13.128 SMMLA on page 13-510.
13.129 SMMLS on page 13-511.
13.130 SMMUL on page 13-512.
13.131 SMUAD on page 13-513.
13.132 SMULxy on page 13-514.
13.133 SMULL on page 13-515.
13.134 SMULWy on page 13-516.
13.135 SMUSD on page 13-517.
Copyright © 2014-2016 ARM Limited or its affiliates. All rights
reserved.
Non-Confidential

13-329

13 A32 and T32 Instructions

13.136 SRS on page 13-518.
13.137 SSAT on page 13-520.
13.138 SSAT16 on page 13-521.
13.139 SSAX on page 13-522.
13.140 SSUB8 on page 13-523.
13.141 SSUB16 on page 13-524.
13.142 STC and STC2 on page 13-525.
13.143 STL on page 13-527.
13.144 STLEX on page 13-528.
13.145 STM on page 13-530.
13.146 STR (immediate offset) on page 13-532.
13.147 STR (register offset) on page 13-534.
13.148 STR, unprivileged on page 13-536.
13.149 STREX on page 13-538.
13.150 SUB on page 13-540.
13.151 SUBS pc, lr on page 13-542.
13.152 SVC on page 13-544.
13.153 SWP and SWPB on page 13-545.
13.154 SXTAB on page 13-546.
13.155 SXTAB16 on page 13-547.
13.156 SXTAH on page 13-548.
13.157 SXTB on page 13-549.
13.158 SXTB16 on page 13-550.
13.159 SXTH on page 13-551.
13.160 SYS on page 13-553.
13.161 TBB and TBH on page 13-554.
13.162 TEQ on page 13-555.
13.163 TST on page 13-556.
13.164 TT, TTT, TTA, TTAT on page 13-557.
13.165 UADD8 on page 13-559.
13.166 UADD16 on page 13-560.
13.167 UASX on page 13-561.
13.168 UBFX on page 13-563.
13.169 UDF on page 13-564.
13.170 UDIV on page 13-565.
13.171 UHADD8 on page 13-566.
13.172 UHADD16 on page 13-567.
13.173 UHASX on page 13-568.
13.174 UHSAX on page 13-569.
13.175 UHSUB8 on page 13-570.
13.176 UHSUB16 on page 13-571.
13.177 UMAAL on page 13-572.
13.178 UMLAL on page 13-573.
13.179 UMULL on page 13-574.
13.180 UND pseudo-instruction on page 13-575.
13.181 UQADD8 on page 13-576.
13.182 UQADD16 on page 13-577.
13.183 UQASX on page 13-578.
13.184 UQSAX on page 13-579.
13.185 UQSUB8 on page 13-580.
13.186 UQSUB16 on page 13-581.
13.187 USAD8 on page 13-582.
13.188 USADA8 on page 13-583.
13.189 USAT on page 13-584.
13.190 USAT16 on page 13-585.
13.191 USAX on page 13-586.
Copyright © 2014-2016 ARM Limited or its affiliates. All rights
reserved.
Non-Confidential

13-330

13 A32 and T32 Instructions

•
•
•
•
•
•
•
•
•
•
•

ARM DUI0801G

13.192 USUB8 on page 13-588.
13.193 USUB16 on page 13-589.
13.194 UXTAB on page 13-590.
13.195 UXTAB16 on page 13-591.
13.196 UXTAH on page 13-593.
13.197 UXTB on page 13-594.
13.198 UXTB16 on page 13-595.
13.199 UXTH on page 13-596.
13.200 WFE on page 13-597.
13.201 WFI on page 13-598.
13.202 YIELD on page 13-599.

13-331

13 A32 and T32 Instructions
13.1 A32 and T32 instruction summary

13.1

A32 and T32 instruction summary
An overview of the instructions available in the A32 and T32 instruction sets.
Table 13-1 Summary of instructions

Mnemonic

Brief description

ADC, ADD

Add with Carry, Add

ADR

Load program or register-relative address (short range)

ADRL pseudo-instruction

Load program or register-relative address (medium range)

AND

Logical AND

ASR

Arithmetic Shift Right

Branch

BFC, BFI

Bit Field Clear and Insert

BIC

Bit Clear

BKPT

Software breakpoint

Branch with Link

BLX, BLXNS

Branch with Link, change instruction set, Branch with Link and Exchange (Nonsecure)

BX, BXNS

Branch, change instruction set, Branch and Exchange (Non-secure)

CBZ, CBNZ

Compare and Branch if {Non}Zero

CDP

Coprocessor Data Processing operation

CDP2

Coprocessor Data Processing operation

CLREX

Clear Exclusive

CLZ

Count leading zeros

CMN, CMP

Compare Negative, Compare

CPS

Change Processor State

CPY pseudo-instruction

Copy

CRC32

CRC32C

DBG

Debug

DCPS1

Debug switch to exception level 1

DCPS2

Debug switch to exception level 2

DCPS3

Debug switch to exception level 3

DMB, DSB

Data Memory Barrier, Data Synchronization Barrier

DSB

Data Synchronization Barrier

EOR

Exclusive OR

ERET

Exception Return

ESB

Error Synchronization Barrier

ARM DUI0801G

13-332

13 A32 and T32 Instructions
13.1 A32 and T32 instruction summary

Table 13-1 Summary of instructions (continued)
Mnemonic

Brief description

HLT

Halting breakpoint

HVC

Hypervisor Call

ISB

Instruction Synchronization Barrier

If-Then

LDAEX, LDAEXB, LDAEXH, LDAEXD

Load-Acquire Register Exclusive Word, Byte, Halfword, Doubleword

LDC, LDC2

Load Coprocessor

LDM

Load Multiple registers

LDR

Load Register with word

LDR pseudo-instruction

Load Register pseudo-instruction

LDA, LDAB, LDAH

Load-Acquire Register Word, Byte, Halfword

LDRB

Load Register with Byte

LDRBT

Load Register with Byte, user mode

LDRD

Load Registers with two words

LDREX, LDREXB, LDREXH, LDREXD

Load Register Exclusive Word, Byte, Halfword, Doubleword

LDRH

Load Register with Halfword

LDRHT

Load Register with Halfword, user mode

LDRSB

Load Register with Signed Byte

LDRSBT

Load Register with Signed Byte, user mode

LDRSH

Load Register with Signed Halfword

LDRSHT

Load Register with Signed Halfword, user mode

LDRT

Load Register with word, user mode

LSL, LSR

Logical Shift Left, Logical Shift Right

MCR

Move from Register to Coprocessor

MCRR

Move from Registers to Coprocessor

MLA

Multiply Accumulate

MLS

Multiply and Subtract

MOV

Move

MOVT

Move Top

MOV32 pseudo-instruction

Move 32-bit immediate to register

MRC

Move from Coprocessor to Register

MRRC

Move from Coprocessor to Registers

MRS

Move from PSR to Register

MRS pseudo-instruction

Move from system Coprocessor to Register

MSR

Move from Register to PSR

MSR pseudo-instruction

Move from Register to system Coprocessor

ARM DUI0801G

13-333

13 A32 and T32 Instructions
13.1 A32 and T32 instruction summary

Table 13-1 Summary of instructions (continued)
Mnemonic

Brief description

MUL

Multiply

MVN

Move Not

NEG pseudo-instruction

Negate

NOP

No Operation

ORN

Logical OR NOT

ORR

Logical OR

PKHBT, PKHTB

Pack Halfwords

PLD

Preload Data

PLDW

Preload Data with intent to Write

PLI

Preload Instruction

PUSH, POP

PUSH registers to stack, POP registers from stack

QADD, QDADD, QDSUB, QSUB

Saturating arithmetic

QADD8, QADD16, QASX, QSUB8, QSUB16,
QSAX

Parallel signed saturating arithmetic

RBIT

Reverse Bits

REV, REV16, REVSH

Reverse byte order

RFE

Return From Exception

ROR

Rotate Right Register

RRX

Rotate Right with Extend

RSB

Reverse Subtract

RSC

Reverse Subtract with Carry

SADD8, SADD16, SASX

Parallel Signed arithmetic

SBC

Subtract with Carry

SBFX, UBFX

Signed, Unsigned Bit Field eXtract

SDIV

Signed Divide

SEL

Select bytes according to APSR GE flags

SETEND

Set Endianness for memory accesses

SETPAN

Set Privileged Access Never

SEV

Set Event

SEVL

Set Event Locally

Secure Gateway

SHADD8, SHADD16, SHASX, SHSUB8,
SHSUB16, SHSAX

Parallel Signed Halving arithmetic

SMC

Secure Monitor Call

SMLAxy

Signed Multiply with Accumulate (32 <= 16 x 16 + 32)

ARM DUI0801G

13-334

13 A32 and T32 Instructions
13.1 A32 and T32 instruction summary

Table 13-1 Summary of instructions (continued)
Mnemonic

Brief description

SMLAD

Dual Signed Multiply Accumulate
(32 <= 32 + 16 x 16 + 16 x 16)

SMLAL

Signed Multiply Accumulate (64 <= 64 + 32 x 32)

SMLALxy

Signed Multiply Accumulate (64 <= 64 + 16 x 16)

SMLALD

Dual Signed Multiply Accumulate Long
(64 <= 64 + 16 x 16 + 16 x 16)

SMLAWy

Signed Multiply with Accumulate (32 <= 32 x 16 + 32)

SMLSD

Dual Signed Multiply Subtract Accumulate
(32 <= 32 + 16 x 16 – 16 x 16)

SMLSLD

Dual Signed Multiply Subtract Accumulate Long
(64 <= 64 + 16 x 16 – 16 x 16)

SMMLA

Signed top word Multiply with Accumulate (32 <= TopWord(32 x 32 + 32))

SMMLS

Signed top word Multiply with Subtract (32 <= TopWord(32 - 32 x 32))

SMMUL

Signed top word Multiply (32 <= TopWord(32 x 32))

SMUAD, SMUSD

Dual Signed Multiply, and Add or Subtract products

SMULxy

Signed Multiply (32 <= 16 x 16)

SMULL

Signed Multiply (64 <= 32 x 32)

SMULWy

Signed Multiply (32 <= 32 x 16)

SRS

Store Return State

SSAT

Signed Saturate

SSAT16

Signed Saturate, parallel halfwords

SSUB8, SSUB16, SSAX

Parallel Signed arithmetic

STC

Store Coprocessor

STM

Store Multiple registers

STR

Store Register with word

STRB

Store Register with Byte

STRBT

Store Register with Byte, user mode

STRD

Store Registers with two words

STREX, STREXB, STREXH,STREXD

Store Register Exclusive Word, Byte, Halfword, Doubleword

STRH

Store Register with Halfword

STRHT

Store Register with Halfword, user mode

STL, STLB, STLH

Store-Release Word, Byte, Halfword

STLEX, STLEXB, STLEXH, STLEXD

Store-Release Exclusive Word, Byte, Halfword, Doubleword

STRT

Store Register with word, user mode

SUB

Subtract

ARM DUI0801G

13-335

13 A32 and T32 Instructions
13.1 A32 and T32 instruction summary

Table 13-1 Summary of instructions (continued)
Mnemonic

Brief description

SUBS pc, lr

Exception return, no stack

SVC (formerly SWI)

Supervisor Call

SXTAB, SXTAB16, SXTAH

Signed extend, with Addition

SXTB, SXTH

Signed extend

SXTB16

Signed extend

SYS

Execute System coprocessor instruction

TBB, TBH

Table Branch Byte, Halfword

TEQ

Test Equivalence

TST

Test

TT, TTT, TTA, TTAT

Test Target (Alternate Domain, Unprivileged)

UADD8, UADD16, UASX

Parallel Unsigned arithmetic

UDF

Permanently Undefined

UDIV

Unsigned Divide

UHADD8, UHADD16, UHASX, UHSUB8,
UHSUB16, UHSAX

Parallel Unsigned Halving arithmetic

UMAAL

Unsigned Multiply Accumulate Accumulate Long
(64 <= 32 + 32 + 32 x 32)

UMLAL, UMULL

Unsigned Multiply Accumulate, Unsigned Multiply
(64 <= 32 x 32 + 64), (64 <= 32 x 32)

UQADD8, UQADD16, UQASX, UQSUB8,
UQSUB16, UQSAX

Parallel Unsigned Saturating arithmetic

USAD8

Unsigned Sum of Absolute Differences

USADA8

Accumulate Unsigned Sum of Absolute Differences

USAT

Unsigned Saturate

USAT16

Unsigned Saturate, parallel halfwords

USUB8, USUB16, USAX

Parallel Unsigned arithmetic

UXTAB, UXTAB16, UXTAH

Unsigned extend with Addition

UXTB, UXTH

Unsigned extend

UXTB16

Unsigned extend

See Chapter 14 Advanced SIMD Instructions (32-bit) on page 14-600 and
Chapter 15 Floating-point Instructions (32-bit) on page 15-750

WFE, WFI, YIELD

Wait For Event, Wait For Interrupt, Yield

ARM DUI0801G

13-336

13 A32 and T32 Instructions
13.2 Instruction width specifiers

13.2

Instruction width specifiers
The instruction width specifiers .W and .N control the size of T32 instruction encodings.
In T32 code the .W width specifier forces the assembler to generate a 32-bit encoding, even if a 16-bit
encoding is available. The .W specifier has no effect when assembling to A32 code.
In T32 code the .N width specifier forces the assembler to generate a 16-bit encoding. In this case, if the
instruction cannot be encoded in 16 bits or if .N is used in A32 code, the assembler generates an error.
If you use an instruction width specifier, you must place it immediately after the instruction mnemonic
and any condition code, for example:
BCS.W
B.N

ARM DUI0801G

label
label

; forces 32-bit instruction even for a short branch
; faults if label out of range for 16-bit instruction

13-337

13 A32 and T32 Instructions
13.3 Flexible second operand (Operand2)

13.3

Flexible second operand (Operand2)
Many A32 and T32 general data processing instructions have a flexible second operand.
This is shown as Operand2 in the descriptions of the syntax of each instruction.
Operand2 can be a:

•
•

Constant.
Register with optional shift.

Related concepts
13.6 Shift operations on page 13-341.
Related references
13.4 Syntax of Operand2 as a constant on page 13-339.
13.5 Syntax of Operand2 as a register with optional shift on page 13-340.

ARM DUI0801G

13-338

13 A32 and T32 Instructions
13.4 Syntax of Operand2 as a constant

13.4

Syntax of Operand2 as a constant
An Operand2 constant in an instruction has a limited range of values.
Syntax
#constant

where constant is an expression evaluating to a numeric value.
Usage
In A32 instructions, constant can have any value that can be produced by rotating an 8-bit value right
by any even number of bits within a 32-bit word.
In T32 instructions, constant can be:
•
•
•
•

Any constant that can be produced by shifting an 8-bit value left by any number of bits within a 32bit word.
Any constant of the form 0x00XY00XY.
Any constant of the form 0xXY00XY00.
Any constant of the form 0xXYXYXYXY.

Note
In these constants, X and Y are hexadecimal digits.
In addition, in a small number of instructions, constant can take a wider range of values. These are
listed in the individual instruction descriptions.
When an Operand2 constant is used with the instructions MOVS, MVNS, ANDS, ORRS, ORNS, EORS, BICS, TEQ
or TST, the carry flag is updated to bit[31] of the constant, if the constant is greater than 255 and can be
produced by shifting an 8-bit value. These instructions do not affect the carry flag if Operand2 is any
other constant.
Instruction substitution
If the value of an Operand2 constant is not available, but its logical inverse or negation is available, then
the assembler produces an equivalent instruction and inverts or negates the constant.
For example, an assembler might assemble the instruction CMP Rd, #0xFFFFFFFE as the equivalent
instruction CMN Rd, #0x2.
Be aware of this when comparing disassembly listings with source code.
You can use the --diag_warning 1645 assembler command line option to check when an instruction
substitution occurs.
Related concepts
13.6 Shift operations on page 13-341.
Related references
13.3 Flexible second operand (Operand2) on page 13-338.
13.5 Syntax of Operand2 as a register with optional shift on page 13-340.

ARM DUI0801G

13-339

13 A32 and T32 Instructions
13.5 Syntax of Operand2 as a register with optional shift

13.5

Syntax of Operand2 as a register with optional shift
When you use an Operand2 register in an instruction, you can optionally also specify a shift value.
Syntax
Rm {, shift}

where:
Rm

is the register holding the data for the second operand.
shift

is an optional constant or register-controlled shift to be applied to Rm. It can be one of:
ASR #n

arithmetic shift right n bits, 1 ≤ n ≤ 32.
LSL #n

logical shift left n bits, 1 ≤ n ≤ 31.
LSR #n

logical shift right n bits, 1 ≤ n ≤ 32.
ROR #n

rotate right n bits, 1 ≤ n ≤ 31.
RRX

rotate right one bit, with extend.
type Rs

is one of ASR, LSL, LSR, ROR.
Rs

is a register supplying the shift amount, and only the least significant byte is
used.
-

if omitted, no shift occurs, equivalent to LSL #0.
Usage
If you omit the shift, or specify LSL #0, the instruction uses the value in Rm.
If you specify a shift, the shift is applied to the value in Rm, and the resulting 32-bit value is used by the
instruction. However, the contents of the register Rm remain unchanged. Specifying a register with shift
also updates the carry flag when used with certain instructions.
Related concepts
13.6 Shift operations on page 13-341.
Related references
13.3 Flexible second operand (Operand2) on page 13-338.
13.4 Syntax of Operand2 as a constant on page 13-339.

ARM DUI0801G

13-340

13 A32 and T32 Instructions
13.6 Shift operations

13.6

Shift operations
Register shift operations move the bits in a register left or right by a specified number of bits, called the
shift length.
Register shift can be performed:
• Directly by the instructions ASR, LSR, LSL, ROR, and RRX, and the result is written to a destination
register.
• During the calculation of Operand2 by the instructions that specify the second operand as a register
with shift. The result is used by the instruction.
The permitted shift lengths depend on the shift type and the instruction, see the individual instruction
description or the flexible second operand description. If the shift length is 0, no shift occurs. Register
shift operations update the carry flag except when the specified shift length is 0.
Arithmetic shift right (ASR)
Arithmetic shift right by n bits moves the left-hand 32-n bits of a register to the right by n places, into the
right-hand 32-n bits of the result. It copies the original bit[31] of the register into the left-hand n bits of
the result.
You can use the ASR #n operation to divide the value in the register Rm by 2n, with the result being
rounded towards negative-infinity.
When the instruction is ASRS or when ASR #n is used in Operand2 with the instructions MOVS, MVNS,
ANDS, ORRS, ORNS, EORS, BICS, TEQ or TST, the carry flag is updated to the last bit shifted out, bit[n-1], of
the register Rm.
Note
•
•

If n is 32 or more, then all the bits in the result are set to the value of bit[31] of Rm.
If n is 32 or more and the carry flag is updated, it is updated to the value of bit[31] of Rm.

Carry
Flag
31

54 3 2 1 0

...
Figure 13-1 ASR #3

Logical shift right (LSR)
Logical shift right by n bits moves the left-hand 32-n bits of a register to the right by n places, into the
right-hand 32-n bits of the result. It sets the left-hand n bits of the result to 0.
You can use the LSR #n operation to divide the value in the register Rm by 2n, if the value is regarded as
an unsigned integer.
When the instruction is LSRS or when LSR #n is used in Operand2 with the instructions MOVS, MVNS,
ANDS, ORRS, ORNS, EORS, BICS, TEQ or TST, the carry flag is updated to the last bit shifted out, bit[n-1], of
the register Rm.
•
•

ARM DUI0801G

Note
If n is 32 or more, then all the bits in the result are cleared to 0.
If n is 33 or more and the carry flag is updated, it is updated to 0.

13-341

13 A32 and T32 Instructions
13.6 Shift operations

Carry
Flag

0 0 0

5 4 3 2 10

...
Figure 13-2 LSR #3

Logical shift left (LSL)
Logical shift left by n bits moves the right-hand 32-n bits of a register to the left by n places, into the lefthand 32-n bits of the result. It sets the right-hand n bits of the result to 0.
You can use the LSL #n operation to multiply the value in the register Rm by 2n, if the value is regarded
as an unsigned integer or a two’s complement signed integer. Overflow can occur without warning.
When the instruction is LSLS or when LSL #n, with non-zero n, is used in Operand2 with the instructions
MOVS, MVNS, ANDS, ORRS, ORNS, EORS, BICS, TEQ or TST, the carry flag is updated to the last bit shifted out,
bit[32-n], of the register Rm. These instructions do not affect the carry flag when used with LSL #0.
Note
•
•

If n is 32 or more, then all the bits in the result are cleared to 0.
If n is 33 or more and the carry flag is updated, it is updated to 0.

0 0 0

31
Carry
Flag

5 4 3 2 10

...
Figure 13-3 LSL #3

Rotate right (ROR)
Rotate right by n bits moves the left-hand 32-n bits of a register to the right by n places, into the righthand 32-n bits of the result. It also moves the right-hand n bits of the register into the left-hand n bits of
the result.
When the instruction is RORS or when ROR #n is used in Operand2 with the instructions MOVS, MVNS,
ANDS, ORRS, ORNS, EORS, BICS, TEQ or TST, the carry flag is updated to the last bit rotation, bit[n-1], of the
register Rm.
Note
•
•

If n is 32, then the value of the result is same as the value in Rm, and if the carry flag is updated, it is
updated to bit[31] of Rm.
ROR with shift length, n, more than 32 is the same as ROR with shift length n-32.

Carry
Flag
5 4 3 2 10

...
Figure 13-4 ROR #3

ARM DUI0801G

13-342

13 A32 and T32 Instructions
13.6 Shift operations

Rotate right with extend (RRX)
Rotate right with extend moves the bits of a register to the right by one bit. It copies the carry flag into
bit[31] of the result.
When the instruction is RRXS or when RRX is used in Operand2 with the instructions MOVS, MVNS, ANDS,
ORRS, ORNS, EORS, BICS, TEQ or TST, the carry flag is updated to bit[0] of the register Rm.

...

1 0

Carry
Flag

Figure 13-5 RRX

Related references
13.3 Flexible second operand (Operand2) on page 13-338.
13.4 Syntax of Operand2 as a constant on page 13-339.
13.5 Syntax of Operand2 as a register with optional shift on page 13-340.

ARM DUI0801G

13-343

13 A32 and T32 Instructions
13.7 Saturating instructions

13.7

Saturating instructions
Some A32 and T32 instructions perform saturating arithmetic.
The saturating instructions are:
• QADD.
• QDADD.
• QDSUB.
• QSUB.
• SSAT.
• USAT.
Some of the parallel instructions are also saturating.
Saturating arithmetic
Saturation means that, for some value of 2n that depends on the instruction:
•
•
•

For a signed saturating operation, if the full result would be less than -2n, the result returned is -2n.
For an unsigned saturating operation, if the full result would be negative, the result returned is zero.
If the full result would be greater than 2n-1, the result returned is 2n-1.

When any of these occurs, it is called saturation. Some instructions set the Q flag when saturation occurs.
Note
Saturating instructions do not clear the Q flag when saturation does not occur. To clear the Q flag, use an
MSR instruction.

The Q flag can also be set by two other instructions, but these instructions do not saturate.
Related concepts
9.14 Saturating Advanced SIMD instructions on page 9-198.
Related references
13.82 QADD on page 13-457.
13.89 QSUB on page 13-464.
13.86 QDADD on page 13-461.
13.87 QDSUB on page 13-462.
13.120 SMLAxy on page 13-501.
13.125 SMLAWy on page 13-507.
13.132 SMULxy on page 13-514.
13.134 SMULWy on page 13-516.
13.137 SSAT on page 13-520.
13.189 USAT on page 13-584.
13.71 MSR (general-purpose register to PSR) on page 13-441.

ARM DUI0801G

13-344

13 A32 and T32 Instructions
13.8 ADC

13.8

ADC
Add with Carry.
Syntax
ADC{S}{cond} {Rd}, Rn, Operand2

where:
S

is an optional suffix. If S is specified, the condition flags are updated on the result of the
operation.
cond

is an optional condition code.
Rd

is the destination register.
Rn

is the register holding the first operand.
Operand2

is a flexible second operand.
Usage
The ADC (Add with Carry) instruction adds the values in Rn and Operand2, together with the carry flag.
You can use ADC to synthesize multiword arithmetic.
In certain circumstances, the assembler can substitute one instruction for another. Be aware of this when
reading disassembly listings.
Use of PC and SP in T32 instructions
You cannot use PC (R15) for Rd, or any operand with the ADC command.
You cannot use SP (R13) for Rd, or any operand with the ADC command.
Use of PC and SP in A32 instructions
You cannot use PC for Rd or any operand in any data processing instruction that has a register-controlled
shift.
Use of PC for any operand, in instructions without register-controlled shift, is deprecated.
If you use PC (R15) as Rn or Operand2, the value used is the address of the instruction plus 8.
If you use PC as Rd:
• Execution branches to the address corresponding to the result.
• If you use the S suffix, see the SUBS pc,lr instruction.
Use of SP with the ADC A32 instruction is deprecated.
Condition flags
If S is specified, the ADC instruction updates the N, Z, C and V flags according to the result.
16-bit instructions
The following forms of this instruction are available in T32 code, and are 16-bit instructions:
ADCS Rd, Rd, Rm
Rd and Rm must both be Lo registers. This form can only be used outside an IT block.
ADC{cond} Rd, Rd, Rm
Rd and Rm must both be Lo registers. This form can only be used inside an IT block.
ARM DUI0801G

13-345

13 A32 and T32 Instructions
13.8 ADC

Multiword arithmetic examples
These two instructions add a 64-bit integer contained in R2 and R3 to another 64-bit integer contained in
R0 and R1, and place the result in R4 and R5.
ADDS
ADC

r4, r0, r2
r5, r1, r3

; adding the least significant words
; adding the most significant words

Related references
13.3 Flexible second operand (Operand2) on page 13-338.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-346

13 A32 and T32 Instructions
13.9 ADD

13.9

ADD
Add without Carry.
Syntax
ADD{S}{cond} {Rd}, Rn, Operand2
ADD{cond} {Rd}, Rn, #imm12 ; T32, 32-bit encoding only

where:
S

is an optional suffix. If S is specified, the condition flags are updated on the result of the
operation.
cond

is an optional condition code.
Rd

is the destination register.
Rn

is the register holding the first operand.
Operand2

is a flexible second operand.
imm12

is any value in the range 0-4095.
Operation
The ADD instruction adds the values in Rn and Operand2 or imm12.
In certain circumstances, the assembler can substitute one instruction for another. Be aware of this when
reading disassembly listings.
Use of PC and SP in T32 instructions
Generally, you cannot use PC (R15) for Rd, or any operand.
The exceptions are:
•

•

you can use PC for Rn in 32-bit encodings of T32 ADD instructions, with a constant Operand2 value in
the range 0-4095, and no S suffix. These instructions are useful for generating PC-relative addresses.
Bit[1] of the PC value reads as 0 in this case, so that the base address for the calculation is always
word-aligned.
you can use PC in 16-bit encodings of T32 ADD{cond} Rd, Rd, Rm instructions, where both registers
cannot be PC. However, the following 16-bit T32 instructions are deprecated:
— ADD{cond} PC, SP, PC.
— ADD{cond} SP, SP, PC.

Generally, you cannot use SP (R13) for Rd, or any operand. Except that:
• You can use SP for Rn in ADD instructions.
• ADD{cond} SP, SP, SP is permitted but is deprecated in ARMv6T2 and above.
• ADD{S}{cond} SP, SP, Rm{,shift} and SUB{S}{cond} SP, SP, Rm{,shift} are permitted if
shift is omitted or LSL #1, LSL #2, or LSL #3.
Use of PC and SP in A32 instructions
You cannot use PC for Rd or any operand in any data processing instruction that has a register-controlled
shift.
In ADD instructions without register-controlled shift, use of PC is deprecated except for the following
cases:

ARM DUI0801G

13-347

13 A32 and T32 Instructions
13.9 ADD

•
•
•

Use of PC for Rd in instructions that do not add SP to a register.
Use of PC for Rn and use of PC for Rm in instructions that add two registers other than SP.
Use of PC for Rn in the instruction ADD{cond} Rd, Rn, #Constant.

If you use PC (R15) as Rn or Rm, the value used is the address of the instruction plus 8.
If you use PC as Rd:
• Execution branches to the address corresponding to the result.
• If you use the S suffix, see the SUBS pc,lr instruction.
You can use SP for Rn in ADD instructions, however, ADDS PC, SP, #Constant is deprecated.
You can use SP in ADD (register) if Rn is SP and shift is omitted or LSL #1, LSL #2, or LSL #3.
Other uses of SP in these A32 instructions are deprecated.
Condition flags
If S is specified, these instructions update the N, Z, C and V flags according to the result.
16-bit instructions
The following forms of these instructions are available in T32 code, and are 16-bit instructions:
ADDS Rd, Rn, #imm
imm range 0-7. Rd and Rn must both be Lo registers. This form can only be used outside an IT

block.
ADD{cond} Rd, Rn, #imm
imm range 0-7. Rd and Rn must both be Lo registers. This form can only be used inside an IT

block.
ADDS Rd, Rn, Rm
Rd, Rn and Rm must all be Lo registers. This form can only be used outside an IT block.
ADD{cond} Rd, Rn, Rm
Rd, Rn and Rm must all be Lo registers. This form can only be used inside an IT block.
ADDS Rd, Rd, #imm
imm range 0-255. Rd must be a Lo register. This form can only be used outside an IT block.
ADD{cond} Rd, Rd, #imm
imm range 0-255. Rd must be a Lo register. This form can only be used inside an IT block.
ADD SP, SP, #imm
imm range 0-508, word aligned.
ADD Rd, SP, #imm
imm range 0-1020, word aligned. Rd must be a Lo register.
ADD Rd, pc, #imm
imm range 0-1020, word aligned. Rd must be a Lo register. Bits[1:0] of the PC are read as 0 in

this instruction.
Example
ADD

r2, r1, r3

Multiword arithmetic example
These two instructions add a 64-bit integer contained in R2 and R3 to another 64-bit integer contained in
R0 and R1, and place the result in R4 and R5.
ADDS
ADC

r4, r0, r2
r5, r1, r3

; adding the least significant words
; adding the most significant words

Related references
13.3 Flexible second operand (Operand2) on page 13-338.
7.11 Condition code suffixes on page 7-150.
13.151 SUBS pc, lr on page 13-542.
ARM DUI0801G

13-348

13 A32 and T32 Instructions
13.10 ADR (PC-relative)

13.10

ADR (PC-relative)
Generate a PC-relative address in the destination register, for a label in the current area.
Syntax
ADR{cond}{.W} Rd,label

where:
cond

is an optional condition code.
.W

is an optional instruction width specifier.
Rd

is the destination register to load.
label

is a PC-relative expression.
label must be within a limited distance of the current instruction.

Usage
ADR produces position-independent code, because the assembler generates an instruction that adds or

subtracts a value to the PC.
Use the ADRL pseudo-instruction to assemble a wider range of effective addresses.
label must evaluate to an address in the same assembler area as the ADR instruction.

If you use ADR to generate a target for a BX or BLX instruction, it is your responsibility to set the T32 bit
(bit 0) of the address if the target contains T32 instructions.
Offset range and architectures
The assembler calculates the offset from the PC for you. The assembler generates an error if label is out
of range.
The following table shows the possible offsets between the label and the current instruction:
Table 13-2 PC-relative offsets
Instruction

Offset range

A32 ADR

See 13.4 Syntax of Operand2 as a constant on page 13-339.

T32 ADR, 32-bit encoding

+/– 4095

T32 ADR, 16-bit encoding a 0-1020 b

ADR in T32
You can use the .W width specifier to force ADR to generate a 32-bit instruction in T32 code. ADR with .W
always generates a 32-bit instruction, even if the address can be generated in a 16-bit instruction.
For forward references, ADR without .W always generates a 16-bit instruction in T32 code, even if that
results in failure for an address that could be generated in a 32-bit T32 ADD instruction.
Restrictions
In T32 code, Rd cannot be PC or SP.
a
b

Rd must be in the range R0-R7.
Must be a multiple of 4.

ARM DUI0801G

13-349

13 A32 and T32 Instructions
13.10 ADR (PC-relative)

In A32 code, Rd can be PC or SP but use of SP is deprecated.
Related concepts
6.10 Load addresses to a register using ADR on page 6-114.
12.5 Register-relative and PC-relative expressions on page 12-302.
Related references
13.4 Syntax of Operand2 as a constant on page 13-339.
13.12 ADRL pseudo-instruction on page 13-353.
21.6 AREA on page 21-1646.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-350

13 A32 and T32 Instructions
13.11 ADR (register-relative)

13.11

ADR (register-relative)
Generate a register-relative address in the destination register, for a label defined in a storage map.
Syntax
ADR{cond}{.W} Rd,label

where:
cond

is an optional condition code.
.W

is an optional instruction width specifier.
Rd

is the destination register to load.
label

is a symbol defined by the FIELD directive. label specifies an offset from the base register
which is defined using the MAP directive.
label must be within a limited distance from the base register.

Usage
ADR generates code to easily access named fields inside a storage map.

Use the ADRL pseudo-instruction to assemble a wider range of effective addresses.
Restrictions
In T32 code:
• Rd cannot be PC.
• Rd can be SP only if the base register is SP.
Offset range and architectures
The assembler calculates the offset from the base register for you. The assembler generates an error if
label is out of range.
The following table shows the possible offsets between the label and the current instruction:
Table 13-3 Register-relative offsets
Instruction

Offset range

A32 ADR

See 13.4 Syntax of Operand2 as a constant on page 13-339

T32 ADR, 32-bit encoding
T32 ADR, 16-bit encoding, base register is SP

±4095
c

0-1020 d

ADR in T32
You can use the .W width specifier to force ADR to generate a 32-bit instruction in T32 code. ADR with .W
always generates a 32-bit instruction, even if the address can be generated in a 16-bit instruction.
For forward references, ADR without .W, with base register SP, always generates a 16-bit instruction in
T32 code, even if that results in failure for an address that could be generated in a 32-bit T32 ADD
instruction.

c
d

Rd must be in the range R0-R7 or SP. If Rd is SP, the offset range is –508 to 508 and must be a multiple of 4
Must be a multiple of 4.

ARM DUI0801G

13-351

13 A32 and T32 Instructions
13.11 ADR (register-relative)

Related concepts
12.5 Register-relative and PC-relative expressions on page 12-302.
Related references
13.4 Syntax of Operand2 as a constant on page 13-339.
13.12 ADRL pseudo-instruction on page 13-353.
21.52 MAP on page 21-1699.
21.29 FIELD on page 21-1672.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-352

13 A32 and T32 Instructions
13.12 ADRL pseudo-instruction

13.12

ADRL pseudo-instruction
Load a PC-relative or register-relative address into a register.
Syntax
ADRL{cond} Rd,label

where:
cond

is an optional condition code.
Rd

is the register to load.
label

is a PC-relative or register-relative expression.
Usage
ADRL always assembles to two 32-bit instructions. Even if the address can be reached in a single

instruction, a second, redundant instruction is produced.
If the assembler cannot construct the address in two instructions, it generates an error message and the
assembly fails. You can use the LDR pseudo-instruction for loading a wider range of addresses.
ADRL is similar to the ADR instruction, except ADRL can load a wider range of addresses because it

generates two data processing instructions.
ADRL produces position-independent code, because the address is PC-relative or register-relative.

If label is PC-relative, it must evaluate to an address in the same assembler area as the ADRL pseudoinstruction.
If you use ADRL to generate a target for a BX or BLX instruction, it is your responsibility to set the T32 bit
(bit 0) of the address if the target contains T32 instructions.
Architectures and range
The available range depends on the instruction set in use:
A32
The range of the instruction is any value that can be generated by two ADD or two SUB
instructions. That is, any value that can be produced by the addition of two values, each of
which is 8 bits rotated right by any even number of bits within a 32-bit word.
T32, 32-bit encoding
±1MB bytes to a byte, halfword, or word-aligned address.
T32, 16-bit encoding
ADRL is not available.
The given range is relative to a point four bytes (in T32 code) or two words (in A32 code) after the
address of the current instruction.
Note
ADRL is not available in ARMv6-M and ARMv8-M.baseline.

Related concepts
12.5 Register-relative and PC-relative expressions on page 12-302.
6.4 Load immediate values on page 6-105.
Related references
13.4 Syntax of Operand2 as a constant on page 13-339.
ARM DUI0801G

13-353

13 A32 and T32 Instructions
13.12 ADRL pseudo-instruction

13.54 LDR pseudo-instruction on page 13-417.
21.6 AREA on page 21-1646.
13.9 ADD on page 13-347.
7.11 Condition code suffixes on page 7-150.
Related information
ARM Architecture Reference Manual.

ARM DUI0801G

13-354

13 A32 and T32 Instructions
13.13 AND

13.13

AND
Logical AND.
Syntax
AND{S}{cond} Rd, Rn, Operand2

where:
S

is an optional suffix. If S is specified, the condition flags are updated on the result of the
operation.
cond

is an optional condition code.
Rd

is the destination register.
Rn

is the register holding the first operand.
Operand2

is a flexible second operand.
Operation
The AND instruction performs bitwise AND operations on the values in Rn and Operand2.
In certain circumstances, the assembler can substitute BIC for AND, or AND for BIC. Be aware of this when
reading disassembly listings.
Use of PC in T32 instructions
You cannot use PC (R15) for Rd or any operand with the AND instruction.
Use of PC and SP in A32 instructions
You can use PC and SP with the AND A32 instruction but this is deprecated.
If you use PC as Rn, the value used is the address of the instruction plus 8.
If you use PC as Rd:
• Execution branches to the address corresponding to the result.
• If you use the S suffix, see the SUBS pc,lr instruction.
You cannot use PC for any operand in any data processing instruction that has a register-controlled shift.
Condition flags
If S is specified, the AND instruction:
• Updates the N and Z flags according to the result.
• Can update the C flag during the calculation of Operand2.
• Does not affect the V flag.
16-bit instructions
The following forms of this instruction are available in T32 code, and are 16-bit instructions:
ANDS Rd, Rd, Rm
Rd and Rm must both be Lo registers. This form can only be used outside an IT block.
AND{cond} Rd, Rd, Rm
Rd and Rm must both be Lo registers. This form can only be used inside an IT block.

It does not matter if you specify AND{S} Rd, Rm, Rd. The instruction is the same.

ARM DUI0801G

13-355

13 A32 and T32 Instructions
13.13 AND

Examples
AND
ANDS

r9,r2,#0xFF00
r9, r8, #0x19

Related references
13.3 Flexible second operand (Operand2) on page 13-338.
13.151 SUBS pc, lr on page 13-542.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-356

13 A32 and T32 Instructions
13.14 ASR

13.14

ASR
Arithmetic Shift Right. This instruction is a preferred synonym for MOV instructions with shifted register
operands.
Syntax
ASR{S}{cond} Rd, Rm, Rs
ASR{S}{cond} Rd, Rm, #sh

where:
S

is an optional suffix. If S is specified, the condition flags are updated on the result of the
operation.
Rd

is the destination register.
Rm

is the register holding the first operand. This operand is shifted right.
Rs

is a register holding a shift value to apply to the value in Rm. Only the least significant byte is
used.
sh

is a constant shift. The range of values permitted is 1-32.
Operation
ASR provides the signed value of the contents of a register divided by a power of two. It copies the sign

bit into vacated bit positions on the left.
Restrictions in T32 code
T32 instructions must not use PC or SP.
Use of SP and PC in A32 instructions
You can use SP in the ASR A32 instruction but this is deprecated.
You cannot use PC in instructions with the ASR{S}{cond} Rd, Rm, Rs syntax. You can use PC for Rd
and Rm in the other syntax, but this is deprecated.
If you use PC as Rm, the value used is the address of the instruction plus 8.
If you use PC as Rd:
• Execution branches to the address corresponding to the result.
• If you use the S suffix, the SPSR of the current mode is copied to the CPSR. You can use this to
return from exceptions.
Note
The A32 instruction ASRS{cond} pc,Rm,#sh always disassembles to the preferred form MOVS{cond}
pc,Rm{,shift}.

13-357

13 A32 and T32 Instructions
13.14 ASR

Condition flags
If S is specified, the ASR instruction updates the N and Z flags according to the result.
The C flag is unaffected if the shift value is 0. Otherwise, the C flag is updated to the last bit shifted out.
16-bit instructions
The following forms of these instructions are available in T32 code, and are 16-bit instructions:
ASRS Rd, Rm, #sh
Rd and Rm must both be Lo registers. This form can only be used outside an IT block.
ASR{cond} Rd, Rm, #sh
Rd and Rm must both be Lo registers. This form can only be used inside an IT block.
ASRS Rd, Rd, Rs
Rd and Rs must both be Lo registers. This form can only be used outside an IT block.
ASR{cond} Rd, Rd, Rs
Rd and Rs must both be Lo registers. This form can only be used inside an IT block.

Architectures
This instruction is available in A32 and T32.
Example
ASR

r7, r8, r9

Related references
13.63 MOV on page 13-431.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-358

13 A32 and T32 Instructions
13.15 B

13.15

B
Branch.
Syntax
B{cond}{.W} label

where:
cond

is an optional condition code.
.W

is an optional instruction width specifier to force the use of a 32-bit B instruction in T32.
label

is a PC-relative expression.
Operation
The B instruction causes a branch to label.
Instruction availability and branch ranges
The following table shows the branch ranges that are available in A32 and T32 code. Instructions that are
not shown in this table are not available.
Table 13-4 B instruction availability and range
Instruction

A32

T32, 16-bit encoding

T32, 32-bit encoding

B label

±32MB

±2KB

±16MB e

B{cond} label

±32MB

–252 to +258

±1MB e

Extending branch ranges
Machine-level B instructions have restricted ranges from the address of the current instruction. However,
you can use these instructions even if label is out of range. Often you do not know where the linker
places label. When necessary, the linker adds code to enable longer branches. The added code is called
a veneer.
B in T32
You can use the .W width specifier to force B to generate a 32-bit instruction in T32 code.
B.W always generates a 32-bit instruction, even if the target could be reached using a 16-bit instruction.

For forward references, B without .W always generates a 16-bit instruction in T32 code, even if that
results in failure for a target that could be reached using a 32-bit T32 instruction.
Condition flags
The B instruction does not change the flags.
Architectures
See the earlier table for details of availability of the B instruction.
Example
B

loopA

Use .W to instruct the assembler to use this 32-bit instruction.

ARM DUI0801G

13-359

13 A32 and T32 Instructions
13.15 B

Related concepts
12.5 Register-relative and PC-relative expressions on page 12-302.
Related references
7.11 Condition code suffixes on page 7-150.
Related information
Information about image structure and generation.

ARM DUI0801G

13-360

13 A32 and T32 Instructions
13.16 BFC

13.16

BFC
Bit Field Clear.
Syntax
BFC{cond} Rd, #lsb, #width

where:
cond

is an optional condition code.
Rd

is the destination register.
lsb

is the least significant bit that is to be cleared.
width

is the number of bits to be cleared. width must not be 0, and (width+lsb) must be less than or
equal to 32.
Operation
Clears adjacent bits in a register. width bits in Rd are cleared, starting at lsb. Other bits in Rd are
unchanged.
Register restrictions
You cannot use PC for any register.
You can use SP in the BFC A32 instruction but this is deprecated. You cannot use SP in the BFC T32
instruction.
Condition flags
The BFC instruction does not change the flags.
Architectures
This 32-bit instruction is available in A32 and T32.
There is no 16-bit version of this instruction in T32.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-361

13 A32 and T32 Instructions
13.17 BFI

13.17

BFI
Bit Field Insert.
Syntax
BFI{cond} Rd, Rn, #lsb, #width

where:
cond

is an optional condition code.
Rd

is the destination register.
Rn

is the source register.
lsb

is the least significant bit that is to be copied.
width

is the number of bits to be copied. width must not be 0, and (width+lsb) must be less than or
equal to 32.
Operation
Inserts adjacent bits from one register into another. width bits in Rd, starting at lsb, are replaced by
width bits from Rn, starting at bit[0]. Other bits in Rd are unchanged.
Register restrictions
You cannot use PC for any register.
You can use SP in the BFI A32 instruction but this is deprecated. You cannot use SP in the BFI T32
instruction.
Condition flags
The BFI instruction does not change the flags.
Architectures
This 32-bit instruction is available in A32 and T32.
There is no 16-bit version of this instruction in T32.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-362

13 A32 and T32 Instructions
13.18 BIC

13.18

BIC
Bit Clear.
Syntax
BIC{S}{cond} Rd, Rn, Operand2

where:
S

is an optional suffix. If S is specified, the condition flags are updated on the result of the
operation.
cond

is an optional condition code.
Rd

is the destination register.
Rn

is the register holding the first operand.
Operand2

is a flexible second operand.
Operation
The BIC (Bit Clear) instruction performs an AND operation on the bits in Rn with the complements of the
corresponding bits in the value of Operand2.
In certain circumstances, the assembler can substitute BIC for AND, or AND for BIC. Be aware of this when
reading disassembly listings.
Use of PC in T32 instructions
You cannot use PC (R15) for Rd or any operand in a BIC instruction.
Use of PC and SP in A32 instructions
You can use PC and SP with the BIC instruction but they are deprecated.
If you use PC as Rn, the value used is the address of the instruction plus 8.
If you use PC as Rd:
• Execution branches to the address corresponding to the result.
• If you use the S suffix, see the SUBS pc,lr instruction.
You cannot use PC for any operand in any data processing instruction that has a register-controlled shift.
Condition flags
If S is specified, the BIC instruction:
• Updates the N and Z flags according to the result.
• Can update the C flag during the calculation of Operand2.
• Does not affect the V flag.
16-bit instructions
The following forms of the BIC instruction are available in T32 code, and are 16-bit instructions:
BICS Rd, Rd, Rm
Rd and Rm must both be Lo registers. This form can only be used outside an IT block.
BIC{cond} Rd, Rd, Rm
Rd and Rm must both be Lo registers. This form can only be used inside an IT block.

ARM DUI0801G

13-363

13 A32 and T32 Instructions
13.18 BIC

Example
BIC

r0, r1, #0xab

Related references
13.3 Flexible second operand (Operand2) on page 13-338.
13.151 SUBS pc, lr on page 13-542.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-364

13 A32 and T32 Instructions
13.19 BKPT

13.19

BKPT
Breakpoint.
Syntax
BKPT #imm

where:
imm

is an expression evaluating to an integer in the range:
• 0-65535 (a 16-bit value) in an A32 instruction.
• 0-255 (an 8-bit value) in a 16-bit T32 instruction.
Usage
The BKPT instruction causes the processor to enter Debug state. Debug tools can use this to investigate
system state when the instruction at a particular address is reached.
In both A32 state and T32 state, imm is ignored by the ARM hardware. However, a debugger can use it to
store additional information about the breakpoint.
BKPT is an unconditional instruction. It must not have a condition code in A32 code. In T32 code, the
BKPT instruction does not require a condition code suffix because BKPT always executes irrespective of its

condition code suffix.
Architectures
This instruction is available in A32 and T32.
In T32, it is only available as a 16-bit instruction.

ARM DUI0801G

13-365

13 A32 and T32 Instructions
13.20 BL

13.20

BL
Branch with Link.
Syntax
BL{cond}{.W} label

where:
cond

is an optional condition code. cond is not available on all forms of this instruction.
.W

is an optional instruction width specifier to force the use of a 32-bit BL instruction in T32.
label

is a PC-relative expression.
Operation
The BL instruction causes a branch to label, and copies the address of the next instruction into LR (R14,
the link register).
Instruction availability and branch ranges
The following table shows the BL instructions that are available in A32 and T32 state. Instructions that
are not shown in this table are not available.
Table 13-5 BL instruction availability and range
Instruction

A32

T32, 16-bit encoding T32, 32-bit encoding

BL label

±32MB ±4MB f

BL{cond} label ±32MB -

±16MB
-

Extending branch ranges
Machine-level BL instructions have restricted ranges from the address of the current instruction.
However, you can use these instructions even if label is out of range. Often you do not know where the
linker places label. When necessary, the linker adds code to enable longer branches. The added code is
called a veneer.
Condition flags
The BL instruction does not change the flags.
Availability
See the preceding table for details of availability of the BL instruction in both instruction sets.
Examples
BLE
BL
BLLT

ng+8
subC
rtX

Related concepts
12.5 Register-relative and PC-relative expressions on page 12-302.
Related references
7.11 Condition code suffixes on page 7-150.
f

BL label and BLX label are an instruction pair.

ARM DUI0801G

13-366

13 A32 and T32 Instructions
13.20 BL

Related information
Information about image structure and generation.

ARM DUI0801G

13-367

13 A32 and T32 Instructions
13.21 BLX, BLXNS

13.21

BLX, BLXNS
Branch with Link and exchange instruction set and Branch with Link and Exchange (Non-secure).
Syntax
BLX{cond}{q} label
BLX{cond}{q} Rm
BLXNS{cond}{q} Rm (ARMv8-M only)

Where:
cond

Is an optional condition code. cond is not available on all forms of this instruction.
q

Is an optional instruction width specifier. Must be set to .W when label is used.
label

Is a PC-relative expression.
Rm

Is a register containing an address to branch to.
Operation
The BLX instruction causes a branch to label, or to the address contained in Rm. In addition:
• The BLX instruction copies the address of the next instruction into LR (R14, the link register).
• The BLX instruction can change the instruction set.
BLX label always changes the instruction set. It changes a processor in A32 state to T32 state, or a

processor in T32 state to A32 state.
BLX Rm derives the target instruction set from bit[0] of Rm:

— If bit[0] of Rm is 0, the processor changes to, or remains in, A32 state.
— If bit[0] of Rm is 1, the processor changes to, or remains in, T32 state.
•
•

Note
There are no equivalent instructions to BLX to change between AArch32 and AArch64 state. The only
way to change execution state is by a change of exception level.
ARMv8-M, ARMv7-M, and ARMv6-M only support the T32 instruction set. An attempt to change
the instruction execution state causes the processor to take an exception on the instruction at the
target address.

The BLXNS instruction calls a subroutine at an address and instruction set specified by a register, and
causes a transition from the Secure to the Non-secure domain. This variant of the instruction must only
be used when additional steps required to make such a transition safe are taken.
Instruction availability and branch ranges
The following table shows the instructions that are available in A32 and T32 state. Instructions that are
not shown in this table are not available.
Table 13-6 BLX instruction availability and range
Instruction

A32

T32, 16-bit encoding

T32, 32-bit encoding

BLX label

±32MB

±4MB g

±16MB

BLX Rm

Available

Use 16-bit

BLX label and BL label are an instruction pair.

ARM DUI0801G

13-368

13 A32 and T32 Instructions
13.21 BLX, BLXNS

Table 13-6 BLX instruction availability and range (continued)
Instruction

A32

T32, 16-bit encoding

T32, 32-bit encoding

BLX{cond} Rm

Available

BLXNS

Available

Register restrictions
You can use PC for Rm in the A32 BLX instruction, but this is deprecated. You cannot use PC in other A32
instructions.
You can use PC for Rm in the T32 BLX instruction. You cannot use PC in other T32 instructions.
You can use SP for Rm in this A32 instruction but this is deprecated.
You can use SP for Rm in the T32 BLX and BLXNS instructions, but this is deprecated. You cannot use SP
in the other T32 instructions.
Condition flags
These instructions do not change the flags.
Availability
See the preceding table for details of availability of the BLX and BLXNS instructions in both instruction
sets.
Related concepts
12.5 Register-relative and PC-relative expressions on page 12-302.
Related references
7.11 Condition code suffixes on page 7-150.
13.2 Instruction width specifiers on page 13-337.
Related information
Information about image structure and generation.

ARM DUI0801G

13-369

13 A32 and T32 Instructions
13.22 BX, BXNS

13.22

BX, BXNS
Branch and exchange instruction set and Branch and Exchange Non-secure.
Syntax
BX{cond}{q} Rm
BXNS{cond}{q} Rm (ARMv8-M only)

Where:
cond

Is an optional condition code. cond is not available on all forms of this instruction.
q

Is an optional instruction width specifier.
Rm

Is a register containing an address to branch to.
Operation
The BX instruction causes a branch to the address contained in Rm and exchanges the instruction set, if
necessary. The BX instruction can change the instruction set.
BX Rm derives the target instruction set from bit[0] of Rm:

•
•

If bit[0] of Rm is 0, the processor changes to, or remains in, A32 state.
If bit[0] of Rm is 1, the processor changes to, or remains in, T32 state.
Note

•
•

There are no equivalent instructions to BX to change between AArch32 and AArch64 state. The only
way to change execution state is by a change of exception level.
ARMv8-M, ARMv7-M, and ARMv6-M only support the T32 instruction set. An attempt to change
the instruction execution state causes the processor to take an exception on the instruction at the
target address.

BX can also be used for an exception return.

The BXNS instruction causes a branch to an address and instruction set specified by a register, and causes
a transition from the Secure to the Non-secure domain. This variant of the instruction must only be used
when additional steps required to make such a transition safe are taken.
Instruction availability and branch ranges
The following table shows the instructions that are available in A32 and T32 state. Instructions that are
not shown in this table are not available.
Table 13-7 BX instruction availability and range
Instruction

A32

T32, 16-bit encoding T32, 32-bit encoding

BX Rm

Available Available

Use 16-bit

BX{cond} Rm Available -

BXNS

Available

Register restrictions
You can use PC for Rm in the A32 BX instruction, but this is deprecated. You cannot use PC in other A32
instructions.

ARM DUI0801G

13-370

13 A32 and T32 Instructions
13.22 BX, BXNS

You can use PC for Rm in the T32 BX and BXNS instructions. You cannot use PC in other T32 instructions.
You can use SP for Rm in the A32 BX instruction but this is deprecated.
You can use SP for Rm in the T32 BX and BXNS instructions, but this is deprecated.
Condition flags
These instructions do not change the flags.
Availability
See the preceding table for details of availability of the BX and BXNS instructions in both instruction sets.
Related concepts
12.5 Register-relative and PC-relative expressions on page 12-302.
Related references
7.11 Condition code suffixes on page 7-150.
13.2 Instruction width specifiers on page 13-337.
Related information
Information about image structure and generation.

ARM DUI0801G

13-371

13 A32 and T32 Instructions
13.23 BXJ

13.23

BXJ
Branch and change to Jazelle state.
Syntax
BXJ{cond} Rm

where:
cond

is an optional condition code. cond is not available on all forms of this instruction.
Rm

is a register containing an address to branch to.
Operation
The BXJ instruction causes a branch to the address contained in Rm and changes the instruction set state to
Jazelle.
Note
In ARMv8, BXJ behaves as a BX instruction. This means it causes a branch to an address and instruction
set specified by a register.

Instruction availability and branch ranges
The following table shows the BXJ instructions that are available in A32 and T32 state. Instructions that
are not shown in this table are not available.
Table 13-8 BXJ instruction availability and range
Instruction

A32

T32, 16-bit encoding T32, 32-bit encoding

BXJ Rm

Available -

BXJ{cond} Rm Available -

Available
-

Register restrictions
You can use SP for Rm in the BXJ A32 instruction but this is deprecated.
You cannot use SP in the BXJ T32 instruction.
Condition flags
The BXJ instruction does not change the flags.
Availability
See the preceding table for details of availability of the BXJ instruction in both instruction sets.
Related concepts
12.5 Register-relative and PC-relative expressions on page 12-302.
Related references
7.11 Condition code suffixes on page 7-150.
Related information
Information about image structure and generation.

ARM DUI0801G

13-372

13 A32 and T32 Instructions
13.24 CBZ and CBNZ

13.24

CBZ and CBNZ
Compare and Branch on Zero, Compare and Branch on Non-Zero.
Syntax
CBZ{q} Rn, label
CBNZ{q} Rn, label

where:
q

Is an optional instruction width specifier.
Rn

Is the register holding the operand.
label

Is the branch destination.
Usage
You can use the CBZ or CBNZ instructions to avoid changing the condition flags and to reduce the number
of instructions.
Except that it does not change the condition flags, CBZ Rn, label is equivalent to:
CMP
BEQ

Rn, #0
label

Except that it does not change the condition flags, CBNZ Rn, label is equivalent to:
CMP
BNE

Rn, #0
label

Restrictions
The branch destination must be a multiple of 2 in the range 0 to 126 bytes after the instruction and in the
same execution state.
These instructions must not be used inside an IT block.
Condition flags
These instructions do not change the flags.
Architectures
These 16-bit instructions are available in ARMv7-A Thumb, ARMv8-A T32, and ARMv8-M only.
There are no ARMv7-A ARM, or ARMv8-A A32 or 32-bit T32 encodings of these instructions.
Related references
13.15 B on page 13-359.
13.28 CMP and CMN on page 13-377.
13.2 Instruction width specifiers on page 13-337.

ARM DUI0801G

13-373

13 A32 and T32 Instructions
13.25 CDP and CDP2

13.25

CDP and CDP2
Coprocessor data operations.
Note
CDP and CDP2 are not supported in ARMv8.

Syntax
CDP{cond} coproc, #opcode1, CRd, CRn, CRm{, #opcode2}
CDP2{cond} coproc, #opcode1, CRd, CRn, CRm{, #opcode2}

where:
cond

is an optional condition code.
In A32 code, cond is not permitted for CDP2.
coproc

is the name of the coprocessor the instruction is for. The standard name is pn, where n is an
integer in the range 0-15.
opcode1

is a 4-bit coprocessor-specific opcode.
opcode2

is an optional 3-bit coprocessor-specific opcode.
CRd, CRn, CRm

are coprocessor registers.
Usage
The use of these instructions depends on the coprocessor. See the coprocessor documentation for details.
Architectures
These 32-bit instructions are available in A32 and T32.
There are no 16-bit versions of these instructions in T32.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-374

13 A32 and T32 Instructions
13.26 CLREX

13.26

CLREX
Clear Exclusive.
Syntax
CLREX{cond}

where:
cond

is an optional condition code.
Note
cond is permitted only in T32 code, using a preceding IT instruction, but this is deprecated in
ARMv8. This is an unconditional instruction in A32.

Usage
Use the CLREX instruction to clear the local record of the executing processor that an address has had a
request for an exclusive access.
CLREX returns a closely-coupled exclusive access monitor to its open-access state. This removes the
requirement for a dummy store to memory.

It is implementation defined whether CLREX also clears the global record of the executing processor that
an address has had a request for an exclusive access.
Architectures
This 32-bit instruction is available in A32 and T32.
There is no 16-bit CLREX instruction in T32.
Related references
7.11 Condition code suffixes on page 7-150.
Related information
ARM Architecture Reference Manual.

ARM DUI0801G

13-375

13 A32 and T32 Instructions
13.27 CLZ

13.27

CLZ
Count Leading Zeros.
Syntax
CLZ{cond} Rd, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm

is the operand register.
Operation
The CLZ instruction counts the number of leading zeros in the value in Rm and returns the result in Rd.
The result value is 32 if no bits are set in the source register, and zero if bit 31 is set.
Register restrictions
You cannot use PC for any operand.
You can use SP in these A32 instructions but this is deprecated.
You cannot use SP in T32 instructions.
Condition flags
This instruction does not change the flags.
Architectures
This 32-bit instruction is available in A32 and T32.
There is no 16-bit version of this instruction in T32.
Examples
CLZ
CLZNE

r4,r9
r2,r3

Use the CLZ T32 instruction followed by a left shift of Rm by the resulting Rd value to normalize the value
of register Rm. Use MOVS, rather than MOV, to flag the case where Rm is zero:
CLZ r5, r9
MOVS r9, r9, LSL r5

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-376

13 A32 and T32 Instructions
13.28 CMP and CMN

13.28

CMP and CMN
Compare and Compare Negative.
Syntax
CMP{cond} Rn, Operand2
CMN{cond} Rn, Operand2

where:
cond

is an optional condition code.
Rn

is the ARM register holding the first operand.
Operand2

is a flexible second operand.
Operation
These instructions compare the value in a register with Operand2. They update the condition flags on the
result, but do not place the result in any register.
The CMP instruction subtracts the value of Operand2 from the value in Rn. This is the same as a SUBS
instruction, except that the result is discarded.
The CMN instruction adds the value of Operand2 to the value in Rn. This is the same as an ADDS
instruction, except that the result is discarded.
In certain circumstances, the assembler can substitute CMN for CMP, or CMP for CMN. Be aware of this when
reading disassembly listings.
Use of PC in A32 and T32 instructions
You cannot use PC for any operand in any data processing instruction that has a register-controlled shift.
You can use PC (R15) in these A32 instructions without register controlled shift but this is deprecated.
If you use PC as Rn in A32 instructions, the value used is the address of the instruction plus 8.
You cannot use PC for any operand in these T32 instructions.
Use of SP in A32 and T32 instructions
You can use SP for Rn in A32 and T32 instructions.
You can use SP for Rm in A32 instructions but this is deprecated.
You can use SP for Rm in a 16-bit T32 CMP Rn, Rm instruction but this is deprecated. Other uses of SP for
Rm are not permitted in T32.
Condition flags
These instructions update the N, Z, C and V flags according to the result.
16-bit instructions
The following forms of these instructions are available in T32 code, and are 16-bit instructions:
CMP Rn, Rm

Lo register restriction does not apply.
CMN Rn, Rm
Rn and Rm must both be Lo registers.

ARM DUI0801G

13-377

13 A32 and T32 Instructions
13.28 CMP and CMN

CMP Rn, #imm
Rn must be a Lo register. imm range 0-255.

Correct examples
CMP
CMN
CMPGT

r2, r9
r0, #6400
sp, r7, LSL #2

Incorrect example
CMP

r2, pc, ASR r0 ; PC not permitted with register-controlled
; shift.

Related references
13.3 Flexible second operand (Operand2) on page 13-338.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-378

13 A32 and T32 Instructions
13.29 CPS

13.29

CPS
Change Processor State.
Syntax
CPSeffect iflags{, #mode}
CPS #mode

where:
effect

is one of:
IE

Interrupt or abort enable.
ID

Interrupt or abort disable.
iflags

is a sequence of one or more of:
a

Enables or disables imprecise aborts.
i

Enables or disables IRQ interrupts.
f

Enables or disables FIQ interrupts.
mode

specifies the number of the mode to change to.
Usage
Changes one or more of the mode, A, I, and F bits in the CPSR, without changing the other CPSR bits.
CPS is only permitted in privileged software execution, and has no effect in User mode.
CPS cannot be conditional, and is not permitted in an IT block.

Condition flags
This instruction does not change the condition flags.
16-bit instructions
The following forms of these instructions are available in T32 code, and are 16-bit instructions:
• CPSIE iflags.
• CPSID iflags.
You cannot specify a mode change in a 16-bit T32 instruction.
Architectures
This instruction is available in A32 and T32.
In T32, 16-bit and 32-bit versions of this instruction are available.
Examples
CPSIE if
; Enable IRQ and FIQ interrupts.
CPSID A
; Disable imprecise aborts.
CPSID ai, #17 ; Disable imprecise aborts and interrupts, and enter
; FIQ mode.
CPS #16
; Enter User mode.

ARM DUI0801G

13-379

13 A32 and T32 Instructions
13.29 CPS

Related concepts
3.2 Processor modes, and privileged and unprivileged software execution on page 3-65.

ARM DUI0801G

13-380

13 A32 and T32 Instructions
13.30 CPY pseudo-instruction

13.30

CPY pseudo-instruction
Copy a value from one register to another.
Syntax
CPY{cond} Rd, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm

is the register holding the value to be copied.
Operation
The CPY pseudo-instruction copies a value from one register to another, without changing the condition
flags.
CPY Rd, Rm assembles to MOV Rd, Rm.

Architectures
This pseudo-instruction is available in A32 code and in T32 code.
Register restrictions
Using SP or PC for both Rd and Rm is deprecated.
Condition flags
This instruction does not change the condition flags.
Related references
13.63 MOV on page 13-431.

ARM DUI0801G

13-381

13 A32 and T32 Instructions
13.31 CRC32

13.31

CRC32
CRC32 performs a cyclic redundancy check (CRC) calculation on a value held in a general-purpose

register.
Syntax
CRC32B{q} Rd, Rn, Rm ; A1 Wd = CRC32(Wn, Rm[<7:0>])
CRC32H{q} Rd, Rn, Rm ; A1 Wd = CRC32(Wn, Rm[<15:0>])
CRC32W{q} Rd, Rn, Rm ; A1 Wd = CRC32(Wn, Rm[<31:0>])
CRC32B{q} Rd, Rn, Rm ; T1 Wd = CRC32(Wn, Rm[<7:0>])
CRC32H{q} Rd, Rn, Rm ; T1 Wd = CRC32(Wn, Rm[<15:0>])
CRC32W{q} Rd, Rn, Rm ; T1 Wd = CRC32(Wn, Rm[<31:0>])

Where:
q

See Standard assembler syntax fields in the ARMv8-A Architecture Reference Manual. A CRC32
instruction must be unconditional.
Rd

Is the general-purpose accumulator output register.
Rn

Is the general-purpose accumulator input register.
Rm

Is the general-purpose data source register.
Architectures supported
Supported in architecture ARMv8.1 and later. Optionally supported in ARMv8-A.
Usage
CRC32 takes an input CRC value in the first source operand, performs a CRC on the input value in the
second source operand, and returns the output CRC value. The second source operand can be 8, 16, or 32
bits. To align with common usage, the bit order of the values is reversed as part of the operation, and the
polynomial 0x04C11DB7 is used for the CRC calculation.
Note
ID_ISAR5.CRC32 indicates whether this instruction is supported in the T32 and A32 instruction sets.
Note
For more information about the CONSTRAINED UNPREDICTABLE behavior, see Architectural Constraints on
UNPREDICTABLE behaviors in the ARMv8-A Architecture Reference Manual.

Related references
13.31 CRC32 on page 13-382.
13.1 A32 and T32 instruction summary on page 13-332.

ARM DUI0801G

13-382

13 A32 and T32 Instructions
13.32 CRC32C

13.32

CRC32C
CRC32C performs a cyclic redundancy check (CRC) calculation on a value held in a general-purpose

register.
Syntax
CRC32CB{q} Rd, Rn, Rm ; A1 Wd = CRC32C(Wn, Rm[<7:0>])
CRC32CH{q} Rd, Rn, Rm ; A1 Wd = CRC32C(Wn, Rm[<15:0>])
CRC32CW{q} Rd, Rn, Rm ; A1 Wd = CRC32C(Wn, Rm[<31:0>])
CRC32CB{q} Rd, Rn, Rm ; T1 Wd = CRC32C(Wn, Rm[<7:0>])
CRC32CH{q} Rd, Rn, Rm ; T1 Wd = CRC32C(Wn, Rm[<15:0>])
CRC32CW{q} Rd, Rn, Rm ; T1 Wd = CRC32C(Wn, Rm[<31:0>])

Where:
q

See Standard assembler syntax fields in the ARMv8-A Architecture Reference Manual. A CRC32C
instruction must be unconditional.
Rd

Is the general-purpose accumulator output register.
Rn

Is the general-purpose accumulator input register.
Rm

Is the general-purpose data source register.
Architectures supported
Supported in architecture ARMv8.1 and later. Optionally supported in ARMv8-A.
Usage
CRC32C takes an input CRC value in the first source operand, performs a CRC on the input value in the

second source operand, and returns the output CRC value. The second source operand can be 8, 16, or 32
bits. To align with common usage, the bit order of the values is reversed as part of the operation, and the
polynomial 0x1EDC6F41 is used for the CRC calculation.
Note
ID_ISAR5.CRC32 indicates whether this instruction is supported in the T32 and A32 instruction sets.
Note
For more information about the CONSTRAINED UNPREDICTABLE behavior, see Architectural Constraints on
UNPREDICTABLE behaviors in the ARMv8-A Architecture Reference Manual.

Related references
13.31 CRC32 on page 13-382.
13.1 A32 and T32 instruction summary on page 13-332.

ARM DUI0801G

13-383

13 A32 and T32 Instructions
13.33 DBG

13.33

DBG
Debug.
Syntax
DBG{cond} {option}

where:
cond

is an optional condition code.
option

is an optional limitation on the operation of the hint. The range is 0-15.
Usage
DBG is a hint instruction. It is optional whether it is implemented or not. If it is not implemented, it
behaves as a NOP. The assembler produces a diagnostic message if the instruction executes as NOP on the

target.
Debug hint provides a hint to a debugger and related tools. See your debugger and related tools
documentation to determine the use, if any, of this instruction.
Architectures
This 32-bit instruction is available in A32 and T32.
There is no 16-bit version of this instruction in T32.
Related references
13.75 NOP on page 13-447.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-384

13 A32 and T32 Instructions
13.34 DCPS1 (T32 instruction)

13.34

DCPS1 (T32 instruction)
Debug switch to exception level 1 (EL1).
Note
This instruction is supported only in ARMv8.

Syntax
DCPS1

Usage
This instruction is valid in Debug state only, and is always UNDEFINED in Non-debug state.
DCPS1 targets EL1 and:
• If EL1 is using AArch32, the processing element (PE) enters SVC mode. If EL3 is using AArch32,
Secure SVC is an EL3 mode. This means DCPS1 causes the PE to enter EL3.
• If EL1 is using AArch64, the PE enters EL1h, and executes future instructions as A64 instructions.
In Non-debug state, use the SVC instruction to generate a trap to EL1.
Availability
This 32-bit instruction is available in T32 only.
There is no 16-bit version of this instruction in T32.
Related references
13.152 SVC on page 13-544.
13.35 DCPS2 (T32 instruction) on page 13-386.
13.36 DCPS3 (T32 instruction) on page 13-387.
Related information
ARM Architecture Reference Manual.

ARM DUI0801G

13-385

13 A32 and T32 Instructions
13.35 DCPS2 (T32 instruction)

13.35

DCPS2 (T32 instruction)
Debug switch to exception level 2.
Note
This instruction is supported only in ARMv8.

Syntax
DCPS2

Usage
This instruction is valid in Debug state only, and is always UNDEFINED in Non-debug state.
DCPS2 targets EL2 and:
• If EL2 is using AArch32, the PE enters Hyp mode.
• If EL2 is using AArch64, the PE enters EL2h, and executes future instructions as A64 instructions.
In Non-debug state, use the HVC instruction to generate a trap to EL2.
Availability
This 32-bit instruction is available in T32 only.
There is no 16-bit version of this instruction in T32.
Related references
13.43 HVC on page 13-397.
13.34 DCPS1 (T32 instruction) on page 13-385.
13.36 DCPS3 (T32 instruction) on page 13-387.
Related information
ARM Architecture Reference Manual.

ARM DUI0801G

13-386

13 A32 and T32 Instructions
13.36 DCPS3 (T32 instruction)

13.36

DCPS3 (T32 instruction)
Debug switch to exception level 3.
Note
This instruction is supported only in ARMv8.

Syntax
DCPS3

Usage
This instruction is valid in Debug state only, and is always UNDEFINED in Non-debug state.
DCPS3 targets EL3 and:
• If EL3 is using AArch32, the PE enters Monitor mode.
• If EL3 is using AArch64, the PE enters EL3h, and executes future instructions as A64 instructions.
In Non-debug state, use the SMC instruction to generate a trap to EL3.
Availability
This 32-bit instruction is available in T32 only.
There is no 16-bit version of this instruction in T32.
Related references
13.119 SMC on page 13-500.
13.35 DCPS2 (T32 instruction) on page 13-386.
13.34 DCPS1 (T32 instruction) on page 13-385.
Related information
ARM Architecture Reference Manual.

ARM DUI0801G

13-387

13 A32 and T32 Instructions
13.37 DMB

13.37

DMB
Data Memory Barrier.
Syntax
DMB{cond} {option}

where:
cond

is an optional condition code.
Note
cond is permitted only in T32 code. This is an unconditional instruction in A32.
option

is an optional limitation on the operation of the hint. Permitted values are:
SY

Full system barrier operation. This is the default and can be omitted.
LD

Barrier operation that waits only for loads to complete.
ST

Barrier operation that waits only for stores to complete.
ISH

Barrier operation only to the inner shareable domain.
ISHLD

Barrier operation that waits only for loads to complete, and only applies to the inner
shareable domain.
ISHST

Barrier operation that waits only for stores to complete, and only to the inner shareable
domain.
NSH

Barrier operation only out to the point of unification.
NSHLD

Barrier operation that waits only for loads to complete and only applies out to the point
of unification.
NSHST

Barrier operation that waits only for stores to complete and only out to the point of
unification.
OSH

Barrier operation only to the outer shareable domain.
OSHLD
DMB operation that waits only for loads to complete, and only applies to the outer

shareable domain.
OSHST

Barrier operation that waits only for stores to complete, and only to the outer shareable
domain.
Note
The options LD, ISHLD, NSHLD, and OSHLD are supported only in ARMv8.

Operation
Data Memory Barrier acts as a memory barrier. It ensures that all explicit memory accesses that appear in
program order before the DMB instruction are observed before any explicit memory accesses that appear

ARM DUI0801G

13-388

13 A32 and T32 Instructions
13.37 DMB

in program order after the DMB instruction. It does not affect the ordering of any other instructions
executing on the processor.
Alias
The following alternative values of option are supported, but ARM recommends that you do not use
them:
• SH is an alias for ISH.
• SHST is an alias for ISHST.
• UN is an alias for NSH.
• UNST is an alias for NSHST.
Architectures
This 32-bit instruction is available in A32 and T32.
There is no 16-bit version of this instruction in T32.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-389

13 A32 and T32 Instructions
13.38 DSB

13.38

DSB
Data Synchronization Barrier.
Syntax
DSB{cond} {option}

where:
cond

is an optional condition code.
Note
cond is permitted only in T32 code. This is an unconditional instruction in A32.
option

is an optional limitation on the operation of the hint. Permitted values are:
SY

Full system barrier operation. This is the default and can be omitted.
LD

Barrier operation that waits only for loads to complete.
ST

Barrier operation that waits only for stores to complete.
ISH

Barrier operation only to the inner shareable domain.
ISHLD

Barrier operation that waits only for loads to complete, and only applies to the inner
shareable domain.
ISHST

Barrier operation that waits only for stores to complete, and only to the inner shareable
domain.
NSH

Barrier operation only out to the point of unification.
NSHLD

Barrier operation that waits only for loads to complete and only applies out to the point
of unification.
NSHST

Barrier operation that waits only for stores to complete and only out to the point of
unification.
OSH

Barrier operation only to the outer shareable domain.
OSHLD
DMB operation that waits only for loads to complete, and only applies to the outer

shareable domain.
OSHST

Barrier operation that waits only for stores to complete, and only to the outer shareable
domain.
Note
The options LD, ISHLD, NSHLD, and OSHLD are supported only in ARMv8.

ARM DUI0801G

13-390

13 A32 and T32 Instructions
13.38 DSB

Operation
Data Synchronization Barrier acts as a special kind of memory barrier. No instruction in program order
after this instruction executes until this instruction completes. This instruction completes when:
• All explicit memory accesses before this instruction complete.
• All Cache, Branch predictor and TLB maintenance operations before this instruction complete.
Alias
The following alternative values of option are supported for DSB, but ARM recommends that you do not
use them:
• SH is an alias for ISH.
• SHST is an alias for ISHST.
• UN is an alias for NSH.
• UNST is an alias for NSHST.
Architectures
This 32-bit instruction is available in A32 and T32.
There is no 16-bit version of this instruction in T32.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-391

13 A32 and T32 Instructions
13.39 EOR

13.39

EOR
Logical Exclusive OR.
Syntax
EOR{S}{cond} Rd, Rn, Operand2

where:
S

is an optional suffix. If S is specified, the condition flags are updated on the result of the
operation.
cond

is an optional condition code.
Rd

is the destination register.
Rn

is the register holding the first operand.
Operand2

is a flexible second operand.
Operation
The EOR instruction performs bitwise Exclusive OR operations on the values in Rn and Operand2.
Use of PC in T32 instructions
You cannot use PC (R15) for Rd or any operand in an EOR instruction.
Use of PC and SP in A32 instructions
You can use PC and SP with the EOR instruction but they are deprecated.
If you use PC as Rn, the value used is the address of the instruction plus 8.
If you use PC as Rd:
• Execution branches to the address corresponding to the result.
• If you use the S suffix, see the SUBS pc,lr instruction.
You cannot use PC for any operand in any data processing instruction that has a register-controlled shift.
Condition flags
If S is specified, the EOR instruction:
• Updates the N and Z flags according to the result.
• Can update the C flag during the calculation of Operand2.
• Does not affect the V flag.
16-bit instructions
The following forms of the EOR instruction are available in T32 code, and are 16-bit instructions:
EORS Rd, Rd, Rm
Rd and Rm must both be Lo registers. This form can only be used outside an IT block.
EOR{cond} Rd, Rd, Rm
Rd and Rm must both be Lo registers. This form can only be used inside an IT block.

It does not matter if you specify EOR{S} Rd, Rm, Rd. The instruction is the same.
Correct examples
EORS
EORS

ARM DUI0801G

r0,r0,r3,ROR r6
r7, r11, #0x18181818

13-392

13 A32 and T32 Instructions
13.39 EOR

Incorrect example
EORS

r0,pc,r3,ROR r6

; PC not permitted with register
; controlled shift

Related references
13.3 Flexible second operand (Operand2) on page 13-338.
13.151 SUBS pc, lr on page 13-542.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-393

13 A32 and T32 Instructions
13.40 ERET

13.40

ERET
Exception Return.
Syntax
ERET{cond}

where:
cond

is an optional condition code.
Usage
In a processor that implements the Virtualization Extensions, you can use ERET to perform a return from
an exception taken to Hyp mode.
Operation
When executed in Hyp mode, ERET loads the PC from ELR_hyp and loads the CPSR from SPSR_hyp.
When executed in any other mode, apart from User or System, it behaves as:
• MOVS PC, LR in the A32 instruction set.
• SUBS PC, LR, #0 in the T32 instruction set.
Notes
You must not use ERET in User or System mode. The assembler cannot warn you about this because it
has no information about what the processor mode is likely to be at execution time.
ERET is the preferred synonym for SUBS PC, LR, #0 in the T32 instruction set.

Architectures
This 32-bit instruction is available in A32 and T32.
There is no 16-bit version of this instruction in T32.
Related concepts
3.2 Processor modes, and privileged and unprivileged software execution on page 3-65.
Related references
13.63 MOV on page 13-431.
13.151 SUBS pc, lr on page 13-542.
7.11 Condition code suffixes on page 7-150.
13.43 HVC on page 13-397.

ARM DUI0801G

13-394

13 A32 and T32 Instructions
13.41 ESB

13.41

ESB
Error Synchronization Barrier.
Syntax
ESB{c}{q} ; A1 general registers (A32)
ESB{c}.W ; T1 general registers (T32)

Where:
q

See Standard assembler syntax fields in the ARMv8-A Architecture Reference Manual.
c

See Standard assembler syntax fields in the ARMv8-A Architecture Reference Manual.
Usage
Error Synchronization Barrier.
Related references
13.1 A32 and T32 instruction summary on page 13-332.

ARM DUI0801G

13-395

13 A32 and T32 Instructions
13.42 HLT

13.42

HLT
Halting breakpoint.
Note
This instruction is supported only in ARMv8.

Syntax
HLT{Q} #imm

Where:
Q

is an optional suffix. It only has an effect when Halting debug-mode is disabled. In this case, if Q
is specified, the instruction behaves as a NOP. If Q is not specified, the instruction is UNDEFINED.
imm

is an expression evaluating to an integer in the range:
• 0-65535 (a 16-bit value) in an A32 instruction.
• 0-63 (a 6-bit value) in a 16-bit T32 instruction.
Usage
The HLT instruction causes the processor to enter Debug state if Halting debug-mode is enabled.
In both A32 state and T32 state, imm is ignored by the ARM hardware. However, a debugger can use it
to store additional information about the breakpoint.
HLT is an unconditional instruction. It must not have a condition code in A32 code. In T32 code, the HLT

instruction does not require a condition code suffix because it always executes irrespective of its
condition code suffix.
Availability
This instruction is available in A32 and T32.
In T32, it is only available as a 16-bit instruction.
Related references
13.75 NOP on page 13-447.

ARM DUI0801G

13-396

13 A32 and T32 Instructions
13.43 HVC

13.43

HVC
Hypervisor Call.
Syntax
HVC #imm

where:
imm

is an expression evaluating to an integer in the range 0-65535.
Operation
In a processor that implements the Virtualization Extensions, the HVC instruction causes a Hypervisor
Call exception. This means that the processor enters Hyp mode, the CPSR value is saved to the Hyp
mode SPSR, and execution branches to the HVC vector.
HVC must not be used if the processor is in Secure state, or in User mode in Non-secure state.
imm is ignored by the processor. However, it can be retrieved by the exception handler to determine what

service is being requested.
HVC cannot be conditional, and is not permitted in an IT block.

Notes
The ERET instruction performs an exception return from Hyp mode.
Architectures
This 32-bit instruction is available in A32 and T32. It is available in ARMv7 architectures that include
the Virtualization Extensions.
There is no 16-bit version of this instruction in T32.
Related concepts
3.2 Processor modes, and privileged and unprivileged software execution on page 3-65.
Related references
13.40 ERET on page 13-394.

ARM DUI0801G

13-397

13 A32 and T32 Instructions
13.44 ISB

13.44

ISB
Instruction Synchronization Barrier.
Syntax
ISB{cond} {option}

where:
cond

is an optional condition code.
Note
cond is permitted only in T32 code. This is an unconditional instruction in A32.
option

is an optional limitation on the operation of the hint. The permitted value is:
SY

Full system barrier operation. This is the default and can be omitted.
Operation
Instruction Synchronization Barrier flushes the pipeline in the processor, so that all instructions following
the ISB are fetched from cache or memory, after the instruction has been completed. It ensures that the
effects of context altering operations, such as changing the ASID, or completed TLB maintenance
operations, or branch predictor maintenance operations, in addition to all changes to the CP15 registers,
executed before the ISB instruction are visible to the instructions fetched after the ISB.
In addition, the ISB instruction ensures that any branches that appear in program order after it are always
written into the branch prediction logic with the context that is visible after the ISB instruction. This is
required to ensure correct execution of the instruction stream.
Note
When the target architecture is ARMv7-M, you cannot use an ISB instruction in an IT block, unless it is
the last instruction in the block.

Architectures
This 32-bit instructions are available in A32 and T32.
There is no 16-bit version of this instruction in T32.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-398

13 A32 and T32 Instructions
13.45 IT

13.45

IT
The IT (If-Then) instruction makes a single following instruction (the IT block) conditional. The
conditional instruction must be from a restricted set of 16-bit instructions.
Syntax
IT cond

where:
cond

specifies the condition for the following instruction.
Deprecated syntax
IT{x{y{z}}} {cond}

where:
x

specifies the condition switch for the second instruction in the IT block.
y

specifies the condition switch for the third instruction in the IT block.
z

specifies the condition switch for the fourth instruction in the IT block.
cond

specifies the condition for the first instruction in the IT block.
The condition switches for the second, third, and fourth instructions in the IT block can be either:
T

Then. Applies the condition cond to the instruction.
E

Else. Applies the inverse condition of cond to the instruction.
Usage
The IT block can contain between two and four conditional instructions, where the conditions can be all
the same, or some of them can be the logical inverse of the others, but this is deprecated in ARMv8.
The conditional instruction (including branches, but excluding the BKPT instruction) must specify the
condition in the {cond} part of its syntax.
You are not required to write IT instructions in your code, because the assembler generates them for you
automatically according to the conditions specified on the following instructions. However, if you do
write IT instructions, the assembler validates the conditions specified in the IT instructions against the
conditions specified in the following instructions.
Writing the IT instructions ensures that you consider the placing of conditional instructions, and the
choice of conditions, in the design of your code.
When assembling to A32 code, the assembler performs the same checks, but does not generate any IT
instructions.
With the exception of CMP, CMN, and TST, the 16-bit instructions that normally affect the condition flags,
do not affect them when used inside an IT block.

ARM DUI0801G

13-399

13 A32 and T32 Instructions
13.45 IT

A BKPT instruction in an IT block is always executed, so it does not require a condition in the {cond} part
of its syntax. The IT block continues from the next instruction. Using a BKPT or HLT instruction inside an
IT block is deprecated.
Note
You can use an IT block for unconditional instructions by using the AL condition.
Conditional branches inside an IT block have a longer branch range than those outside the IT block.
Restrictions
The following instructions are not permitted in an IT block:
•
•
•
•
•

IT.
CBZ and CBNZ.
TBB and TBH.
CPS, CPSID and CPSIE.
SETEND.

Other restrictions when using an IT block are:
• A branch or any instruction that modifies the PC is only permitted in an IT block if it is the last
instruction in the block.
• You cannot branch to any instruction in an IT block, unless when returning from an exception
handler.
• You cannot use any assembler directives in an IT block.
Note
armasm shows a diagnostic message when any of these instructions are used in an IT block.

Using any instruction not listed in the following table in an IT block is deprecated. Also, any explicit
reference to R15 (the PC) in the IT block is deprecated.
Table 13-9 Permitted instructions inside an IT block
16-bit instruction

When deprecated

MOV, MVN

When Rm or Rd is the PC

LDR, LDRB, LDRH, LDRSB, LDRSH

For PC-relative forms

STR, STRB, STRH

ADD, ADC, RSB, SBC, SUB

ADD SP, SP, #imm or SUB SP, SP, #imm or when Rm, Rdn
or Rdm is the PC

CMP, CMN

When Rm or Rn is the PC

MUL

ASR, LSL, LSR, ROR

AND, BIC, EOR, ORR, TST

BX, BLX

When Rm is the PC

Condition flags
This instruction does not change the flags.

ARM DUI0801G

13-400

13 A32 and T32 Instructions
13.45 IT

Exceptions
Exceptions can occur between an IT instruction and the corresponding IT block, or within an IT block.
This exception results in entry to the appropriate exception handler, with suitable return information in
LR and SPSR.
Instructions designed for use as exception returns can be used as normal to return from the exception,
and execution of the IT block resumes correctly. This is the only way that a PC-modifying instruction
can branch to an instruction in an IT block.
Availability
This 16-bit instruction is available in T32 only.
In A32 code, IT is a pseudo-instruction that does not generate any code.
There is no 32-bit version of this instruction.
Correct examples
IT
LDRGT

GT
r0, [r1,#4]

IT
ADDEQ

EQ
r0, r1, r2

Incorrect examples

ARM DUI0801G

IT
ADD

NE
r0,r0,r1

; syntax error: no condition code used in IT block

ITT
MOVEQ
ADDEQ

EQ
r0,r1
r0,r0,#1

; IT block covering more than one instruction is deprecated

IT
LDRGT

GT
r0,label

; LDR (PC-relative) is deprecated in an IT block

IT
ADDEQ

EQ
PC,r0

; ADD is deprecated when Rdn is the PC

13-401

13 A32 and T32 Instructions
13.46 LDA

13.46

LDA
Load-Acquire Register.
Note
This instruction is supported only in ARMv8.

Syntax
LDA{cond} Rt, [Rn]
LDAB{cond} Rt, [Rn]
LDAH{cond} Rt, [Rn]

where:
cond

is an optional condition code.
Rt

is the register to load.
Rn

is the register on which the memory address is based.
Operation
LDA loads data from memory. If any loads or stores appear after a load-acquire in program order, then all

observers are guaranteed to observe the load-acquire before observing the loads and stores. Loads and
stores appearing before a load-acquire are unaffected.
If a store-release follows a load-acquire, each observer is guaranteed to observe them in program order.
There is no requirement that a load-acquire be paired with a store-release.
Restrictions
The address specified must be naturally aligned, or an alignment fault is generated.
The PC must not be used for Rt or Rn.
Availability
This 32-bit instruction is available in A32 and T32.
There is no 16-bit version of this instruction.
Related references
13.47 LDAEX on page 13-403.
13.143 STL on page 13-527.
13.144 STLEX on page 13-528.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-402

13 A32 and T32 Instructions
13.47 LDAEX

13.47

LDAEX
Load-Acquire Register Exclusive.
Note
This instruction is supported only in ARMv8.

Syntax
LDAEX{cond} Rt, [Rn]
LDAEXB{cond} Rt, [Rn]
LDAEXH{cond} Rt, [Rn]
LDAEXD{cond} Rt, Rt2, [Rn]

where:
cond

is an optional condition code.
Rt

is the register to load.
Rt2

is the second register for doubleword loads.
Rn

is the register on which the memory address is based.
Operation
LDAEX loads data from memory.

•

•
•

If the physical address has the Shared TLB attribute, LDAEX tags the physical address as exclusive
access for the current processor, and clears any exclusive access tag for this processor for any other
physical address.
Otherwise, it tags the fact that the executing processor has an outstanding tagged physical address.
If any loads or stores appear after LDAEX in program order, then all observers are guaranteed to
observe the LDAEX before observing the loads and stores. Loads and stores appearing before LDAEX
are unaffected.

Restrictions
The PC must not be used for any of Rt, Rt2, or Rn.
For A32 instructions:
•
•
•

SP can be used but use of SP for any of Rt, or Rt2 is deprecated.
For LDAEXD, Rt must be an even numbered register, and not LR.
Rt2 must be R(t+1).

For T32 instructions:
• SP can be used for Rn, but must not be used for any of Rt, or Rt2.
• For LDAEXD, Rt and Rt2 must not be the same register.
Usage
Use LDAEX and STLEX to implement interprocess communication in multiple-processor and sharedmemory systems.

ARM DUI0801G

13-403

13 A32 and T32 Instructions
13.47 LDAEX

For reasons of performance, keep the number of instructions between corresponding LDAEX and STLEX
instructions to a minimum.
Note
The address used in a STLEX instruction must be the same as the address in the most recently executed
LDAEX instruction.

Availability
These 32-bit instructions are available in A32 and T32.
There are no 16-bit versions of these instructions.
Related concepts
8.15 Address alignment in A32/T32 code on page 8-178.
Related references
13.143 STL on page 13-527.
13.46 LDA on page 13-402.
13.144 STLEX on page 13-528.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-404

13 A32 and T32 Instructions
13.48 LDC and LDC2

13.48

LDC and LDC2
Transfer Data from memory to Coprocessor.
Note
LDC2 is not supported in ARMv8.

Syntax
op{L}{cond} coproc, CRd, [Rn]
op{L}{cond} coproc, CRd, [Rn, #{-}offset] ; offset addressing
op{L}{cond} coproc, CRd, [Rn, #{-}offset]! ; pre-index addressing
op{L}{cond} coproc, CRd, [Rn], #{-}offset ; post-index addressing
op{L}{cond} coproc, CRd, label
op{L}{cond} coproc, CRd, [Rn], {option}

where:
op

is LDC or LDC2.
cond

is an optional condition code.
In A32 code, cond is not permitted for LDC2.
L

is an optional suffix specifying a long transfer.
coproc

is the name of the coprocessor the instruction is for. The standard name is pn, where n is an
integer whose value must be:
• In the range 0 to 15 in ARMv7 and earlier.
• 14 in ARMv8.
CRd

is the coprocessor register to load.
Rn

is the register on which the memory address is based. If PC is specified, the value used is the
address of the current instruction plus eight.
-

is an optional minus sign. If - is present, the offset is subtracted from Rn. Otherwise, the offset is
added to Rn.
offset

is an expression evaluating to a multiple of 4, in the range 0 to 1020.
!

is an optional suffix. If ! is present, the address including the offset is written back into Rn.
label

is a word-aligned PC-relative expression.
label must be within 1020 bytes of the current instruction.
option

is a coprocessor option in the range 0-255, enclosed in braces.
Usage
The use of these instructions depends on the coprocessor. See the coprocessor documentation for details.
ARM DUI0801G

13-405

13 A32 and T32 Instructions
13.48 LDC and LDC2

Architectures
These 32-bit instructions are available in A32 and T32.
There are no 16-bit versions of these instructions in T32.
Register restrictions
You cannot use PC for Rn in the pre-index and post-index instructions. These are the forms that write
back to Rn.
Related concepts
12.5 Register-relative and PC-relative expressions on page 12-302.
8.15 Address alignment in A32/T32 code on page 8-178.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-406

13 A32 and T32 Instructions
13.49 LDM

13.49

LDM
Load Multiple registers.
Syntax
LDM{addr_mode}{cond} Rn{!}, reglist{^}

where:
addr_mode

is any one of the following:
IA

Increment address After each transfer. This is the default, and can be omitted.
IB

Increment address Before each transfer (A32 only).
DA

Decrement address After each transfer (A32 only).
DB

Decrement address Before each transfer.
You can also use the stack oriented addressing mode suffixes, for example, when implementing
stacks.
cond

is an optional condition code.
Rn

is the base register, the ARM register holding the initial address for the transfer. Rn must not be
PC.
!

is an optional suffix. If ! is present, the final address is written back into Rn.
reglist

is a list of one or more registers to be loaded, enclosed in braces. It can contain register ranges.
It must be comma separated if it contains more than one register or register range. Any
combination of registers R0 to R15 (PC) can be transferred in A32 state, but there are some
restrictions in T32 state.
^

is an optional suffix, available in A32 state only. You must not use it in User mode or System
mode. It has the following purposes:
• If reglist contains the PC (R15), in addition to the normal multiple register transfer, the
SPSR is copied into the CPSR. This is for returning from exception handlers. Use this only
from exception modes.
• Otherwise, data is transferred into or out of the User mode registers instead of the current
mode registers.
Restrictions on reglist in 32-bit T32 instructions
In 32-bit T32 instructions:
• The SP cannot be in the list.
• The PC and LR cannot both be in the list.
• There must be two or more registers in the list.
If you write an LDM instruction with only one register in reglist, the assembler automatically substitutes
the equivalent LDR instruction. Be aware of this when comparing disassembly listings with source code.
You can use the --diag_warning 1645 assembler command line option to check when an instruction
substitution occurs.

ARM DUI0801G

13-407

13 A32 and T32 Instructions
13.49 LDM

Restrictions on reglist in A32 instructions
A32 load instructions can have SP and PC in the reglist but these instructions that include SP in the
reglist or both PC and LR in the reglist are deprecated.
16-bit instructions
16-bit versions of a subset of these instructions are available in T32 code.
The following restrictions apply to the 16-bit instructions:
• All registers in reglist must be Lo registers.
• Rn must be a Lo register.
• addr_mode must be omitted (or IA), meaning increment address after each transfer.
• Writeback must be specified for LDM instructions where Rn is not in the reglist.
In addition, the PUSH and POP instructions are subsets of the STM and LDM instructions and can therefore
be expressed using the STM and LDM instructions. Some forms of PUSH and POP are also 16-bit
instructions.
Loading to the PC
A load to the PC causes a branch to the instruction at the address loaded.
Also:
• Bits[1:0] must not be 0b10.
• If bit[0] is 1, execution continues in T32 state.
• If bit[0] is 0, execution continues in A32 state.
Loading or storing the base register, with writeback
In A32 or 16-bit T32 instructions, if Rn is in reglist, and writeback is specified with the ! suffix:
• If the instruction is STM{addr_mode}{cond} and Rn is the lowest-numbered register in reglist, the
initial value of Rn is stored. These instructions are deprecated.
• Otherwise, the loaded or stored value of Rn cannot be relied on, so these instructions are not
permitted.
32-bit T32 instructions are not permitted if Rn is in reglist, and writeback is specified with the ! suffix.
Correct example
LDM

r8,{r0,r2,r9}

; LDMIA is a synonym for LDM

Incorrect example
LDMDA

r2, {}

; must be at least one register in list

Related concepts
6.16 Stack implementation using LDM and STM on page 6-122.
8.15 Address alignment in A32/T32 code on page 8-178.
Related references
13.80 POP on page 13-455.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-408

13 A32 and T32 Instructions
13.50 LDR (immediate offset)

13.50

LDR (immediate offset)
Load with immediate offset, pre-indexed immediate offset, or post-indexed immediate offset.
Syntax
LDR{type}{cond} Rt, [Rn {, #offset}] ; immediate offset
LDR{type}{cond} Rt, [Rn, #offset]! ; pre-indexed
LDR{type}{cond} Rt, [Rn], #offset ; post-indexed
LDRD{cond} Rt, Rt2, [Rn {, #offset}] ; immediate offset, doubleword
LDRD{cond} Rt, Rt2, [Rn, #offset]! ; pre-indexed, doubleword
LDRD{cond} Rt, Rt2, [Rn], #offset ; post-indexed, doubleword

where:
type

can be any one of:
B

unsigned Byte (Zero extend to 32 bits on loads.)
SB

signed Byte (LDR only. Sign extend to 32 bits.)
H

unsigned Halfword (Zero extend to 32 bits on loads.)
SH

signed Halfword (LDR only. Sign extend to 32 bits.)
-

omitted, for Word.
cond

is an optional condition code.
Rt

is the register to load.
Rn

is the register on which the memory address is based.
offset

is an offset. If offset is omitted, the address is the contents of Rn.
Rt2

is the additional register to load for doubleword operations.
Not all options are available in every instruction set and architecture.
Offset ranges and architectures
The following table shows the ranges of offsets and availability of these instructions:
Table 13-10 Offsets and architectures, LDR, word, halfword, and byte
Instruction

Immediate offset Pre-indexed

Post-indexed

A32, word or byte h

–4095 to 4095

A32, signed byte, halfword, or signed halfword

–255 to 255

A32, doubleword

–255 to 255

T32 32-bit encoding, word, halfword, signed halfword, byte, or signed byte h –255 to 4095
h

ARM DUI0801G

13-409

13 A32 and T32 Instructions
13.50 LDR (immediate offset)

Table 13-10 Offsets and architectures, LDR, word, halfword, and byte (continued)
Instruction

Immediate offset Pre-indexed
i

–1020 to 1020

Post-indexed
i

–1020 to 1020 i

T32 32-bit encoding, doubleword

–1020 to 1020

T32 16-bit encoding, word j

0 to 124 i

Not available

T32 16-bit encoding, unsigned halfword j

0 to 62 k

Not available

T32 16-bit encoding, unsigned byte j

0 to 31

Not available

T32 16-bit encoding, word, Rn is SP l

0 to 1020 i

Not available

Doubleword register restrictions
Rn must be different from Rt2 in the pre-index and post-index forms.

For T32 instructions, you must not specify SP or PC for either Rt or Rt2.
For A32 instructions:
• Rt must be an even-numbered register.
• Rt must not be LR.
• ARM strongly recommends that you do not use R12 for Rt.
• Rt2 must be R(t + 1).
Use of PC
In A32 code you can use PC for Rt in LDR word instructions and PC for Rn in LDR instructions.
Other uses of PC are not permitted in these A32 instructions.
In T32 code you can use PC for Rt in LDR word instructions and PC for Rn in LDR instructions. Other uses
of PC in these T32 instructions are not permitted.
Use of SP
You can use SP for Rn.
In A32 code, you can use SP for Rt in word instructions. You can use SP for Rt in non-word instructions
in A32 code but this is deprecated.
In T32 code, you can use SP for Rt in word instructions only. All other use of SP for Rt in these
instructions are not permitted in T32 code.
Examples
LDR
LDRNE

r8,[r10]
r2,[r5,#960]!

;
;
;
;

loads R8 from the address in R10.
(conditionally) loads R2 from a word
960 bytes above the address in R5, and
increments R5 by 960.

Related concepts
8.15 Address alignment in A32/T32 code on page 8-178.
Related references
7.11 Condition code suffixes on page 7-150.
i
j
k
l

Must be divisible by 4.
Rt and Rn must be in the range R0-R7.
Must be divisible by 2.
Rt must be in the range R0-R7.

ARM DUI0801G

13-410

13 A32 and T32 Instructions
13.51 LDR (PC-relative)

13.51

LDR (PC-relative)
Load register. The address is an offset from the PC.
Syntax
LDR{type}{cond}{.W} Rt, label
LDRD{cond} Rt, Rt2, label ; Doubleword

where:
type

can be any one of:
B

unsigned Byte (Zero extend to 32 bits on loads.)
SB

signed Byte (LDR only. Sign extend to 32 bits.)
H

unsigned Halfword (Zero extend to 32 bits on loads.)
SH

signed Halfword (LDR only. Sign extend to 32 bits.)
-

omitted, for Word.
cond

is an optional condition code.
.W

is an optional instruction width specifier.
Rt

is the register to load or store.
Rt2

is the second register to load or store.
label

is a PC-relative expression.
label must be within a limited distance of the current instruction.

Note
Equivalent syntaxes are available for the STR instruction in A32 code but they are deprecated.

Offset range and architectures
The assembler calculates the offset from the PC for you. The assembler generates an error if label is out
of range.
The following table shows the possible offsets between the label and the current instruction:
Table 13-11 PC-relative offsets
Instruction
A32 LDR, LDRB, LDRSB, LDRH, LDRSH
A32 LDRD

Offset range
m

+/– 4095
+/– 255

ARM DUI0801G

13-411

13 A32 and T32 Instructions
13.51 LDR (PC-relative)

Table 13-11 PC-relative offsets (continued)
Instruction
32-bit T32 LDR, LDRB, LDRSB, LDRH, LDRSH

Offset range
m

+/– 4095

32-bit T32 LDRD n

+/– 1020 o

16-bit T32 LDR p

0-1020 o

LDR (PC-relative) in T32
You can use the .W width specifier to force LDR to generate a 32-bit instruction in T32 code. LDR.W
always generates a 32-bit instruction, even if the target could be reached using a 16-bit LDR.
For forward references, LDR without .W always generates a 16-bit instruction in T32 code, even if that
results in failure for a target that could be reached using a 32-bit T32 LDR instruction.
Doubleword register restrictions
For 32-bit T32 instructions, you must not specify SP or PC for either Rt or Rt2.
For A32 instructions:
• Rt must be an even-numbered register.
• Rt must not be LR.
• ARM strongly recommends that you do not use R12 for Rt.
• Rt2 must be R(t + 1).
Use of SP
In A32 code, you can use SP for Rt in LDR word instructions. You can use SP for Rt in LDR non-word
A32 instructions but this is deprecated.
In T32 code, you can use SP for Rt in LDR word instructions only. All other uses of SP in these
instructions are not permitted in T32 code.
Related concepts
12.5 Register-relative and PC-relative expressions on page 12-302.
8.15 Address alignment in A32/T32 code on page 8-178.
Related references
7.11 Condition code suffixes on page 7-150.

m
n
o
p

For word loads, Rt can be the PC. A load to the PC causes a branch to the address loaded. In ARMv4, bits[1:0] of the address loaded must be 0b00. In ARMv5T and
above, bits[1:0] must not be 0b10, and if bit[0] is 1, execution continues in T32 state, otherwise execution continues in A32 state.
In ARMv7-M, LDRD (PC-relative) instructions must be on a word-aligned address.
Must be a multiple of 4.
Rt must be in the range R0-R7. There are no byte, halfword, or doubleword 16-bit instructions.

ARM DUI0801G

13-412

13 A32 and T32 Instructions
13.52 LDR (register offset)

13.52

LDR (register offset)
Load with register offset, pre-indexed register offset, or post-indexed register offset.
Syntax
LDR{type}{cond} Rt, [Rn, ±Rm {, shift}] ; register offset
LDR{type}{cond} Rt, [Rn, ±Rm {, shift}]! ; pre-indexed ; A32 only
LDR{type}{cond} Rt, [Rn], ±Rm {, shift} ; post-indexed ; A32 only
LDRD{cond} Rt, Rt2, [Rn, ±Rm] ; register offset, doubleword ; A32 only
LDRD{cond} Rt, Rt2, [Rn, ±Rm]! ; pre-indexed, doubleword ; A32 only
LDRD{cond} Rt, Rt2, [Rn], ±Rm ; post-indexed, doubleword ; A32 only

where:
type

can be any one of:
B

unsigned Byte (Zero extend to 32 bits on loads.)
SB

signed Byte (LDR only. Sign extend to 32 bits.)
H

unsigned Halfword (Zero extend to 32 bits on loads.)
SH

signed Halfword (LDR only. Sign extend to 32 bits.)
-

omitted, for Word.
cond

is an optional condition code.
Rt

is the register to load.
Rn

is the register on which the memory address is based.
Rm

is a register containing a value to be used as the offset. –Rm is not permitted in T32 code.
shift

is an optional shift.
Rt2

is the additional register to load for doubleword operations.
Not all options are available in every instruction set and architecture.
Offset register and shift options
The following table shows the ranges of offsets and availability of these instructions:
Table 13-12 Options and architectures, LDR (register offsets)
Instruction

+/–Rm q shift

A32, word or byte r

+/–Rm

LSL #0-31 LSR #1-32
ASR #1-32 ROR #1-31 RRX

q
r

Where +/–Rm is shown, you can use –Rm, +Rm, or Rm. Where +Rm is shown, you cannot use –Rm.
For word loads, Rt can be the PC. A load to the PC causes a branch to the address loaded. In ARMv4, bits[1:0] of the address loaded must be 0b00. In ARMv5T and
above, bits[1:0] must not be 0b10, and if bit[0] is 1, execution continues in T32 state, otherwise execution continues in A32 state.

ARM DUI0801G

13-413

13 A32 and T32 Instructions
13.52 LDR (register offset)

Table 13-12 Options and architectures, LDR (register offsets) (continued)
Instruction

+/–Rm q shift

A32, signed byte, halfword, or signed halfword

+/–Rm

Not available

A32, doubleword

+/–Rm

Not available

T32 32-bit encoding, word, halfword, signed halfword, byte, or signed byte r +Rm

LSL #0-3

T32 16-bit encoding, all except doubleword s

Not available

+Rm

Register restrictions
In the pre-index and post-index forms, Rn must be different from Rt.
Doubleword register restrictions
For A32 instructions:
• Rt must be an even-numbered register.
• Rt must not be LR.
• ARM strongly recommends that you do not use R12 for Rt.
• Rt2 must be R(t + 1).
• Rm must be different from Rt and Rt2 in LDRD instructions.
• Rn must be different from Rt2 in the pre-index and post-index forms.
Use of PC
In A32 instructions you can use PC for Rt in LDR word instructions, and you can use PC for Rn in LDR
instructions with register offset syntax (that is the forms that do not writeback to the Rn).
Other uses of PC are not permitted in A32 instructions.
In T32 instructions you can use PC for Rt in LDR word instructions. Other uses of PC in these T32
instructions are not permitted.
Use of SP
You can use SP for Rn.
In A32 code, you can use SP for Rt in word instructions. You can use SP for Rt in non-word A32
instructions but this is deprecated.
You can use SP for Rm in A32 instructions but this is deprecated.
In T32 code, you can use SP for Rt in word instructions only. All other use of SP for Rt in these
instructions are not permitted in T32 code.
Use of SP for Rm is not permitted in T32 state.
Related concepts
8.15 Address alignment in A32/T32 code on page 8-178.
Related references
7.11 Condition code suffixes on page 7-150.

q
s

Where +/–Rm is shown, you can use –Rm, +Rm, or Rm. Where +Rm is shown, you cannot use –Rm.
Rt, Rn, and Rm must all be in the range R0-R7.

ARM DUI0801G

13-414

13 A32 and T32 Instructions
13.53 LDR (register-relative)

13.53

LDR (register-relative)
Load register. The address is an offset from a base register.
Syntax
LDR{type}{cond}{.W} Rt, label
LDRD{cond} Rt, Rt2, label ; Doubleword

where:
type

can be any one of:
B

unsigned Byte (Zero extend to 32 bits on loads.)
SB

signed Byte (LDR only. Sign extend to 32 bits.)
H

unsigned Halfword (Zero extend to 32 bits on loads.)
SH

signed Halfword (LDR only. Sign extend to 32 bits.)
-

omitted, for Word.
cond

is an optional condition code.
.W

is an optional instruction width specifier.
Rt

is the register to load or store.
Rt2

is the second register to load or store.
label

is a symbol defined by the FIELD directive. label specifies an offset from the base register
which is defined using the MAP directive.
label must be within a limited distance of the value in the base register.

Offset range and architectures
The assembler calculates the offset from the base register for you. The assembler generates an error if
label is out of range.
The following table shows the possible offsets between the label and the current instruction:
Table 13-13 Register-relative offsets
Instruction
A32 LDR, LDRB

Offset range
t

+/– 4095

A32 LDRSB, LDRH, LDRSH

+/– 255

A32 LDRD

+/– 255

T32, 32-bit LDR, LDRB, LDRSB, LDRH, LDRSH t

–255 to 4095

t
u
v
w
x

For word loads, Rt can be the PC. A load to the PC causes a branch to the address loaded. In ARMv4, bits[1:0] of the address loaded must be 0b00. In ARMv5T and
above, bits[1:0] must not be 0b10, and if bit[0] is 1, execution continues in T32 state, otherwise execution continues in A32 state.
Must be a multiple of 4.
Rt and base register must be in the range R0-R7.
Must be a multiple of 2.
Rt must be in the range R0-R7.

ARM DUI0801G

13-415

13 A32 and T32 Instructions
13.53 LDR (register-relative)

Table 13-13 Register-relative offsets (continued)
Instruction

Offset range

T32, 32-bit LDRD

+/– 1020 u

T32, 16-bit LDR v

0 to 124 u

T32, 16-bit LDRH v

0 to 62 w

T32, 16-bit LDRB v

0 to 31

T32, 16-bit LDR, base register is SP x

0 to 1020 u

LDR (register-relative) in T32
You can use the .W width specifier to force LDR to generate a 32-bit instruction in T32 code. LDR.W
always generates a 32-bit instruction, even if the target could be reached using a 16-bit LDR.
For forward references, LDR without .W always generates a 16-bit instruction in T32 code, even if that
results in failure for a target that could be reached using a 32-bit T32 LDR instruction.
Doubleword register restrictions
For 32-bit T32 instructions, you must not specify SP or PC for either Rt or Rt2.
For A32 instructions:
• Rt must be an even-numbered register.
• Rt must not be LR.
• ARM strongly recommends that you do not use R12 for Rt.
• Rt2 must be R(t + 1).
Use of PC
You can use PC for Rt in word instructions. Other uses of PC are not permitted in these instructions.
Use of SP
In A32 code, you can use SP for Rt in word instructions. You can use SP for Rt in non-word A32
instructions but this is deprecated.
In T32 code, you can use SP for Rt in word instructions only. All other use of SP for Rt in these
instructions are not permitted in T32 code.
Related concepts
12.5 Register-relative and PC-relative expressions on page 12-302.
8.15 Address alignment in A32/T32 code on page 8-178.
Related references
21.29 FIELD on page 21-1672.
21.52 MAP on page 21-1699.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-416

13 A32 and T32 Instructions
13.54 LDR pseudo-instruction

13.54

LDR pseudo-instruction
Load a register with either a 32-bit immediate value or an address.
Note
This describes the LDR pseudo-instruction only, and not the LDR instruction.

Syntax
LDR{cond}{.W} Rt, =expr
LDR{cond}{.W} Rt, =label_expr

where:
cond

is an optional condition code.
.W

is an optional instruction width specifier.
Rt

is the register to be loaded.
expr

evaluates to a numeric value.
label_expr

is a PC-relative or external expression of an address in the form of a label plus or minus a
numeric value.
Usage
When using the LDR pseudo-instruction:
•
•

If the value of expr can be loaded with a valid MOV or MVN instruction, the assembler uses that
instruction.
If a valid MOV or MVN instruction cannot be used, or if the label_expr syntax is used, the assembler
places the constant in a literal pool and generates a PC-relative LDR instruction that reads the constant
from the literal pool.
Note
— An address loaded in this way is fixed at link time, so the code is not position-independent.
— The address holding the constant remains valid regardless of where the linker places the ELF
section containing the LDR instruction.

The assembler places the value of label_expr in a literal pool and generates a PC-relative LDR
instruction that loads the value from the literal pool.
If label_expr is an external expression, or is not contained in the current section, the assembler places a
linker relocation directive in the object file. The linker generates the address at link time.
If label_expr is either a named or numeric local label, the assembler places a linker relocation directive
in the object file and generates a symbol for that local label. The address is generated at link time. If the
local label references T32 code, the T32 bit (bit 0) of the address is set.
The offset from the PC to the value in the literal pool must be less than ±4KB (in an A32 or 32-bit T32
encoding) or in the range 0 to +1KB (16-bit T32 encoding). You are responsible for ensuring that there is
a literal pool within range.

ARM DUI0801G

13-417

13 A32 and T32 Instructions
13.54 LDR pseudo-instruction

If the label referenced is in T32 code, the LDR pseudo-instruction sets the T32 bit (bit 0) of label_expr.
Note
In RealView® Compilation Tools (RVCT) v2.2, the T32 bit of the address was not set. If you have code
that relies on this behavior, use the command line option --untyped_local_labels to force the
assembler not to set the T32 bit when referencing labels in T32 code.

LDR in T32 code
You can use the .W width specifier to force LDR to generate a 32-bit instruction in T32 code. LDR.W
always generates a 32-bit instruction, even if the immediate value could be loaded in a 16-bit MOV, or
there is a literal pool within reach of a 16-bit PC-relative load.
If the value to be loaded is not known in the first pass of the assembler, LDR without .W generates a 16-bit
instruction in T32 code, even if that results in a 16-bit PC-relative load for a value that could be
generated in a 32-bit MOV or MVN instruction. However, if the value is known in the first pass, and it can
be generated using a 32-bit MOV or MVN instruction, the MOV or MVN instruction is used.
In UAL syntax, the LDR pseudo-instruction never generates a 16-bit flag-setting MOV instruction. Use the
--diag_warning 1727 assembler command line option to check when a 16-bit instruction could have
been used.
You can use the MOV32 pseudo-instruction for generating immediate values or addresses without loading
from a literal pool.
Examples
LDR

r3,=0xff0

LDR

r1,=0xfff

LDR

r2,=place

;
;
;
;
;
;
;
;
;
;
;

loads 0xff0 into R3
=> MOV.W r3,#0xff0
loads 0xfff into R1
=> LDR r1,[pc,offset_to_litpool]
...
litpool DCD 0xfff
loads the address of
place into R2
=> LDR r2,[pc,offset_to_litpool]
...
litpool DCD place

Related concepts
12.3 Numeric constants on page 12-300.
12.5 Register-relative and PC-relative expressions on page 12-302.
12.10 Numeric local labels on page 12-307.
Related references
11.62 --untyped_local_labels on page 11-290.
13.64 MOV32 pseudo-instruction on page 13-433.
7.11 Condition code suffixes on page 7-150.
21.50 LTORG on page 21-1695.

ARM DUI0801G

13-418

13 A32 and T32 Instructions
13.55 LDR, unprivileged

13.55

LDR, unprivileged
Unprivileged load byte, halfword, or word.
Syntax
LDR{type}T{cond} Rt, [Rn {, #offset}] ; immediate offset (32-bit T32 encoding only)
LDR{type}T{cond} Rt, [Rn] {, #offset} ; post-indexed (A32 only)
LDR{type}T{cond} Rt, [Rn], ±Rm {, shift} ; post-indexed (register) (A32 only)

where:
type

can be any one of:
B

unsigned Byte (Zero extend to 32 bits on loads.)
SB

signed Byte (Sign extend to 32 bits.)
H

unsigned Halfword (Zero extend to 32 bits on loads.)
SH

signed Halfword (Sign extend to 32 bits.)
-

omitted, for Word.
cond

is an optional condition code.
Rt

is the register to load.
Rn

is the register on which the memory address is based.
offset

is an offset. If offset is omitted, the address is the value in Rn.
Rm

is a register containing a value to be used as the offset. Rm must not be PC.
shift

is an optional shift.
Operation
When these instructions are executed by privileged software, they access memory with the same
restrictions as they would have if they were executed by unprivileged software.
When executed by unprivileged software these instructions behave in exactly the same way as the
corresponding load instruction, for example LDRSBT behaves in the same way as LDRSB.
Offset ranges and architectures
The following table shows the ranges of offsets and availability of these instructions.
Table 13-14 Offsets and architectures, LDR (User mode)
Instruction

Immediate offset Post-indexed +/–Rm y shift

A32, word or byte

Not available

–4095 to 4095

+/–Rm

LSL #0-31
LSR #1-32

You can use –Rm, +Rm, or Rm.

ARM DUI0801G

13-419

13 A32 and T32 Instructions
13.55 LDR, unprivileged

Table 13-14 Offsets and architectures, LDR (User mode) (continued)
Immediate offset Post-indexed +/–Rm y shift

Instruction

ASR #1-32
ROR #1-31
RRX
A32, signed byte, halfword, or signed halfword

Not available

–255 to 255

+/–Rm

Not available

T32, 32-bit encoding, word, halfword, signed halfword, byte, or
signed byte

0 to 255

Not available

Related concepts
8.15 Address alignment in A32/T32 code on page 8-178.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-420

13 A32 and T32 Instructions
13.56 LDREX

13.56

LDREX
Load Register Exclusive.
Syntax
LDREX{cond} Rt, [Rn {, #offset}]
LDREXB{cond} Rt, [Rn]
LDREXH{cond} Rt, [Rn]
LDREXD{cond} Rt, Rt2, [Rn]

where:
cond

is an optional condition code.
Rt

is the register to load.
Rt2

is the second register for doubleword loads.
Rn

is the register on which the memory address is based.
offset

is an optional offset applied to the value in Rn. offset is permitted only in 32-bit T32
instructions. If offset is omitted, an offset of zero is assumed.
Operation
LDREX loads data from memory.

•

If the physical address has the Shared TLB attribute, LDREX tags the physical address as exclusive
access for the current processor, and clears any exclusive access tag for this processor for any other
physical address.
Otherwise, it tags the fact that the executing processor has an outstanding tagged physical address.

LDREXB and LDREXH zero extend the value loaded.

Restrictions
PC must not be used for any of Rt, Rt2, or Rn.
For A32 instructions:
•
•
•
•

SP can be used but use of SP for any of Rt, or Rt2 is deprecated.
For LDREXD, Rt must be an even numbered register, and not LR.
Rt2 must be R(t+1).
offset is not permitted.

For T32 instructions:
• SP can be used for Rn, but must not be used for Rt or Rt2.
• For LDREXD, Rt and Rt2 must not be the same register.
• The value of offset can be any multiple of four in the range 0-1020.
Usage
Use LDREX and STREX to implement interprocess communication in multiple-processor and sharedmemory systems.

ARM DUI0801G

13-421

13 A32 and T32 Instructions
13.56 LDREX

For reasons of performance, keep the number of instructions between corresponding LDREX and STREX
instructions to a minimum.
Note
The address used in a STREX instruction must be the same as the address in the most recently executed
LDREX instruction.

Architectures
These 32-bit instructions are available in A32 and T32.
The LDREXD instruction is not available in the ARMv7-M architecture.
There are no 16-bit versions of these instructions in T32.
Examples
try

MOV r1, #0x1

; load the ‘lock taken’ value

LDREX r0, [LockAddr]
CMP r0, #0
STREXEQ r0, r1, [LockAddr]
CMPEQ r0, #0
BNE try
....

;
;
;
;
;
;

load the lock value
is the lock free?
try and claim the lock
did this succeed?
no – try again
yes – we have the lock

Related concepts
8.15 Address alignment in A32/T32 code on page 8-178.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-422

13 A32 and T32 Instructions
13.57 LSL

13.57

LSL
Logical Shift Left. This instruction is a preferred synonym for MOV instructions with shifted register
operands.
Syntax
LSL{S}{cond} Rd, Rm, Rs
LSL{S}{cond} Rd, Rm, #sh

where:
S

is an optional suffix. If S is specified, the condition flags are updated on the result of the
operation.
Rd

is the destination register.
Rm

is the register holding the first operand. This operand is shifted left.
Rs

is a register holding a shift value to apply to the value in Rm. Only the least significant byte is
used.
sh

is a constant shift. The range of values permitted is 0-31.
Operation
LSL provides the value of a register multiplied by a power of two, inserting zeros into the vacated bit

positions.
Restrictions in T32 code
T32 instructions must not use PC or SP.
You cannot specify zero for the sh value in an LSL instruction in an IT block.
Use of SP and PC in A32 instructions
You can use SP in these A32 instructions but this is deprecated.
You cannot use PC in instructions with the LSL{S}{cond} Rd, Rm, Rs syntax. You can use PC for Rd
and Rm in the other syntax, but this is deprecated.
If you use PC as Rm, the value used is the address of the instruction plus 8.
If you use PC as Rd:
• Execution branches to the address corresponding to the result.
• If you use the S suffix, the SPSR of the current mode is copied to the CPSR. You can use this to
return from exceptions.
Note
The A32 instruction LSLS{cond} pc,Rm,#sh always disassembles to the preferred form MOVS{cond}
pc,Rm{,shift}.

ARM DUI0801G

13-423

13 A32 and T32 Instructions
13.57 LSL

You cannot use PC for Rd or any operand in the LSL instruction if it has a register-controlled shift.
Condition flags
If S is specified, the LSL instruction updates the N and Z flags according to the result.
The C flag is unaffected if the shift value is 0. Otherwise, the C flag is updated to the last bit shifted out.
16-bit instructions
The following forms of these instructions are available in T32 code, and are 16-bit instructions:
LSLS Rd, Rm, #sh
Rd and Rm must both be Lo registers. This form can only be used outside an IT block.
LSL{cond} Rd, Rm, #sh
Rd and Rm must both be Lo registers. This form can only be used inside an IT block.
LSLS Rd, Rd, Rs
Rd and Rs must both be Lo registers. This form can only be used outside an IT block.
LSL{cond} Rd, Rd, Rs
Rd and Rs must both be Lo registers. This form can only be used inside an IT block.

Architectures
This 32-bit instruction is available in A32 and T32.
This 16-bit T32 instruction is available in T32.
Example
LSLS

r1, r2, r3

Related references
13.63 MOV on page 13-431.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-424

13 A32 and T32 Instructions
13.58 LSR

13.58

LSR
Logical Shift Right. This instruction is a preferred synonym for MOV instructions with shifted register
operands.
Syntax
LSR{S}{cond} Rd, Rm, Rs
LSR{S}{cond} Rd, Rm, #sh

where:
S

is an optional suffix. If S is specified, the condition flags are updated on the result of the
operation.
Rd

is the destination register.
Rm

is the register holding the first operand. This operand is shifted right.
Rs

is a register holding a shift value to apply to the value in Rm. Only the least significant byte is
used.
sh

is a constant shift. The range of values permitted is 1-32.
Operation
LSR provides the unsigned value of a register divided by a variable power of two, inserting zeros into the
vacated bit positions.

Restrictions in T32 code
T32 instructions must not use PC or SP.
Use of SP and PC in A32 instructions
You can use SP in these A32 instructions but they are deprecated.
You cannot use PC in instructions with the LSR{S}{cond} Rd, Rm, Rs syntax. You can use PC for Rd
and Rm in the other syntax, but this is deprecated.
If you use PC as Rm, the value used is the address of the instruction plus 8.
If you use PC as Rd:
• Execution branches to the address corresponding to the result.
• If you use the S suffix, the SPSR of the current mode is copied to the CPSR. You can use this to
return from exceptions.
Note
The A32 instruction LSRS{cond} pc,Rm,#sh always disassembles to the preferred form MOVS{cond}
pc,Rm{,shift}.

13-425

13 A32 and T32 Instructions
13.58 LSR

Condition flags
If S is specified, the instruction updates the N and Z flags according to the result.
The C flag is unaffected if the shift value is 0. Otherwise, the C flag is updated to the last bit shifted out.
16-bit instructions
The following forms of these instructions are available in T32 code, and are 16-bit instructions:
LSRS Rd, Rm, #sh
Rd and Rm must both be Lo registers. This form can only be used outside an IT block.
LSR{cond} Rd, Rm, #sh
Rd and Rm must both be Lo registers. This form can only be used inside an IT block.
LSRS Rd, Rd, Rs
Rd and Rs must both be Lo registers. This form can only be used outside an IT block.
LSR{cond} Rd, Rd, Rs
Rd and Rs must both be Lo registers. This form can only be used inside an IT block.

Architectures
This 32-bit instruction is available in A32 and T32.
This 16-bit T32 instruction is available in T32.
Example
LSR

r4, r5, r6

Related references
13.63 MOV on page 13-431.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-426

13 A32 and T32 Instructions
13.59 MCR and MCR2

13.59

MCR and MCR2
Move to Coprocessor from ARM Register. Depending on the coprocessor, you might be able to specify
various additional operations.
Note
MCR2 is not supported in ARMv8.

Syntax
MCR{cond} coproc, #opcode1, Rt, CRn, CRm{, #opcode2}
MCR2{cond} coproc, #opcode1, Rt, CRn, CRm{, #opcode2}

where:
cond

is an optional condition code.
In A32 code, cond is not permitted for MCR2.
coproc

is the name of the coprocessor the instruction is for. The standard name is pn, where n is an
integer whose value must be:
• In the range 0-15 in ARMv7 and earlier.
• 14 or 15 in ARMv8.
opcode1

is a 3-bit coprocessor-specific opcode.
opcode2

is an optional 3-bit coprocessor-specific opcode.
Rt

is an ARM source register. Rt must not be PC.
CRn, CRm

ARM DUI0801G

13-427

13 A32 and T32 Instructions
13.60 MCRR and MCRR2

13.60

MCRR and MCRR2
Move to Coprocessor from ARM Registers. Depending on the coprocessor, you might be able to specify
various additional operations.
Note
MCRR2 is not supported in ARMv8.

Syntax
MCRR{cond} coproc, #opcode, Rt, Rt2, CRn
MCRR2{cond} coproc, #opcode, Rt, Rt2, CRn

where:
cond

is an optional condition code.
In A32 code, cond is not permitted for MCRR2.
coproc

is the name of the coprocessor the instruction is for. The standard name is pn, where n is an
integer whose value must be:
• In the range 0-15 in ARMv7 and earlier.
• 14 or 15 in ARMv8.
opcode

is a 4-bit coprocessor-specific opcode.
Rt, Rt2

are ARM source registers. Rt and Rt2 must not be PC.
CRn

is a coprocessor register.
Usage
The use of these instructions depends on the coprocessor. See the coprocessor documentation for details.
Architectures
These 32-bit instructions are available in A32 and T32.
There are no 16-bit versions of these instructions in T32.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-428

13 A32 and T32 Instructions
13.61 MLA

13.61

MLA
Multiply-Accumulate with signed or unsigned 32-bit operands, giving the least significant 32 bits of the
result.
Syntax
MLA{S}{cond} Rd, Rn, Rm, Ra

where:
cond

is an optional condition code.
S

is an optional suffix. If S is specified, the condition flags are updated on the result of the
operation.
Rd

is the destination register.
Rn, Rm

are registers holding the values to be multiplied.
Ra

is a register holding the value to be added.
Operation
The MLA instruction multiplies the values from Rn and Rm, adds the value from Ra, and places the least
significant 32 bits of the result in Rd.
Register restrictions
You cannot use PC for any register.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
If S is specified, the MLA instruction:
• Updates the N and Z flags according to the result.
• Does not affect the C or V flag.
Architectures
This instruction is available in A32 and T32.
There is no 16-bit version of this instruction in T32.
Example
MLA

r10, r2, r1, r5

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-429

13 A32 and T32 Instructions
13.62 MLS

13.62

MLS
Multiply-Subtract, with signed or unsigned 32-bit operands, giving the least significant 32 bits of the
result.
Syntax
MLS{cond} Rd, Rn, Rm, Ra

where:
cond

is an optional condition code.
S

is an optional suffix. If S is specified, the condition flags are updated on the result of the
operation.
Rd

is the destination register.
Rn, Rm

are registers holding the values to be multiplied.
Ra

is a register holding the value to be subtracted from.
Operation
The MLS instruction multiplies the values in Rn and Rm, subtracts the result from the value in Ra, and
places the least significant 32 bits of the final result in Rd.
Register restrictions
You cannot use PC for any register.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Architectures
This instruction is available in A32 and T32.
There is no 16-bit version of this instruction in T32.
Example
MLS

r4, r5, r6, r7

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-430

13 A32 and T32 Instructions
13.63 MOV

13.63

MOV
Move.
Syntax
MOV{S}{cond} Rd, Operand2
MOV{cond} Rd, #imm16

where:
S

is an optional suffix. If S is specified, the condition flags are updated on the result of the
operation.
cond

is an optional condition code.
Rd

is the destination register.
Operand2

is a flexible second operand.
imm16

is any value in the range 0-65535.
Operation
The MOV instruction copies the value of Operand2 into Rd.
In certain circumstances, the assembler can substitute MVN for MOV, or MOV for MVN. Be aware of this when
reading disassembly listings.
Use of PC and SP in 32-bit T32 encodings
You cannot use PC (R15) for Rd, or in Operand2, in 32-bit T32 MOV instructions. With the following
exceptions, you cannot use SP (R13) for Rd, or in Operand2:
• MOV{cond}.W Rd, SP, where Rd is not SP.
• MOV{cond}.W SP, Rm, where Rm is not SP.
Use of PC and SP in 16-bit T32 encodings
You can use PC or SP in 16-bit T32 MOV{cond} Rd, Rm instructions but these instructions in which both
Rd and Rm are SP or PC are deprecated.
You cannot use PC or SP in any other MOV{S} 16-bit T32 instructions.
Use of PC and SP in A32 MOV
You cannot use PC for Rd or any operand in any data processing instruction that has a register-controlled
shift.
In instructions without register-controlled shift, the use of PC is deprecated except for the following
cases:
•
•
•

MOVS PC, LR.
MOV PC, Rm when Rm is not PC or SP.
MOV Rd, PC when Rd is not PC or SP.

You can use SP for Rd or Rm. But this is deprecated except for the following cases:
•
•

ARM DUI0801G

MOV SP, Rm when Rm is not PC or SP.
MOV Rd, SP when Rd is not PC or SP.

13-431

13 A32 and T32 Instructions
13.63 MOV

•

Note
You cannot use PC for Rd in MOV Rd, #imm16 if the #imm16 value is not a permitted Operand2 value.
You can use PC in forms with Operand2 without register-controlled shift.

If you use PC as Rm, the value used is the address of the instruction plus 8.
If you use PC as Rd:
• Execution branches to the address corresponding to the result.
• If you use the S suffix, see the SUBS pc,lr instruction.
Condition flags
If S is specified, the instruction:
• Updates the N and Z flags according to the result.
• Can update the C flag during the calculation of Operand2.
• Does not affect the V flag.
16-bit instructions
The following forms of this instruction are available in T32 code, and are 16-bit instructions:
MOVS Rd, #imm
Rd must be a Lo register. imm range 0-255. This form can only be used outside an IT block.
MOV{cond} Rd, #imm
Rd must be a Lo register. imm range 0-255. This form can only be used inside an IT block.
MOVS Rd, Rm
Rd and Rm must both be Lo registers. This form can only be used outside an IT block.
MOV{cond} Rd, Rm
Rd or Rm can be Lo or Hi registers.

Availability
These instructions are available in A32 and T32.
In T32, 16-bit and 32-bit versions of these instructions are available.
Related concepts
6.5 Load immediate values using MOV and MVN on page 6-106.
Related references
13.3 Flexible second operand (Operand2) on page 13-338.
13.151 SUBS pc, lr on page 13-542.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-432

13 A32 and T32 Instructions
13.64 MOV32 pseudo-instruction

13.64

MOV32 pseudo-instruction
Load a register with either a 32-bit immediate value or any address.
Syntax
MOV32{cond} Rd, expr

where:
cond

is an optional condition code.
Rd

is the register to be loaded. Rd must not be SP or PC.
expr

can be any one of the following:
symbol

A label in this or another program area.
#constant

Any 32-bit immediate value.
symbol + constant

A label plus a 32-bit immediate value.
Usage
MOV32 always generates two 32-bit instructions, a MOV, MOVT pair. This enables you to load any 32-bit

immediate, or to access the whole 32-bit address space.
The main purposes of the MOV32 pseudo-instruction are:
• To generate literal constants when an immediate value cannot be generated in a single instruction.
• To load a PC-relative or external address into a register. The address remains valid regardless of
where the linker places the ELF section containing the MOV32.
Note
An address loaded in this way is fixed at link time, so the code is not position-independent.
MOV32 sets the T32 bit (bit 0) of the address if the label referenced is in T32 code.

Architectures
This pseudo-instruction is available in A32 and T32.
Examples
MOV32 r3, #0xABCDEF12
MOV32 r1, Trigger+12

; loads 0xABCDEF12 into R3
; loads the address that is 12 bytes
; higher than the address Trigger into R1

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-433

13 A32 and T32 Instructions
13.65 MOVT

13.65

MOVT
Move Top.
Syntax
MOVT{cond} Rd, #imm16

where:
cond

is an optional condition code.
Rd

is the destination register.
imm16

is a 16-bit immediate value.
Usage
MOVT writes imm16 to Rd[31:16], without affecting Rd[15:0].

You can generate any 32-bit immediate with a MOV, MOVT instruction pair. The assembler implements the
MOV32 pseudo-instruction for convenient generation of this instruction pair.
Register restrictions
You cannot use PC in A32 or T32 instructions.
You can use SP for Rd in A32 instructions but this is deprecated.
You cannot use SP in T32 instructions.
Condition flags
This instruction does not change the flags.
Architectures
This 32-bit instruction is available in A32 and T32.
There is no 16-bit version of this instruction in T32.
Related references
13.64 MOV32 pseudo-instruction on page 13-433.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-434

13 A32 and T32 Instructions
13.66 MRC and MRC2

13.66

MRC and MRC2
Move to ARM Register from Coprocessor. Depending on the coprocessor, you might be able to specify
various additional operations.
Note
MRC2 is not supported in ARMv8.

Syntax
MRC{cond} coproc, #opcode1, Rt, CRn, CRm{, #opcode2}
MRC2{cond} coproc, #opcode1, Rt, CRn, CRm{, #opcode2}

where:
cond

is an optional condition code.
In A32 code, cond is not permitted for MRC2.
coproc

is the name of the coprocessor the instruction is for. The standard name is pn, where n is an
integer whose value must be:
• In the range 0-15 in ARMv7 and earlier.
• 14 or 15 in ARMv8.
opcode1

is a 3-bit coprocessor-specific opcode.
opcode2

is an optional 3-bit coprocessor-specific opcode.
Rt

is the ARM destination register. Rt must not be PC.
Rt can be APSR_nzcv. This means that the coprocessor executes an instruction that changes the

value of the condition flags in the APSR.
CRn, CRm

ARM DUI0801G

13-435

13 A32 and T32 Instructions
13.67 MRRC and MRRC2

13.67

MRRC and MRRC2
Move to ARM Registers from Coprocessor. Depending on the coprocessor, you might be able to specify
various additional operations.
Note
MRRC2 is not supported in ARMv8.

Syntax
MRRC{cond} coproc, #opcode, Rt, Rt2, CRm
MRRC2{cond} coproc, #opcode, Rt, Rt2, CRm

where:
cond

is an optional condition code.
In A32 code, cond is not permitted for MRRC2.
coproc

is the name of the coprocessor the instruction is for. The standard name is pn, where n is an
integer whose value must be:
• In the range 0-15 in ARMv7 and earlier.
• 14 or 15 in ARMv8.
opcode

is a 4-bit coprocessor-specific opcode.
Rt, Rt2

are ARM destination registers. Rt and Rt2 must not be PC.
CRm

ARM DUI0801G

13-436

13 A32 and T32 Instructions
13.68 MRS (PSR to general-purpose register)

13.68

MRS (PSR to general-purpose register)
Move the contents of a PSR to a general-purpose register.
Syntax
MRS{cond} Rd, psr

where:
cond

is an optional condition code.
Rd

is the destination register.
psr

is one of:
APSR

on any processor, in any mode.
CPSR

deprecated synonym for APSR and for use in Debug state, on any processor except
ARMv7-M and ARMv6-M.
SPSR

on any processor, except ARMv6-M, ARMv7-M, ARMv8-M.baseline, and ARMv8M.mainline, in privileged software execution only.
Mpsr

on ARMv6-M, ARMv7-M, ARMv8-M.baseline, and ARMv8-M.mainline processors
only.
Mpsr

can be any of: IPSR, EPSR, IEPSR, IAPSR, EAPSR, MSP, PSP, XPSR, PRIMASK, BASEPRI,
BASEPRI_MAX, FAULTMASK, or CONTROL.
Usage
Use MRS in combination with MSR as part of a read-modify-write sequence for updating a PSR, for
example to change processor mode, or to clear the Q flag.
In process swap code, the programmers’ model state of the process being swapped out must be saved,
including relevant PSR contents. Similarly, the state of the process being swapped in must also be
restored. These operations make use of MRS/store and load/MSR instruction sequences.
SPSR
You must not attempt to access the SPSR when the processor is in User or System mode. This is your
responsibility. The assembler cannot warn you about this, because it has no information about the
processor mode at execution time.
CPSR
ARM deprecates reading the CPSR endianness bit (E) with an MRS instruction.
The CPSR execution state bits, other than the E bit, can only be read when the processor is in Debug
state, halting debug-mode. Otherwise, the execution state bits in the CPSR read as zero.
The condition flags can be read in any mode on any processor. Use APSR if you are only interested in
accessing the condition flags in User mode.
Register restrictions
You cannot use PC for Rd in A32 instructions. You can use SP for Rd in A32 instructions but this is
deprecated.
You cannot use PC or SP for Rd in T32 instructions.
ARM DUI0801G

13-437

13 A32 and T32 Instructions
13.68 MRS (PSR to general-purpose register)

Condition flags
This instruction does not change the flags.
Architectures
This instruction is available in A32 and T32.
There is no 16-bit version of this instruction in T32.
Related concepts
3.12 Current Program Status Register in AArch32 state on page 3-76.
Related references
13.69 MRS (system coprocessor register to ARM register) on page 13-439.
13.70 MSR (ARM register to system coprocessor register) on page 13-440.
13.71 MSR (general-purpose register to PSR) on page 13-441.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-438

13 A32 and T32 Instructions
13.69 MRS (system coprocessor register to ARM register)

13.69

MRS (system coprocessor register to ARM register)
Move to ARM register from system coprocessor register.
Syntax
MRS{cond} Rn, coproc_register
MRS{cond} APSR_nzcv, special_register

where:
cond

is an optional condition code.
coproc_register

is the name of the coprocessor register.
special_register

is the name of the coprocessor register that can be written to APSR_nzcv. This is only possible
for the coprocessor register DBGDSCRint.
Rn

is the ARM destination register. Rn must not be PC.
Usage
You can use this pseudo-instruction to read CP14 or CP15 coprocessor registers, with the exception of
write-only registers. A complete list of the applicable coprocessor register names is in the ARMv7-AR
Architecture Reference Manual. For example:
MRS R1, SCTLR ; writes the contents of the CP15 coprocessor
; register SCTLR into R1

Architectures
This pseudo-instruction is available in ARMv7-A and ARMv7-R in A32 and 32-bit T32 code.
There is no 16-bit version of this pseudo-instruction in T32.
Related references
13.68 MRS (PSR to general-purpose register) on page 13-437.
13.70 MSR (ARM register to system coprocessor register) on page 13-440.
13.71 MSR (general-purpose register to PSR) on page 13-441.
7.11 Condition code suffixes on page 7-150.
Related information
ARM Architecture Reference Manual.

ARM DUI0801G

13-439

13 A32 and T32 Instructions
13.70 MSR (ARM register to system coprocessor register)

13.70

MSR (ARM register to system coprocessor register)
Move to system coprocessor register from ARM register.
Syntax
MSR{cond} coproc_register, Rn

where:
cond

is an optional condition code.
coproc_register

is the name of the coprocessor register.
Rn

is the ARM source register. Rn must not be PC.
Usage
You can use this pseudo-instruction to write to any CP14 or CP15 coprocessor writable register. A
complete list of the applicable coprocessor register names is in the ARM Architecture Reference Manual.
For example:
MSR SCTLR, R1 ; writes the contents of R1 into the CP15
; coprocessor register SCTLR

Availability
This pseudo-instruction is available in A32 and T32.
This pseudo-instruction is available in ARMv7-A and ARMv7-R in A32 and 32-bit T32 code.
There is no 16-bit version of this pseudo-instruction in T32.
Related references
13.68 MRS (PSR to general-purpose register) on page 13-437.
13.69 MRS (system coprocessor register to ARM register) on page 13-439.
13.71 MSR (general-purpose register to PSR) on page 13-441.
7.11 Condition code suffixes on page 7-150.
13.160 SYS on page 13-553.
Related information
ARM Architecture Reference Manual.

ARM DUI0801G

13-440

13 A32 and T32 Instructions
13.71 MSR (general-purpose register to PSR)

13.71

MSR (general-purpose register to PSR)
Load an immediate value, or the contents of a general-purpose register, into the specified fields of a
Program Status Register (PSR).
Syntax
MSR{cond} APSR_flags, Rm

where:
cond

is an optional condition code.
flags

specifies the APSR flags to be moved. flags can be one or more of:
nzcvq

ALU flags field mask, PSR[31:27] (User mode)
g

SIMD GE flags field mask, PSR[19:16] (User mode).
Rm

is the source register. Rm must not be PC.
Syntax
You can also use the following syntax on architectures other than ARMv6-M, ARMv7-M, ARMv8M.baseline, and ARMv8-M.mainline:
MSR{cond} APSR_flags, #constant
MSR{cond} psr_fields, #constant
MSR{cond} psr_fields, Rm

where:
cond

is an optional condition code.
flags

specifies the APSR flags to be moved. flags can be one or more of:
nzcvq

ALU flags field mask, PSR[31:27] (User mode)
g

SIMD GE flags field mask, PSR[19:16] (User mode).
constant

is an expression evaluating to a numeric value. The value must correspond to an 8-bit pattern
rotated by an even number of bits within a 32-bit word. Not available in T32.
Rm

is the source register. Rm must not be PC.
psr

is one of:
CPSR

for use in Debug state, also deprecated synonym for APSR
SPSR

on any processor, in privileged software execution only.
fields

specifies the SPSR or CPSR fields to be moved. fields can be one or more of:
c

control field mask byte, PSR[7:0] (privileged software execution)
x

extension field mask byte, PSR[15:8] (privileged software execution)
ARM DUI0801G

13-441

13 A32 and T32 Instructions
13.71 MSR (general-purpose register to PSR)

status field mask byte, PSR[23:16] (privileged software execution)
f

flags field mask byte, PSR[31:24] (privileged software execution).
Syntax
You can also use the following syntax on ARMv6-M, ARMv7-M, ARMv8-M.baseline, and ARMv8M.mainline only:
MSR{cond} psr, Rm

where:
cond

is an optional condition code.
Rm

is the source register. Rm must not be PC.
psr

can be any of: APSR, IPSR, EPSR, IEPSR, IAPSR, EAPSR, XPSR, MSP, PSP, PRIMASK, BASEPRI,
BASEPRI_MAX, FAULTMASK, or CONTROL.
Usage
In User mode:
• Use APSR to access the condition flags, Q, or GE bits.
• Writes to unallocated, privileged or execution state bits in the CPSR are ignored. This ensures that
User mode programs cannot change to privileged software execution.
ARM deprecates using MSR to change the endianness bit (E) of the CPSR, in any mode.
You must not attempt to access the SPSR when the processor is in User or System mode.
Register restrictions
You cannot use PC in A32 instructions. You can use SP for Rm in A32 instructions but this is deprecated.
You cannot use PC or SP in T32 instructions.
Condition flags
This instruction updates the flags explicitly if the APSR_nzcvq or CPSR_f field is specified.
Architectures
This instruction is available in A32 and T32.
There is no 16-bit version of this instruction in T32.
Related references
13.68 MRS (PSR to general-purpose register) on page 13-437.
13.69 MRS (system coprocessor register to ARM register) on page 13-439.
13.70 MSR (ARM register to system coprocessor register) on page 13-440.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-442

13 A32 and T32 Instructions
13.72 MUL

13.72

MUL
Multiply with signed or unsigned 32-bit operands, giving the least significant 32 bits of the result.
Syntax
MUL{S}{cond} {Rd}, Rn, Rm

where:
cond

is an optional condition code.
S

is an optional suffix. If S is specified, the condition flags are updated on the result of the
operation.
Rd

is the destination register.
Rn, Rm

are registers holding the values to be multiplied.
Operation
The MUL instruction multiplies the values from Rn and Rm, and places the least significant 32 bits of the
result in Rd.
Register restrictions
You cannot use PC for any register.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
If S is specified, the MUL instruction:
• Updates the N and Z flags according to the result.
• Does not affect the C or V flag.
16-bit instructions
The following forms of the MUL instruction are available in T32 code, and are 16-bit instructions:
MULS Rd, Rn, Rd
Rd and Rn must both be Lo registers. This form can only be used outside an IT block.
MUL{cond} Rd, Rn, Rd
Rd and Rn must both be Lo registers. This form can only be used inside an IT block.

There are no other T32 multiply instructions that can update the condition flags.
Availability
This instruction is available in A32 and T32.
The MULS instruction is available in T32 in a 16-bit encoding.
Examples
MUL
MULS
MULLT

r10, r2, r5
r0, r2, r2
r2, r3, r2

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-443

13 A32 and T32 Instructions
13.73 MVN

13.73

MVN
Move Not.
Syntax
MVN{S}{cond} Rd, Operand2

where:
S

is an optional suffix. If S is specified, the condition flags are updated on the result of the
operation.
cond

is an optional condition code.
Rd

is the destination register.
Operand2

is a flexible second operand.
Operation
The MVN instruction takes the value of Operand2, performs a bitwise logical NOT operation on the value,
and places the result into Rd.
In certain circumstances, the assembler can substitute MVN for MOV, or MOV for MVN. Be aware of this when
reading disassembly listings.
Use of PC and SP in 32-bit T32 MVN
You cannot use PC (R15) for Rd, or in Operand2, in 32-bit T32 MVN instructions. You cannot use SP (R13)
for Rd, or in Operand2.
Use of PC and SP in 16-bit T32 instructions
You cannot use PC or SP in any MVN{S} 16-bit T32 instructions.
Use of PC and SP in A32 MVN
You cannot use PC for Rd or any operand in any data processing instruction that has a register-controlled
shift.
In instructions without register-controlled shift, use of PC is deprecated.
You can use SP for Rd or Rm, but this is deprecated.
Note
•

PC and SP in A32 instructions are deprecated.

ARM DUI0801G

13-444

13 A32 and T32 Instructions
13.73 MVN

16-bit instructions
The following forms of this instruction are available in T32 code, and are 16-bit instructions:
MVNS Rd, Rm
Rd and Rm must both be Lo registers. This form can only be used outside an IT block.
MVN{cond} Rd, Rm
Rd and Rm must both be Lo registers. This form can only be used inside an IT block.

Architectures
This instruction is available in A32 and T32.
Correct example
MVNNE

r11, #0xF000000B ; A32 only. This immediate value is not
; available in T32.

Incorrect example
MVN

pc,r3,ASR r0

; PC not permitted with
; register-controlled shift

Related concepts
6.5 Load immediate values using MOV and MVN on page 6-106.
Related references
13.3 Flexible second operand (Operand2) on page 13-338.
13.151 SUBS pc, lr on page 13-542.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-445

13 A32 and T32 Instructions
13.74 NEG pseudo-instruction

13.74

NEG pseudo-instruction
Negate the value in a register.
Syntax
NEG{cond} Rd, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm

is the register containing the value that is subtracted from zero.
Operation
The NEG pseudo-instruction negates the value in one register and stores the result in a second register.
NEG{cond} Rd, Rm assembles to RSBS{cond} Rd, Rm, #0.

Architectures
The 32-bit encoding of this pseudo-instruction is available in A32 and T32.
There is no 16-bit encoding of this pseudo-instruction available T32.
Register restrictions
In A32 instructions, using SP or PC for Rd or Rm is deprecated. In T32 instructions, you cannot use SP or
PC for Rd or Rm.
Condition flags
This pseudo-instruction updates the condition flags, based on the result.
Related references
13.9 ADD on page 13-347.

ARM DUI0801G

13-446

13 A32 and T32 Instructions
13.75 NOP

13.75

NOP
No Operation.
Syntax
NOP{cond}

where:
cond

is an optional condition code.
Usage
NOP does nothing. If NOP is not implemented as a specific instruction on your target architecture, the

assembler treats it as a pseudo-instruction and generates an alternative instruction that does nothing, such
as MOV r0, r0 (A32) or MOV r8, r8 (T32).
NOP is not necessarily a time-consuming NOP. The processor might remove it from the pipeline before it

reaches the execution stage.
You can use NOP for padding, for example to place the following instruction on a 64-bit boundary in A32,
or a 32-bit boundary in T32.
Architectures
This instruction is available in A32 and T32.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-447

13 A32 and T32 Instructions
13.76 ORN (T32 only)

13.76

ORN (T32 only)
Logical OR NOT.
Syntax
ORN{S}{cond} Rd, Rn, Operand2

where:
S

is an optional suffix. If S is specified, the condition flags are updated on the result of the
operation.
cond

is an optional condition code.
Rd

is the destination register.
Rn

is the register holding the first operand.
Operand2

is a flexible second operand.
Operation
The ORN T32 instruction performs an OR operation on the bits in Rn with the complements of the
corresponding bits in the value of Operand2.
In certain circumstances, the assembler can substitute ORN for ORR, or ORR for ORN. Be aware of this when
reading disassembly listings.
Use of PC
You cannot use PC (R15) for Rd or any operand in the ORN instruction.
Condition flags
If S is specified, the ORN instruction:
• Updates the N and Z flags according to the result.
• Can update the C flag during the calculation of Operand2.
• Does not affect the V flag.
Examples
ORN
ORNS

r7, r11, lr, ROR #4
r7, r11, lr, ASR #32

Architectures
This 32-bit instruction is available in T32.
There is no A32 or 16-bit T32 ORN instruction.
Related references
13.3 Flexible second operand (Operand2) on page 13-338.
13.151 SUBS pc, lr on page 13-542.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-448

13 A32 and T32 Instructions
13.77 ORR

13.77

ORR
Logical OR.
Syntax
ORR{S}{cond} Rd, Rn, Operand2

where:
S

is an optional suffix. If S is specified, the condition flags are updated on the result of the
operation.
cond

is an optional condition code.
Rd

is the destination register.
Rn

is the register holding the first operand.
Operand2

is a flexible second operand.
Operation
The ORR instruction performs bitwise OR operations on the values in Rn and Operand2.
In certain circumstances, the assembler can substitute ORN for ORR, or ORR for ORN. Be aware of this when
reading disassembly listings.
Use of PC in 32-bit T32 instructions
You cannot use PC (R15) for Rd or any operand with the ORR instruction.
Use of PC and SP in A32 instructions
You can use PC and SP with the ORR instruction but this is deprecated.
If you use PC as Rn, the value used is the address of the instruction plus 8.
If you use PC as Rd:
• Execution branches to the address corresponding to the result.
• If you use the S suffix, see the SUBS pc,lr instruction.
You cannot use PC for any operand in any data processing instruction that has a register-controlled shift.
Condition flags
If S is specified, the ORR instruction:
• Updates the N and Z flags according to the result.
• Can update the C flag during the calculation of Operand2.
• Does not affect the V flag.
16-bit instructions
The following forms of the ORR instruction are available in T32 code, and are 16-bit instructions:
ORRS Rd, Rd, Rm
Rd and Rm must both be Lo registers. This form can only be used outside an IT block.
ORR{cond} Rd, Rd, Rm
Rd and Rm must both be Lo registers. This form can only be used inside an IT block.

It does not matter if you specify ORR{S} Rd, Rm, Rd. The instruction is the same.

ARM DUI0801G

13-449

13 A32 and T32 Instructions
13.77 ORR

Example
ORREQ

r2,r0,r5

Related references
13.3 Flexible second operand (Operand2) on page 13-338.
13.151 SUBS pc, lr on page 13-542.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-450

13 A32 and T32 Instructions
13.78 PKHBT and PKHTB

13.78

PKHBT and PKHTB
Halfword Packing instructions that combine a halfword from one register with a halfword from another
register. One of the operands can be shifted before extraction of the halfword.
Syntax
PKHBT{cond} {Rd}, Rn, Rm{, LSL #leftshift}
PKHTB{cond} {Rd}, Rn, Rm{, ASR #rightshift}

where:
PKHBT

Combines bits[15:0] of Rn with bits[31:16] of the shifted value from Rm.
PKHTB

Combines bits[31:16] of Rn with bits[15:0] of the shifted value from Rm.
cond

is an optional condition code.
Rd

is the destination register.
Rn

is the register holding the first operand.
Rm

is the register holding the first operand.
leftshift

is in the range 0 to 31.
rightshift

is in the range 1 to 32.
Register restrictions
You cannot use PC for any register.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
These instructions do not change the flags.
Architectures
These instructions are available in A32.
These 32-bit instructions are available T32. For the ARMv7-M architecture, they are only available in an
ARMv7E-M implementation.
There are no 16-bit versions of these instructions in T32.
Correct examples
PKHBT
PKHBT
PKHTB

r0, r3, r5

;
;
r0, r3, r5, LSL #16 ;
;
r0, r3, r5, ASR #16 ;
;

combine the bottom halfword
with the top halfword of R5
combine the bottom halfword
with the bottom halfword of
combine the top halfword of
with the top halfword of R5

of R3
of R3
R5
R3

You can also scale the second operand by using different values of shift.
Incorrect example
PKHBTEQ r4, r5, r1, ASR #8

ARM DUI0801G

; ASR not permitted with PKHBT

13-451

13 A32 and T32 Instructions
13.78 PKHBT and PKHTB

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-452

13 A32 and T32 Instructions
13.79 PLD, PLDW, and PLI

13.79

PLD, PLDW, and PLI
Preload Data and Preload Instruction allow the processor to signal the memory system that a data or
instruction load from an address is likely in the near future.
Syntax
PLtype{cond} [Rn {, #offset}]
PLtype{cond} [Rn, ±Rm {, shift}]
PLtype{cond} label

where:
type

can be one of:
D

Data address.
DW

Data address with intention to write.
I

Instruction address.
type cannot be DW if the syntax specifies label.
cond

is an optional condition code.
Note
cond is permitted only in T32 code, using a preceding IT instruction, but this is deprecated in
ARMv8. This is an unconditional instruction in A32 code and you must not use cond.
Rn

is the register on which the memory address is based.
offset

is an immediate offset. If offset is omitted, the address is the value in Rn.
Rm

is a register containing a value to be used as the offset.
shift

is an optional shift.
label

is a PC-relative expression.
Range of offsets
The offset is applied to the value in Rn before the preload takes place. The result is used as the memory
address for the preload. The range of offsets permitted is:
• –4095 to +4095 for A32 instructions.
• –255 to +4095 for T32 instructions, when Rn is not PC.
• –4095 to +4095 for T32 instructions, when Rn is PC.
The assembler calculates the offset from the PC for you. The assembler generates an error if label is out
of range.
Register or shifted register offset
In A32 code, the value in Rm is added to or subtracted from the value in Rn. In T32 code, the value in Rm
can only be added to the value in Rn. The result is used as the memory address for the preload.

ARM DUI0801G

13-453

13 A32 and T32 Instructions
13.79 PLD, PLDW, and PLI

The range of shifts permitted is:
• LSL #0 to #3 for T32 instructions.
• Any one of the following for A32 instructions:
— LSL #0 to #31.
— LSR #1 to #32.
— ASR #1 to #32.
— ROR #1 to #31.
— RRX.
Address alignment for preloads
No alignment checking is performed for preload instructions.
Register restrictions
Rm must not be PC. For T32 instructions Rm must also not be SP.
Rn must not be PC for T32 instructions of the syntax PLtype{cond} [Rn, ±Rm{, #shift}].

Architectures
The PLD instruction is available in A32.
The 32-bit encoding of PLD is available in T32.
PLDW is available only in ARMv7 and above that implement the Multiprocessing Extensions.
PLI is available only in ARMv7 and above.

There are no 16-bit encodings of these instructions in T32.
These are hint instructions, and their implementation is optional. If they are not implemented, they
execute as NOPs.
Related concepts
12.5 Register-relative and PC-relative expressions on page 12-302.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-454

13 A32 and T32 Instructions
13.80 POP

13.80

POP
Pop registers off a full descending stack.
Syntax
POP{cond} reglist

where:
cond

is an optional condition code.
reglist

is a non-empty list of registers, enclosed in braces. It can contain register ranges. It must be
comma separated if it contains more than one register or register range.
Operation
POP is a synonym for LDMIA sp! reglist. POP is the preferred mnemonic.
Note
LDM and LDMFD are synonyms of LDMIA.
Registers are stored on the stack in numerical order, with the lowest numbered register at the lowest
address.
POP, with reglist including the PC
This instruction causes a branch to the address popped off the stack into the PC. This is usually a return
from a subroutine, where the LR was pushed onto the stack at the start of the subroutine.
Also:
• Bits[1:0] must not be 0b10.
• If bit[0] is 1, execution continues in T32 state.
• If bit[0] is 0, execution continues in A32 state.
T32 instructions
A subset of this instruction is available in the T32 instruction set.
The following restriction applies to the 16-bit POP instruction:
•

reglist can only include the Lo registers and the PC.

The following restrictions apply to the 32-bit POP instruction:
• reglist must not include the SP.
• reglist can include either the LR or the PC, but not both.
Restrictions on reglist in A32 instructions
The A32 POP instruction cannot have SP but can have PC in the reglist. The instruction that includes
both PC and LR in the reglist is deprecated.
Example
POP

{r0,r10,pc} ; no 16-bit version available

Related references
13.49 LDM on page 13-407.
13.81 PUSH on page 13-456.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-455

13 A32 and T32 Instructions
13.81 PUSH

13.81

PUSH
Push registers onto a full descending stack.
Syntax
PUSH{cond} reglist

where:
cond

is an optional condition code.
reglist

Note
STMFD is a synonym of STMDB.
Registers are stored on the stack in numerical order, with the lowest numbered register at the lowest
address.
T32 instructions
The following restriction applies to the 16-bit PUSH instruction:
•

reglist can only include the Lo registers and the LR.

The following restrictions apply to the 32-bit PUSH instruction:
• reglist must not include the SP.
• reglist must not include the PC.
Restrictions on reglist in A32 instructions
The A32 PUSH instruction can have SP and PC in the reglist but the instruction that includes SP or PC
in the reglist is deprecated.
Examples
PUSH
PUSH

{r0,r4-r7}
{r2,lr}

Related references
13.49 LDM on page 13-407.
13.80 POP on page 13-455.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-456

13 A32 and T32 Instructions
13.82 QADD

13.82

QADD
Signed saturating addition.
Syntax
QADD{cond} {Rd}, Rm, Rn

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm, Rn

are the registers holding the operands.
Operation
The QADD instruction adds the values in Rm and Rn. It saturates the result to the signed range –231 ≤ x ≤
231–1.
Note
All values are treated as two’s complement signed integers by this instruction.

Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Q flag
If saturation occurs, this instruction sets the Q flag. To read the state of the Q flag, use an MRS instruction.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Example
QADD

r0, r1, r9

Related references
13.68 MRS (PSR to general-purpose register) on page 13-437.
3.10 The Q flag in AArch32 state on page 3-74.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-457

13 A32 and T32 Instructions
13.83 QADD8

13.83

QADD8
Signed saturating parallel byte-wise addition.
Syntax
QADD8{cond} {Rd}, Rn, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm, Rn

are the ARM registers holding the operands.
Operation
This instruction performs four signed integer additions on the corresponding bytes of the operands and
writes the results into the corresponding bytes of the destination. It saturates the results to the signed
range –27 ≤ x ≤ 27 –1. The Q flag is not affected even if this operation saturates.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
This instruction does not affect the N, Z, C, V, Q, or GE flags.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Related references
3.10 The Q flag in AArch32 state on page 3-74.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-458

13 A32 and T32 Instructions
13.84 QADD16

13.84

QADD16
Signed saturating parallel halfword-wise addition.
Syntax
QADD16{cond} {Rd}, Rn, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm, Rn

are the ARM registers holding the operands.
Operation
This instruction performs two signed integer additions on the corresponding halfwords of the operands
and writes the results into the corresponding halfwords of the destination. It saturates the results to the
signed range –215 ≤ x ≤ 215 –1. The Q flag is not affected even if this operation saturates.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
This instruction does not affect the N, Z, C, V, Q, or GE flags.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Related references
3.10 The Q flag in AArch32 state on page 3-74.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-459

13 A32 and T32 Instructions
13.85 QASX

13.85

QASX
Signed saturating parallel add and subtract halfwords with exchange.
Syntax
QASX{cond} {Rd}, Rn, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm, Rn

are the ARM registers holding the operands.
Operation
This instruction exchanges the two halfwords of the second operand, then performs an addition on the
two top halfwords of the operands and a subtraction on the bottom two halfwords. It writes the results
into the corresponding halfwords of the destination. It saturates the results to the signed range –215 ≤ x ≤
215 –1. The Q flag is not affected even if this operation saturates.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
This instruction does not affect the N, Z, C, V, Q, or GE flags.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Related references
3.10 The Q flag in AArch32 state on page 3-74.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-460

13 A32 and T32 Instructions
13.86 QDADD

13.86

QDADD
Signed saturating Double and Add.
Syntax
QDADD{cond} {Rd}, Rm, Rn

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm, Rn

are the registers holding the operands.
Operation

QDADD calculates SAT(Rm + SAT(Rn * 2)). It saturates the result to the signed range –231 ≤ x ≤ 231–1.

Saturation can occur on the doubling operation, on the addition, or on both. If saturation occurs on the
doubling but not on the addition, the Q flag is set but the final result is unsaturated.
Note
All values are treated as two’s complement signed integers by this instruction.

ARM DUI0801G

13-461

13 A32 and T32 Instructions
13.87 QDSUB

13.87

QDSUB
Signed saturating Double and Subtract.
Syntax
QDSUB{cond} {Rd}, Rm, Rn

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm, Rn

are the registers holding the operands.
Operation

QDSUB calculates SAT(Rm - SAT(Rn * 2)). It saturates the result to the signed range –231 ≤ x ≤ 231–1.

Saturation can occur on the doubling operation, on the subtraction, or on both. If saturation occurs on the
doubling but not on the subtraction, the Q flag is set but the final result is unsaturated.
Note
All values are treated as two’s complement signed integers by this instruction.

Related references
3.10 The Q flag in AArch32 state on page 3-74.
13.68 MRS (PSR to general-purpose register) on page 13-437.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-462

13 A32 and T32 Instructions
13.88 QSAX

13.88

QSAX
Signed saturating parallel subtract and add halfwords with exchange.
Syntax
QSAX{cond} {Rd}, Rn, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm, Rn

are the ARM registers holding the operands.
Operation
This instruction exchanges the two halfwords of the second operand, then performs a subtraction on the
two top halfwords of the operands and an addition on the bottom two halfwords. It writes the results into
the corresponding halfwords of the destination. It saturates the results to the signed range –215 ≤ x ≤ 215
–1. The Q flag is not affected even if this operation saturates.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
This instruction does not affect the N, Z, C, V, Q, or GE flags.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Related references
3.10 The Q flag in AArch32 state on page 3-74.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-463

13 A32 and T32 Instructions
13.89 QSUB

13.89

QSUB
Signed saturating Subtract.
Syntax
QSUB{cond} {Rd}, Rm, Rn

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm, Rn

are the registers holding the operands.
Operation
The QSUB instruction subtracts the value in Rn from the value in Rm. It saturates the result to the signed
range –231 ≤ x ≤ 231–1.
Note
All values are treated as two’s complement signed integers by this instruction.

ARM DUI0801G

13-464

13 A32 and T32 Instructions
13.90 QSUB8

13.90

QSUB8
Signed saturating parallel byte-wise subtraction.
Syntax
QSUB8{cond} {Rd}, Rn, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm, Rn

are the ARM registers holding the operands.
Operation
This instruction subtracts each byte of the second operand from the corresponding byte of the first
operand and writes the results into the corresponding bytes of the destination. It saturates the results to
the signed range –27 ≤ x ≤ 27 –1. The Q flag is not affected even if this operation saturates.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
This instruction does not affect the N, Z, C, V, Q, or GE flags.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Related references
3.10 The Q flag in AArch32 state on page 3-74.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-465

13 A32 and T32 Instructions
13.91 QSUB16

13.91

QSUB16
Signed saturating parallel halfword-wise subtraction.
Syntax
QSUB16{cond} {Rd}, Rn, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm, Rn

are the ARM registers holding the operands.
Operation
This instruction subtracts each halfword of the second operand from the corresponding halfword of the
first operand and writes the results into the corresponding halfwords of the destination. It saturates the
results to the signed range –215 ≤ x ≤ 215 –1. The Q flag is not affected even if this operation saturates.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
This instruction does not affect the N, Z, C, V, Q, or GE flags.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Related references
3.10 The Q flag in AArch32 state on page 3-74.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-466

13 A32 and T32 Instructions
13.92 RBIT

13.92

RBIT
Reverse the bit order in a 32-bit word.
Syntax
RBIT{cond} Rd, Rn

where:
cond

is an optional condition code.
Rd

is the destination register.
Rn

is the register holding the operand.
Register restrictions
You cannot use PC for any register.
You can use SP in the A32 instruction but this is deprecated. You cannot use SP in the T32 instruction.
Condition flags
This instruction does not change the flags.
Architectures
This instruction is available in A32 and T32.
There is no 16-bit version of this instruction in T32.
Example
RBIT

r7, r8

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-467

13 A32 and T32 Instructions
13.93 REV

13.93

REV
Reverse the byte order in a word.
Syntax
REV{cond} Rd, Rn

where:
cond

is an optional condition code.
Rd

is the destination register.
Rn

is the register holding the operand.
Usage
You can use this instruction to change endianness. REV converts 32-bit big-endian data into little-endian
data or 32-bit little-endian data into big-endian data.
Register restrictions
You cannot use PC for any register.
You can use SP in the A32 instruction but this is deprecated. You cannot use SP in the T32 instruction.
Condition flags
This instruction does not change the flags.
16-bit instructions
The following form of this instruction is available in T32 code, and is a 16-bit instruction:
REV Rd, Rm
Rd and Rm must both be Lo registers.

Architectures
This instruction is available in A32 and T32.
Example
REV

r3, r7

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-468

13 A32 and T32 Instructions
13.94 REV16

13.94

REV16
Reverse the byte order in each halfword independently.
Syntax
REV16{cond} Rd, Rn

where:
cond

is an optional condition code.
Rd

is the destination register.
Rn

is the register holding the operand.
Usage
You can use this instruction to change endianness. REV16 converts 16-bit big-endian data into littleendian data or 16-bit little-endian data into big-endian data.
Register restrictions
You cannot use PC for any register.
You can use SP in the A32 instruction but this is deprecated. You cannot use SP in the T32 instruction.
Condition flags
This instruction does not change the flags.
16-bit instructions
The following form of this instruction is available in T32 code, and is a 16-bit instruction:
REV16 Rd, Rm
Rd and Rm must both be Lo registers.

Architectures
This instruction is available in A32 and T32.
Example
REV16

r0, r0

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-469

13 A32 and T32 Instructions
13.95 REVSH

13.95

REVSH
Reverse the byte order in the bottom halfword, and sign extend to 32 bits.
Syntax
REVSH{cond} Rd, Rn

where:
cond

is an optional condition code.
Rd

is the destination register.
Rn

is the register holding the operand.
Usage
You can use this instruction to change endianness. REVSH converts either:
• 16-bit signed big-endian data into 32-bit signed little-endian data.
• 16-bit signed little-endian data into 32-bit signed big-endian data.
Register restrictions
You cannot use PC for any register.
You can use SP in the A32 instruction but this is deprecated. You cannot use SP in the T32 instruction.
Condition flags
This instruction does not change the flags.
16-bit instructions
The following form of this instruction is available in T32 code, and is a 16-bit instruction:
REVSH Rd, Rm
Rd and Rm must both be Lo registers.

Architectures
This instruction is available in A32 and T32.
Example
REVSH

r0, r5

; Reverse Signed Halfword

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-470

13 A32 and T32 Instructions
13.96 RFE

13.96

RFE
Return From Exception.
Syntax
RFE{addr_mode}{cond} Rn{!}

where:
addr_mode

is any one of the following:
IA

Increment address After each transfer (Full Descending stack)
IB

Increment address Before each transfer (A32 only)
DA

Decrement address After each transfer (A32 only)
DB

Decrement address Before each transfer.
If addr_mode is omitted, it defaults to Increment After.
cond

is an optional condition code.
Note
cond is permitted only in T32 code, using a preceding IT instruction, but this is deprecated in
ARMv8. This is an unconditional instruction in A32 code.
Rn

specifies the base register. Rn must not be PC.
!

is an optional suffix. If ! is present, the final address is written back into Rn.
Usage
You can use RFE to return from an exception if you previously saved the return state using the SRS
instruction. Rn is usually the SP where the return state information was saved.
Operation
Loads the PC and the CPSR from the address contained in Rn, and the following address. Optionally
updates Rn.
Notes
RFE writes an address to the PC. The alignment of this address must be correct for the instruction set in
use after the exception return:
• For a return to A32, the address written to the PC must be word-aligned.
• For a return to T32, the address written to the PC must be halfword-aligned.
• For a return to Jazelle, there are no alignment restrictions on the address written to the PC.
No special precautions are required in software to follow these rules, if you use the instruction to return
after a valid exception entry mechanism.
Where addresses are not word-aligned, RFE ignores the least significant two bits of Rn.
The time order of the accesses to individual words of memory generated by RFE is not architecturally
defined. Do not use this instruction on memory-mapped I/O locations where access order matters.
Do not use RFE in unprivileged software execution.
ARM DUI0801G

13-471

13 A32 and T32 Instructions
13.96 RFE

Architectures
This instruction is available in A32.
This 32-bit T32 instruction is available, except in the ARMv7-M and ARMv8-M.mainline architectures.
There is no 16-bit version of this instruction.
Example
RFE sp!

Related concepts
3.2 Processor modes, and privileged and unprivileged software execution on page 3-65.
Related references
13.136 SRS on page 13-518.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-472

13 A32 and T32 Instructions
13.97 ROR

13.97

ROR
Rotate Right. This instruction is a preferred synonym for MOV instructions with shifted register operands.
Syntax
ROR{S}{cond} Rd, Rm, Rs
ROR{S}{cond} Rd, Rm, #sh

where:
S

is an optional suffix. If S is specified, the condition flags are updated on the result of the
operation.
Rd

is the destination register.
Rm

is the register holding the first operand. This operand is shifted right.
Rs

is a register holding a shift value to apply to the value in Rm. Only the least significant byte is
used.
sh

is a constant shift. The range of values is 1-31.
Operation
ROR provides the value of the contents of a register rotated by a value. The bits that are rotated off the

right end are inserted into the vacated bit positions on the left.
Restrictions in T32 code
T32 instructions must not use PC or SP.
Use of SP and PC in A32 instructions
You can use SP in these A32 instructions but this is deprecated.
You cannot use PC in instructions with the ROR{S}{cond} Rd, Rm, Rs syntax. You can use PC for Rd
and Rm in the other syntax, but this is deprecated.
If you use PC as Rm, the value used is the address of the instruction plus 8.
If you use PC as Rd:
• Execution branches to the address corresponding to the result.
• If you use the S suffix, the SPSR of the current mode is copied to the CPSR. You can use this to
return from exceptions.
Note
The A32 instruction RORS{cond} pc,Rm,#sh always disassembles to the preferred form MOVS{cond}
pc,Rm{,shift}.

ARM DUI0801G

13-473

13 A32 and T32 Instructions
13.97 ROR

Condition flags
If S is specified, the instruction updates the N and Z flags according to the result.
The C flag is unaffected if the shift value is 0. Otherwise, the C flag is updated to the last bit shifted out.
16-bit instructions
The following forms of this instruction are available in T32 code, and are 16-bit instructions:
RORS Rd, Rd, Rs
Rd and Rs must both be Lo registers. This form can only be used outside an IT block.
ROR{cond} Rd, Rd, Rs
Rd and Rs must both be Lo registers. This form can only be used inside an IT block.

Architectures
This instruction is available in A32 and T32.
Example
ROR

r4, r5, r6

Related references
13.63 MOV on page 13-431.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-474

13 A32 and T32 Instructions
13.98 RRX

13.98

RRX
Rotate Right with Extend. This instruction is a preferred synonym for MOV instructions with shifted
register operands.
Syntax
RRX{S}{cond} Rd, Rm

where:
S

is an optional suffix. If S is specified, the condition flags are updated on the result of the
operation.
Rd

is the destination register.
Rm

is the register holding the first operand. This operand is shifted right.
Operation
RRX provides the value of the contents of a register shifted right one bit. The old carry flag is shifted into
bit[31]. If the S suffix is present, the old bit[0] is placed in the carry flag.

Restrictions in T32 code
T32 instructions must not use PC or SP.
Use of SP and PC in A32 instructions
You can use SP in this A32 instruction but this is deprecated.
If you use PC as Rm, the value used is the address of the instruction plus 8.
If you use PC as Rd:
• Execution branches to the address corresponding to the result.
• If you use the S suffix, the SPSR of the current mode is copied to the CPSR. You can use this to
return from exceptions.
Note
The A32 instruction RRXS{cond} pc,Rm always disassembles to the preferred form MOVS{cond}
pc,Rm{,shift}.

Caution
Do not use the S suffix when using PC as Rd in User mode or System mode. The assembler cannot warn
you about this because it has no information about what the processor mode is likely to be at execution
time.
You cannot use PC for Rd or any operand in this instruction if it has a register-controlled shift.
Condition flags
If S is specified, the instruction updates the N and Z flags according to the result.
The C flag is unaffected if the shift value is 0. Otherwise, the C flag is updated to the last bit shifted out.
Architectures
The 32-bit instruction is available in A32 and T32.

ARM DUI0801G

13-475

13 A32 and T32 Instructions
13.98 RRX

There is no 16-bit instruction in T32.
Related references
13.63 MOV on page 13-431.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-476

13 A32 and T32 Instructions
13.99 RSB

13.99

RSB
Reverse Subtract without carry.
Syntax
RSB{S}{cond} {Rd}, Rn, Operand2

where:
S

is an optional suffix. If S is specified, the condition flags are updated on the result of the
operation.
cond

is an optional condition code.
Rd

is the destination register.
Rn

is the register holding the first operand.
Operand2

is a flexible second operand.
Operation
The RSB instruction subtracts the value in Rn from the value of Operand2. This is useful because of the
wide range of options for Operand2.
In certain circumstances, the assembler can substitute one instruction for another. Be aware of this when
reading disassembly listings.
Use of PC and SP in T32 instructions
You cannot use PC (R15) for Rd or any operand.
You cannot use SP (R13) for Rd or any operand.
Use of PC and SP in A32 instructions
You cannot use PC for Rd or any operand in an RSB instruction that has a register-controlled shift.
Use of PC for any operand, in instructions without register-controlled shift, is deprecated.
If you use PC (R15) as Rn or Rm, the value used is the address of the instruction plus 8.
If you use PC as Rd:
• Execution branches to the address corresponding to the result.
• If you use the S suffix, see the SUBS pc,lr instruction.
Use of SP and PC in A32 instructions is deprecated.
Condition flags
If S is specified, the RSB instruction updates the N, Z, C and V flags according to the result.
16-bit instructions
The following forms of this instruction are available in T32 code, and are 16-bit instructions:
RSBS Rd, Rn, #0
Rd and Rn must both be Lo registers. This form can only be used outside an IT block.
RSB{cond} Rd, Rn, #0
Rd and Rn must both be Lo registers. This form can only be used inside an IT block.

ARM DUI0801G

13-477

13 A32 and T32 Instructions
13.99 RSB

Example
RSB

r4, r4, #1280

; subtracts contents of R4 from 1280

Related references
13.3 Flexible second operand (Operand2) on page 13-338.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-478

13 A32 and T32 Instructions
13.100 RSC

13.100

RSC
Reverse Subtract with Carry.
Syntax
RSC{S}{cond} {Rd}, Rn, Operand2

where:
S

is an optional suffix. If S is specified, the condition flags are updated on the result of the
operation.
cond

is an optional condition code.
Rd

is the destination register.
Rn

is the register holding the first operand.
Operand2

is a flexible second operand.
Usage
The RSC instruction subtracts the value in Rn from the value of Operand2. If the carry flag is clear, the
result is reduced by one.
You can use RSC to synthesize multiword arithmetic.
In certain circumstances, the assembler can substitute one instruction for another. Be aware of this when
reading disassembly listings.
RSC is not available in T32 code.

Use of PC and SP
Use of PC and SP is deprecated.
You cannot use PC for Rd or any operand in an RSC instruction that has a register-controlled shift.
If you use PC (R15) as Rn or Rm, the value used is the address of the instruction plus 8.
If you use PC as Rd:
• Execution branches to the address corresponding to the result.
• If you use the S suffix, see the SUBS pc,lr instruction.
Condition flags
If S is specified, the RSC instruction updates the N, Z, C and V flags according to the result.
Correct example
RSCSLE

r0,r5,r0,LSL r4

; conditional, flags set

Incorrect example
RSCSLE

r0,pc,r0,LSL r4

; PC not permitted with register
; controlled shift

Related references
13.3 Flexible second operand (Operand2) on page 13-338.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-479

13 A32 and T32 Instructions
13.101 SADD8

13.101

SADD8
Signed parallel byte-wise addition.
Syntax
SADD8{cond} {Rd}, Rn, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm, Rn

are the ARM registers holding the operands.
Operation
This instruction performs four signed integer additions on the corresponding bytes of the operands and
writes the results into the corresponding bytes of the destination. The results are modulo 28. It sets the
APSR GE flags.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
GE flags
This instruction does not affect the N, Z, C, V, or Q flags.
It sets the GE flags in the APSR as follows:
GE[0]

for bits[7:0] of the result.
GE[1]

for bits[15:8] of the result.
GE[2]

for bits[23:16] of the result.
GE[3]

for bits[31:24] of the result.
It sets a GE flag to 1 to indicate that the corresponding result is greater than or equal to zero. This is
equivalent to an ADDS instruction setting the N and V condition flags to the same value, so that the GE
condition passes.
You can use these flags to control a following SEL instruction.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Related references
13.107 SEL on page 13-487.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-480

13 A32 and T32 Instructions
13.102 SADD16

13.102

SADD16
Signed parallel halfword-wise addition.
Syntax
SADD16{cond} {Rd}, Rn, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm, Rn

are the ARM registers holding the operands.
Operation
This instruction performs two signed integer additions on the corresponding halfwords of the operands
and writes the results into the corresponding halfwords of the destination. The results are modulo 216. It
sets the APSR GE flags.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
GE flags
This instruction does not affect the N, Z, C, V, or Q flags.
It sets the GE flags in the APSR as follows:
GE[1:0]

for bits[15:0] of the result.
GE[3:2]

for bits[31:16] of the result.
It sets a pair of GE flags to 1 to indicate that the corresponding result is greater than or equal to zero.
This is equivalent to an ADDS instruction setting the N and V condition flags to the same value, so that the
GE condition passes.
You can use these flags to control a following SEL instruction.
Note
GE[1:0] are set or cleared together, and GE[3:2] are set or cleared together.

Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Related references
13.107 SEL on page 13-487.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-481

13 A32 and T32 Instructions
13.103 SASX

13.103

SASX
Signed parallel add and subtract halfwords with exchange.
Syntax
SASX{cond} {Rd}, Rn, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm, Rn

for bits[15:0] of the result.
GE[3:2]

for bits[31:16] of the result.
It sets a pair of GE flags to 1 to indicate that the corresponding result is greater than or equal to zero.
This is equivalent to an ADDS or SUBS instruction setting the N and V condition flags to the same value,
so that the GE condition passes.
You can use these flags to control a following SEL instruction.
Note
GE[1:0] are set or cleared together, and GE[3:2] are set or cleared together.

13-482

13 A32 and T32 Instructions
13.104 SBC

13.104

SBC
Subtract with Carry.
Syntax
SBC{S}{cond} {Rd}, Rn, Operand2

where:
S

is an optional suffix. If S is specified, the condition flags are updated on the result of the
operation.
cond

is an optional condition code.
Rd

is the destination register.
Rn

is the register holding the first operand.
Operand2

is a flexible second operand.
Usage
The SBC (Subtract with Carry) instruction subtracts the value of Operand2 from the value in Rn. If the
carry flag is clear, the result is reduced by one.
You can use SBC to synthesize multiword arithmetic.
In certain circumstances, the assembler can substitute one instruction for another. Be aware of this when
reading disassembly listings.
Use of PC and SP in T32 instructions
You cannot use PC (R15) for Rd, or any operand.
You cannot use SP (R13) for Rd, or any operand.
Use of PC and SP in A32 instructions
You cannot use PC for Rd or any operand in an SBC instruction that has a register-controlled shift.
Use of PC for any operand in instructions without register-controlled shift, is deprecated.
If you use PC (R15) as Rn or Rm, the value used is the address of the instruction plus 8.
If you use PC as Rd:
• Execution branches to the address corresponding to the result.
• If you use the S suffix, see the SUBS pc,lr instruction.
Use of SP and PC in SBC A32 instructions is deprecated.
Condition flags
If S is specified, the SBC instruction updates the N, Z, C and V flags according to the result.
16-bit instructions
The following forms of this instruction are available in T32 code, and are 16-bit instructions:
SBCS Rd, Rd, Rm
Rd and Rm must both be Lo registers. This form can only be used outside an IT block.
SBC{cond} Rd, Rd, Rm
Rd and Rm must both be Lo registers. This form can only be used inside an IT block.
ARM DUI0801G

13-483

13 A32 and T32 Instructions
13.104 SBC

Multiword arithmetic examples
These instructions subtract one 96-bit integer contained in R9, R10, and R11 from another 96-bit integer
contained in R6, R7, and R8, and place the result in R3, R4, and R5:
SUBS
SBCS
SBC

r3, r6, r9
r4, r7, r10
r5, r8, r11

For clarity, the above examples use consecutive registers for multiword values. There is no requirement
to do this. The following, for example, is perfectly valid:
SUBS
SBCS
SBC

r6, r6, r9
r9, r2, r1
r2, r8, r11

Related references
13.3 Flexible second operand (Operand2) on page 13-338.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-484

13 A32 and T32 Instructions
13.105 SBFX

13.105

SBFX
Signed Bit Field Extract.
Syntax
SBFX{cond} Rd, Rn, #lsb, #width

where:
cond

is an optional condition code.
Rd

is the destination register.
Rn

is the source register.
lsb

is the bit number of the least significant bit in the bitfield, in the range 0 to 31.
width

is the width of the bitfield, in the range 1 to (32–lsb).
Operation
Copies adjacent bits from one register into the least significant bits of a second register, and sign extends
to 32 bits.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
This instruction does not alter any flags.
Architectures
This 32-bit instruction is available in A32 and T32.
There is no 16-bit version of this instruction in T32.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-485

13 A32 and T32 Instructions
13.106 SDIV

13.106

SDIV
Signed Divide.
Syntax
SDIV{cond} {Rd}, Rn, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rn

is the register holding the value to be divided.
Rm

is a register holding the divisor.
Register restrictions
PC or SP cannot be used for Rd, Rn, or Rm.
Architectures
This 32-bit T32 instruction is available in ARMv7-R, ARMv7-M, and ARMv8-M.mainline.
This 32-bit A32 instruction is optional in ARMv7-R.
This 32-bit A32 and T32 instruction is available in ARMv7-A if Virtualization Extensions are
implemented, and optional if not.
There is no 16-bit T32 SDIV instruction.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-486

13 A32 and T32 Instructions
13.107 SEL

13.107

SEL
Select bytes from each operand according to the state of the APSR GE flags.
Syntax
SEL{cond} {Rd}, Rn, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rn

is the register holding the first operand.
Rm

is the register holding the second operand.
Operation
The SEL instruction selects bytes from Rn or Rm according to the APSR GE flags:
• If GE[0] is set, Rd[7:0] come from Rn[7:0], otherwise from Rm[7:0].
• If GE[1] is set, Rd[15:8] come from Rn[15:8], otherwise from Rm[15:8].
• If GE[2] is set, Rd[23:16] come from Rn[23:16], otherwise from Rm[23:16].
• If GE[3] is set, Rd[31:24] come from Rn[31:24], otherwise from Rm[31:24].
Usage
Use the SEL instruction after one of the signed parallel instructions. You can use this to select maximum
or minimum values in multiple byte or halfword data.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
This instruction does not change the flags.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Examples
SEL
SELLT

r0, r4, r5
r4, r0, r4

The following instruction sequence sets each byte in R4 equal to the unsigned minimum of the
corresponding bytes of R1 and R2:
USUB8
SEL

r4, r1, r2
r4, r2, r1

Related concepts
3.11 Application Program Status Register on page 3-75.
ARM DUI0801G

13-487

13 A32 and T32 Instructions
13.107 SEL

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-488

13 A32 and T32 Instructions
13.108 SETEND

13.108

SETEND
Set the endianness bit in the CPSR, without affecting any other bits in the CPSR.
Note
This instruction is deprecated in ARMv8.

Syntax
SETEND specifier

where:
specifier

is one of:
BE

Big-endian.
LE

Little-endian.
Usage
Use SETEND to access data of different endianness, for example, to access several big-endian DMAformatted data fields from an otherwise little-endian application.
SETEND cannot be conditional, and is not permitted in an IT block.

Architectures
This instruction is available in A32 and 16-bit T32.
This 16-bit instruction is available in T32, except in the ARMv6-M and ARMv7-M architectures.
There is no 32-bit version of this instruction in T32.
Example
SETEND
LDR
LDR
SETEND

ARM DUI0801G

BE
; Set the CPSR E bit for big-endian accesses
r0, [r2, #header]
r1, [r2, #CRC32]
le
; Set the CPSR E bit for little-endian accesses
; for the rest of the application

13-489

13 A32 and T32 Instructions
13.109 SETPAN

13.109

SETPAN
Set Privileged Access Never.
Syntax
SETPAN{q} #imm ; A1 general registers (A32)
SETPAN{q} #imm ; T1 general registers (T32)

Where:
q

See Standard assembler syntax fields in the ARMv8-A Architecture Reference Manual.
imm

Is the unsigned immediate 0 or 1.
Usage
Set Privileged Access Never writes a new value to PSTATE.PAN.
This instruction is available only in privileged mode and it is a NOP when executed in User mode.
Related references
13.1 A32 and T32 instruction summary on page 13-332.

ARM DUI0801G

13-490

13 A32 and T32 Instructions
13.110 SEV

13.110

SEV
Set Event.
Syntax
SEV{cond}

where:
cond

is an optional condition code.
Operation
This is a hint instruction. It is optional whether it is implemented or not. If it is not implemented, it
executes as a NOP. The assembler produces a diagnostic message if the instruction executes as a NOP on
the target.
SEV causes an event to be signaled to all cores within a multiprocessor system. If SEV is implemented,
WFE must also be implemented.

Availability
This instruction is available in A32 and T32.
Related references
13.111 SEVL on page 13-492.
13.75 NOP on page 13-447.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-491

13 A32 and T32 Instructions
13.111 SEVL

13.111

SEVL
Set Event Locally.
Note
This instruction is supported only in ARMv8.

Syntax
SEVL{cond}

where:
cond

is an optional condition code.
Operation
This is a hint instruction. It is optional whether it is implemented or not. If it is not implemented, it
executes as a NOP. armasm produces a diagnostic message if the instruction executes as a NOP on the
target.
SEVL causes an event to be signaled to all cores the current processor. SEVL is not required to affect other
processors although it is permitted to do so.

Availability
This instruction is available in A32 and T32.
Related references
13.110 SEV on page 13-491.
13.75 NOP on page 13-447.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-492

13 A32 and T32 Instructions
13.112 SG

13.112

SG
Secure Gateway.
Syntax
SG

Usage
Secure Gateway marks a valid branch target for branches from Non-secure code that wants to call Secure
code.

ARM DUI0801G

13-493

13 A32 and T32 Instructions
13.113 SHADD8

13.113

SHADD8
Signed halving parallel byte-wise addition.
Syntax
SHADD8{cond} {Rd}, Rn, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm, Rn

are the ARM registers holding the operands.
Operation
This instruction performs four signed integer additions on the corresponding bytes of the operands,
halves the results, and writes the results into the corresponding bytes of the destination. This cannot
cause overflow.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
This instruction does not affect the N, Z, C, V, Q, or GE flags.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-494

13 A32 and T32 Instructions
13.114 SHADD16

13.114

SHADD16
Signed halving parallel halfword-wise addition.
Syntax
SHADD16{cond} {Rd}, Rn, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm, Rn

are the ARM registers holding the operands.
Operation
This instruction performs two signed integer additions on the corresponding halfwords of the operands,
halves the results, and writes the results into the corresponding halfwords of the destination. This cannot
cause overflow.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
This instruction does not affect the N, Z, C, V, Q, or GE flags.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-495

13 A32 and T32 Instructions
13.115 SHASX

13.115

SHASX
Signed halving parallel add and subtract halfwords with exchange.
Syntax
SHASX{cond} {Rd}, Rn, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm, Rn

are the ARM registers holding the operands.
Operation
This instruction exchanges the two halfwords of the second operand, then performs an addition on the
two top halfwords of the operands and a subtraction on the bottom two halfwords. It halves the results
and writes them into the corresponding halfwords of the destination. This cannot cause overflow.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
This instruction does not affect the N, Z, C, V, Q, or GE flags.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-496

13 A32 and T32 Instructions
13.116 SHSAX

13.116

SHSAX
Signed halving parallel subtract and add halfwords with exchange.
Syntax
SHSAX{cond} {Rd}, Rn, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm, Rn

are the ARM registers holding the operands.
Operation
This instruction exchanges the two halfwords of the second operand, then performs a subtraction on the
two top halfwords of the operands and an addition on the bottom two halfwords. It halves the results and
writes them into the corresponding halfwords of the destination. This cannot cause overflow.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
This instruction does not affect the N, Z, C, V, Q, or GE flags.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-497

13 A32 and T32 Instructions
13.117 SHSUB8

13.117

SHSUB8
Signed halving parallel byte-wise subtraction.
Syntax
SHSUB8{cond} {Rd}, Rn, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm, Rn

are the ARM registers holding the operands.
Operation
This instruction subtracts each byte of the second operand from the corresponding byte of the first
operand, halves the results, and writes the results into the corresponding bytes of the destination. This
cannot cause overflow.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
This instruction does not affect the N, Z, C, V, Q, or GE flags.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-498

13 A32 and T32 Instructions
13.118 SHSUB16

13.118

SHSUB16
Signed halving parallel halfword-wise subtraction.
Syntax
SHSUB16{cond} {Rd}, Rn, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm, Rn

are the ARM registers holding the operands.
Operation
This instruction subtracts each halfword of the second operand from the corresponding halfword of the
first operand, halves the results, and writes the results into the corresponding halfwords of the
destination. This cannot cause overflow.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
This instruction does not affect the N, Z, C, V, Q, or GE flags.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-499

13 A32 and T32 Instructions
13.119 SMC

13.119

SMC
Secure Monitor Call.
Syntax
SMC{cond} #imm4

where:
cond

is an optional condition code.
imm4

is a 4-bit immediate value. This is ignored by the ARM processor, but can be used by the SMC
exception handler to determine what service is being requested.
Note
SMC was called SMI in earlier versions of the A32 assembly language. SMI instructions disassemble to
SMC, with a comment to say that this was formerly SMI.

Architectures
This 32-bit instruction is available in A32 and T32, if the ARM architecture has the Security Extensions.
There is no 16-bit version of this instruction in T32.
Related references
7.11 Condition code suffixes on page 7-150.
Related information
ARM Architecture Reference Manual.

ARM DUI0801G

13-500

13 A32 and T32 Instructions
13.120 SMLAxy

13.120

SMLAxy
Signed Multiply Accumulate, with 16-bit operands and a 32-bit result and accumulator.
Syntax
SMLA{cond} Rd, Rn, Rm, Ra

where:

is either B or T. B means use the bottom half (bits [15:0]) of Rn, T means use the top half (bits
[31:16]) of Rn.

is either B or T. B means use the bottom half (bits [15:0]) of Rm, T means use the top half (bits
[31:16]) of Rm.
cond

is an optional condition code.
Rd

is the destination register.
Rn, Rm

are the registers holding the values to be multiplied.
Ra

is the register holding the value to be added.
Operation
SMLAxy multiplies the 16-bit signed integers from the selected halves of Rn and Rm, adds the 32-bit result
to the 32-bit value in Ra, and places the result in Rd.

r0, r2, r1, r10
r0, r0, r3, r5

Related references
13.68 MRS (PSR to general-purpose register) on page 13-437.
ARM DUI0801G

13-501

13 A32 and T32 Instructions
13.120 SMLAxy

13.71 MSR (general-purpose register to PSR) on page 13-441.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-502

13 A32 and T32 Instructions
13.121 SMLAD

13.121

SMLAD
Dual 16-bit Signed Multiply with Addition of products and 32-bit accumulation.
Syntax
SMLAD{X}{cond} Rd, Rn, Rm, Ra

where:
cond

is an optional condition code.
X

is an optional parameter. If X is present, the most and least significant halfwords of the second
operand are exchanged before the multiplications occur.
Rd

is the destination register.
Rn, Rm

are the registers holding the operands.
Ra

is the register holding the accumulate operand.
Operation
SMLAD multiplies the bottom halfword of Rn with the bottom halfword of Rm, and the top halfword of Rn
with the top halfword of Rm. It then adds both products to the value in Ra and stores the sum to Rd.

Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
This instruction does not change the flags.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Example
SMLADLT

r1, r2, r4, r1

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-503

13 A32 and T32 Instructions
13.122 SMLAL

13.122

SMLAL
Signed Long Multiply, with optional Accumulate, with 32-bit operands, and 64-bit result and
accumulator.
Syntax
SMLAL{S}{cond} RdLo, RdHi, Rn, Rm

where:
S

is an optional suffix available in A32 state only. If S is specified, the condition flags are updated
on the result of the operation.
cond

is an optional condition code.
RdLo, RdHi

are the destination registers. They also hold the accumulating value. RdLo and RdHi must be
different registers
Rn, Rm

are ARM registers holding the operands.
Operation
The SMLAL instruction interprets the values from Rn and Rm as two’s complement signed integers. It
multiplies these integers, and adds the 64-bit result to the 64-bit signed integer contained in RdHi and
RdLo.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
If S is specified, this instruction:
• Updates the N and Z flags according to the result.
• Does not affect the C or V flags.
Architectures
This instruction is available in A32 and T32.
There is no 16-bit version of this instruction in T32.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-504

13 A32 and T32 Instructions
13.123 SMLALD

13.123

SMLALD
Dual 16-bit Signed Multiply with Addition of products and 64-bit Accumulation.
Syntax
SMLALD{X}{cond} RdLo, RdHi, Rn, Rm

where:
X

is an optional parameter. If X is present, the most and least significant halfwords of the second
operand are exchanged before the multiplications occur.
cond

is an optional condition code.
RdLo, RdHi

are the destination registers for the 64-bit result. They also hold the 64-bit accumulate operand.
RdHi and RdLo must be different registers.
Rn, Rm

are the registers holding the operands.
Operation
SMLALD multiplies the bottom halfword of Rn with the bottom halfword of Rm, and the top halfword of Rn
with the top halfword of Rm. It then adds both products to the value in RdLo, RdHi and stores the sum to
RdLo, RdHi.

r10, r11, r5, r1

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-505

13 A32 and T32 Instructions
13.124 SMLALxy

13.124

SMLALxy
Signed Multiply-Accumulate with 16-bit operands and a 64-bit accumulator.
Syntax
SMLAL{cond} RdLo, RdHi, Rn, Rm

where:

is either B or T. B means use the bottom half (bits [15:0]) of Rn, T means use the top half (bits
[31:16]) of Rn.

is either B or T. B means use the bottom half (bits [15:0]) of Rm, T means use the top half (bits
[31:16]) of Rm.
cond

is an optional condition code.
RdLo, RdHi

are the destination registers. They also hold the accumulate value. RdHi and RdLo must be
different registers.
Rn, Rm

are the registers holding the values to be multiplied.
Operation
SMLALxy multiplies the signed integer from the selected half of Rm by the signed integer from the selected
half of Rn, and adds the 32-bit result to the 64-bit value in RdHi and RdLo.

Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
This instruction does not change the flags.
Note
SMLALxy cannot raise an exception. If overflow occurs on this instruction, the result wraps round without
any warning.

r2, r3, r7, r1
r0, r1, r9, r2

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-506

13 A32 and T32 Instructions
13.125 SMLAWy

13.125

SMLAWy
Signed Multiply-Accumulate Wide, with one 32-bit and one 16-bit operand, and a 32-bit accumulate
value, providing the top 32 bits of the result.
Syntax
SMLAW{cond} Rd, Rn, Rm, Ra

where:

is either B or T. B means use the bottom half (bits [15:0]) of Rm, T means use the top half (bits
[31:16]) of Rm.
cond

is an optional condition code.
Rd

is the destination register.
Rn, Rm

are the registers holding the values to be multiplied.
Ra

is the register holding the value to be added.
Operation
SMLAWy multiplies the signed 16-bit integer from the selected half of Rm by the signed 32-bit integer from
Rn, adds the top 32 bits of the 48-bit result to the 32-bit value in Ra, and places the result in Rd.

Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
This instruction does not affect the N, Z, C, or V flags.
If overflow occurs in the accumulation, SMLAWy sets the Q flag.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Related references
13.68 MRS (PSR to general-purpose register) on page 13-437.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-507

13 A32 and T32 Instructions
13.126 SMLSD

13.126

SMLSD
Dual 16-bit Signed Multiply with Subtraction of products and 32-bit accumulation.
Syntax
SMLSD{X}{cond} Rd, Rn, Rm, Ra

where:
cond

is an optional condition code.
X

is an optional parameter. If X is present, the most and least significant halfwords of the second
operand are exchanged before the multiplications occur.
Rd

is the destination register.
Rn, Rm

are the registers holding the operands.
Ra

is the register holding the accumulate operand.
Operation
SMLSD multiplies the bottom halfword of Rn with the bottom halfword of Rm, and the top halfword of Rn
with the top halfword of Rm. It then subtracts the second product from the first, adds the difference to the
value in Ra, and stores the result to Rd.

r1, r2, r0, r7
r11, r10, r2, r3

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-508

13 A32 and T32 Instructions
13.127 SMLSLD

13.127

SMLSLD
Dual 16-bit Signed Multiply with Subtraction of products and 64-bit accumulation.
Syntax
SMLSD{X}{cond} RdLo, RdHi, Rn, Rm

where:
X

is an optional parameter. If X is present, the most and least significant halfwords of the second
operand are exchanged before the multiplications occur.
cond

is an optional condition code.
RdLo, RdHi

are the destination registers for the 64-bit result. They also hold the 64-bit accumulate operand.
RdHi and RdLo must be different registers.
Rn, Rm

are the registers holding the operands.
Operation
SMLSLD multiplies the bottom halfword of Rn with the bottom halfword of Rm, and the top halfword of Rn
with the top halfword of Rm. It then subtracts the second product from the first, adds the difference to the
value in RdLo, RdHi, and stores the result to RdLo, RdHi.

r3, r0, r5, r1

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-509

13 A32 and T32 Instructions
13.128 SMMLA

13.128

SMMLA
Signed Most significant word Multiply with Accumulation.
Syntax
SMMLA{R}{cond} Rd, Rn, Rm, Ra

where:
R

is an optional parameter. If R is present, the result is rounded, otherwise it is truncated.
cond

is an optional condition code.
Rd

is the destination register.
Rn, Rm

are the registers holding the operands.
Ra

is a register holding the value to be added or subtracted from.
Operation
SMMLA multiplies the values from Rn and Rm, adds the value in Ra to the most significant 32 bits of the
product, and stores the result in Rd.

If the optional R parameter is specified, 0x80000000 is added before extracting the most significant 32
bits. This has the effect of rounding the result.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
This instruction does not change the flags.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-510

13 A32 and T32 Instructions
13.129 SMMLS

13.129

SMMLS
Signed Most significant word Multiply with Subtraction.
Syntax
SMMLS{R}{cond} Rd, Rn, Rm, Ra

where:
R

is an optional parameter. If R is present, the result is rounded, otherwise it is truncated.
cond

is an optional condition code.
Rd

is the destination register.
Rn, Rm

are the registers holding the operands.
Ra

is a register holding the value to be added or subtracted from.
Operation
SMMLS multiplies the values from Rn and Rm, subtracts the product from the value in Ra shifted left by 32
bits, and stores the most significant 32 bits of the result in Rd.

ARM DUI0801G

13-511

13 A32 and T32 Instructions
13.130 SMMUL

13.130

SMMUL
Signed Most significant word Multiply.
Syntax
SMMUL{R}{cond} {Rd}, Rn, Rm

where:
R

is an optional parameter. If R is present, the result is rounded, otherwise it is truncated.
cond

is an optional condition code.
Rd

is the destination register.
Rn, Rm

are the registers holding the operands.
Ra

is a register holding the value to be added or subtracted from.
Operation
SMMUL multiplies the 32-bit values from Rn and Rm, and stores the most significant 32 bits of the 64-bit
result to Rd.

r6, r4, r3
r2, r2, r2

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-512

13 A32 and T32 Instructions
13.131 SMUAD

13.131

SMUAD
Dual 16-bit Signed Multiply with Addition of products, and optional exchange of operand halves.
Syntax
SMUAD{X}{cond} {Rd}, Rn, Rm

where:
X

is an optional parameter. If X is present, the most and least significant halfwords of the second
operand are exchanged before the multiplications occur.
cond

is an optional condition code.
Rd

is the destination register.
Rn, Rm

are the registers holding the operands.
Operation
SMUAD multiplies the bottom halfword of Rn with the bottom halfword of Rm, and the top halfword of Rn
with the top halfword of Rm. It then adds the products and stores the sum to Rd.

Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Q flag
The SMUAD instruction sets the Q flag if the addition overflows.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Examples
SMUAD

r2, r3, r2

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-513

13 A32 and T32 Instructions
13.132 SMULxy

13.132

SMULxy
Signed Multiply, with 16-bit operands and a 32-bit result.
Syntax
SMUL{cond} {Rd}, Rn, Rm

where:

is either B or T. B means use the bottom half (bits [15:0]) of Rn, T means use the top half (bits
[31:16]) of Rn.

is either B or T. B means use the bottom half (bits [15:0]) of Rm, T means use the top half (bits
[31:16]) of Rm.
cond

is an optional condition code.
Rd

is the destination register.
Rn, Rm

are the registers holding the values to be multiplied.
Operation
SMULxy multiplies the 16-bit signed integers from the selected halves of Rn and Rm, and places the 32-bit
result in Rd.

Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
These instructions do not affect the N, Z, C, or V flags.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Examples
SMULTBEQ

r8, r7, r9

Related references
13.68 MRS (PSR to general-purpose register) on page 13-437.
13.71 MSR (general-purpose register to PSR) on page 13-441.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-514

13 A32 and T32 Instructions
13.133 SMULL

13.133

SMULL
Signed Long Multiply, with 32-bit operands and 64-bit result.
Syntax
SMULL{S}{cond} RdLo, RdHi, Rn, Rm

where:
S

is an optional suffix available in A32 state only. If S is specified, the condition flags are updated
on the result of the operation.
cond

is an optional condition code.
RdLo, RdHi

are the destination registers. RdLo and RdHi must be different registers
Rn, Rm

are ARM registers holding the operands.
Operation
The SMULL instruction interprets the values from Rn and Rm as two’s complement signed integers. It
multiplies these integers and places the least significant 32 bits of the result in RdLo, and the most
significant 32 bits of the result in RdHi.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
If S is specified, this instruction:
• Updates the N and Z flags according to the result.
• Does not affect the C or V flags.
Architectures
This instruction is available in A32 and T32.
There is no 16-bit version of this instruction in T32.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-515

13 A32 and T32 Instructions
13.134 SMULWy

13.134

SMULWy
Signed Multiply Wide, with one 32-bit and one 16-bit operand, providing the top 32 bits of the result.
Syntax
SMULW{cond} {Rd}, Rn, Rm

where:

is either B or T. B means use the bottom half (bits [15:0]) of Rm, T means use the top half (bits
[31:16]) of Rm.
cond

is an optional condition code.
Rd

is the destination register.
Rn, Rm

are the registers holding the values to be multiplied.
Operation
SMULWy multiplies the signed integer from the selected half of Rm by the signed integer from Rn, and
places the upper 32-bits of the 48-bit result in Rd.

Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
This instruction does not affect the N, Z, C, or V flags.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Related references
13.68 MRS (PSR to general-purpose register) on page 13-437.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-516

13 A32 and T32 Instructions
13.135 SMUSD

13.135

SMUSD
Dual 16-bit Signed Multiply with Subtraction of products, and optional exchange of operand halves.
Syntax
SMUSD{X}{cond} {Rd}, Rn, Rm

where:
X

is an optional parameter. If X is present, the most and least significant halfwords of the second
operand are exchanged before the multiplications occur.
cond

is an optional condition code.
Rd

is the destination register.
Rn, Rm

are the registers holding the operands.
Operation
SMUSD multiplies the bottom halfword of Rn with the bottom halfword of Rm, and the top halfword of Rn
with the top halfword of Rm. It then subtracts the second product from the first, and stores the difference
to Rd.

Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Example
SMUSDXNE

r0, r1, r2

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-517

13 A32 and T32 Instructions
13.136 SRS

13.136

SRS
Store Return State onto a stack.
Syntax
SRS{addr_mode}{cond} sp{!}, #modenum
SRS{addr_mode}{cond} #modenum{!} ; This is pre-UAL syntax

where:
addr_mode

is any one of the following:
IA

Increment address After each transfer
IB

Increment address Before each transfer (A32 only)
DA

Decrement address After each transfer (A32 only)
DB

Decrement address Before each transfer (Full Descending stack).
If addr_mode is omitted, it defaults to Increment After. You can also use stack oriented
addressing mode suffixes, for example, when implementing stacks.
cond

is an optional condition code.
Note
cond is permitted only in T32 code, using a preceding IT instruction, but this is deprecated in
ARMv8. This is an unconditional instruction in A32.
!

is an optional suffix. If ! is present, the final address is written back into the SP of the mode
specified by modenum.
modenum

specifies the number of the mode whose banked SP is used as the base register. You must use
only the defined mode numbers.
Operation
SRS stores the LR and the SPSR of the current mode, at the address contained in SP of the mode
specified by modenum, and the following word respectively. Optionally updates SP of the mode specified
by modenum. This is compatible with the normal use of the STM instruction for stack accesses.

Note
For full descending stack, you must use SRSFD or SRSDB.

Usage
You can use SRS to store return state for an exception handler on a different stack from the one
automatically selected.
Notes
Where addresses are not word-aligned, SRS ignores the least significant two bits of the specified address.
The time order of the accesses to individual words of memory generated by SRS is not architecturally
defined. Do not use this instruction on memory-mapped I/O locations where access order matters.

ARM DUI0801G

13-518

13 A32 and T32 Instructions
13.136 SRS

Do not use SRS in User and System modes because these modes do not have a SPSR.
SRS is not permitted in a non-secure state if modenum specifies monitor mode.

EQU
SRSFD

16
sp,#R13_usr

Related concepts
6.16 Stack implementation using LDM and STM on page 6-122.
3.2 Processor modes, and privileged and unprivileged software execution on page 3-65.
Related references
13.49 LDM on page 13-407.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-519

13 A32 and T32 Instructions
13.137 SSAT

13.137

SSAT
Signed Saturate to any bit position, with optional shift before saturating.
Syntax
SSAT{cond} Rd, #sat, Rm{, shift}

where:
cond

is an optional condition code.
Rd

is the destination register.
sat

specifies the bit position to saturate to, in the range 1 to 32.
Rm

is the register containing the operand.
shift

is an optional shift. It must be one of the following:
ASR #n

where n is in the range 1-32 (A32) or 1-31 (T32)
LSL #n

where n is in the range 0-31.
Operation
The SSAT instruction applies the specified shift, then saturates a signed value to the signed range –2sat–1 ≤
x ≤ 2sat–1 –1.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Q flag
If saturation occurs, this instruction sets the Q flag. To read the state of the Q flag, use an MRS instruction.
Architectures
This instruction is available in A32 and T32.
There is no 16-bit version of this instruction in T32.
Example
SSAT

r7, #16, r7, LSL #4

Related references
13.138 SSAT16 on page 13-521.
13.68 MRS (PSR to general-purpose register) on page 13-437.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-520

13 A32 and T32 Instructions
13.138 SSAT16

13.138

SSAT16
Parallel halfword Saturate.
Syntax
SSAT16{cond} Rd, #sat, Rn

where:
cond

is an optional condition code.
Rd

is the destination register.
sat

specifies the bit position to saturate to, in the range 1 to 16.
Rn

is the register holding the operand.
Operation
Halfword-wise signed saturation to any bit position.
The SSAT16 instruction saturates each signed halfword to the signed range –2sat–1 ≤ x ≤ 2sat–1 –1.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Q flag
If saturation occurs on either halfword, this instruction sets the Q flag. To read the state of the Q flag, use
an MRS instruction.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Correct example
SSAT16

r7, #12, r7

Incorrect example
SSAT16

r1, #16, r2, LSL #4 ; shifts not permitted with halfword
; saturations

Related references
13.68 MRS (PSR to general-purpose register) on page 13-437.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-521

13 A32 and T32 Instructions
13.139 SSAX

13.139

SSAX
Signed parallel subtract and add halfwords with exchange.
Syntax
SSAX{cond} {Rd}, Rn, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm, Rn

for bits[15:0] of the result.
GE[3:2]

ARM DUI0801G

13-522

13 A32 and T32 Instructions
13.140 SSUB8

13.140

SSUB8
Signed parallel byte-wise subtraction.
Syntax
SSUB8{cond} {Rd}, Rn, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm, Rn

are the ARM registers holding the operands.
Operation
This instruction subtracts each byte of the second operand from the corresponding byte of the first
operand and writes the results into the corresponding bytes of the destination. The results are modulo 28.
It sets the APSR GE flags.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
GE flags
This instruction does not affect the N, Z, C, V, or Q flags.
It sets the GE flags in the APSR as follows:
GE[0]

for bits[7:0] of the result.
GE[1]

for bits[15:8] of the result.
GE[2]

for bits[23:16] of the result.
GE[3]

for bits[31:24] of the result.
It sets a GE flag to 1 to indicate that the corresponding result is greater than or equal to zero. This is
equivalent to a SUBS instruction setting the N and V condition flags to the same value, so that the GE
condition passes.
You can use these flags to control a following SEL instruction.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Related references
13.107 SEL on page 13-487.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-523

13 A32 and T32 Instructions
13.141 SSUB16

13.141

SSUB16
Signed parallel halfword-wise subtraction.
Syntax
SSUB16{cond} {Rd}, Rn, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm, Rn

for bits[15:0] of the result.
GE[3:2]

for bits[31:16] of the result.
It sets a pair of GE flags to 1 to indicate that the corresponding result is greater than or equal to zero.
This is equivalent to a SUBS instruction setting the N and V condition flags to the same value, so that the
GE condition passes.
You can use these flags to control a following SEL instruction.
Note
GE[1:0] are set or cleared together, and GE[3:2] are set or cleared together.

ARM DUI0801G

13-524

13 A32 and T32 Instructions
13.142 STC and STC2

13.142

STC and STC2
Transfer Data between memory and Coprocessor.
Note
STC2 is not supported in ARMv8.

where:
op

is one of STC or STC2.
cond

is an optional condition code.
In A32 code, cond is not permitted for STC2.
L

is an optional suffix specifying a long transfer.
coproc

is the name of the coprocessor the instruction is for. The standard name is pn, where n is an
integer whose value must be:
• In the range 0-15 in ARMv7 and earlier.
• 14 in ARMv8.
CRd

is the coprocessor register to store.
Rn

is the register on which the memory address is based. If PC is specified, the value used is the
address of the current instruction plus eight.
-

is an optional minus sign. If - is present, the offset is subtracted from Rn. Otherwise, the offset is
added to Rn.
offset

is an expression evaluating to a multiple of 4, in the range 0 to 1020.
!

is an optional suffix. If ! is present, the address including the offset is written back into Rn.
option

is a coprocessor option in the range 0-255, enclosed in braces.
Usage
The use of these instructions depends on the coprocessor. See the coprocessor documentation for details.
Architectures
These 32-bit instructions are available in A32 and T32.
There are no 16-bit versions of these instructions in T32.

ARM DUI0801G

13-525

13 A32 and T32 Instructions
13.142 STC and STC2

Register restrictions
You cannot use PC for Rn in the pre-index and post-index instructions. These are the forms that write
back to Rn.
You cannot use PC for Rn in T32 STC and STC2 instructions.
A32 STC and STC2 instructions where Rn is PC, are deprecated.
Related concepts
12.5 Register-relative and PC-relative expressions on page 12-302.
8.15 Address alignment in A32/T32 code on page 8-178.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-526

13 A32 and T32 Instructions
13.143 STL

13.143

STL
Store-Release Register.
Note
This instruction is supported only in ARMv8.

Syntax
STL{cond} Rt, [Rn]
STLB{cond} Rt, [Rn]
STLH{cond} Rt, [Rn]

where:
cond

is an optional condition code.
Rt

is the register to store.
Rn

is the register on which the memory address is based.
Operation
STL stores data to memory. If any loads or stores appear before a store-release in program order, then all

observers are guaranteed to observe the loads and stores before observing the store-release. Loads and
stores appearing after a store-release are unaffected.
If a store-release follows a load-acquire, each observer is guaranteed to observe them in program order.
There is no requirement that a store-release be paired with a load-acquire.
All store-release operations are multi-copy atomic, meaning that in a multiprocessing system, if one
observer observes a write to memory because of a store-release operation, then all observers observe it.
Also, all observers observe all such writes to the same location in the same order.
Restrictions
The address specified must be naturally aligned, or an alignment fault is generated.
The PC must not be used for Rt or Rn.
Availability
This 32-bit instruction is available in A32 and T32.
There is no 16-bit version of this instruction.
Related references
13.47 LDAEX on page 13-403.
13.46 LDA on page 13-402.
13.144 STLEX on page 13-528.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-527

13 A32 and T32 Instructions
13.144 STLEX

13.144

STLEX
Store-Release Register Exclusive.
Note
This instruction is supported only in ARMv8.

Syntax
STLEX{cond} Rd, Rt, [Rn]
STLEXB{cond} Rd, Rt, [Rn]
STLEXH{cond} Rd, Rt, [Rn]
STLEXD{cond} Rd, Rt, Rt2, [Rn]

where:
cond

is an optional condition code.
Rd

is the destination register for the returned status.
Rt

is the register to load or store.
Rt2

is the second register for doubleword loads or stores.
Rn

is the register on which the memory address is based.
Operation
STLEX performs a conditional store to memory. The conditions are as follows:

•

If the physical address does not have the Shared TLB attribute, and the executing processor has an
outstanding tagged physical address, the store takes place, the tag is cleared, and the value 0 is
returned in Rd.
If the physical address does not have the Shared TLB attribute, and the executing processor does not
have an outstanding tagged physical address, the store does not take place, and the value 1 is returned
in Rd.
If the physical address has the Shared TLB attribute, and the physical address is tagged as exclusive
access for the executing processor, the store takes place, the tag is cleared, and the value 0 is returned
in Rd.
If the physical address has the Shared TLB attribute, and the physical address is not tagged as
exclusive access for the executing processor, the store does not take place, and the value 1 is returned
in Rd.

If any loads or stores appear before STLEX in program order, then all observers are guaranteed to observe
the loads and stores before observing the store-release. Loads and stores appearing after STLEX are
unaffected.
All store-release operations are multi-copy atomic.
Restrictions
The PC must not be used for any of Rd, Rt, Rt2, or Rn.
For STLEX, Rd must not be the same register as Rt, Rt2, or Rn.

ARM DUI0801G

13-528

13 A32 and T32 Instructions
13.144 STLEX

For A32 instructions:
• SP can be used but use of SP for any of Rd, Rt, or Rt2 is deprecated.
• For STLEXD, Rt must be an even numbered register, and not LR.
• Rt2 must be R(t+1).
For T32 instructions, SP can be used for Rn, but must not be used for any of Rd, Rt, or Rt2.
Usage
Use LDAEX and STLEX to implement interprocess communication in multiple-processor and sharedmemory systems.
For reasons of performance, keep the number of instructions between corresponding LDAEX and STLEX
instructions to a minimum.
Note
The address used in a STLEX instruction must be the same as the address in the most recently executed
LDAEX instruction.

Availability
These 32-bit instructions are available in A32 and T32.
There are no 16-bit versions of these instructions.
Related concepts
8.15 Address alignment in A32/T32 code on page 8-178.
Related references
13.47 LDAEX on page 13-403.
13.143 STL on page 13-527.
13.46 LDA on page 13-402.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-529

13 A32 and T32 Instructions
13.145 STM

13.145

STM
Store Multiple registers.
Syntax
STM{addr_mode}{cond} Rn{!}, reglist{^}

where:
addr_mode

is any one of the following:
IA

Increment address After each transfer. This is the default, and can be omitted.
IB

Increment address Before each transfer (A32 only).
DA

Decrement address After each transfer (A32 only).
DB

Decrement address Before each transfer.
You can also use the stack-oriented addressing mode suffixes, for example when implementing
stacks.
cond

is an optional condition code.
Rn

is the base register, the ARM register holding the initial address for the transfer. Rn must not be
PC.
!

is an optional suffix. If ! is present, the final address is written back into Rn.
reglist

is a list of one or more registers to be stored, enclosed in braces. It can contain register ranges. It
must be comma-separated if it contains more than one register or register range. Any
combination of registers R0 to R15 (PC) can be transferred in A32 state, but there are some
restrictions in T32 state.
^

is an optional suffix, available in A32 state only. You must not use it in User mode or System
mode. Data is transferred into or out of the User mode registers instead of the current mode
registers.
Restrictions on reglist in 32-bit T32 instructions
In 32-bit T32 instructions:
• The SP cannot be in the list.
• The PC cannot be in the list.
• There must be two or more registers in the list.
If you write an STM instruction with only one register in reglist, the assembler automatically substitutes
the equivalent STR instruction. Be aware of this when comparing disassembly listings with source code.
You can use the --diag_warning 1645 assembler command-line option to check when an instruction
substitution occurs.
Restrictions on reglist in A32 instructions
A32 store instructions can have SP and PC in the reglist but these instructions that include SP or PC in
the reglist are deprecated.

ARM DUI0801G

13-530

13 A32 and T32 Instructions
13.145 STM

16-bit instruction
A 16-bit version of this instruction is available in T32 code.
The following restrictions apply to the 16-bit instruction:
• All registers in reglist must be Lo registers.
• Rn must be a Lo register.
• addr_mode must be omitted (or IA), meaning increment address after each transfer.
• Writeback must be specified for STM instructions.
Note
16-bit T32 STM instructions with writeback that specify Rn as the lowest register in the reglist are
deprecated.
In addition, the PUSH and POP instructions are subsets of the STM and LDM instructions and can therefore
be expressed using the STM and LDM instructions. Some forms of PUSH and POP are also 16-bit
instructions.
Storing the base register, with writeback
In A32 or 16-bit T32 instructions, if Rn is in reglist, and writeback is specified with the ! suffix:
• If the instruction is STM{addr_mode}{cond} and Rn is the lowest-numbered register in reglist, the
initial value of Rn is stored. These instructions are deprecated.
• Otherwise, the stored value of Rn cannot be relied on, so these instructions are not permitted.
32-bit T32 instructions are not permitted if Rn is in reglist, and writeback is specified with the ! suffix.
Correct example
STMDB

r1!,{r3-r6,r11,r12}

Incorrect example
STM

r5!,{r5,r4,r9} ; value stored for R5 unknown

ARM DUI0801G

13-531

13 A32 and T32 Instructions
13.146 STR (immediate offset)

13.146

STR (immediate offset)
Store with immediate offset, pre-indexed immediate offset, or post-indexed immediate offset.
Syntax
STR{type}{cond} Rt, [Rn {, #offset}] ; immediate offset
STR{type}{cond} Rt, [Rn, #offset]! ; pre-indexed
STR{type}{cond} Rt, [Rn], #offset ; post-indexed
STRD{cond} Rt, Rt2, [Rn {, #offset}] ; immediate offset, doubleword
STRD{cond} Rt, Rt2, [Rn, #offset]! ; pre-indexed, doubleword
STRD{cond} Rt, Rt2, [Rn], #offset ; post-indexed, doubleword

where:
type

can be any one of:
B

Byte
H

Halfword
-

omitted, for Word.
cond

is an optional condition code.
Rt

is the register to store.
Rn

is the register on which the memory address is based.
offset

is an offset. If offset is omitted, the address is the contents of Rn.
Rt2

is the additional register to store for doubleword operations.
Not all options are available in every instruction set and architecture.
Offset ranges and architectures
The following table shows the ranges of offsets and availability of this instruction:
Table 13-15 Offsets and architectures, STR, word, halfword, and byte
Instruction

Immediate offset Pre-indexed

Post-indexed

A32, word or byte

–4095 to 4095

A32, halfword

–255 to 255

A32, doubleword

–255 to 255

T32 32-bit encoding, word, halfword, or byte –255 to 4095

z
aa

T32 32-bit encoding, doubleword

–1020 to 1020 z

–1020 to 1020 z –1020 to 1020 z

T32 16-bit encoding, word aa

0 to 124 z

Not available

Must be divisible by 4.
Rt and Rn must be in the range R0-R7.

ARM DUI0801G

13-532

13 A32 and T32 Instructions
13.146 STR (immediate offset)

Table 13-15 Offsets and architectures, STR, word, halfword, and byte (continued)
Instruction
T32 16-bit encoding, halfword

Immediate offset Pre-indexed
aa

0 to 62

Post-indexed

Not available

T32 16-bit encoding, byte aa

0 to 31

Not available

T32 16-bit encoding, word, Rn is SP ab

0 to 1020 z

Not available

Doubleword register restrictions
Rn must be different from Rt2 in the pre-index and post-index forms.

For T32 instructions, you must not specify SP or PC for either Rt or Rt2.
For A32 instructions:
• Rt must be an even-numbered register.
• Rt must not be LR.
• ARM strongly recommends that you do not use R12 for Rt.
• Rt2 must be R(t + 1).
Use of PC
In A32 instructions you can use PC for Rt in STR word instructions and PC for Rn in STR instructions
with immediate offset syntax (that is the forms that do not writeback to the Rn). However, this is
deprecated.
Other uses of PC are not permitted in these A32 instructions.
In T32 code, using PC in STR instructions is not permitted.
Use of SP
You can use SP for Rn.
In A32 code, you can use SP for Rt in word instructions. You can use SP for Rt in non-word instructions
in A32 code but this is deprecated.
In T32 code, you can use SP for Rt in word instructions only. All other use of SP for Rt in this
instruction is not permitted in T32 code.
Example
STR

r2,[r9,#consta-struc]

; consta-struc is an expression
; evaluating to a constant in
; the range 0-4095.

Related concepts
8.15 Address alignment in A32/T32 code on page 8-178.
Related references
7.11 Condition code suffixes on page 7-150.

ab
ac

Rt must be in the range R0-R7.
Must be divisible by 2.

ARM DUI0801G

13-533

13 A32 and T32 Instructions
13.147 STR (register offset)

13.147

STR (register offset)
Store with register offset, pre-indexed register offset, or post-indexed register offset.
Syntax
STR{type}{cond} Rt, [Rn, ±Rm {, shift}] ; register offset
STR{type}{cond} Rt, [Rn, ±Rm {, shift}]! ; pre-indexed ; A32 only
STR{type}{cond} Rt, [Rn], ±Rm {, shift} ; post-indexed ; A32 only
STRD{cond} Rt, Rt2, [Rn, ±Rm] ; register offset, doubleword ; A32 only
STRD{cond} Rt, Rt2, [Rn, ±Rm]! ; pre-indexed, doubleword ; A32 only
STRD{cond} Rt, Rt2, [Rn], ±Rm ; post-indexed, doubleword ; A32 only

where:
type

can be any one of:
B

Byte
H

Halfword
-

omitted, for Word.
cond

is an optional condition code.
Rt

is the register to store.
Rn

is the register on which the memory address is based.
Rm

is a register containing a value to be used as the offset. –Rm is not permitted in T32 code.
shift

is an optional shift.
Rt2

is the additional register to store for doubleword operations.
Not all options are available in every instruction set and architecture.
Offset register and shift options
The following table shows the ranges of offsets and availability of this instruction:
Table 13-16 Options and architectures, STR (register offsets)
Instruction

+/–Rm ad shift

A32, word or byte

+/–Rm

LSL #0-31 LSR #1-32
ASR #1-32 ROR #1-31 RRX

ad
ae

A32, halfword

+/–Rm

Not available

A32, doubleword

+/–Rm

Not available

Where +/–Rm is shown, you can use –Rm, +Rm, or Rm. Where +Rm is shown, you cannot use –Rm.
Rt, Rn, and Rm must all be in the range R0-R7.

ARM DUI0801G

13-534

13 A32 and T32 Instructions
13.147 STR (register offset)

Table 13-16 Options and architectures, STR (register offsets) (continued)
Instruction

+/–Rm ad shift

T32 32-bit encoding, word, halfword, or byte

+Rm

T32 16-bit encoding, all except doubleword ae +Rm

LSL #0-3
Not available

Register restrictions
In the pre-index and post-index forms, Rn must be different from Rt.
Doubleword register restrictions
For A32 instructions:
• Rt must be an even-numbered register.
• Rt must not be LR.
• ARM strongly recommends that you do not use R12 for Rt.
• Rt2 must be R(t + 1).
• Rn must be different from Rt2 in the pre-index and post-index forms.
Use of PC
In A32 instructions you can use PC for Rt in STR word instructions, and you can use PC for Rn in STR
instructions with register offset syntax (that is, the forms that do not writeback to the Rn). However, this
is deprecated.
Other uses of PC are not permitted in A32 instructions.
Use of PC in STR T32 instructions is not permitted.
Use of SP
You can use SP for Rn.
In A32 code, you can use SP for Rt in word instructions. You can use SP for Rt in non-word A32
instructions but this is deprecated.
You can use SP for Rm in A32 instructions but this is deprecated.
In T32 code, you can use SP for Rt in word instructions only. All other use of SP for Rt in this
instruction is not permitted in T32 code.
Use of SP for Rm is not permitted in T32 state.
Related concepts
8.15 Address alignment in A32/T32 code on page 8-178.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-535

13 A32 and T32 Instructions
13.148 STR, unprivileged

13.148

STR, unprivileged
Unprivileged Store, byte, halfword, or word.
Syntax
STR{type}T{cond} Rt, [Rn {, #offset}] ; immediate offset (T32, 32-bit encoding only)
STR{type}T{cond} Rt, [Rn] {, #offset} ; post-indexed (A32 only)
STR{type}T{cond} Rt, [Rn], ±Rm {, shift} ; post-indexed (register) (A32 only)

where:
type

can be any one of:
B

Byte
H

Halfword
-

omitted, for Word.
cond

is an optional condition code.
Rt

is the register to load or store.
Rn

is the register on which the memory address is based.
offset

is an offset. If offset is omitted, the address is the value in Rn.
Rm

is a register containing a value to be used as the offset. Rm must not be PC.
shift

is an optional shift.
Operation
When these instructions are executed by privileged software, they access memory with the same
restrictions as they would have if they were executed by unprivileged software.
When executed by unprivileged software, these instructions behave in exactly the same way as the
corresponding store instruction, for example STRBT behaves in the same way as STRB.
Offset ranges and architectures
The following table shows the ranges of offsets and availability of this instruction:
Table 13-17 Offsets and architectures, STR (User mode)
Instruction

Immediate offset Post-indexed +/–Rm af shift

A32, word or byte

Not available

–4095 to 4095

+/–Rm

LSL #0-31
LSR #1-32
ASR #1-32
ROR #1-31
RRX

You can use –Rm, +Rm, or Rm.

ARM DUI0801G

13-536

13 A32 and T32 Instructions
13.148 STR, unprivileged

Table 13-17 Offsets and architectures, STR (User mode) (continued)
Instruction

Immediate offset Post-indexed +/–Rm af shift

A32, halfword

Not available

T32 32-bit encoding, word, halfword, or byte 0 to 255

–255 to 255

+/–Rm

Not available

Related concepts
8.15 Address alignment in A32/T32 code on page 8-178.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-537

13 A32 and T32 Instructions
13.149 STREX

13.149

STREX
Store Register Exclusive.
Syntax
STREX{cond} Rd, Rt, [Rn {, #offset}]
STREXB{cond} Rd, Rt, [Rn]
STREXH{cond} Rd, Rt, [Rn]
STREXD{cond} Rd, Rt, Rt2, [Rn]

where:
cond

is an optional condition code.
Rd

is the destination register for the returned status.
Rt

is the register to store.
Rt2

is the second register for doubleword stores.
Rn

is the register on which the memory address is based.
offset

is an optional offset applied to the value in Rn. offset is permitted only in T32 instructions. If
offset is omitted, an offset of 0 is assumed.
Operation
STREX performs a conditional store to memory. The conditions are as follows:
• If the physical address does not have the Shared TLB attribute, and the executing processor has an
outstanding tagged physical address, the store takes place, the tag is cleared, and the value 0 is
returned in Rd.
• If the physical address does not have the Shared TLB attribute, and the executing processor does not
have an outstanding tagged physical address, the store does not take place, and the value 1 is returned
in Rd.
• If the physical address has the Shared TLB attribute, and the physical address is tagged as exclusive
access for the executing processor, the store takes place, the tag is cleared, and the value 0 is returned
in Rd.
• If the physical address has the Shared TLB attribute, and the physical address is not tagged as
exclusive access for the executing processor, the store does not take place, and the value 1 is returned
in Rd.
Restrictions
PC must not be used for any of Rd, Rt, Rt2, or Rn.
For STREX, Rd must not be the same register as Rt, Rt2, or Rn.
For A32 instructions:
•
•
•
•

SP can be used but use of SP for any of Rd, Rt, or Rt2 is deprecated.
For STREXD, Rt must be an even numbered register, and not LR.
Rt2 must be R(t+1).
offset is not permitted.

For T32 instructions:
• SP can be used for Rn, but must not be used for any of Rd, Rt, or Rt2.
• The value of offset can be any multiple of four in the range 0-1020.
ARM DUI0801G

13-538

13 A32 and T32 Instructions
13.149 STREX

Usage
Use LDREX and STREX to implement interprocess communication in multiple-processor and sharedmemory systems.
For reasons of performance, keep the number of instructions between corresponding LDREX and STREX
instructions to a minimum.
Note
The address used in a STREX instruction must be the same as the address in the most recently executed
LDREX instruction.

Availability
All these 32-bit instructions are available in A32 and T32.
There are no 16-bit versions of these instructions.
Examples
try

MOV r1, #0x1

; load the ‘lock taken’ value

LDREX r0, [LockAddr]
CMP r0, #0
STREXEQ r0, r1, [LockAddr]
CMPEQ r0, #0
BNE try
....

;
;
;
;
;
;

load the lock value
is the lock free?
try and claim the lock
did this succeed?
no – try again
yes – we have the lock

Related concepts
8.15 Address alignment in A32/T32 code on page 8-178.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-539

13 A32 and T32 Instructions
13.150 SUB

13.150

SUB
Subtract without carry.
Syntax
SUB{S}{cond} {Rd}, Rn, Operand2
SUB{cond} {Rd}, Rn, #imm12 ; T32, 32-bit encoding only

where:
S

is an optional suffix. If S is specified, the condition flags are updated on the result of the
operation.
cond

is an optional condition code.
Rd

is the destination register.
Rn

is the register holding the first operand.
Operand2

is a flexible second operand.
imm12

is any value in the range 0-4095.
Operation
The SUB instruction subtracts the value of Operand2 or imm12 from the value in Rn.
In certain circumstances, the assembler can substitute one instruction for another. Be aware of this when
reading disassembly listings.
Use of PC and SP in T32 instructions
In general, you cannot use PC (R15) for Rd, or any operand. The exception is you can use PC for Rn in
32-bit T32 SUB instructions, with a constant Operand2 value in the range 0-4095, and no S suffix. These
instructions are useful for generating PC-relative addresses. Bit[1] of the PC value reads as 0 in this case,
so that the base address for the calculation is always word-aligned.
Generally, you cannot use SP (R13) for Rd, or any operand, except that you can use SP for Rn.
Use of PC and SP in A32 instructions
You cannot use PC for Rd or any operand in a SUB instruction that has a register-controlled shift.
In SUB instructions without register-controlled shift, use of PC is deprecated except for the following
cases:
•
•

Use of PC for Rd.
Use of PC for Rn in the instruction SUB{cond} Rd, Rn, #Constant.

If you use PC (R15) as Rn or Rm, the value used is the address of the instruction plus 8.
If you use PC as Rd:
•
•

Execution branches to the address corresponding to the result.
If you use the S suffix, see the SUBS pc,lr instruction.

You can use SP for Rn in SUB instructions, however, SUBS PC, SP, #Constant is deprecated.
You can use SP in SUB (register) if Rn is SP and shift is omitted or LSL #1, LSL #2, or LSL #3.

ARM DUI0801G

13-540

13 A32 and T32 Instructions
13.150 SUB

Other uses of SP in A32 SUB instructions are deprecated.
Note
Use of SP and PC is deprecated in A32 instructions.

Condition flags
If S is specified, the SUB instruction updates the N, Z, C and V flags according to the result.
16-bit instructions
The following forms of this instruction are available in T32 code, and are 16-bit instructions:
SUBS Rd, Rn, Rm
Rd, Rn and Rm must all be Lo registers. This form can only be used outside an IT block.
SUB{cond} Rd, Rn, Rm
Rd, Rn and Rm must all be Lo registers. This form can only be used inside an IT block.
SUBS Rd, Rn, #imm
imm range 0-7. Rd and Rn must both be Lo registers. This form can only be used outside an IT

block.
SUB{cond} Rd, Rn, #imm
imm range 0-7. Rd and Rn must both be Lo registers. This form can only be used inside an IT

block.
SUBS Rd, Rd, #imm
imm range 0-255. Rd must be a Lo register. This form can only be used outside an IT block.
SUB{cond} Rd, Rd, #imm
imm range 0-255. Rd must be a Lo register. This form can only be used inside an IT block.
SUB{cond} SP, SP, #imm
imm range 0-508, word aligned.

Example
SUBS

r8, r6, #240

; sets the flags based on the result

r3, r6, r9
r4, r7, r10
r5, r8, r11

For clarity, the above examples use consecutive registers for multiword values. There is no requirement
to do this. The following, for example, is perfectly valid:
SUBS
SBCS
SBC

r6, r6, r9
r9, r2, r1
r2, r8, r11

Related references
13.3 Flexible second operand (Operand2) on page 13-338.
13.151 SUBS pc, lr on page 13-542.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-541

13 A32 and T32 Instructions
13.151 SUBS pc, lr

13.151

SUBS pc, lr
Exception return, without popping anything from the stack.
Syntax
SUBS{cond} pc, lr, #imm ; A32 and T32 code
MOVS{cond} pc, lr ; A32 and T32 code
op1S{cond} pc, Rn, #imm ; A32 code only and is deprecated
op1S{cond} pc, Rn, Rm {, shift} ; A32 code only and is deprecated
op2S{cond} pc, #imm ; A32 code only and is deprecated
op2S{cond} pc, Rm {, shift} ; A32 code only and is deprecated

where:
op1

is one of ADC, ADD, AND, BIC, EOR, ORN, ORR, RSB, RSC, SBC, and SUB.
op2

is one of MOV and MVN.
cond

is an optional condition code.
imm

is an immediate value. In T32 code, it is limited to the range 0-255. In A32 code, it is a flexible
second operand.
Rn

is the first operand register. ARM deprecates the use of any register except LR.
Rm

is the optionally shifted second or only operand register.
shift

is an optional condition code.
Usage
SUBS pc, lr, #imm subtracts a value from the link register and loads the PC with the result, then copies
the SPSR to the CPSR.

You can use SUBS pc, lr, #imm to return from an exception if there is no return state on the stack. The
value of #imm depends on the exception to return from.
Notes
SUBS pc, lr, #imm writes an address to the PC. The alignment of this address must be correct for the

instruction set in use after the exception return:
•
•
•

For a return to A32, the address written to the PC must be word-aligned.
For a return to T32, the address written to the PC must be halfword-aligned.
For a return to Jazelle, there are no alignment restrictions on the address written to the PC.

No special precautions are required in software to follow these rules, if you use the instruction to return
after a valid exception entry mechanism.
In T32, only SUBS{cond} pc, lr, #imm is a valid instruction. MOVS pc, lr is a synonym of SUBS pc,
lr, #0. Other instructions are undefined.

ARM DUI0801G

13-542

13 A32 and T32 Instructions
13.151 SUBS pc, lr

In A32, only SUBS{cond} pc, lr, #imm and MOVS{cond} pc, lr are valid instructions. Other
instructions are deprecated.
Caution
Do not use these instructions in User mode or System mode. The assembler cannot warn you about this.

Availability
This 32-bit instruction is available in A32 and T32.
The 32-bit T32 instruction is not available in the ARMv7-M architecture.
There is no 16-bit version of this instruction in T32.
Related references
13.13 AND on page 13-355.
13.63 MOV on page 13-431.
13.3 Flexible second operand (Operand2) on page 13-338.
13.9 ADD on page 13-347.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-543

13 A32 and T32 Instructions
13.152 SVC

13.152

SVC
SuperVisor Call.
Syntax
SVC{cond} #imm

where:
cond

is an optional condition code.
imm

is an expression evaluating to an integer in the range:
• 0 to 224-1 (a 24-bit value) in an A32 instruction.
• 0-255 (an 8-bit value) in a T32 instruction.
Operation
The SVC instruction causes an exception. This means that the processor mode changes to Supervisor, the
CPSR is saved to the Supervisor mode SPSR, and execution branches to the SVC vector.
imm is ignored by the processor. However, it can be retrieved by the exception handler to determine what
service is being requested.

Note
SVC was called SWI in earlier versions of the A32 assembly language. SWI instructions disassemble to
SVC, with a comment to say that this was formerly SWI.

Condition flags
This instruction does not change the flags.
Availability
This instruction is available in A32 and 16-bit T32 and in the ARMv7 architectures.
There is no 32-bit version of this instruction in T32.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-544

13 A32 and T32 Instructions
13.153 SWP and SWPB

13.153

SWP and SWPB
Swap data between registers and memory.
Note
These instruction are not supported in ARMv8.

Syntax
SWP{B}{cond} Rt, Rt2, [Rn]

where:
cond

is an optional condition code.
B

is an optional suffix. If B is present, a byte is swapped. Otherwise, a 32-bit word is swapped.
Rt

is the destination register. Rt must not be PC.
Rt2

is the source register. Rt2 can be the same register as Rt. Rt2 must not be PC.
Rn

contains the address in memory. Rn must be a different register from both Rt and Rt2. Rn must
not be PC.
Usage
You can use SWP and SWPB to implement semaphores:
• Data from memory is loaded into Rt.
• The contents of Rt2 are saved to memory.
• If Rt2 is the same register as Rt, the contents of the register are swapped with the contents of the
memory location.
Note
The use of SWP and SWPB is deprecated. You can use LDREX and STREX instructions to implement more
sophisticated semaphores.
Availability
These instructions are available in A32.
There are no T32 SWP or SWPB instructions.
Related references
13.56 LDREX on page 13-421.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-545

13 A32 and T32 Instructions
13.154 SXTAB

13.154

SXTAB
Sign extend Byte with Add, to extend an 8-bit value to a 32-bit value.
Syntax
SXTAB{cond} {Rd}, Rn, Rm {,rotation}

where:
cond

is an optional condition code.
Rd

is the destination register.
Rn

is the register holding the number to add.
Rm

is the register holding the value to extend.
rotation

is one of:
ROR #8

Value from Rm is rotated right 8 bits.
ROR #16

Value from Rm is rotated right 16 bits.
ROR #24

Value from Rm is rotated right 24 bits.
If rotation is omitted, no rotation is performed.
Operation
This instruction does the following:
1. Rotate the value from Rm right by 0, 8, 16 or 24 bits.
2. Extract bits[7:0] from the value obtained.
3. Sign extend to 32 bits.
4. Add the value from Rn.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
This instruction does not change the flags.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-546

13 A32 and T32 Instructions
13.155 SXTAB16

13.155

SXTAB16
Sign extend two Bytes with Add, to extend two 8-bit values to two 16-bit values.
Syntax
SXTAB16{cond} {Rd}, Rn, Rm {,rotation}

where:
cond

is an optional condition code.
Rd

is the destination register.
Rn

is the register holding the number to add.
Rm

is the register holding the value to extend.
rotation

is one of:
ROR #8

Value from Rm is rotated right 8 bits.
ROR #16

Value from Rm is rotated right 16 bits.
ROR #24

Value from Rm is rotated right 24 bits.
If rotation is omitted, no rotation is performed.
Operation
This instruction does the following:
1. Rotate the value from Rm right by 0, 8, 16 or 24 bits.
2. Extract bits[23:16] and bits[7:0] from the value obtained.
3. Sign extend to 16 bits.
4. Add them to bits[31:16] and bits[15:0] respectively of Rn to form bits[31:16] and bits[15:0] of the
result.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
This instruction does not change the flags.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-547

13 A32 and T32 Instructions
13.156 SXTAH

13.156

SXTAH
Sign extend Halfword with Add, to extend a 16-bit value to a 32-bit value.
Syntax
SXTAH{cond} {Rd}, Rn, Rm {,rotation}

where:
cond

is an optional condition code.
Rd

is the destination register.
Rn

is the register holding the number to add.
Rm

is the register holding the value to extend.
rotation

is one of:
ROR #8

Value from Rm is rotated right 8 bits.
ROR #16

Value from Rm is rotated right 16 bits.
ROR #24

Value from Rm is rotated right 24 bits.
If rotation is omitted, no rotation is performed.
Operation
This instruction does the following:
1. Rotate the value from Rm right by 0, 8, 16 or 24 bits.
2. Extract bits[15:0] from the value obtained.
3. Sign extend to 32 bits.
4. Add the value from Rn.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
This instruction does not change the flags.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-548

13 A32 and T32 Instructions
13.157 SXTB

13.157

SXTB
Sign extend Byte, to extend an 8-bit value to a 32-bit value.
Syntax
SXTB{cond} {Rd}, Rm {,rotation}

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm

is the register holding the value to extend.
rotation

is one of:
ROR #8

Value from Rm is rotated right 8 bits.
ROR #16

Value from Rm is rotated right 16 bits.
ROR #24

Value from Rm is rotated right 24 bits.
If rotation is omitted, no rotation is performed.
Operation
This instruction does the following:
1. Rotates the value from Rm right by 0, 8, 16 or 24 bits.
2. Extracts bits[7:0] from the value obtained.
3. Sign extends to 32 bits.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
This instruction does not change the flags.
16-bit instructions
The following form of this instruction is available in T32 code, and is a 16-bit instruction:
SXTB Rd, Rm
Rd and Rm must both be Lo registers.

Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
The 16-bit instruction is available in T32.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-549

13 A32 and T32 Instructions
13.158 SXTB16

13.158

SXTB16
Sign extend two bytes.
Syntax
SXTB16{cond} {Rd}, Rm {,rotation}

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm

is the register holding the value to extend.
rotation

is one of:
ROR #8

Value from Rm is rotated right 8 bits.
ROR #16

Value from Rm is rotated right 16 bits.
ROR #24

Value from Rm is rotated right 24 bits.
If rotation is omitted, no rotation is performed.
Operation
SXTB16 extends two 8-bit values to two 16-bit values. It does this by:
1. Rotating the value from Rm right by 0, 8, 16 or 24 bits.
2. Extracting bits[23:16] and bits[7:0] from the value obtained.
3. Sign extending to 16 bits each.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
This instruction does not change the flags.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-550

13 A32 and T32 Instructions
13.159 SXTH

13.159

SXTH
Sign extend Halfword.
Syntax
SXTH{cond} {Rd}, Rm {,rotation}

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm

is the register holding the value to extend.
rotation

is one of:
ROR #8

Value from Rm is rotated right 8 bits.
ROR #16

Value from Rm is rotated right 16 bits.
ROR #24

Value from Rm is rotated right 24 bits.
If rotation is omitted, no rotation is performed.
Operation
SXTH extends a 16-bit value to a 32-bit value. It does this by:

1. Rotating the value from Rm right by 0, 8, 16 or 24 bits.
2. Extracting bits[15:0] from the value obtained.
3. Sign extending to 32 bits.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
This instruction does not change the flags.
16-bit instructions
The following form of this instruction is available in T32 code, and is a 16-bit instruction:
SXTH Rd, Rm
Rd and Rm must both be Lo registers.

ARM DUI0801G

r3, r9, r4

13-551

13 A32 and T32 Instructions
13.159 SXTH

Incorrect example
SXTH

r9, r3, r2, ROR #12 ; rotation must be by 0, 8, 16, or 24.

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-552

13 A32 and T32 Instructions
13.160 SYS

13.160

SYS
Execute system coprocessor instruction.
Syntax
SYS{cond} instruction{, Rn}

where:
cond

is an optional condition code.
instruction

is the coprocessor instruction to execute.
Rn

is an operand to the instruction. For instructions that take an argument, Rn is compulsory. For
instructions that do not take an argument, Rn is optional and if it is not specified, R0 is used. Rn
must not be PC.
Usage
You can use this pseudo-instruction to execute special coprocessor instructions such as cache, branch
predictor, and TLB operations. The instructions operate by writing to special write-only coprocessor
registers. The instruction names are the same as the write-only coprocessor register names and are listed
in the ARM Architecture Reference Manual. For example:
SYS ICIALLUIS ; invalidates all instruction caches Inner Shareable
; to Point of Unification and also flushes branch
; target cache.

Availability
This 32-bit instruction is available in A32 and T32.
The 32-bit T32 instruction is not available in the ARMv7-M architecture.
There is no 16-bit version of this instruction in T32.
Related references
7.11 Condition code suffixes on page 7-150.
Related information
ARM Architecture Reference Manual.

ARM DUI0801G

13-553

13 A32 and T32 Instructions
13.161 TBB and TBH

13.161

TBB and TBH
Table Branch Byte and Table Branch Halfword.
Syntax
TBB [Rn, Rm]
TBH [Rn, Rm, LSL #1]

where:
Rn

is the base register. This contains the address of the table of branch lengths. Rn must not be SP.
If PC is specified for Rn, the value used is the address of the instruction plus 4.
Rm

is the index register. This contains an index into the table.
Rm must not be PC or SP.

Operation
These instructions cause a PC-relative forward branch using a table of single byte offsets (TBB) or
halfword offsets (TBH). Rn provides a pointer to the table, and Rm supplies an index into the table. The
branch length is twice the value of the byte (TBB) or the halfword (TBH) returned from the table. The
target of the branch table must be in the same execution state.
Architectures
These 32-bit T32 instructions are available.
There are no versions of these instructions in A32 or in 16-bit T32 encodings.
Related concepts
8.15 Address alignment in A32/T32 code on page 8-178.

ARM DUI0801G

13-554

13 A32 and T32 Instructions
13.162 TEQ

13.162

TEQ
Test Equivalence.
Syntax
TEQ{cond} Rn, Operand2

where:
cond

is an optional condition code.
Rn

is the ARM register holding the first operand.
Operand2

is a flexible second operand.
Usage
This instruction tests the value in a register against Operand2. It updates the condition flags on the result,
but does not place the result in any register.
The TEQ instruction performs a bitwise Exclusive OR operation on the value in Rn and the value of
Operand2. This is the same as an EORS instruction, except that the result is discarded.
Use the TEQ instruction to test if two values are equal, without affecting the V or C flags (as CMP does).
TEQ is also useful for testing the sign of a value. After the comparison, the N flag is the logical Exclusive
OR of the sign bits of the two operands.

Register restrictions
In this T32 instruction, you cannot use SP or PC for Rn or Operand2.
In this A32 instruction, use of SP or PC is deprecated.
For A32 instructions:
• If you use PC (R15) as Rn, the value used is the address of the instruction plus 8.
• You cannot use PC for any operand in any data processing instruction that has a register-controlled
shift.
Condition flags
This instruction:
• Updates the N and Z flags according to the result.
• Can update the C flag during the calculation of Operand2.
• Does not affect the V flag.
Architectures
This instruction is available in A32 and T32.
Correct example
TEQEQ

r10, r9

Incorrect example
TEQ

pc, r1, ROR r0

; PC not permitted with register
; controlled shift

Related references
13.3 Flexible second operand (Operand2) on page 13-338.
7.11 Condition code suffixes on page 7-150.
ARM DUI0801G

13-555

13 A32 and T32 Instructions
13.163 TST

13.163

TST
Test bits.
Syntax
TST{cond} Rn, Operand2

where:
cond

is an optional condition code.
Rn

is the ARM register holding the first operand.
Operand2

is a flexible second operand.
Operation
This instruction tests the value in a register against Operand2. It updates the condition flags on the result,
but does not place the result in any register.
The TST instruction performs a bitwise AND operation on the value in Rn and the value of Operand2.
This is the same as an ANDS instruction, except that the result is discarded.
Register restrictions
In this T32 instruction, you cannot use SP or PC for Rn or Operand2.
In this A32 instruction, use of SP or PC is deprecated.
For A32 instructions:
• If you use PC (R15) as Rn, the value used is the address of the instruction plus 8.
• You cannot use PC for any operand in any data processing instruction that has a register-controlled
shift.
Condition flags
This instruction:
• Updates the N and Z flags according to the result.
• Can update the C flag during the calculation of Operand2.
• Does not affect the V flag.
16-bit instructions
The following form of the TST instruction is available in T32 code, and is a 16-bit instruction:
TST Rn, Rm
Rn and Rm must both be Lo registers.

Architectures
This instruction is available A32 and T32.
Examples
TST
TSTNE

r0, #0x3F8
r1, r5, ASR r1

Related references
13.3 Flexible second operand (Operand2) on page 13-338.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-556

13 A32 and T32 Instructions
13.164 TT, TTT, TTA, TTAT

13.164

TT, TTT, TTA, TTAT
Test Target (Alternate Domain, Unprivileged).
Syntax
TT{cond}{q} Rd, Rn ; T1 TT general registers (T32)
TTA{cond}{q} Rd, Rn ; T1 TTA general registers (T32)
TTAT{cond}{q} Rd, Rn ; T1 TTAT general registers (T32)
TTT{cond}{q} Rd, Rn ; T1 TTT general registers (T32)

Where:
cond

Is an optional field. It specifies the condition under which the instruction is executed. If cond is
omitted, it defaults to always (AL).
q

Is an optional instruction width specifier.
Rd

Is the destination general-purpose register into which the status result of the target test is written.
Rn

Is the general-purpose base register.
Usage
Test Target (TT) queries the security state and access permissions of a memory location.
Test Target Unprivileged (TTT) queries the security state and access permissions of a memory location
for an unprivileged access to that location.
Test Target Alternate Domain (TTA) and Test Target Alternate Domain Unprivileged (TTAT) query the
security state and access permissions of a memory location for a Non-secure access to that location.
These instructions are only valid when executing in Secure state, and are UNDEFINED if used from Nonsecure state.
These instructions return the security state and access permissions in the destination register, the contents
of which are as follows:
Bits

Name

[7:0]

MREGION The MPU region that the address maps to. This field is 0 if MRVALID is 0.

[15:8]

SREGION

[16]

MRVALID Set to 1 if the MREGION content is valid. Set to 0 if the MREGION content is invalid.

[17]

SRVALID

Set to 1 if the SREGION content is valid. Set to 0 if the SREGION content is invalid.

[18]

Read accessibility. Set to 1 if the memory location can be read according to the permissions of the selected
MPU when operating in the current mode. For TTT and TTAT, this bit returns the permissions for unprivileged
access, regardless of whether the current mode is privileged or unprivileged.

[19]

Read/write accessibility. Set to 1 if the memory location can be read and written according to the permissions of
the selected MPU when operating in the current mode. For TTT and TTAT, this bit returns the permissions for
unprivileged access, regardless of whether the current mode is privileged or unprivileged.

[20]

NSR

Equal to R AND NOT S. Can be used in combination with the LSLS (immediate) instruction to check both the
MPU and SAU/IDAU permissions. This bit is only valid if the instruction is executed from Secure state and the
R field is valid.

ARM DUI0801G

Description

The SAU region that the address maps to. This field is only valid if the instruction is executed from Secure
state. This field is 0 if SRVALID is 0.

13-557

13 A32 and T32 Instructions
13.164 TT, TTT, TTA, TTAT

(continued)
Bits

Name

Description

[21]

NSRW

Equal to RW AND NOT S. Can be used in combination with the LSLS (immediate) instruction to check both
the MPU and SAU/IDAU permissions. This bit is only valid if the instruction is executed from Secure state and
the RW field is valid.

[22]

Security. A value of 1 indicates the memory location is Secure, and a value of 0 indicates the memory location
is Non-secure. This bit is only valid if the instruction is executed from Secure state.

[23]

IRVALID

IREGION valid flag. For a Secure request, indicates the validity of the IREGION field. Set to 1 if the IREGION
content is valid. Set to 0 if the IREGION content is invalid.
This bit is always 0 if the IDAU cannot provide a region number, the address is exempt from security
attribution, or if the requesting TT instruction is executed from the Non-secure state.

[31:24] IREGION

IDAU region number. Indicates the IDAU region number containing the target address. This field is 0 if
IRVALID is0.

Invalid fields are 0.
The MREGION field is invalid and 0 if any of the following conditions are true:
•
•
•
•

The MPU is not present or MPU_CTRL.ENABLE is 0.
The address did not match any enabled MPU regions.
The address matched multiple MPU regions.
TT or TTT was executed from an unprivileged mode.

The SREGION field is invalid and 0 if any of the following conditions are true:
•
•
•
•
•

SAU_CTRL.ENABLE is set to 0.
The address did not match any enabled SAU regions.
The address matched multiple SAU regions.
The SAU attributes were overridden by the IDAU.
The instruction is executed from Non-secure state, or is executed on a processor that does not
implement the ARMv8-M Security Extensions.

The R and RW bits are invalid and 0 if any of the following conditions are true:
• The address matched multiple MPU regions.
• TT or TTT is executed from an unprivileged mode.
Related references
7.11 Condition code suffixes on page 7-150.
13.2 Instruction width specifiers on page 13-337.

ARM DUI0801G

13-558

13 A32 and T32 Instructions
13.165 UADD8

13.165

UADD8
Unsigned parallel byte-wise addition.
Syntax
UADD8{cond} {Rd}, Rn, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm, Rn

are the ARM registers holding the operands.
Operation
This instruction performs four unsigned integer additions on the corresponding bytes of the operands and
writes the results into the corresponding bytes of the destination. The results are modulo 28. It sets the
APSR GE flags.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
GE flags
This instruction does not affect the N, Z, C, V, or Q flags.
It sets the GE flags in the APSR as follows:
GE[0]

for bits[7:0] of the result.
GE[1]

for bits[15:8] of the result.
GE[2]

for bits[23:16] of the result.
GE[3]

for bits[31:24] of the result.
It sets a GE flag to 1 to indicate that the corresponding result overflowed, generating a carry. This is
equivalent to an ADDS instruction setting the C condition flag to 1.
You can use these flags to control a following SEL instruction.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Related references
13.107 SEL on page 13-487.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-559

13 A32 and T32 Instructions
13.166 UADD16

13.166

UADD16
Unsigned parallel halfword-wise addition.
Syntax
UADD16{cond} {Rd}, Rn, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm, Rn

are the ARM registers holding the operands.
Operation
This instruction performs two unsigned integer additions on the corresponding halfwords of the operands
and writes the results into the corresponding halfwords of the destination. The results are modulo 216. It
sets the APSR GE flags.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
GE flags
This instruction does not affect the N, Z, C, V, or Q flags.
It sets the GE flags in the APSR as follows:
GE[1:0]

for bits[15:0] of the result.
GE[3:2]

for bits[31:16] of the result.
It sets a pair of GE flags to 1 to indicate that the corresponding result overflowed, generating a carry.
This is equivalent to an ADDS instruction setting the C condition flag to 1.
You can use these flags to control a following SEL instruction.
Note
GE[1:0] are set or cleared together, and GE[3:2] are set or cleared together.

ARM DUI0801G

13-560

13 A32 and T32 Instructions
13.167 UASX

13.167

UASX
Unsigned parallel add and subtract halfwords with exchange.
Syntax
UASX{cond} {Rd}, Rn, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm, Rn

for bits[15:0] of the result.
GE[3:2]

for bits[31:16] of the result.
It sets GE[1:0] to 1 to indicate that the subtraction gave a result greater than or equal to zero, meaning a
borrow did not occur. This is equivalent to a SUBS instruction setting the C condition flag to 1.
It sets GE[3:2] to 1 to indicate that the addition overflowed, generating a carry. This is equivalent to an
ADDS instruction setting the C condition flag to 1.
You can use these flags to control a following SEL instruction.
Note
GE[1:0] are set or cleared together, and GE[3:2] are set or cleared together.

ARM DUI0801G

13-561

13 A32 and T32 Instructions
13.167 UASX

Related references
13.107 SEL on page 13-487.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-562

13 A32 and T32 Instructions
13.168 UBFX

13.168

UBFX
Unsigned Bit Field Extract.
Syntax
UBFX{cond} Rd, Rn, #lsb, #width

where:
cond

is an optional condition code.
Rd

is the destination register.
Rn

is the source register.
lsb

is the bit number of the least significant bit in the bitfield, in the range 0 to 31.
width

is the width of the bitfield, in the range 1 to (32–lsb).
Operation
Copies adjacent bits from one register into the least significant bits of a second register, and zero extends
to 32 bits.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
This instruction does not alter any flags.
Architectures
This instruction is available in A32 and T32.
There is no 16-bit version of this instruction in T32.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-563

13 A32 and T32 Instructions
13.169 UDF

13.169

UDF
Permanently Undefined.
Syntax
UDF{c}{q} {#}imm ; A1 general registers (A32)
UDF{c}{q} {#}imm ; T1 general registers (T32)
UDF{c}.W {#}imm ; T2 general registers (T32)

Where:
imm

Depends on the instruction variant:
A1 general registers
Is a 16-bit unsigned immediate, in the range 0 to 65535. The PE ignores the value of
this constant.
T1 general registers
Is a 8-bit unsigned immediate, in the range 0 to 255. The PE ignores the value of this
constant.
T2 general registers
Is a 16-bit unsigned immediate, in the range 0 to 65535. The PE ignores the value of
this constant.
c

For T32, see Standard assembler syntax fields in the ARMv8-A Architecture Reference Manual.
ARM deprecates using any c value other than AL.
q

See Standard assembler syntax fields in the ARMv8-A Architecture Reference Manual.
Usage
Permanently Undefined generates an Undefined Instruction exception.
The encodings for UDF used in this section are defined as permanently UNDEFINED in the ARMv8-A
architecture. However:
• With the T32 instruction set, ARM deprecates using the UDF instruction in an IT block.
• In the A32 instruction set, UDF is not conditional.
Related references
13.1 A32 and T32 instruction summary on page 13-332.

ARM DUI0801G

13-564

13 A32 and T32 Instructions
13.170 UDIV

13.170

UDIV
Unsigned Divide.
Syntax
UDIV{cond} {Rd}, Rn, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rn

is the register holding the value to be divided.
Rm

is a register holding the divisor.
Register restrictions
PC or SP cannot be used for Rd, Rn, or Rm.
Architectures
This 32-bit T32 instruction is available in ARMv7-R, ARMv7-M and ARMv8-M.mainline.
This 32-bit A32 instruction is optional in ARMv7-R.
This 32-bit A32 and T32 instruction is available in ARMv7-A if Virtualization Extensions are
implemented, and optional if not.
There is no 16-bit T32 UDIV instruction.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-565

13 A32 and T32 Instructions
13.171 UHADD8

13.171

UHADD8
Unsigned halving parallel byte-wise addition.
Syntax
UHADD8{cond} {Rd}, Rn, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm, Rn

are the ARM registers holding the operands.
Operation
This instruction performs four unsigned integer additions on the corresponding bytes of the operands,
halves the results, and writes the results into the corresponding bytes of the destination. This cannot
cause overflow.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
This instruction does not affect the N, Z, C, V, Q, or GE flags.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-566

13 A32 and T32 Instructions
13.172 UHADD16

13.172

UHADD16
Unsigned halving parallel halfword-wise addition.
Syntax
UHADD16{cond} {Rd}, Rn, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm, Rn

are the ARM registers holding the operands.
Operation
This instruction performs two unsigned integer additions on the corresponding halfwords of the
operands, halves the results, and writes the results into the corresponding halfwords of the destination.
This cannot cause overflow.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
This instruction does not affect the N, Z, C, V, Q, or GE flags.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-567

13 A32 and T32 Instructions
13.173 UHASX

13.173

UHASX
Unsigned halving parallel add and subtract halfwords with exchange.
Syntax
UHASX{cond} {Rd}, Rn, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm, Rn

are the ARM registers holding the operands.
Operation
This instruction exchanges the two halfwords of the second operand, then performs an addition on the
two top halfwords of the operands and a subtraction on the bottom two halfwords. It halves the results
and writes them into the corresponding halfwords of the destination. This cannot cause overflow.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
This instruction does not affect the N, Z, C, V, Q, or GE flags.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-568

13 A32 and T32 Instructions
13.174 UHSAX

13.174

UHSAX
Unsigned halving parallel subtract and add halfwords with exchange.
Syntax
UHSAX{cond} {Rd}, Rn, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm, Rn

are the ARM registers holding the operands.
Operation
This instruction exchanges the two halfwords of the second operand, then performs a subtraction on the
two top halfwords of the operands and an addition on the bottom two halfwords. It halves the results and
writes them into the corresponding halfwords of the destination. This cannot cause overflow.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
This instruction does not affect the N, Z, C, V, Q, or GE flags.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-569

13 A32 and T32 Instructions
13.175 UHSUB8

13.175

UHSUB8
Unsigned halving parallel byte-wise subtraction.
Syntax
UHSUB8{cond} {Rd}, Rn, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm, Rn

ARM DUI0801G

13-570

13 A32 and T32 Instructions
13.176 UHSUB16

13.176

UHSUB16
Unsigned halving parallel halfword-wise subtraction.
Syntax
UHSUB16{cond} {Rd}, Rn, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm, Rn

ARM DUI0801G

13-571

13 A32 and T32 Instructions
13.177 UMAAL

13.177

UMAAL
Unsigned Multiply Accumulate Accumulate Long.
Syntax
UMAAL{cond} RdLo, RdHi, Rn, Rm

where:
cond

is an optional condition code.
RdLo, RdHi

are the destination registers for the 64-bit result. They also hold the two 32-bit accumulate
operands. RdLo and RdHi must be different registers.
Rn, Rm

are the registers holding the multiply operands.
Operation
The UMAAL instruction multiplies the 32-bit values in Rn and Rm, adds the two 32-bit values in RdHi and
RdLo, and stores the 64-bit result to RdLo, RdHi.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
This instruction does not change the flags.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Examples
UMAAL
UMAALGE

r8, r9, r2, r3
r2, r0, r5, r3

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-572

13 A32 and T32 Instructions
13.178 UMLAL

13.178

UMLAL
Unsigned Long Multiply, with optional Accumulate, with 32-bit operands and 64-bit result and
accumulator.
Syntax
UMLAL{S}{cond} RdLo, RdHi, Rn, Rm

where:
S

is an optional suffix available in A32 state only. If S is specified, the condition flags are updated
based on the result of the operation.
cond

is an optional condition code.
RdLo, RdHi

are the destination registers. They also hold the accumulating value. RdLo and RdHi must be
different registers.
Rn, Rm

are ARM registers holding the operands.
Operation
The UMLAL instruction interprets the values from Rn and Rm as unsigned integers. It multiplies these
integers, and adds the 64-bit result to the 64-bit unsigned integer contained in RdHi and RdLo.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
If S is specified, this instruction:
• Updates the N and Z flags according to the result.
• Does not affect the C or V flags.
Architectures
This ARM instruction is available in A32 and T32.
There is no 16-bit version of this instruction in T32.
Example
UMLALS

r4, r5, r3, r8

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-573

13 A32 and T32 Instructions
13.179 UMULL

13.179

UMULL
Unsigned Long Multiply, with 32-bit operands, and 64-bit result.
Syntax
UMULL{S}{cond} RdLo, RdHi, Rn, Rm

where:
S

is an optional suffix available in A32 state only. If S is specified, the condition flags are updated
based on the result of the operation.
cond

is an optional condition code.
RdLo, RdHi

are the destination registers. RdLo and RdHi must be different registers.
Rn, Rm

are ARM registers holding the operands.
Operation
The UMULL instruction interprets the values from Rn and Rm as unsigned integers. It multiplies these
integers and places the least significant 32 bits of the result in RdLo, and the most significant 32 bits of
the result in RdHi.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
If S is specified, this instruction:
• Updates the N and Z flags according to the result.
• Does not affect the C or V flags.
Architectures
This instruction is available in A32 and T32.
There is no 16-bit version of this instruction in T32.
Example
UMULL

r0, r4, r5, r6

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-574

13 A32 and T32 Instructions
13.180 UND pseudo-instruction

13.180

UND pseudo-instruction
Generate an architecturally undefined instruction.
Syntax
UND{cond}{.W} {#expr}

where:
cond

is an optional condition code.
.W

is an optional instruction width specifier.
expr

evaluates to a numeric value. The following table shows the range and encoding of expr in the
instruction, where Y shows the locations of the bits that encode for expr and V is the 4 bits that
encode for the condition code.
If expr is omitted, the value 0 is used.
Table 13-18 Range and encoding of expr
Number of bits for expr Range

Instruction

Encoding

A32

0xV7FYYYFY 16

0-65535

T32 32-bit encoding 0xF7FYAYFY

0-4095

T32 16-bit encoding 0xDEYY

0-255

Usage
An attempt to execute an undefined instruction causes the Undefined instruction exception.
Architecturally undefined instructions are expected to remain undefined.
UND in T32 code
You can use the .W width specifier to force UND to generate a 32-bit instruction in T32 code. UND.W
always generates a 32-bit instruction, even if expr is in the range 0-255.
Disassembly
The encodings that this pseudo-instruction produces disassemble to DCI.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-575

13 A32 and T32 Instructions
13.181 UQADD8

13.181

UQADD8
Unsigned saturating parallel byte-wise addition.
Syntax
UQADD8{cond} {Rd}, Rn, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm, Rn

are the ARM registers holding the operands.
Operation
This instruction performs four unsigned integer additions on the corresponding bytes of the operands and
writes the results into the corresponding bytes of the destination. It saturates the results to the unsigned
range 0 ≤ x ≤ 28 –1. The Q flag is not affected even if this operation saturates.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
This instruction does not affect the N, Z, C, V, Q, or GE flags.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-576

13 A32 and T32 Instructions
13.182 UQADD16

13.182

UQADD16
Unsigned saturating parallel halfword-wise addition.
Syntax
UQADD16{cond} {Rd}, Rn, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm, Rn

are the ARM registers holding the operands.
Operation
This instruction performs two unsigned integer additions on the corresponding halfwords of the operands
and writes the results into the corresponding halfwords of the destination. It saturates the results to the
unsigned range 0 ≤ x ≤ 216 –1. The Q flag is not affected even if this operation saturates.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
This instruction does not affect the N, Z, C, V, Q, or GE flags.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-577

13 A32 and T32 Instructions
13.183 UQASX

13.183

UQASX
Unsigned saturating parallel add and subtract halfwords with exchange.
Syntax
UQASX{cond} {Rd}, Rn, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm, Rn

are the ARM registers holding the operands.
Operation
This instruction exchanges the two halfwords of the second operand, then performs an addition on the
two top halfwords of the operands and a subtraction on the bottom two halfwords. It writes the results
into the corresponding halfwords of the destination. It saturates the results to the unsigned range 0 ≤ x ≤
216 –1. The Q flag is not affected even if this operation saturates.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
This instruction does not affect the N, Z, C, V, Q, or GE flags.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-578

13 A32 and T32 Instructions
13.184 UQSAX

13.184

UQSAX
Unsigned saturating parallel subtract and add halfwords with exchange.
Syntax
UQSAX{cond} {Rd}, Rn, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm, Rn

are the ARM registers holding the operands.
Operation
This instruction exchanges the two halfwords of the second operand, then performs a subtraction on the
two top halfwords of the operands and an addition on the bottom two halfwords. It writes the results into
the corresponding halfwords of the destination. It saturates the results to the unsigned range 0 ≤ x ≤ 216 –
1. The Q flag is not affected even if this operation saturates.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
This instruction does not affect the N, Z, C, V, Q, or GE flags.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-579

13 A32 and T32 Instructions
13.185 UQSUB8

13.185

UQSUB8
Unsigned saturating parallel byte-wise subtraction.
Syntax
UQSUB8{cond} {Rd}, Rn, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm, Rn

are the ARM registers holding the operands.
Operation
This instruction subtracts each byte of the second operand from the corresponding byte of the first
operand and writes the results into the corresponding bytes of the destination. It saturates the results to
the unsigned range 0 ≤ x ≤ 28 –1. The Q flag is not affected even if this operation saturates.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
This instruction does not affect the N, Z, C, V, Q, or GE flags.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-580

13 A32 and T32 Instructions
13.186 UQSUB16

13.186

UQSUB16
Unsigned saturating parallel halfword-wise subtraction.
Syntax
UQSUB16{cond} {Rd}, Rn, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm, Rn

are the ARM registers holding the operands.
Operation
This instruction subtracts each halfword of the second operand from the corresponding halfword of the
first operand and writes the results into the corresponding halfwords of the destination. It saturates the
results to the unsigned range 0 ≤ x ≤ 216 –1. The Q flag is not affected even if this operation saturates.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
This instruction does not affect the N, Z, C, V, Q, or GE flags.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-581

13 A32 and T32 Instructions
13.187 USAD8

13.187

USAD8
Unsigned Sum of Absolute Differences.
Syntax
USAD8{cond} {Rd}, Rn, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rn

is the register holding the first operand.
Rm

is the register holding the second operand.
Operation
The USAD8 instruction finds the four differences between the unsigned values in corresponding bytes of
Rn and Rm. It adds the absolute values of the four differences, and saves the result to Rd.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
This instruction does not alter any flags.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Example
USAD8

r2, r4, r6

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-582

13 A32 and T32 Instructions
13.188 USADA8

13.188

USADA8
Unsigned Sum of Absolute Differences and Accumulate.
Syntax
USADA8{cond} Rd, Rn, Rm, Ra

where:
cond

is an optional condition code.
Rd

is the destination register.
Rn

is the register holding the first operand.
Rm

is the register holding the second operand.
Ra

is the register holding the accumulate operand.
Operation
The USADA8 instruction adds the absolute values of the four differences to the value in Ra, and saves the
result to Rd.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
This instruction does not alter any flags.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Correct examples
USADA8
USADA8VS

r0, r3, r5, r2
r0, r4, r0, r1

Incorrect examples
USADA8
USADA16

r2, r4, r6
r0, r4, r0, r1

; USADA8 requires four registers
; no such instruction

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-583

13 A32 and T32 Instructions
13.189 USAT

13.189

USAT
Unsigned Saturate to any bit position, with optional shift before saturating.
Syntax
USAT{cond} Rd, #sat, Rm{, shift}

where:
cond

is an optional condition code.
Rd

is the destination register.
sat

specifies the bit position to saturate to, in the range 0 to 31.
Rm

is the register containing the operand.
shift

is an optional shift. It must be one of the following:
ASR #n

where n is in the range 1-32 (A32) or 1-31 (T32).
LSL #n

where n is in the range 0-31.
Operation
The USAT instruction applies the specified shift to a signed value, then saturates to the unsigned range 0 ≤
x ≤ 2sat – 1.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Q flag
If saturation occurs, this instruction sets the Q flag. To read the state of the Q flag, use an MRS instruction.
Architectures
This instruction is available in A32 and T32.
There is no 16-bit version of this instruction in T32.
Example
USATNE

r0, #7, r5

Related references
13.138 SSAT16 on page 13-521.
13.68 MRS (PSR to general-purpose register) on page 13-437.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-584

13 A32 and T32 Instructions
13.190 USAT16

13.190

USAT16
Parallel halfword Saturate.
Syntax
USAT16{cond} Rd, #sat, Rn

where:
cond

is an optional condition code.
Rd

is the destination register.
sat

specifies the bit position to saturate to, in the range 0 to 15.
Rn

is the register holding the operand.
Operation
Halfword-wise unsigned saturation to any bit position.
The USAT16 instruction saturates each signed halfword to the unsigned range 0 ≤ x ≤ 2sat –1.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Q flag
If saturation occurs on either halfword, this instruction sets the Q flag. To read the state of the Q flag, use
an MRS instruction.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Example
USAT16

r0, #7, r5

Related references
13.68 MRS (PSR to general-purpose register) on page 13-437.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-585

13 A32 and T32 Instructions
13.191 USAX

13.191

USAX
Unsigned parallel subtract and add halfwords with exchange.
Syntax
USAX{cond} {Rd}, Rn, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm, Rn

for bits[15:0] of the result.
GE[3:2]

for bits[31:16] of the result.
It sets GE[1:0] to 1 to indicate that the addition overflowed, generating a carry. This is equivalent to an
ADDS instruction setting the C condition flag to 1.

It sets GE[3:2] to 1 to indicate that the subtraction gave a result greater than or equal to zero, meaning a
borrow did not occur. This is equivalent to a SUBS instruction setting the C condition flag to 1.
You can use these flags to control a following SEL instruction.
Note
GE[1:0] are set or cleared together, and GE[3:2] are set or cleared together.

ARM DUI0801G

13-586

13 A32 and T32 Instructions
13.191 USAX

Related references
13.107 SEL on page 13-487.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-587

13 A32 and T32 Instructions
13.192 USUB8

13.192

USUB8
Unsigned parallel byte-wise subtraction.
Syntax
USUB8{cond} {Rd}, Rn, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm, Rn

are the ARM registers holding the operands.
Operation
This instruction subtracts each byte of the second operand from the corresponding byte of the first
operand and writes the results into the corresponding bytes of the destination. The results are modulo 28.
It sets the APSR GE flags.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
GE flags
This instruction does not affect the N, Z, C, V, or Q flags.
It sets the GE flags in the APSR as follows:
GE[0]

for bits[7:0] of the result.
GE[1]

for bits[15:8] of the result.
GE[2]

for bits[23:16] of the result.
GE[3]

for bits[31:24] of the result.
It sets a GE flag to 1 to indicate that the corresponding result is greater than or equal to zero, meaning a
borrow did not occur. This is equivalent to a SUBS instruction setting the C condition flag to 1.
You can use these flags to control a following SEL instruction.
Availability
This instruction is available in A32 and T32.
There is no 16-bit version of this instruction in T32.
Related references
13.107 SEL on page 13-487.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-588

13 A32 and T32 Instructions
13.193 USUB16

13.193

USUB16
Unsigned parallel halfword-wise subtraction.
Syntax
USUB16{cond} {Rd}, Rn, Rm

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm, Rn

for bits[15:0] of the result.
GE[3:2]

for bits[31:16] of the result.
It sets a pair of GE flags to 1 to indicate that the corresponding result is greater than or equal to zero,
meaning a borrow did not occur. This is equivalent to a SUBS instruction setting the C condition flag to 1.
You can use these flags to control a following SEL instruction.
Note
GE[1:0] are set or cleared together, and GE[3:2] are set or cleared together.

Availability
This instruction is available in A32 and T32.
There is no 16-bit version of this instruction in T32.
Related references
13.107 SEL on page 13-487.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-589

13 A32 and T32 Instructions
13.194 UXTAB

13.194

UXTAB
Zero extend Byte and Add.
Syntax
UXTAB{cond} {Rd}, Rn, Rm {,rotation}

where:
cond

is an optional condition code.
Rd

is the destination register.
Rn

is the register holding the number to add.
Rm

is the register holding the value to extend.
rotation

is one of:
ROR #8

Value from Rm is rotated right 8 bits.
ROR #16

Value from Rm is rotated right 16 bits.
ROR #24

Value from Rm is rotated right 24 bits.
If rotation is omitted, no rotation is performed.
Operation
UXTAB extends an 8-bit value to a 32-bit value. It does this by:

1.
2.
3.
4.

Rotating the value from Rm right by 0, 8, 16 or 24 bits.
Extracting bits[7:0] from the value obtained.
Zero extending to 32 bits.
Adding the value from Rn.

ARM DUI0801G

13-590

13 A32 and T32 Instructions
13.195 UXTAB16

13.195

UXTAB16
Zero extend two Bytes and Add.
Syntax
UXTAB16{cond} {Rd}, Rn, Rm {,rotation}

where:
cond

is an optional condition code.
Rd

is the destination register.
Rn

is the register holding the number to add.
Rm

is the register holding the value to extend.
rotation

is one of:
ROR #8

Value from Rm is rotated right 8 bits.
ROR #16

Value from Rm is rotated right 16 bits.
ROR #24

Value from Rm is rotated right 24 bits.
If rotation is omitted, no rotation is performed.
Operation
UXTAB16 extends two 8-bit values to two 16-bit values. It does this by:

1.
2.
3.
4.

Rotating the value from Rm right by 0, 8, 16 or 24 bits.
Extracting bits[23:16] and bits[7:0] from the value obtained.
Zero extending them to 16 bits.
Adding them to bits[31:16] and bits[15:0] respectively of Rn to form bits[31:16] and bits[15:0] of the
result.

ARM DUI0801G

r0, r0, r4, ROR #16

13-591

13 A32 and T32 Instructions
13.195 UXTAB16

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-592

13 A32 and T32 Instructions
13.196 UXTAH

13.196

UXTAH
Zero extend Halfword and Add.
Syntax
UXTAH{cond} {Rd}, Rn, Rm {,rotation}

where:
cond

is an optional condition code.
Rd

is the destination register.
Rn

is the register holding the number to add.
Rm

is the register holding the value to extend.
rotation

is one of:
ROR #8

Value from Rm is rotated right 8 bits.
ROR #16

Value from Rm is rotated right 16 bits.
ROR #24

Value from Rm is rotated right 24 bits.
If rotation is omitted, no rotation is performed.
Operation
UXTAH extends a 16-bit value to a 32-bit value. It does this by:

1.
2.
3.
4.

Rotating the value from Rm right by 0, 8, 16 or 24 bits.
Extracting bits[15:0] from the value obtained.
Zero extending to 32 bits.
Adding the value from Rn.

ARM DUI0801G

13-593

13 A32 and T32 Instructions
13.197 UXTB

13.197

UXTB
Zero extend Byte.
Syntax
UXTB{cond} {Rd}, Rm {,rotation}

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm

is the register holding the value to extend.
rotation

is one of:
ROR #8

Value from Rm is rotated right 8 bits.
ROR #16

Value from Rm is rotated right 16 bits.
ROR #24

Value from Rm is rotated right 24 bits.
If rotation is omitted, no rotation is performed.
Operation
UXTB extends an 8-bit value to a 32-bit value. It does this by:

1. Rotating the value from Rm right by 0, 8, 16, or 24 bits.
2. Extracting bits[7:0] from the value obtained.
3. Zero extending to 32 bits.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
This instruction does not change the flags.
16-bit instruction
The following form of this instruction is available in T32 code, and is a 16-bit instruction:
UXTB Rd, Rm
Rd and Rm must both be Lo registers.

ARM DUI0801G

13-594

13 A32 and T32 Instructions
13.198 UXTB16

13.198

UXTB16
Zero extend two Bytes.
Syntax
UXTB16{cond} {Rd}, Rm {,rotation}

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm

is the register holding the value to extend.
rotation

is one of:
ROR #8

Value from Rm is rotated right 8 bits.
ROR #16

Value from Rm is rotated right 16 bits.
ROR #24

Value from Rm is rotated right 24 bits.
If rotation is omitted, no rotation is performed.
Operation
UXTB16 extends two 8-bit values to two 16-bit values. It does this by:
1. Rotating the value from Rm right by 0, 8, 16 or 24 bits.
2. Extracting bits[23:16] and bits[7:0] from the value obtained.
3. Zero extending each to 16 bits.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
This instruction does not change the flags.
Availability
The 32-bit instruction is available in A32 and T32.
For the ARMv7-M architecture, the 32-bit T32 instruction is only available in an ARMv7E-M
implementation.
There is no 16-bit version of this instruction in T32.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-595

13 A32 and T32 Instructions
13.199 UXTH

13.199

UXTH
Zero extend Halfword.
Syntax
UXTH{cond} {Rd}, Rm {,rotation}

where:
cond

is an optional condition code.
Rd

is the destination register.
Rm

is the register holding the value to extend.
rotation

is one of:
ROR #8

Value from Rm is rotated right 8 bits.
ROR #16

Value from Rm is rotated right 16 bits.
ROR #24

Value from Rm is rotated right 24 bits.
If rotation is omitted, no rotation is performed.
Operation
UXTH extends a 16-bit value to a 32-bit value. It does this by:

1. Rotating the value from Rm right by 0, 8, 16, or 24 bits.
2. Extracting bits[15:0] from the value obtained.
3. Zero extending to 32 bits.
Register restrictions
You cannot use PC for any operand.
You can use SP in A32 instructions but this is deprecated. You cannot use SP in T32 instructions.
Condition flags
This instruction does not change the flags.
16-bit instructions
The following form of this instruction is available in T32 code, and is a 16-bit instruction:
UXTH Rd, Rm
Rd and Rm must both be Lo registers.

ARM DUI0801G

13-596

13 A32 and T32 Instructions
13.200 WFE

13.200

WFE
Wait For Event.
Syntax
WFE{cond}

where:
cond

is an optional condition code.
Operation
This is a hint instruction. It is optional whether this instruction is implemented or not. If this instruction
is not implemented, it executes as a NOP. The assembler produces a diagnostic message if the instruction
executes as a NOP on the target.
If the Event Register is not set, WFE suspends execution until one of the following events occurs:
• An IRQ interrupt, unless masked by the CPSR I-bit.
• An FIQ interrupt, unless masked by the CPSR F-bit.
• An Imprecise Data abort, unless masked by the CPSR A-bit.
• A Debug Entry request, if Debug is enabled.
• An Event signaled by another processor using the SEV instruction, or by the current processor using
the SEVL instruction.
If the Event Register is set, WFE clears it and returns immediately.
If WFE is implemented, SEV must also be implemented.
Availability
This instruction is available in A32 and T32.
Related references
13.75 NOP on page 13-447.
7.11 Condition code suffixes on page 7-150.
13.110 SEV on page 13-491.
13.111 SEVL on page 13-492.
13.201 WFI on page 13-598.

ARM DUI0801G

13-597

13 A32 and T32 Instructions
13.201 WFI

13.201

WFI
Wait for Interrupt.
Syntax
WFI{cond}

where:
cond

•
•
•
•

An IRQ interrupt, regardless of the CPSR I-bit.
An FIQ interrupt, regardless of the CPSR F-bit.
An Imprecise Data abort, unless masked by the CPSR A-bit.
A Debug Entry request, regardless of whether Debug is enabled.

Availability
This instruction is available in A32 and T32.
Related references
13.75 NOP on page 13-447.
7.11 Condition code suffixes on page 7-150.
13.200 WFE on page 13-597.

ARM DUI0801G

13-598

13 A32 and T32 Instructions
13.202 YIELD

13.202

YIELD
Yield.
Syntax
YIELD{cond}

where:
cond

Availability
This instruction is available in A32 and T32.
Related references
13.75 NOP on page 13-447.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

13-599

Chapter 14
Advanced SIMD Instructions (32-bit)

Describes Advanced SIMD assembly language instructions.
It contains the following sections:
• 14.1 Summary of Advanced SIMD instructions on page 14-604.
• 14.2 Summary of shared Advanced SIMD and floating-point instructions on page 14-607.
• 14.3 Cryptographic instructions on page 14-608.
• 14.4 Interleaving provided by load and store element and structure instructions on page 14-609.
• 14.5 Alignment restrictions in load and store element and structure instructions on page 14-610.
• 14.6 FLDMDBX, FLDMIAX on page 14-611.
• 14.7 FSTMDBX, FSTMIAX on page 14-612.
• 14.8 VABA and VABAL on page 14-613.
• 14.9 VABD and VABDL on page 14-614.
• 14.10 VABS on page 14-615.
• 14.11 VACLE, VACLT, VACGE and VACGT on page 14-616.
• 14.12 VADD on page 14-617.
• 14.13 VADDHN on page 14-618.
• 14.14 VADDL and VADDW on page 14-619.
• 14.15 VAND (immediate) on page 14-620.
• 14.16 VAND (register) on page 14-621.
• 14.17 VBIC (immediate) on page 14-622.
• 14.18 VBIC (register) on page 14-623.
• 14.19 VBIF on page 14-624.
• 14.20 VBIT on page 14-625.
• 14.21 VBSL on page 14-626.
• 14.22 VCADD on page 14-627.
• 14.23 VCEQ (immediate #0) on page 14-628.

ARM DUI0801G

14-600

14 Advanced SIMD Instructions (32-bit)

14.24 VCEQ (register) on page 14-629.
14.25 VCGE (immediate #0) on page 14-630.
14.26 VCGE (register) on page 14-631.
14.27 VCGT (immediate #0) on page 14-632.
14.28 VCGT (register) on page 14-633.
14.29 VCLE (immediate #0) on page 14-634.
14.30 VCLS on page 14-635.
14.31 VCLE (register) on page 14-636.
14.32 VCLT (immediate #0) on page 14-637.
14.33 VCLT (register) on page 14-638.
14.34 VCLZ on page 14-639.
14.35 VCMLA on page 14-640.
14.36 VCMLA (by element) on page 14-641.
14.37 VCNT on page 14-642.
14.38 VCVT (between fixed-point or integer, and floating-point) on page 14-643.
14.39 VCVT (between half-precision and single-precision floating-point) on page 14-644.
14.40 VCVT (from floating-point to integer with directed rounding modes) on page 14-645.
14.41 VCVTB, VCVTT (between half-precision and double-precision) on page 14-646.
14.42 VDUP on page 14-647.
14.43 VEOR on page 14-648.
14.44 VEXT on page 14-649.
14.45 VFMA, VFMS on page 14-650.
14.46 VHADD on page 14-651.
14.47 VHSUB on page 14-652.
14.48 VLDn (single n-element structure to one lane) on page 14-653.
14.49 VLDn (single n-element structure to all lanes) on page 14-655.
14.50 VLDn (multiple n-element structures) on page 14-657.
14.51 VLDM on page 14-659.
14.52 VLDR on page 14-660.
14.53 VLDR (post-increment and pre-decrement) on page 14-661.
14.54 VLDR pseudo-instruction on page 14-662.
14.55 VMAX and VMIN on page 14-663.
14.56 VMAXNM, VMINNM on page 14-664.
14.57 VMLA on page 14-665.
14.58 VMLA (by scalar) on page 14-666.
14.59 VMLAL (by scalar) on page 14-667.
14.60 VMLAL on page 14-668.
14.61 VMLS (by scalar) on page 14-669.
14.62 VMLS on page 14-670.
14.63 VMLSL on page 14-671.
14.64 VMLSL (by scalar) on page 14-672.
14.65 VMOV (immediate) on page 14-673.
14.66 VMOV (register) on page 14-674.
14.67 VMOV (between two ARM registers and a 64-bit extension register) on page 14-675.
14.68 VMOV (between an ARM register and an Advanced SIMD scalar) on page 14-676.
14.69 VMOVL on page 14-677.
14.70 VMOVN on page 14-678.
14.71 VMOV2 on page 14-679.
14.72 VMRS on page 14-680.
14.73 VMSR on page 14-681.
14.74 VMUL on page 14-682.
14.75 VMUL (by scalar) on page 14-683.
14.76 VMULL on page 14-684.
14.77 VMULL (by scalar) on page 14-685.
14.78 VMVN (register) on page 14-686.
14.79 VMVN (immediate) on page 14-687.
Copyright © 2014-2016 ARM Limited or its affiliates. All rights
reserved.
Non-Confidential

14-601

14 Advanced SIMD Instructions (32-bit)

14.80 VNEG on page 14-688.
14.81 VORN (register) on page 14-689.
14.82 VORN (immediate) on page 14-690.
14.83 VORR (register) on page 14-691.
14.84 VORR (immediate) on page 14-692.
14.85 VPADAL on page 14-693.
14.86 VPADD on page 14-694.
14.87 VPADDL on page 14-695.
14.88 VPMAX and VPMIN on page 14-696.
14.89 VPOP on page 14-697.
14.90 VPUSH on page 14-698.
14.91 VQABS on page 14-699.
14.92 VQADD on page 14-700.
14.93 VQDMLAL and VQDMLSL (by vector or by scalar) on page 14-701.
14.94 VQDMULH (by vector or by scalar) on page 14-702.
14.95 VQDMULL (by vector or by scalar) on page 14-703.
14.96 VQMOVN and VQMOVUN on page 14-704.
14.97 VQNEG on page 14-705.
14.98 VQRDMULH (by vector or by scalar) on page 14-706.
14.99 VQRSHL (by signed variable) on page 14-707.
14.100 VQRSHRN and VQRSHRUN (by immediate) on page 14-708.
14.101 VQSHL (by signed variable) on page 14-709.
14.102 VQSHL and VQSHLU (by immediate) on page 14-710.
14.103 VQSHRN and VQSHRUN (by immediate) on page 14-711.
14.104 VQSUB on page 14-712.
14.105 VRADDHN on page 14-713.
14.106 VRECPE on page 14-714.
14.107 VRECPS on page 14-715.
14.108 VREV16, VREV32, and VREV64 on page 14-716.
14.109 VRHADD on page 14-717.
14.110 VRSHL (by signed variable) on page 14-718.
14.111 VRSHR (by immediate) on page 14-719.
14.112 VRSHRN (by immediate) on page 14-720.
14.113 VRINT on page 14-721.
14.114 VRSQRTE on page 14-722.
14.115 VRSQRTS on page 14-723.
14.116 VRSRA (by immediate) on page 14-724.
14.117 VRSUBHN on page 14-725.
14.118 VSHL (by immediate) on page 14-726.
14.119 VSHL (by signed variable) on page 14-727.
14.120 VSHLL (by immediate) on page 14-728.
14.121 VSHR (by immediate) on page 14-729.
14.122 VSHRN (by immediate) on page 14-730.
14.123 VSLI on page 14-731.
14.124 VSRA (by immediate) on page 14-732.
14.125 VSRI on page 14-733.
14.126 VSTM on page 14-734.
14.127 VSTn (multiple n-element structures) on page 14-735.
14.128 VSTn (single n-element structure to one lane) on page 14-737.
14.129 VSTR on page 14-739.
14.130 VSTR (post-increment and pre-decrement) on page 14-740.
14.131 VSUB on page 14-741.
14.132 VSUBHN on page 14-742.
14.133 VSUBL and VSUBW on page 14-743.
14.134 VSWP on page 14-744.
14.135 VTBL and VTBX on page 14-745.
Copyright © 2014-2016 ARM Limited or its affiliates. All rights
reserved.
Non-Confidential

14-602

14 Advanced SIMD Instructions (32-bit)

•
•
•
•

ARM DUI0801G

14.136 VTRN on page 14-746.
14.137 VTST on page 14-747.
14.138 VUZP on page 14-748.
14.139 VZIP on page 14-749.

14-603

14 Advanced SIMD Instructions (32-bit)
14.1 Summary of Advanced SIMD instructions

14.1

Summary of Advanced SIMD instructions
Most Advanced SIMD instructions are not available in floating-point.
The following table shows a summary of Advanced SIMD instructions that are not available as floatingpoint instructions:
Table 14-1 Summary of Advanced SIMD instructions
Mnemonic

Brief description

FLDMDBX, FLDMIAX FLDMX
FSTMDBX, FSTMIAX FSTMX
VABA, VABD

Absolute difference and Accumulate, Absolute Difference

VABS

Absolute value

VACGE, VACGT

Absolute Compare Greater than or Equal, Greater Than

VACLE, VACLT

Absolute Compare Less than or Equal, Less Than (pseudo-instructions)

VADD

Add

VADDHN

Add, select High half

VAND

Bitwise AND

VAND

Bitwise AND (pseudo-instruction)

VBIC

Bitwise Bit Clear (register)

VBIC

Bitwise Bit Clear (immediate)

VBIF, VBIT, VBSL

Bitwise Insert if False, Insert if True, Select

VCADD

Vector Complex Add

VCEQ, VCLE, VCLT

Compare Equal, Less than or Equal, Compare Less Than

VCGE, VCGT

Compare Greater than or Equal, Greater Than

VCLE, VCLT

Compare Less than or Equal, Compare Less Than (pseudo-instruction)

VCLS, VCLZ, VCNT

Count Leading Sign bits, Count Leading Zeros, and Count set bits

VCMLA

Vector Complex Multiply Accumulate

VCMLA (by element) Vector Complex Multiply Accumulate (by element)

ARM DUI0801G

VCVT

Convert fixed-point or integer to floating-point, floating-point to integer or fixed-point

VCVT

Convert floating-point to integer with directed rounding modes

VCVT

Convert between half-precision and single-precision floating-point numbers

VDUP

Duplicate scalar to all lanes of vector

VEOR

Bitwise Exclusive OR

VEXT

Extract

VFMA, VFMS

Fused Multiply Accumulate, Fused Multiply Subtract

VHADD, VHSUB

Halving Add, Halving Subtract

VLD

Vector Load

VMAX, VMIN

Maximum, Minimum

14-604

14 Advanced SIMD Instructions (32-bit)
14.1 Summary of Advanced SIMD instructions

Table 14-1 Summary of Advanced SIMD instructions (continued)
Mnemonic

Brief description

VMAXNM, VMINNM

Maximum, Minimum, consistent with IEEE 754-2008

VMLA, VMLS

Multiply Accumulate, Multiply Subtract (vector)

VMLA, VMLS

Multiply Accumulate, Multiply Subtract (by scalar)

VMOV

Move (immediate)

VMOV

Move (register)

VMOVL, VMOV{U}N

Move Long, Move Narrow (register)

VMUL

Multiply (vector)

VMUL

Multiply (by scalar)

VMVN

Move Negative (immediate)

VNEG

Negate

VORN

Bitwise OR NOT

VORN

Bitwise OR NOT (pseudo-instruction)

VORR

Bitwise OR (register)

VORR

Bitwise OR (immediate)

VPADD, VPADAL

Pairwise Add, Pairwise Add and Accumulate

VPMAX, VPMIN

Pairwise Maximum, Pairwise Minimum

VQABS

Absolute value, saturate

VQADD

Add, saturate

VQDMLAL, VQDMLSL Saturating Doubling Multiply Accumulate, and Multiply Subtract

ARM DUI0801G

VQDMULL

Saturating Doubling Multiply

VQDMULH

Saturating Doubling Multiply returning High half

VQMOV{U}N

Saturating Move (register)

VQNEG

Negate, saturate

VQRDMULH

Saturating Doubling Multiply returning High half

VQRSHL

Shift Left, Round, saturate (by signed variable)

VQRSHR{U}N

Shift Right, Round, saturate (by immediate)

VQSHL

Shift Left, saturate (by immediate)

VQSHL

Shift Left, saturate (by signed variable)

VQSHR{U}N

Shift Right, saturate (by immediate)

VQSUB

Subtract, saturate

VRADDHN

Add, select High half, Round

VRECPE

Reciprocal Estimate

VRECPS

Reciprocal Step

VREV

Reverse elements

VRHADD

Halving Add, Round
Copyright © 2014-2016 ARM Limited or its affiliates. All rights
reserved.
Non-Confidential

14-605

14 Advanced SIMD Instructions (32-bit)
14.1 Summary of Advanced SIMD instructions

Table 14-1 Summary of Advanced SIMD instructions (continued)

ARM DUI0801G

Mnemonic

Brief description

VRINT

Round to integer

VRSHR

Shift Right and Round (by immediate)

VRSHRN

Shift Right, Round, Narrow (by immediate)

VRSQRTE

Reciprocal Square Root Estimate

VRSQRTS

Reciprocal Square Root Step

VRSRA

Shift Right, Round, and Accumulate (by immediate)

VRSUBHN

Subtract, select High half, Round

VSHL

Shift Left (by immediate)

VSHR

Shift Right (by immediate)

VSHRN

Shift Right, Narrow (by immediate)

VSLI

Shift Left and Insert

VSRA

Shift Right, Accumulate (by immediate)

VSRI

Shift Right and Insert

VST

Vector Store

VSUB

Subtract

VSUBHN

Subtract, select High half

VSWP

Swap vectors

VTBL, VTBX

Vector table look-up

VTRN

Vector transpose

VTST

Test bits

VUZP, VZIP

Vector interleave and de-interleave

14-606

14 Advanced SIMD Instructions (32-bit)
14.2 Summary of shared Advanced SIMD and floating-point instructions

14.2

Summary of shared Advanced SIMD and floating-point instructions
Some instructions are common to Advanced SIMD and floating-point.
The following table shows a summary of instructions that are common to the Advanced SIMD and
floating-point instruction sets.
Table 14-2 Summary of shared Advanced SIMD and floating-point instructions
Mnemonic Brief description
VLDM

Load multiple

VLDR

Load
Load (post-increment and pre-decrement)

VMOV

Transfer from one ARM register to a scalar
Transfer from two ARM registers to either one double-precision or two single-precision registers
Transfer from a scalar to an ARM register
Transfer from either one double-precision or two single-precision registers to two ARM registers

VMRS

Transfer from SIMD and floating-point system register to ARM register

VMSR

Transfer from ARM register to SIMD and floating-point system register

VPOP

Pop floating-point or SIMD registers from full-descending stack

VPUSH

Push floating-point or SIMD registers to full-descending stack

VSTM

Store multiple

VSTR

Store
Store (post-increment and pre-decrement)

Related references
14.51 VLDM on page 14-659.
14.52 VLDR on page 14-660.
14.53 VLDR (post-increment and pre-decrement) on page 14-661.
14.54 VLDR pseudo-instruction on page 14-662.
14.67 VMOV (between two ARM registers and a 64-bit extension register) on page 14-675.
14.68 VMOV (between an ARM register and an Advanced SIMD scalar) on page 14-676.
14.72 VMRS on page 14-680.
14.73 VMSR on page 14-681.
14.89 VPOP on page 14-697.
14.90 VPUSH on page 14-698.
14.126 VSTM on page 14-734.
14.129 VSTR on page 14-739.
14.130 VSTR (post-increment and pre-decrement) on page 14-740.

ARM DUI0801G

14-607

14 Advanced SIMD Instructions (32-bit)
14.3 Cryptographic instructions

14.3

Cryptographic instructions
A set of cryptographic instructions is available in some implementations of the ARMv8 architecture.
These instructions use the 128-bit Advanced SIMD registers and support the acceleration of the
following cryptographic and hash algorithms:
• AES.
• SHA1.
• SHA256.

ARM DUI0801G

14-608

14 Advanced SIMD Instructions (32-bit)
14.4 Interleaving provided by load and store element and structure instructions

14.4

Interleaving provided by load and store element and structure instructions
Many instructions in this group provide interleaving when structures are stored to memory, and deinterleaving when structures are loaded from memory.
The following figure shows an example of de-interleaving. Interleaving is the inverse process.
A[0].x
A[0].y
A[0].z
A[1].x
A[1].y
A[1].z
A[2].x
A[2].y
A[2].z
A[3].x
A[3].y
A[3].z

X3 X2 X1 X0 D0
Y3 Y2 Y1 Y0 D1
Z3 Z2 Z1 Z0 D2

Figure 14-1 De-interleaving an array of 3-element structures

Related concepts
14.5 Alignment restrictions in load and store element and structure instructions on page 14-610.
Related references
14.48 VLDn (single n-element structure to one lane) on page 14-653.
14.49 VLDn (single n-element structure to all lanes) on page 14-655.
14.50 VLDn (multiple n-element structures) on page 14-657.
14.127 VSTn (multiple n-element structures) on page 14-735.
14.128 VSTn (single n-element structure to one lane) on page 14-737.
Related information
ARM Architecture Reference Manual.

ARM DUI0801G

14-609

14 Advanced SIMD Instructions (32-bit)
14.5 Alignment restrictions in load and store element and structure instructions

14.5

Alignment restrictions in load and store element and structure instructions
Many of these instructions allow you to specify memory alignment restrictions.
When the alignment is not specified in the instruction, the alignment restriction is controlled by the A bit
(SCTLR bit[1]):
• If the A bit is 0, there are no alignment restrictions (except for strongly-ordered or device memory,
where accesses must be element-aligned).
• If the A bit is 1, accesses must be element-aligned.
If an address is not correctly aligned, an alignment fault occurs.
Related concepts
14.4 Interleaving provided by load and store element and structure instructions on page 14-609.
Related references
14.48 VLDn (single n-element structure to one lane) on page 14-653.
14.49 VLDn (single n-element structure to all lanes) on page 14-655.
14.50 VLDn (multiple n-element structures) on page 14-657.
14.127 VSTn (multiple n-element structures) on page 14-735.
14.128 VSTn (single n-element structure to one lane) on page 14-737.
Related information
ARM Architecture Reference Manual.

ARM DUI0801G

14-610

14 Advanced SIMD Instructions (32-bit)
14.6 FLDMDBX, FLDMIAX

14.6

FLDMDBX, FLDMIAX
FLDMX.
Syntax
FLDMDBX{c}{q} Rn!, dreglist ; A1 Decrement Before FP/SIMD registers (A32)
FLDMIAX{c}{q} Rn{!}, dreglist ; A1 Increment After FP/SIMD registers (A32)
FLDMDBX{c}{q} Rn!, dreglist ; T1 Decrement Before FP/SIMD registers (T32)
FLDMIAX{c}{q} Rn{!}, dreglist ; T1 Increment After FP/SIMD registers (T32)

Where:
c

See Standard assembler syntax fields in the ARMv8-A Architecture Reference Manual.
q

See Standard assembler syntax fields in the ARMv8-A Architecture Reference Manual.
Rn

Is the general-purpose base register. If writeback is not specified, the PC can be used.
!

Specifies base register writeback.
dreglist

Is the list of consecutively numbered 64-bit SIMD and FP registers to be transferred. The list
must contain at least one register, all registers must be in the range D0-D15, and must not
contain more than 16 registers.
Usage
FLDMX loads multiple SIMD and FP registers from consecutive locations in the Advanced SIMD and
floating-point register file using an address from a general-purpose register.
ARM deprecates use of FLDMDBX and FLDMIAX, except for disassembly purposes, and reassembly
of disassembled code.
Depending on settings in the CPACR in the ARMv8-A Architecture Reference Manual, NSACR in the
ARMv8-A Architecture Reference Manual, and HCPTR in the ARMv8-A Architecture Reference Manual
registers, and the security state and mode in which the instruction is executed, an attempt to execute the
instruction might be UNDEFINED, or trapped to Hyp mode. For more information see Enabling Advanced
SIMD and floating-point support in the ARMv8-A Architecture Reference Manual.
Note
For more information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural
Constraints on UNPREDICTABLE behaviors in the ARMv8-A Architecture Reference Manual.

Related references
14.1 Summary of Advanced SIMD instructions on page 14-604.

ARM DUI0801G

14-611

14 Advanced SIMD Instructions (32-bit)
14.7 FSTMDBX, FSTMIAX

14.7

FSTMDBX, FSTMIAX
FSTMX.
Syntax
FSTMDBX{c}{q} Rn!, dreglist ; A1 Decrement Before FP/SIMD registers (A32)
FSTMIAX{c}{q} Rn{!}, dreglist ; A1 Increment After FP/SIMD registers (A32)
FSTMDBX{c}{q} Rn!, dreglist ; T1 Decrement Before FP/SIMD registers (T32)
FSTMIAX{c}{q} Rn{!}, dreglist ; T1 Increment After FP/SIMD registers (T32)

Where:
c

See Standard assembler syntax fields in the ARMv8-A Architecture Reference Manual.
q

See Standard assembler syntax fields in the ARMv8-A Architecture Reference Manual.
Rn

Is the general-purpose base register. If writeback is not specified, the PC can be used. However,
ARM deprecates use of the PC.
!

Specifies base register writeback.
dreglist

Is the list of consecutively numbered 64-bit SIMD and FP registers to be transferred. The list
must contain at least one register, all registers must be in the range D0-D15, and must not
contain more than 16 registers.
Usage
FSTMX stores multiple SIMD and FP registers from the Advanced SIMD and floating-point register file
to consecutive locations in using an address from a general-purpose register.
ARM deprecates use of FLDMDBX and FLDMIAX, except for disassembly purposes, and reassembly
of disassembled code.
Depending on settings in the CPACR in the ARMv8-A Architecture Reference Manual, NSACR in the
ARMv8-A Architecture Reference Manual, HCPTR in the ARMv8-A Architecture Reference Manual, and
FPEXC in the ARMv8-A Architecture Reference Manual registers, and the security state and mode in
which the instruction is executed, an attempt to execute the instruction might be UNDEFINED, or trapped to
Hyp mode. For more information see Enabling Advanced SIMD and floating-point support in the
ARMv8-A Architecture Reference Manual.
Note
For more information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural
Constraints on UNPREDICTABLE behaviors in the ARMv8-A Architecture Reference Manual.

Related references
14.1 Summary of Advanced SIMD instructions on page 14-604.

ARM DUI0801G

14-612

14 Advanced SIMD Instructions (32-bit)
14.8 VABA and VABAL

14.8

VABA and VABAL
Vector Absolute Difference and Accumulate.
Syntax
VABA{cond}.datatype {Qd}, Qn, Qm
VABA{cond}.datatype {Dd}, Dn, Dm
VABAL{cond}.datatype Qd, Dn, Dm

where:
cond

is an optional condition code.
datatype

must be one of S8, S16, S32, U8, U16, or U32.
Qd, Qn, Qm

are the destination vector, the first operand vector, and the second operand vector, for a
quadword operation.
Dd, Dn, Dm

are the destination vector, the first operand vector, and the second operand vector, for a
doubleword operation.
Qd, Dn, Dm

are the destination vector, the first operand vector, and the second operand vector, for a long
operation.
Operation
VABA subtracts the elements of one vector from the corresponding elements of another vector, and
accumulates the absolute values of the results into the elements of the destination vector.
VABAL is the long version of the VABA instruction.

Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-613

14 Advanced SIMD Instructions (32-bit)
14.9 VABD and VABDL

14.9

VABD and VABDL
Vector Absolute Difference.
Syntax
VABD{cond}.datatype {Qd}, Qn, Qm
VABD{cond}.datatype {Dd}, Dn, Dm
VABDL{cond}.datatype Qd, Dn, Dm

where:
cond

is an optional condition code.
datatype

must be one of:
• S8, S16, S32, U8, U16, or U32 for VABDL.
• S8, S16, S32, U8, U16, U32 or F32 for VABD.
Qd, Qn, Qm

are the destination vector, the first operand vector, and the second operand vector, for a
quadword operation.
Dd, Dn, Dm

are the destination vector, the first operand vector, and the second operand vector, for a
doubleword operation.
Qd, Dn, Dm

are the destination vector, the first operand vector, and the second operand vector, for a long
operation.
Operation
VABD subtracts the elements of one vector from the corresponding elements of another vector, and places
the absolute values of the results into the elements of the destination vector.
VABDL is the long version of the VABD instruction.

Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-614

14 Advanced SIMD Instructions (32-bit)
14.10 VABS

14.10

VABS
Vector Absolute
Syntax
VABS{cond}.datatype Qd, Qm
VABS{cond}.datatype Dd, Dm

where:
cond

is an optional condition code.
datatype

must be one of S8, S16, S32, or F32.
Qd, Qm

are the destination vector and the operand vector, for a quadword operation.
Dd, Dm

are the destination vector and the operand vector, for a doubleword operation.
Operation
VABS takes the absolute value of each element in a vector, and places the results in a second vector. (The

floating-point version only clears the sign bit.)
Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
14.91 VQABS on page 14-699.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-615

14 Advanced SIMD Instructions (32-bit)
14.11 VACLE, VACLT, VACGE and VACGT

14.11

VACLE, VACLT, VACGE and VACGT
Vector Absolute Compare.
Syntax
VACop{cond}.F32 {Qd}, Qn, Qm
VACop{cond}.F32 {Dd}, Dn, Dm

where:
op

must be one of:
GE

Absolute Greater than or Equal.
GT

Absolute Greater Than.
LE

Absolute Less than or Equal.
LT

Absolute Less Than.
cond

is an optional condition code.
Qd, Qn, Qm

specifies the destination register, the first operand register, and the second operand register, for a
quadword operation.
Dd, Dn, Dm

specifies the destination register, the first operand register, and the second operand register, for a
doubleword operation.
The result datatype is I32.
Operation
These instructions take the absolute value of each element in a vector, and compare it with the absolute
value of the corresponding element of a second vector. If the condition is true, the corresponding element
in the destination vector is set to all ones. Otherwise, it is set to all zeros.
Note
On disassembly, the VACLE and VACLT pseudo-instructions are disassembled to the corresponding VACGE
and VACGT instructions, with the operands reversed.

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-616

14 Advanced SIMD Instructions (32-bit)
14.12 VADD

14.12

VADD
Vector Add.
Syntax
VADD{cond}.datatype {Qd}, Qn, Qm
VADD{cond}.datatype {Dd}, Dn, Dm

where:
cond

is an optional condition code.
datatype

must be one of I8, I16, I32, I64, or F32
Qd, Qn, Qm

are the destination vector, the first operand vector, and the second operand vector, for a
quadword operation.
Dd, Dn, Dm

are the destination vector, the first operand vector, and the second operand vector, for a
doubleword operation.
Operation
VADD adds corresponding elements in two vectors, and places the results in the destination vector.

Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
14.14 VADDL and VADDW on page 14-619.
14.92 VQADD on page 14-700.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-617

14 Advanced SIMD Instructions (32-bit)
14.13 VADDHN

14.13

VADDHN
Vector Add and Narrow, selecting High half.
Syntax
VADDHN{cond}.datatype Dd, Qn, Qm

where:
cond

is an optional condition code.
datatype

must be one of I16, I32, or I64.
Dd, Qn, Qm

are the destination vector, the first operand vector, and the second operand vector.
Operation
VADDHN adds corresponding elements in two vectors, selects the most significant halves of the results, and

places the final results in the destination vector. Results are truncated.
Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
14.105 VRADDHN on page 14-713.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-618

14 Advanced SIMD Instructions (32-bit)
14.14 VADDL and VADDW

14.14

VADDL and VADDW
Vector Add Long, Vector Add Wide.
Syntax
VADDL{cond}.datatype Qd, Dn, Dm ; Long operation
VADDW{cond}.datatype {Qd,} Qn, Dm ; Wide operation

where:
cond

is an optional condition code.
datatype

must be one of S8, S16, S32, U8, U16, or U32.
Qd, Dn, Dm

are the destination vector, the first operand vector, and the second operand vector, for a long
operation.
Qd, Qn, Dm

are the destination vector, the first operand vector, and the second operand vector, for a wide
operation.
Operation
VADDL adds corresponding elements in two doubleword vectors, and places the results in the destination

quadword vector.
VADDW adds corresponding elements in one quadword and one doubleword vector, and places the results

in the destination quadword vector.
Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
14.12 VADD on page 14-617.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-619

14 Advanced SIMD Instructions (32-bit)
14.15 VAND (immediate)

14.15

VAND (immediate)
Vector bitwise AND immediate pseudo-instruction.
Syntax
VAND{cond}.datatype Qd, #imm
VAND{cond}.datatype Dd, #imm

where:
cond

is an optional condition code.
datatype

must be either I8, I16, I32, or I64.
Qd or Dd

is the Advanced SIMD register for the result.
imm

is the immediate value.
Operation
VAND takes each element of the destination vector, performs a bitwise AND with an immediate value, and

returns the result into the destination vector.
Note
On disassembly, this pseudo-instruction is disassembled to a corresponding VBIC instruction, with the
complementary immediate value.

Immediate values
If datatype is I16, the immediate value must have one of the following forms:
•
•

0xFFXY.
0xXYFF.

If datatype is I32, the immediate value must have one of the following forms:
• 0xFFFFFFXY.
• 0xFFFFXYFF.
• 0xFFXYFFFF.
• 0xXYFFFFFF.
Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
14.17 VBIC (immediate) on page 14-622.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-620

14 Advanced SIMD Instructions (32-bit)
14.16 VAND (register)

14.16

VAND (register)
Vector bitwise AND.
Syntax
VAND{cond}{.datatype} {Qd}, Qn, Qm
VAND{cond}{.datatype} {Dd}, Dn, Dm

where:
cond

is an optional condition code.
datatype

is an optional data type. The assembler ignores datatype.
Qd, Qn, Qm

specifies the destination register, the first operand register, and the second operand register, for a
quadword operation.
Dd, Dn, Dm

specifies the destination register, the first operand register, and the second operand register, for a
doubleword operation.
Operation
VAND performs a bitwise logical AND between two registers, and places the result in the destination
register.

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-621

14 Advanced SIMD Instructions (32-bit)
14.17 VBIC (immediate)

14.17

VBIC (immediate)
Vector Bit Clear immediate.
Syntax
VBIC{cond}.datatype Qd, #imm
VBIC{cond}.datatype Dd, #imm

where:
cond

is an optional condition code.
datatype

must be either I8, I16, I32, or I64.
Qd or Dd

is the Advanced SIMD register for the source and result.
imm

is the immediate value.
Operation
VBIC takes each element of the destination vector, performs a bitwise AND complement with an

immediate value, and returns the result in the destination vector.
Immediate values
You can either specify imm as a pattern which the assembler repeats to fill the destination register, or you
can directly specify the immediate value (that conforms to the pattern) in full. The pattern for imm
depends on datatype as shown in the following table:
Table 14-3 Patterns for immediate value in VBIC (immediate)
I16

I32

0x00XY 0x000000XY
0xXY00 0x0000XY00
0x00XY0000
0xXY000000

If you use the I8 or I64 datatypes, the assembler converts it to either the I16 or I32 instruction to match
the pattern of imm. If the immediate value does not match any of the patterns in the preceding table, the
assembler generates an error.
Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
14.15 VAND (immediate) on page 14-620.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-622

14 Advanced SIMD Instructions (32-bit)
14.18 VBIC (register)

14.18

VBIC (register)
Vector Bit Clear.
Syntax
VBIC{cond}{.datatype} {Qd}, Qn, Qm
VBIC{cond}{.datatype} {Dd}, Dn, Dm

where:
cond

is an optional condition code.
datatype

is an optional data type. The assembler ignores datatype.
Qd, Qn, Qm

specifies the destination register, the first operand register, and the second operand register, for a
quadword operation.
Dd, Dn, Dm

specifies the destination register, the first operand register, and the second operand register, for a
doubleword operation.
Operation
VBIC performs a bitwise logical AND complement between two registers, and places the result in the

destination register.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-623

14 Advanced SIMD Instructions (32-bit)
14.19 VBIF

14.19

VBIF
Vector Bitwise Insert if False.
Syntax
VBIF{cond}{.datatype} {Qd}, Qn, Qm
VBIF{cond}{.datatype} {Dd}, Dn, Dm

where:
cond

is an optional condition code.
datatype

is an optional datatype. The assembler ignores datatype.
Qd, Qn, Qm

specifies the destination register, the first operand register, and the second operand register, for a
quadword operation.
Dd, Dn, Dm

specifies the destination register, the first operand register, and the second operand register, for a
doubleword operation.
Operation
VBIF inserts each bit from the first operand into the destination if the corresponding bit of the second
operand is 0, otherwise it leaves the destination bit unchanged.

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-624

14 Advanced SIMD Instructions (32-bit)
14.20 VBIT

14.20

VBIT
Vector Bitwise Insert if True.
Syntax
VBIT{cond}{.datatype} {Qd}, Qn, Qm
VBIT{cond}{.datatype} {Dd}, Dn, Dm

where:
cond

is an optional condition code.
datatype

is an optional datatype. The assembler ignores datatype.
Qd, Qn, Qm

specifies the destination register, the first operand register, and the second operand register, for a
quadword operation.
Dd, Dn, Dm

specifies the destination register, the first operand register, and the second operand register, for a
doubleword operation.
Operation
VBIT inserts each bit from the first operand into the destination if the corresponding bit of the second
operand is 1, otherwise it leaves the destination bit unchanged.

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-625

14 Advanced SIMD Instructions (32-bit)
14.21 VBSL

14.21

VBSL
Vector Bitwise Select.
Syntax
VBSL{cond}{.datatype} {Qd}, Qn, Qm
VBSL{cond}{.datatype} {Dd}, Dn, Dm

where:
cond

is an optional condition code.
datatype

is an optional datatype. The assembler ignores datatype.
Qd, Qn, Qm

specifies the destination register, the first operand register, and the second operand register, for a
quadword operation.
Dd, Dn, Dm

specifies the destination register, the first operand register, and the second operand register, for a
doubleword operation.
Operation
VBSL selects each bit for the destination from the first operand if the corresponding bit of the destination
is 1, or from the second operand if the corresponding bit of the destination is 0.

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-626

14 Advanced SIMD Instructions (32-bit)
14.22 VCADD

14.22

VCADD
Vector Complex Add.
Syntax
VCADD{q}.dt {Dd,} Dn, Dm, #rotate ; A1 64-bit SIMD vector FP/SIMD registers (A32)
VCADD{q}.dt {Qd,} Qn, Qm, #rotate ; A1 128-bit SIMD vector FP/SIMD registers (A32)
VCADD{q}.dt {Dd,} Dn, Dm, #rotate ; T1 64-bit SIMD vector FP/SIMD registers (T32)
VCADD{q}.dt {Qd,} Qn, Qm, #rotate ; T1 128-bit SIMD vector FP/SIMD registers (T32)

Where:
q

See Standard assembler syntax fields in the ARMv8-A Architecture Reference Manual.
dt

Is the data type for the elements of the vectors. For the 64-bit instructions, can be one of F16 or
F32.
Dd

Is the 64-bit name of the SIMD and FP destination register.
Dn

Is the 64-bit name of the first SIMD and FP source register.
Dm

Is the 64-bit name of the second SIMD and FP source register.
rotate

Is the rotation to be applied to elements in the second SIMD and FP source register. For the 64bit instruction, can be one of 90 or 270.
Qd

Is the 128-bit name of the SIMD and FP destination register.
Qn

Is the 128-bit name of the first SIMD and FP source register.
Qm

Is the 128-bit name of the second SIMD and FP source register.
Architectures supported
Supported in ARMv8.3.
Usage
Depending on settings in the CPACR in the ARMv8-A Architecture Reference Manual, NSACR in the
ARMv8-A Architecture Reference Manual, and HCPTR in the ARMv8-A Architecture Reference Manual
registers, and the security state and mode in which the instruction is executed, an attempt to execute the
instruction might be UNDEFINED, or trapped to Hyp mode. For more information see Enabling Advanced
SIMD and floating-point support in the ARMv8-A Architecture Reference Manual.
Related references
14.1 Summary of Advanced SIMD instructions on page 14-604.

ARM DUI0801G

14-627

14 Advanced SIMD Instructions (32-bit)
14.23 VCEQ (immediate #0)

14.23

VCEQ (immediate #0)
Vector Compare Equal to zero.
Syntax
VCEQ{cond}.datatype {Qd}, Qn, #0
VCEQ{cond}.datatype {Dd}, Dn, #0

where:
cond

is an optional condition code.
datatype

must be one of I8, I16, I32, or F32.
The result datatype is:
• I32 for operand datatypes I32 or F32.
• I16 for operand datatype I16.
• I8 for operand datatype I8.
Qd, Qn, Qm

specifies the destination register and the operand register, for a quadword operation.
Dd, Dn, Dm

specifies the destination register and the operand register, for a doubleword operation.
#0

specifies a comparison with zero.
Operation
VCEQ takes the value of each element in a vector, and compares it with zero. If the condition is true, the
corresponding element in the destination vector is set to all ones. Otherwise, it is set to all zeros.

Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-628

14 Advanced SIMD Instructions (32-bit)
14.24 VCEQ (register)

14.24

VCEQ (register)
Vector Compare Equal.
Syntax
VCEQ{cond}.datatype {Qd}, Qn, Qm
VCEQ{cond}.datatype {Dd}, Dn, Dm

where:
cond

is an optional condition code.
datatype

must be one of I8, I16, I32, or F32.
The result datatype is:
• I32 for operand datatypes I32 or F32.
• I16 for operand datatype I16.
• I8 for operand datatype I8.
Qd, Qn, Qm

specifies the destination register, the first operand register, and the second operand register, for a
quadword operation.
Dd, Dn, Dm

specifies the destination register, the first operand register, and the second operand register, for a
doubleword operation.
Operation
VCEQ takes the value of each element in a vector, and compares it with the value of the corresponding
element of a second vector. If the condition is true, the corresponding element in the destination vector is
set to all ones. Otherwise, it is set to all zeros.

Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-629

14 Advanced SIMD Instructions (32-bit)
14.25 VCGE (immediate #0)

14.25

VCGE (immediate #0)
Vector Compare Greater than or Equal to zero.
Syntax
VCGE{cond}.datatype {Qd}, Qn, #0
VCGE{cond}.datatype {Dd}, Dn, #0

where:
cond

is an optional condition code.
datatype

must be one of S8, S16, S32, or F32.
The result datatype is:
• I32 for operand datatypes S32 or F32.
• I16 for operand datatype S16.
• I8 for operand datatype S8.
Qd, Qn, Qm

specifies the destination register and the operand register, for a quadword operation.
Dd, Dn, Dm

specifies the destination register and the operand register, for a doubleword operation.
#0

specifies a comparison with zero.
Operation
VCGE takes the value of each element in a vector, and compares it with zero. If the condition is true, the
corresponding element in the destination vector is set to all ones. Otherwise, it is set to all zeros.

Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-630

14 Advanced SIMD Instructions (32-bit)
14.26 VCGE (register)

14.26

VCGE (register)
Vector Compare Greater than or Equal.
Syntax
VCGE{cond}.datatype {Qd}, Qn, Qm
VCGE{cond}.datatype {Dd}, Dn, Dm

where:
cond

is an optional condition code.
datatype

must be one of S8, S16, S32, U8, U16, U32, or F32.
The result datatype is:
• I32 for operand datatypes S32, U32, or F32.
• I16 for operand datatypes S16 or U16.
• I8 for operand datatypes S8 or U8.
Qd, Qn, Qm

specifies the destination register, the first operand register, and the second operand register, for a
quadword operation.
Dd, Dn, Dm

specifies the destination register, the first operand register, and the second operand register, for a
doubleword operation.
Operation
VCGE takes the value of each element in a vector, and compares it with the value of the corresponding

element of a second vector. If the condition is true, the corresponding element in the destination vector is
set to all ones. Otherwise, it is set to all zeros.
Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-631

14 Advanced SIMD Instructions (32-bit)
14.27 VCGT (immediate #0)

14.27

VCGT (immediate #0)
Vector Compare Greater Than zero.
Syntax
VCGT{cond}.datatype {Qd}, Qn, #0
VCGT{cond}.datatype {Dd}, Dn, #0

where:
cond

is an optional condition code.
datatype

must be one of S8, S16, S32, or F32.
The result datatype is:
• I32 for operand datatypes S32 or F32.
• I16 for operand datatype S16.
• I8 for operand datatype S8.
Qd, Qn, Qm

specifies the destination register and the operand register, for a quadword operation.
Dd, Dn, Dm

specifies the destination register and the operand register, for a doubleword operation.
Operation
VCGT takes the value of each element in a vector, and compares it with zero. If the condition is true, the

corresponding element in the destination vector is set to all ones. Otherwise, it is set to all zeros.
Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-632

14 Advanced SIMD Instructions (32-bit)
14.28 VCGT (register)

14.28

VCGT (register)
Vector Compare Greater Than.
Syntax
VCGT{cond}.datatype {Qd}, Qn, Qm
VCGT{cond}.datatype {Dd}, Dn, Dm

where:
cond

is an optional condition code.
datatype

specifies the destination register, the first operand register, and the second operand register, for a
quadword operation.
Dd, Dn, Dm

specifies the destination register, the first operand register, and the second operand register, for a
doubleword operation.
Operation
VCGT takes the value of each element in a vector, and compares it with the value of the corresponding

ARM DUI0801G

14-633

14 Advanced SIMD Instructions (32-bit)
14.29 VCLE (immediate #0)

14.29

VCLE (immediate #0)
Vector Compare Less than or Equal to zero.
Syntax
VCLE{cond}.datatype {Qd}, Qn, #0
VCLE{cond}.datatype {Dd}, Dn, #0

where:
cond

is an optional condition code.
datatype

must be one of S8, S16, S32, or F32.
The result datatype is:
• I32 for operand datatypes S32 or F32.
• I16 for operand datatype S16.
• I8 for operand datatype S8.
Qd, Qn, Qm

specifies the destination register and the operand register, for a quadword operation.
Dd, Dn, Dm

specifies the destination register and the operand register, for a doubleword operation.
#0

specifies a comparison with zero.
Operation
VCLE takes the value of each element in a vector, and compares it with zero. If the condition is true, the

ARM DUI0801G

14-634

14 Advanced SIMD Instructions (32-bit)
14.30 VCLS

14.30

VCLS
Vector Count Leading Sign bits.
Syntax
VCLS{cond}.datatype Qd, Qm
VCLS{cond}.datatype Dd, Dm

where:
cond

is an optional condition code.
datatype

must be one of S8, S16, or S32.
Qd, Qm

are the destination vector and the operand vector, for a quadword operation.
Dd, Dm

are the destination vector and the operand vector, for a doubleword operation.
Operation
VCLS counts the number of consecutive bits following the topmost bit, that are the same as the topmost

bit, in each element in a vector, and places the results in a second vector.
Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-635

14 Advanced SIMD Instructions (32-bit)
14.31 VCLE (register)

14.31

VCLE (register)
Vector Compare Less than or Equal pseudo-instruction.
Syntax
VCLE{cond}.datatype {Qd}, Qn, Qm
VCLE{cond}.datatype {Dd}, Dn, Dm

where:
cond

is an optional condition code.
datatype

specifies the destination register, the first operand register, and the second operand register, for a
quadword operation.
Dd, Dn, Dm

specifies the destination register, the first operand register, and the second operand register, for a
doubleword operation.
Operation
VCLE takes the value of each element in a vector, and compares it with the value of the corresponding

element of a second vector. If the condition is true, the corresponding element in the destination vector is
set to all ones. Otherwise, it is set to all zeros.
On disassembly, this pseudo-instruction is disassembled to the corresponding VCGE instruction, with the
operands reversed.
Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-636

14 Advanced SIMD Instructions (32-bit)
14.32 VCLT (immediate #0)

14.32

VCLT (immediate #0)
Vector Compare Less Than zero.
Syntax
VCLT{cond}.datatype {Qd}, Qn, #0
VCLT{cond}.datatype {Dd}, Dn, #0

where:
cond

is an optional condition code.
datatype

must be one of S8, S16, S32, or F32.
The result datatype is:
• I32 for operand datatypes S32 or F32.
• I16 for operand datatype S16.
• I8 for operand datatype S8.
Qd, Qn, Qm

specifies the destination register and the operand register, for a quadword operation.
Dd, Dn, Dm

specifies the destination register and the operand register, for a doubleword operation.
#0

specifies a comparison with zero.
Operation
VCLT takes the value of each element in a vector, and compares it with zero. If the condition is true, the

ARM DUI0801G

14-637

14 Advanced SIMD Instructions (32-bit)
14.33 VCLT (register)

14.33

VCLT (register)
Vector Compare Less Than.
Syntax
VCLT{cond}.datatype {Qd}, Qn, Qm
VCLT{cond}.datatype {Dd}, Dn, Dm

where:
cond

is an optional condition code.
datatype

specifies the destination register, the first operand register, and the second operand register, for a
quadword operation.
Dd, Dn, Dm

specifies the destination register, the first operand register, and the second operand register, for a
doubleword operation.
Operation
VCLT takes the value of each element in a vector, and compares it with the value of the corresponding

element of a second vector. If the condition is true, the corresponding element in the destination vector is
set to all ones. Otherwise, it is set to all zeros.
Note
On disassembly, this pseudo-instruction is disassembled to the corresponding VCGT instruction, with the
operands reversed.

Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-638

14 Advanced SIMD Instructions (32-bit)
14.34 VCLZ

14.34

VCLZ
Vector Count Leading Zeros.
Syntax
VCLZ{cond}.datatype Qd, Qm
VCLZ{cond}.datatype Dd, Dm

where:
cond

is an optional condition code.
datatype

must be one of I8, I16, or I32.
Qd, Qm

are the destination vector and the operand vector, for a quadword operation.
Dd, Dm

are the destination vector and the operand vector, for a doubleword operation.
Operation
VCLZ counts the number of consecutive zeros, starting from the top bit, in each element in a vector, and
places the results in a second vector.

Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-639

14 Advanced SIMD Instructions (32-bit)
14.35 VCMLA

14.35

VCMLA
Vector Complex Multiply Accumulate.
Syntax
VCMLA{q}.dt {Dd,} Dn, Dm, #rotate ; A1 64-bit SIMD vector FP/SIMD registers (A32)
VCMLA{q}.dt {Qd,} Qn, Qm, #rotate ; A1 128-bit SIMD vector FP/SIMD registers (A32)
VCMLA{q}.dt {Dd,} Dn, Dm, #rotate ; T1 64-bit SIMD vector FP/SIMD registers (T32)
VCMLA{q}.dt {Qd,} Qn, Qm, #rotate ; T1 128-bit SIMD vector FP/SIMD registers (T32)

Where:
q

See Standard assembler syntax fields in the ARMv8-A Architecture Reference Manual.
dt

Is the data type for the elements of the vectors:
64
Can be one of F16 or F32.
64-bit SIMD vector FP/SIMD registers
Can be one of F16 or F32.
Dd

Is the 64-bit name of the SIMD and FP destination register.
Dn

Is the 64-bit name of the first SIMD and FP source register.
Dm

Is the 64-bit name of the second SIMD and FP source register.
rotate

Is the rotation to be applied to elements in the second SIMD and FP source register:
64
Can be one of 0, 90, 180 or 270.
64-bit SIMD vector FP/SIMD registers
Can be one of 0, 90, 180 or 270.
Qd

Is the 128-bit name of the SIMD and FP destination register.
Qn

Is the 128-bit name of the first SIMD and FP source register.
Qm

Is the 128-bit name of the second SIMD and FP source register.
Architectures supported
Supported in ARMv8.3.
Usage
Vector Complex Multiply Accumulate.
Depending on settings in the CPACR in the ARMv8-A Architecture Reference Manual, NSACR in the
ARMv8-A Architecture Reference Manual, and HCPTR in the ARMv8-A Architecture Reference Manual
registers, and the security state and mode in which the instruction is executed, an attempt to execute the
instruction might be UNDEFINED, or trapped to Hyp mode. For more information see Enabling Advanced
SIMD and floating-point support in the ARMv8-A Architecture Reference Manual.
Related references
14.1 Summary of Advanced SIMD instructions on page 14-604.

ARM DUI0801G

14-640

14 Advanced SIMD Instructions (32-bit)
14.36 VCMLA (by element)

14.36

VCMLA (by element)
Vector Complex Multiply Accumulate (by element).
Syntax
VCMLA{q}.F16 Dd, Dn, Dm[index], #rotate ; A1 Double,halfprec FP/SIMD registers (A32)
VCMLA{q}.F32 Dd, Dn, Dm[0], #rotate ; A1 Double,singleprec FP/SIMD registers (A32)
VCMLA{q}.F32 Qd, Qn, Dm[0], #rotate ; A1 Quad,singleprec FP/SIMD registers (A32)
VCMLA{q}.F16 Qd, Qn, Dm[index], #rotate ; A1 Halfprec,quad FP/SIMD registers (A32)
VCMLA{q}.F16 Dd, Dn, Dm[index], #rotate ; T1 Double,halfprec FP/SIMD registers (T32)
VCMLA{q}.F32 Dd, Dn, Dm[0], #rotate ; T1 Double,singleprec FP/SIMD registers (T32)
VCMLA{q}.F32 Qd, Qn, Dm[0], #rotate ; T1 Quad,singleprec FP/SIMD registers (T32)
VCMLA{q}.F16 Qd, Qn, Dm[index], #rotate ; T1 Halfprec,quad FP/SIMD registers (T32)

Where:
q

See Standard assembler syntax fields in the ARMv8-A Architecture Reference Manual.
Dd

Is the 64-bit name of the SIMD and FP destination register.
Dn

Is the 64-bit name of the first SIMD and FP source register.
Dm

Is the 64-bit name of the second SIMD and FP source register
index

Is the element index in the range 0 to 1.
rotate

Is the rotation to be applied to elements in the second SIMD and FP source register. For the
Double,halfprec FP/SIMD registers, can be one of 0, 90, 180 or 270.
Qd

Is the 128-bit name of the SIMD and FP destination register.
Qn

Is the 128-bit name of the first SIMD and FP source register.
Architectures supported
Supported in ARMv8.3.
Usage
Depending on settings in the CPACR in the ARMv8-A Architecture Reference Manual, NSACR in the
ARMv8-A Architecture Reference Manual, and HCPTR in the ARMv8-A Architecture Reference Manual
registers, and the security state and mode in which the instruction is executed, an attempt to execute the
instruction might be UNDEFINED, or trapped to Hyp mode. For more information see Enabling Advanced
SIMD and floating-point support in the ARMv8-A Architecture Reference Manual.
Related references
14.1 Summary of Advanced SIMD instructions on page 14-604.

ARM DUI0801G

14-641

14 Advanced SIMD Instructions (32-bit)
14.37 VCNT

14.37

VCNT
Vector Count set bits.
Syntax
VCNT{cond}.datatype Qd, Qm
VCNT{cond}.datatype Dd, Dm

where:
cond

is an optional condition code.
datatype

must be I8.
Qd, Qm

are the destination vector and the operand vector, for a quadword operation.
Dd, Dm

are the destination vector and the operand vector, for a doubleword operation.
Operation
VCNT counts the number of bits that are one in each element in a vector, and places the results in a second
vector.

Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-642

14 Advanced SIMD Instructions (32-bit)
14.38 VCVT (between fixed-point or integer, and floating-point)

14.38

VCVT (between fixed-point or integer, and floating-point)
Vector Convert.
Syntax
VCVT{cond}.type Qd, Qm {, #fbits}
VCVT{cond}.type Dd, Dm {, #fbits}

where:
cond

is an optional condition code.
type

specifies the data types for the elements of the vectors. It must be one of:
S32.F32

Floating-point to signed integer or fixed-point.
U32.F32

Floating-point to unsigned integer or fixed-point.
F32.S32

Signed integer or fixed-point to floating-point.
F32.U32

Unsigned integer or fixed-point to floating-point.
Qd, Qm

specifies the destination vector and the operand vector, for a quadword operation.
Dd, Dm

specifies the destination vector and the operand vector, for a doubleword operation.
fbits

if present, specifies the number of fraction bits in the fixed point number. Otherwise, the
conversion is between floating-point and integer. fbits must lie in the range 0-32. If fbits is
omitted, the number of fraction bits is 0.
Operation
VCVT converts each element in a vector in one of the following ways, and places the results in the
destination vector:
• From floating-point to integer.
• From integer to floating-point.
• From floating-point to fixed-point.
• From fixed-point to floating-point.
Rounding
Integer or fixed-point to floating-point conversions use round to nearest.
Floating-point to integer or fixed-point conversions use round towards zero.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-643

14 Advanced SIMD Instructions (32-bit)
14.39 VCVT (between half-precision and single-precision floating-point)

14.39

VCVT (between half-precision and single-precision floating-point)
Vector Convert.
Syntax
VCVT{cond}.F32.F16 Qd, Dm
VCVT{cond}.F16.F32 Dd, Qm

where:
cond

is an optional condition code.
Qd, Dm

specifies the destination vector for the single-precision results and the half-precision operand
vector.
Dd, Qm

specifies the destination vector for half-precision results and the single-precision operand vector.
Operation
VCVT with half-precision extension, converts each element in a vector in one of the following ways, and

places the results in the destination vector:
• From half-precision floating-point to single-precision floating-point (F32.F16).
• From single-precision floating-point to half-precision floating-point (F16.F32).
Architectures
This instruction is available in ARMv8. In earlier architectures, it is only available in NEON systems
with the half-precision extension.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-644

14 Advanced SIMD Instructions (32-bit)
14.40 VCVT (from floating-point to integer with directed rounding modes)

14.40

VCVT (from floating-point to integer with directed rounding modes)
VCVT (Vector Convert) converts each element in a vector from floating-point to signed or unsigned

integer, and places the results in the destination vector.
Note
•
•

This instruction is supported only in ARMv8.
You cannot use VCVT with a directed rounding mode inside an IT block.

Syntax
VCVTmode.type Qd, Qm
VCVTmode.type Dd, Dm

where:
mode

must be one of:
A

meaning round to nearest, ties away from zero
N

meaning round to nearest, ties to even
P

meaning round towards plus infinity
M

meaning round towards minus infinity.
type

specifies the data types for the elements of the vectors. It must be one of:
S32.F32

floating-point to signed integer
U32.F32

floating-point to unsigned integer.
Qd, Qm

specifies the destination and operand vectors, for a quadword operation.
Dd, Dm

specifies the destination and operand vectors, for a doubleword operation.

ARM DUI0801G

14-645

14 Advanced SIMD Instructions (32-bit)
14.41 VCVTB, VCVTT (between half-precision and double-precision)

14.41

VCVTB, VCVTT (between half-precision and double-precision)
These instructions convert between half-precision and double-precision floating-point numbers.
The conversion can be done in either of the following ways:
•
•

From half-precision floating-point to double-precision floating-point (F64.F16).
From double-precision floating-point to half-precision floating-point (F16.F64).

VCVTB uses the bottom half (bits[15:0]) of the single word register to obtain or store the half-precision

value.
VCVTT uses the top half (bits[31:16]) of the single word register to obtain or store the half-precision

value.
Note
These instructions are supported only in ARMv8.

Syntax
VCVTB{cond}.F64.F16 Dd, Sm
VCVTB{cond}.F16.F64 Sd, Dm
VCVTT{cond}.F64.F16 Dd, Sm
VCVTT{cond}.F16.F64 Sd, Dm

where:
cond

is an optional condition code.
Dd

is a double-precision register for the result.
Sm

is a single word register holding the operand.
Sd

is a single word register for the result.
Dm

is a double-precision register holding the operand.
Usage
These instructions convert the half-precision value in Sm to double-precision and place the result in Dd, or
the double-precision value in Dm to half-precision and place the result in Sd.
Floating-point exceptions
These instructions can produce Input Denormal, Invalid Operation, Overflow, Underflow, or Inexact
exceptions.

ARM DUI0801G

14-646

14 Advanced SIMD Instructions (32-bit)
14.42 VDUP

14.42

VDUP
Vector Duplicate.
Syntax
VDUP{cond}.size Qd, Dm[x]
VDUP{cond}.size Dd, Dm[x]
VDUP{cond}.size Qd, Rm
VDUP{cond}.size Dd, Rm

where:
cond

is an optional condition code.
size

must be 8, 16, or 32.
Qd

specifies the destination register for a quadword operation.
Dd

specifies the destination register for a doubleword operation.
Dm[x]

specifies the Advanced SIMD scalar.
Rm

specifies the ARM register. Rm must not be PC.
Operation
VDUP duplicates a scalar into every element of the destination vector. The source can be an Advanced
SIMD scalar or an ARM register.

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-647

14 Advanced SIMD Instructions (32-bit)
14.43 VEOR

14.43

VEOR
Vector Bitwise Exclusive OR.
Syntax
VEOR{cond}{.datatype} {Qd}, Qn, Qm
VEOR{cond}{.datatype} {Dd}, Dn, Dm

where:
cond

is an optional condition code.
datatype

is an optional data type. The assembler ignores datatype.
Qd, Qn, Qm

specifies the destination register, the first operand register, and the second operand register, for a
quadword operation.
Dd, Dn, Dm

specifies the destination register, the first operand register, and the second operand register, for a
doubleword operation.
Operation
VEOR performs a logical exclusive OR between two registers, and places the result in the destination

ARM DUI0801G

14-648

14 Advanced SIMD Instructions (32-bit)
14.44 VEXT

14.44

VEXT
Vector Extract.
Syntax
VEXT{cond}.8 {Qd}, Qn, Qm, #imm
VEXT{cond}.8 {Dd}, Dn, Dm, #imm

where:
cond

is an optional condition code.
Qd, Qn, Qm

specifies the destination register, the first operand register, and the second operand register, for a
quadword operation.
Dd, Dn, Dm

specifies the destination register, the first operand register, and the second operand register, for a
doubleword operation.
imm

is the number of 8-bit elements to extract from the bottom of the second operand vector, in the
range 0-7 for doubleword operations, or 0-15 for quadword operations.
Operation
VEXT extracts 8-bit elements from the bottom end of the second operand vector and the top end of the
first, concatenates them, and places the result in the destination vector. See the following figure for an
example:
7 6 5 4 3 2 1 0

7 6 5 4 3 2 1 0

Figure 14-2 Operation of doubleword VEXT for imm = 3

VEXT pseudo-instruction
You can specify a datatype of 16, 32, or 64 instead of 8. In this case, #imm refers to halfwords, words, or
doublewords instead of referring to bytes, and the permitted ranges are correspondingly reduced.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-649

14 Advanced SIMD Instructions (32-bit)
14.45 VFMA, VFMS

14.45

VFMA, VFMS
Vector Fused Multiply Accumulate, Vector Fused Multiply Subtract.
Syntax
Vop{cond}.F32 {Qd}, Qn, Qm
Vop{cond}.F32 {Dd}, Dn, Dm

where:
op

is one of FMA or FMS.
cond

is an optional condition code.
Dd, Dn, Dm

are the destination and operand vectors for doubleword operation.
Qd, Qn, Qm

are the destination and operand vectors for quadword operation.
Operation
VFMA multiplies corresponding elements in the two operand vectors, and accumulates the results into the
elements of the destination vector. The result of the multiply is not rounded before the accumulation.
VFMS multiplies corresponding elements in the two operand vectors, then subtracts the products from the
corresponding elements of the destination vector, and places the final results in the destination vector.
The result of the multiply is not rounded before the subtraction.

Related references
14.74 VMUL on page 14-682.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-650

14 Advanced SIMD Instructions (32-bit)
14.46 VHADD

14.46

VHADD
Vector Halving Add.
Syntax
VHADD{cond}.datatype {Qd}, Qn, Qm
VHADD{cond}.datatype {Dd}, Dn, Dm

where:
cond

is an optional condition code.
datatype

must be one of S8, S16, S32, U8, U16, or U32.
Qd, Qn, Qm

are the destination vector, the first operand vector, and the second operand vector, for a
quadword operation.
Dd, Dn, Dm

are the destination vector, the first operand vector, and the second operand vector, for a
doubleword operation.
Operation
VHADD adds corresponding elements in two vectors, shifts each result right one bit, and places the results
in the destination vector. Results are truncated.

Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-651

14 Advanced SIMD Instructions (32-bit)
14.47 VHSUB

14.47

VHSUB
Vector Halving Subtract.
Syntax
VHSUB{cond}.datatype {Qd}, Qn, Qm
VHSUB{cond}.datatype {Dd}, Dn, Dm

where:
cond

is an optional condition code.
datatype

must be one of S8, S16, S32, U8, U16, or U32.
Qd, Qn, Qm

are the destination vector, the first operand vector, and the second operand vector, for a
quadword operation.
Dd, Dn, Dm

are the destination vector, the first operand vector, and the second operand vector, for a
doubleword operation.
Operation
VHSUB subtracts the elements of one vector from the corresponding elements of another vector, shifts

each result right one bit, and places the results in the destination vector. Results are always truncated.
Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-652

14 Advanced SIMD Instructions (32-bit)
14.48 VLDn (single n-element structure to one lane)

14.48

VLDn (single n-element structure to one lane)
Vector Load single n-element structure to one lane.
Syntax
VLDn{cond}.datatype list, [Rn{@align}]{!}
VLDn{cond}.datatype list, [Rn{@align}], Rm

where:
n

must be one of 1, 2, 3, or 4.
cond

is an optional condition code.
datatype

see the following table.
list

is the list of Advanced SIMD registers enclosed in braces, { and }. See the following table for
options.
Rn

is the ARM register containing the base address. Rn cannot be PC.
align

specifies an optional alignment. See the following table for options.
!

if ! is present, Rn is updated to (Rn + the number of bytes transferred by the instruction). The
update occurs after all the loads have taken place.
Rm

of the register that are not loaded are unaltered.
Table 14-4 Permitted combinations of parameters for VLDn (single n-element structure to one lane)
n datatype list ag

align ah

alignment

1 8

{Dd[x]}

Standard only

{Dd[x]}

@16

2-byte

{Dd[x]}

@32

4-byte

{Dd[x], D(d+1)[x]}

@16

2-byte

{Dd[x], D(d+1)[x]}

@32

4-byte

{Dd[x], D(d+2)[x]}

@32

4-byte

{Dd[x], D(d+1)[x]}

@64

8-byte

{Dd[x], D(d+2)[x]}

@64

8-byte

{Dd[x], D(d+1)[x], D(d+2)[x]}

Standard only

{Dd[x], D(d+1)[x], D(d+2)[x]}

Standard only

2 8
16

3 8
16 or 32

ag
ah

Every register in the list must be in the range D0-D31.
align can be omitted. In this case, standard alignment rules apply.

ARM DUI0801G

14-653

14 Advanced SIMD Instructions (32-bit)
14.48 VLDn (single n-element structure to one lane)

Table 14-4 Permitted combinations of parameters for VLDn (single n-element structure to one lane) (continued)
n datatype list ag
{Dd[x], D(d+2)[x], D(d+4)[x]}
4 8
16

align ah

alignment

Standard only

{Dd[x], D(d+1)[x], D(d+2)[x], D(d+3)[x]} @32

4-byte

{Dd[x], D(d+1)[x], D(d+2)[x], D(d+3)[x]} @64

8-byte

{Dd[x], D(d+2)[x], D(d+4)[x], D(d+6)[x]} @64

8-byte

{Dd[x], D(d+1)[x], D(d+2)[x], D(d+3)[x]} @64 or @128 8-byte or 16-byte
{Dd[x], D(d+2)[x], D(d+4)[x], D(d+6)[x]} @64 or @128 8-byte or 16-byte

Related concepts
14.4 Interleaving provided by load and store element and structure instructions on page 14-609.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-654

14 Advanced SIMD Instructions (32-bit)
14.49 VLDn (single n-element structure to all lanes)

14.49

VLDn (single n-element structure to all lanes)
Vector Load single n-element structure to all lanes.
Syntax
VLDn{cond}.datatype list, [Rn{@align}]{!}
VLDn{cond}.datatype list, [Rn{@align}], Rm

where:
n

must be one of 1, 2, 3, or 4.
cond

is an optional condition code.
datatype

see the following table.
list

is the list of Advanced SIMD registers enclosed in braces, { and }. See the following table for
options.
Rn

is the ARM register containing the base address. Rn cannot be PC.
align

specifies an optional alignment. See the following table for options.
!

if ! is present, Rn is updated to (Rn + the number of bytes transferred by the instruction). The
update occurs after all the loads have taken place.
Rm

registers.
Table 14-5 Permitted combinations of parameters for VLDn (single n-element structure to all lanes)
n datatype

list ai

align aj

alignment

1 8

{Dd[]}

Standard only

{Dd[],D(d+1)[]}

Standard only

{Dd[]}

@16

2-byte

{Dd[],D(d+1)[]}

@16

2-byte

{Dd[]}

@32

4-byte

{Dd[],D(d+1)[]}

@32

4-byte

{Dd[], D(d+1)[]}

byte

{Dd[], D(d+2)[]}

byte

{Dd[], D(d+1)[]}

@16

2-byte

{Dd[], D(d+2)[]}

@16

2-byte

2 8

ai
aj

Every register in the list must be in the range D0-D31.
align can be omitted. In this case, standard alignment rules apply.

ARM DUI0801G

14-655

14 Advanced SIMD Instructions (32-bit)
14.49 VLDn (single n-element structure to all lanes)

Table 14-5 Permitted combinations of parameters for VLDn (single n-element structure to all lanes) (continued)
list ai

align aj

alignment

{Dd[], D(d+1)[]}

@32

4-byte

{Dd[], D(d+2)[]}

@32

4-byte

3 8, 16, or 32 {Dd[], D(d+1)[], D(d+2)[]}

Standard only

{Dd[], D(d+2)[], D(d+4)[]}

Standard only

n datatype
32

4 8

{Dd[], D(d+1)[], D(d+2)[], D(d+3)[]} @32

4-byte

{Dd[], D(d+2)[], D(d+4)[], D(d+6)[]} @32

4-byte

{Dd[], D(d+1)[], D(d+2)[], D(d+3)[]} @64

8-byte

{Dd[], D(d+2)[], D(d+4)[], D(d+6)[]} @64

8-byte

{Dd[], D(d+1)[], D(d+2)[], D(d+3)[]} @64 or @128 8-byte or 16-byte
{Dd[], D(d+2)[], D(d+4)[], D(d+6)[]} @64 or @128 8-byte or 16-byte

Related concepts
14.4 Interleaving provided by load and store element and structure instructions on page 14-609.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-656

14 Advanced SIMD Instructions (32-bit)
14.50 VLDn (multiple n-element structures)

14.50

VLDn (multiple n-element structures)
Vector Load multiple n-element structures.
Syntax
VLDn{cond}.datatype list, [Rn{@align}]{!}
VLDn{cond}.datatype list, [Rn{@align}], Rm

where:
n

must be one of 1, 2, 3, or 4.
cond

is an optional condition code.
datatype

see the following table for options.
list

is the list of Advanced SIMD registers enclosed in braces, { and }. See the following table for
options.
Rn

is the ARM register containing the base address. Rn cannot be PC.
align

specifies an optional alignment. See the following table for options.
!

if ! is present, Rn is updated to (Rn + the number of bytes transferred by the instruction). The
update occurs after all the loads have taken place.
Rm

is an ARM register containing an offset from the base address. If Rm is present, the instruction
updates Rn to (Rn + Rm) after using the address to access memory. Rm cannot be SP or PC.
Operation
VLDn loads multiple n-element structures from memory into one or more Advanced SIMD registers, with
de-interleaving (unless n == 1). Every element of each register is loaded.
Table 14-6 Permitted combinations of parameters for VLDn (multiple n-element structures)
n datatype

list ak

align al

alignment

@64

8-byte

{Dd, D(d+1)}

@64 or @128

8-byte or 16-byte

{Dd, D(d+1), D(d+2)}

@64

8-byte

1 8, 16, 32, or 64 {Dd}

{Dd, D(d+1), D(d+2), D(d+3)} @64, @128, or @256 8-byte, 16-byte, or 32-byte
2 8, 16, or 32

{Dd, D(d+1)}

@64, @128

8-byte or 16-byte

{Dd, D(d+2)}

@64, @128

8-byte or 16-byte

{Dd, D(d+1), D(d+2), D(d+3)} @64, @128, or @256 8-byte, 16-byte, or 32-byte
3 8, 16, or 32

ak
al

{Dd, D(d+1), D(d+2)}

@64

8-byte

{Dd, D(d+2), D(d+4)}

@64

8-byte

Every register in the list must be in the range D0-D31.
align can be omitted. In this case, standard alignment rules apply.

ARM DUI0801G

14-657

14 Advanced SIMD Instructions (32-bit)
14.50 VLDn (multiple n-element structures)

Table 14-6 Permitted combinations of parameters for VLDn (multiple n-element structures) (continued)
n datatype

list ak

align al

4 8, 16, or 32

{Dd, D(d+1), D(d+2), D(d+3)} @64, @128, or @256 8-byte, 16-byte, or 32-byte

alignment

{Dd, D(d+2), D(d+4), D(d+6)} @64, @128, or @256 8-byte, 16-byte, or 32-byte

Related concepts
14.4 Interleaving provided by load and store element and structure instructions on page 14-609.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-658

14 Advanced SIMD Instructions (32-bit)
14.51 VLDM

14.51

VLDM
Extension register load multiple.
Syntax
VLDMmode{cond} Rn{!}, Registers

where:
mode

must be one of:
IA

meaning Increment address After each transfer. IA is the default, and can be omitted.
DB

meaning Decrement address Before each transfer.
EA

meaning Empty Ascending stack operation. This is the same as DB for loads.
FD

meaning Full Descending stack operation. This is the same as IA for loads.
cond

is an optional condition code.
Rn

is the ARM register holding the base address for the transfer.
!

is optional. ! specifies that the updated base address must be written back to Rn. If ! is not
specified, mode must be IA.
Registers

is a list of consecutive extension registers enclosed in braces, { and }. The list can be commaseparated, or in range format. There must be at least one register in the list.
You can specify D or Q registers, but they must not be mixed. The number of registers must not
exceed 16 D registers, or 8 Q registers. If Q registers are specified, on disassembly they are shown
as D registers.
Note
VPOP Registers is equivalent to VLDM sp!, Registers.

You can use either form of this instruction. They both disassemble to VPOP.

Related concepts
6.16 Stack implementation using LDM and STM on page 6-122.
Related references
7.11 Condition code suffixes on page 7-150.
15.14 VLDM (floating-point) on page 15-766.

ARM DUI0801G

14-659

14 Advanced SIMD Instructions (32-bit)
14.52 VLDR

14.52

VLDR
Extension register load.
Syntax
VLDR{cond}{.64} Dd, [Rn{, #offset}]
VLDR{cond}{.64} Dd, label

where:
cond

is an optional condition code.
Dd

is the extension register to be loaded.
Rn

is the ARM register holding the base address for the transfer.
offset

is a PC-relative expression.
label must be aligned on a word boundary within ±1KB of the current instruction.

Operation
The VLDR instruction loads an extension register from memory.
Two words are transferred.
There is also a VLDR pseudo-instruction.
Related concepts
12.5 Register-relative and PC-relative expressions on page 12-302.
Related references
14.54 VLDR pseudo-instruction on page 14-662.
7.11 Condition code suffixes on page 7-150.
15.15 VLDR (floating-point) on page 15-767.

ARM DUI0801G

14-660

14 Advanced SIMD Instructions (32-bit)
14.53 VLDR (post-increment and pre-decrement)

14.53

VLDR (post-increment and pre-decrement)
Pseudo-instruction that loads extension registers, with post-increment and pre-decrement forms.
Note
There are also VLDR and VSTR instructions without post-increment and pre-decrement.

Syntax
VLDR{cond}{.64} Dd, [Rn], #offset ; post-increment
VLDR{cond}{.64} Dd, [Rn, #-offset]! ; pre-decrement

where:
cond

is an optional condition code.
Dd

is the extension register to load.
Rn

is the ARM register holding the base address for the transfer.
offset

is a numeric expression that must evaluate to 8 at assembly time.
Operation
The post-increment instruction increments the base address in the register by the offset value, after the
transfer. The pre-decrement instruction decrements the base address in the register by the offset value,
and then performs the transfer using the new address in the register. This pseudo-instruction assembles to
a VLDM instruction.
Related references
14.51 VLDM on page 14-659.
14.52 VLDR on page 14-660.
7.11 Condition code suffixes on page 7-150.
15.16 VLDR (post-increment and pre-decrement, floating-point) on page 15-768.

ARM DUI0801G

14-661

14 Advanced SIMD Instructions (32-bit)
14.54 VLDR pseudo-instruction

14.54

VLDR pseudo-instruction
The VLDR pseudo-instruction loads a constant value into every element of a 64-bit Advanced SIMD
vector.
Note
This description is for the VLDR pseudo-instruction only.

Syntax
VLDR{cond}.datatype Dd,=constant

where:
cond

is an optional condition code.
datatype

must be one of In, Sn, Un, or F32.
n

must be one of 8, 16, 32, or 64.
Dd

is the extension register to be loaded.
constant

is an immediate value of the appropriate type for datatype.
Usage
If an instruction (for example, VMOV) is available that can generate the constant directly into the register,
the assembler uses it. Otherwise, it generates a doubleword literal pool entry containing the constant and
loads the constant using a VLDR instruction.
Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
14.52 VLDR on page 14-660.
7.11 Condition code suffixes on page 7-150.
14.54 VLDR pseudo-instruction on page 14-662.

ARM DUI0801G

14-662

14 Advanced SIMD Instructions (32-bit)
14.55 VMAX and VMIN

14.55

VMAX and VMIN
Vector Maximum, Vector Minimum.
Syntax
Vop{cond}.datatype Qd, Qn, Qm
Vop{cond}.datatype Dd, Dn, Dm

where:
op

must be either MAX or MIN.
cond

is an optional condition code.
datatype

must be one of S8, S16, S32, U8, U16, U32, or F32.
Qd, Qn, Qm

are the destination vector, the first operand vector, and the second operand vector, for a
quadword operation.
Dd, Dn, Dm

are the destination vector, the first operand vector, and the second operand vector, for a
doubleword operation.
Operation
VMAX compares corresponding elements in two vectors, and copies the larger of each pair into the
corresponding element in the destination vector.
VMIN compares corresponding elements in two vectors, and copies the smaller of each pair into the
corresponding element in the destination vector.

Floating-point maximum and minimum
max(+0.0, –0.0) = +0.0.
min(+0.0, –0.0) = –0.0
If any input is a NaN, the corresponding result element is the default NaN.
Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
14.86 VPADD on page 14-694.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-663

14 Advanced SIMD Instructions (32-bit)
14.56 VMAXNM, VMINNM

14.56

VMAXNM, VMINNM
Vector Minimum, Vector Maximum.
Note
•
•

These instructions are supported only in ARMv8.
You cannot use VMAXNM or VMINNM inside an IT block.

Syntax
Vop.F32 Qd, Qn, Qm
Vop.F32 Dd, Dn, Dm

where:
op

must be either MAXNM or MINNM.
Qd, Qn, Qm

are the destination vector, the first operand vector, and the second operand vector, for a
quadword operation.
Dd, Dn, Dm

are the destination vector, the first operand vector, and the second operand vector, for a
doubleword operation.
Operation
VMAXNM compares corresponding elements in two vectors, and copies the larger of each pair into the

corresponding element in the destination vector.
VMINNM compares corresponding elements in two vectors, and copies the smaller of each pair into the
corresponding element in the destination vector.

If one of the elements in a pair is a number and the other element is NaN, the corresponding result
element is the number. This is consistent with the IEEE 754-2008 standard.

ARM DUI0801G

14-664

14 Advanced SIMD Instructions (32-bit)
14.57 VMLA

14.57

VMLA
Vector Multiply Accumulate.
Syntax
VMLA{cond}.datatype {Qd}, Qn, Qm
VMLA{cond}.datatype {Dd}, Dn, Dm

where:
cond

is an optional condition code.
datatype

must be one of I8, I16, I32, or F32.
Qd, Qn, Qm

are the destination vector, the first operand vector, and the second operand vector, for a
quadword operation.
Dd, Dn, Dm

are the destination vector, the first operand vector, and the second operand vector, for a
doubleword operation.
Operation
VMLA multiplies corresponding elements in two vectors, and accumulates the results into the elements of
the destination vector.

Related concepts
9.11 Polynomial arithmetic over {0,1} on page 9-195.
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-665

14 Advanced SIMD Instructions (32-bit)
14.58 VMLA (by scalar)

14.58

VMLA (by scalar)
Vector Multiply by scalar and Accumulate.
Syntax
VMLA{cond}.datatype {Qd}, Qn, Dm[x]
VMLA{cond}.datatype {Dd}, Dn, Dm[x]

where:
cond

is an optional condition code.
datatype

must be one of I16, I32, or F32.
Qd, Qn

are the destination vector and the first operand vector, for a quadword operation.
Dd, Dn

are the destination vector and the first operand vector, for a doubleword operation.
Dm[x]

is the scalar holding the second operand.
Operation
VMLA multiplies each element in a vector by a scalar, and accumulates the results into the corresponding
elements of the destination vector.

Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-666

14 Advanced SIMD Instructions (32-bit)
14.59 VMLAL (by scalar)

14.59

VMLAL (by scalar)
Vector Multiply by scalar and Accumulate Long.
Syntax
VMLAL{cond}.datatype Qd, Dn, Dm[x]

where:
cond

is an optional condition code.
datatype

must be one of S16, S32, U16, or U32
Qd, Dn

are the destination vector and the first operand vector, for a long operation.
Dm[x]

is the scalar holding the second operand.
Operation
VMLAL multiplies each element in a vector by a scalar, and accumulates the results into the corresponding

elements of the destination vector.
Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-667

14 Advanced SIMD Instructions (32-bit)
14.60 VMLAL

14.60

VMLAL
Vector Multiply Accumulate Long.
Syntax
VMLAL{cond}.datatype Qd, Dn, Dm

where:
cond

is an optional condition code.
datatype

must be one of S8, S16, S32,U8, U16, or U32.
Qd, Dn, Dm

are the destination vector, the first operand vector, and the second operand vector, for a long
operation.
Operation
VMLAL multiplies corresponding elements in two vectors, and accumulates the results into the elements of

the destination vector.
Related concepts
9.11 Polynomial arithmetic over {0,1} on page 9-195.
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-668

14 Advanced SIMD Instructions (32-bit)
14.61 VMLS (by scalar)

14.61

VMLS (by scalar)
Vector Multiply by scalar and Subtract.
Syntax
VMLS{cond}.datatype {Qd}, Qn, Dm[x]
VMLS{cond}.datatype {Dd}, Dn, Dm[x]

where:
cond

is an optional condition code.
datatype

must be one of I16, I32, or F32.
Qd, Qn

are the destination vector and the first operand vector, for a quadword operation.
Dd, Dn

are the destination vector and the first operand vector, for a doubleword operation.
Dm[x]

is the scalar holding the second operand.
Operation
VMLS multiplies each element in a vector by a scalar, subtracts the results from the corresponding

elements of the destination vector, and places the final results in the destination vector.
Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-669

14 Advanced SIMD Instructions (32-bit)
14.62 VMLS

14.62

VMLS
Vector Multiply Subtract.
Syntax
VMLS{cond}.datatype {Qd}, Qn, Qm
VMLS{cond}.datatype {Dd}, Dn, Dm

where:
cond

is an optional condition code.
datatype

must be one of I8, I16, I32, F32.
Qd, Qn, Qm

are the destination vector, the first operand vector, and the second operand vector, for a
quadword operation.
Dd, Dn, Dm

are the destination vector, the first operand vector, and the second operand vector, for a
doubleword operation.
Operation
VMLS multiplies corresponding elements in two vectors, subtracts the results from corresponding
elements of the destination vector, and places the final results in the destination vector.

ARM DUI0801G

14-670

14 Advanced SIMD Instructions (32-bit)
14.63 VMLSL

14.63

VMLSL
Vector Multiply Subtract Long.
Syntax
VMLSL{cond}.datatype Qd, Dn, Dm

where:
cond

is an optional condition code.
datatype

must be one of S8, S16, S32, U8, U16, or U32.
Qd, Dn, Dm

are the destination vector, the first operand vector, and the second operand vector, for a long
operation.
Operation
VMLSL multiplies corresponding elements in two vectors, subtracts the results from corresponding
elements of the destination vector, and places the final results in the destination vector.

ARM DUI0801G

14-671

14 Advanced SIMD Instructions (32-bit)
14.64 VMLSL (by scalar)

14.64

VMLSL (by scalar)
Vector Multiply by scalar and Subtract Long.
Syntax
VMLSL{cond}.datatype Qd, Dn, Dm[x]

where:
cond

is an optional condition code.
datatype

must be one of S16, S32, U16, or U32.
Qd, Dn

are the destination vector and the first operand vector, for a long operation.
Dm[x]

is the scalar holding the second operand.
Operation
VMLSL multiplies each element in a vector by a scalar, subtracts the results from the corresponding
elements of the destination vector, and places the final results in the destination vector.

Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-672

14 Advanced SIMD Instructions (32-bit)
14.65 VMOV (immediate)

14.65

VMOV (immediate)
Vector Move.
Syntax
VMOV{cond}.datatype Qd, #imm
VMOV{cond}.datatype Dd, #imm

where:
cond

is an optional condition code.
datatype

must be one of I8, I16, I32, I64, or F32.
Qd or Dd

is the Advanced SIMD register for the result.
imm

is an immediate value of the type specified by datatype. This is replicated to fill the destination
register.
Operation
VMOV replicates an immediate value in every element of the destination register.
Table 14-7 Available immediate values in VMOV (immediate)
datatype imm
I8

0xXY

I16

0x00XY, 0xXY00

I32

0x000000XY, 0x0000XY00, 0x00XY0000, 0xXY000000
0x0000XYFF, 0x00XYFFFF

I64

byte masks, 0xGGHHJJKKLLMMNNPP am

F32

floating-point numbers an

Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

am
an

Each of 0xGG, 0xHH, 0xJJ, 0xKK, 0xLL, 0xMM, 0xNN, and 0xPP must be either 0x00 or 0xFF.
Any number that can be expressed as +/–n * 2–r, where n and r are integers, 16 <= n <= 31, 0 <= r <= 7.

ARM DUI0801G

14-673

14 Advanced SIMD Instructions (32-bit)
14.66 VMOV (register)

14.66

VMOV (register)
Vector Move.
Syntax
VMOV{cond}{.datatype} Qd, Qm
VMOV{cond}{.datatype} Dd, Dm

where:
cond

is an optional condition code.
datatype

is an optional datatype. The assembler ignores datatype.
Qd, Qm

specifies the destination vector and the source vector, for a quadword operation.
Dd, Dm

specifies the destination vector and the source vector, for a doubleword operation.
Operation
VMOV copies the contents of the source register into the destination register.

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-674

14 Advanced SIMD Instructions (32-bit)
14.67 VMOV (between two ARM registers and a 64-bit extension register)

14.67

VMOV (between two ARM registers and a 64-bit extension register)
Transfer contents between two ARM registers and a 64-bit extension register.
Syntax
VMOV{cond} Dm, Rd, Rn
VMOV{cond} Rd, Rn, Dm

where:
cond

is an optional condition code.
Dm

is a 64-bit extension register.
Rd, Rn

are the ARM registers. Rd and Rn must not be PC.
Operation
VMOV Dm, Rd, Rn transfers the contents of Rd into the low half of Dm, and the contents of Rn into the
high half of Dm.
VMOV Rd, Rn, Dm transfers the contents of the low half of Dm into Rd, and the contents of the high half of
Dm into Rn.

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-675

14 Advanced SIMD Instructions (32-bit)
14.68 VMOV (between an ARM register and an Advanced SIMD scalar)

14.68

VMOV (between an ARM register and an Advanced SIMD scalar)
Transfer contents between an ARM register and an Advanced SIMD scalar.
Syntax
VMOV{cond}{.size} Dn[x], Rd
VMOV{cond}{.datatype} Rd, Dn[x]

where:
cond

is an optional condition code.
size

the data size. Can be 8, 16, or 32. If omitted, size is 32.
datatype

the data type. Can be U8, S8, U16, S16, or 32. If omitted, datatype is 32.
Dn[x]

is the Advanced SIMD scalar.
Rd

is the ARM register. Rd must not be PC.
Operation
VMOV Dn[x], Rd transfers the contents of the least significant byte, halfword, or word of Rd into Dn[x].
VMOV Rd, Dn[x] transfers the contents of Dn[x] into the least significant byte, halfword, or word of Rd.
The remaining bits of Rd are either zero or sign extended.

Related concepts
9.15 Advanced SIMD scalars on page 9-199.
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-676

14 Advanced SIMD Instructions (32-bit)
14.69 VMOVL

14.69

VMOVL
Vector Move Long.
Syntax
VMOVL{cond}.datatype Qd, Dm

where:
cond

is an optional condition code.
datatype

must be one of S8, S16, S32, U8, U16, or U32.
Qd, Dm

specifies the destination vector and the operand vector.
Operation
VMOVL takes each element in a doubleword vector, sign or zero extends them to twice their original
length, and places the results in a quadword vector.

Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-677

14 Advanced SIMD Instructions (32-bit)
14.70 VMOVN

14.70

VMOVN
Vector Move and Narrow.
Syntax
VMOVN{cond}.datatype Dd, Qm

where:
cond

is an optional condition code.
datatype

must be one of I16, I32, or I64.
Dd, Qm

specifies the destination vector and the operand vector.
Operation
VMOVN copies the least significant half of each element of a quadword vector into the corresponding
elements of a doubleword vector.

Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-678

14 Advanced SIMD Instructions (32-bit)
14.71 VMOV2

14.71

VMOV2
Pseudo-instruction that generates an immediate value and places it in every element of an Advanced
SIMD vector, without loading a value from a literal pool.
Syntax
VMOV2{cond}.datatype Qd, #constant
VMOV2{cond}.datatype Dd, #constant

where:
datatype

must be one of:
• I8, I16, I32, or I64.
• S8, S16, S32, or S64.
• U8, U16, U32, or U64.
• F32.
cond

is an optional condition code.
Qd or Dd

is the extension register to be loaded.
constant

is an immediate value of the appropriate type for datatype.
Operation
VMOV2 can generate any 16-bit immediate value, and a restricted range of 32-bit and 64-bit immediate
values.
VMOV2 is a pseudo-instruction that always assembles to exactly two instructions. It typically assembles to
a VMOV or VMVN instruction, followed by a VBIC or VORR instruction.

Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
14.65 VMOV (immediate) on page 14-673.
14.17 VBIC (immediate) on page 14-622.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-679

14 Advanced SIMD Instructions (32-bit)
14.72 VMRS

14.72

VMRS
Transfer contents from an Advanced SIMD system register to an ARM register.
Syntax
VMRS{cond} Rd, extsysreg

where:
cond

is an optional condition code.
extsysreg

is the Advanced SIMD and floating-point system register, usually FPSCR, FPSID, or FPEXC.
Rd

is the ARM register. Rd must not be PC.
It can be APSR_nzcv, if extsysreg is FPSCR. In this case, the floating-point status flags are
transferred into the corresponding flags in the ARM APSR.
Usage
The VMRS instruction transfers the contents of extsysreg into Rd.
Note
The instruction stalls the processor until all current Advanced SIMD or floating-point operations
complete.

Example
VMRS
VMRS

r2,FPCID
APSR_nzcv, FPSCR

; transfer FP status register to ARM APSR

Related references
9.17 Advanced SIMD system registers in AArch32 state on page 9-201.
7.11 Condition code suffixes on page 7-150.
15.25 VMRS (floating-point) on page 15-777.

ARM DUI0801G

14-680

14 Advanced SIMD Instructions (32-bit)
14.73 VMSR

14.73

VMSR
Transfer contents of an ARM register to an Advanced SIMD system register.
Syntax
VMSR{cond} extsysreg, Rd

where:
cond

is an optional condition code.
extsysreg

is the Advanced SIMD and floating-point system register, usually FPSCR, FPSID, or FPEXC.
Rd

is the ARM register. Rd must not be PC.
It can be APSR_nzcv, if extsysreg is FPSCR. In this case, the floating-point status flags are
transferred into the corresponding flags in the ARM APSR.
Usage
The VMSR instruction transfers the contents of Rd into extsysreg.
Note
The instruction stalls the processor until all current Advanced SIMD operations complete.

Example
VMSR

FPSCR, r4

Related references
9.17 Advanced SIMD system registers in AArch32 state on page 9-201.
7.11 Condition code suffixes on page 7-150.
15.26 VMSR (floating-point) on page 15-778.

ARM DUI0801G

14-681

14 Advanced SIMD Instructions (32-bit)
14.74 VMUL

14.74

VMUL
Vector Multiply.
Syntax
VMUL{cond}.datatype {Qd}, Qn, Qm
VMUL{cond}.datatype {Dd}, Dn, Dm

where:
cond

is an optional condition code.
datatype

must be one of I8, I16, I32, F32, or P8.
Qd, Qn, Qm

are the destination vector, the first operand vector, and the second operand vector, for a
quadword operation.
Dd, Dn, Dm

are the destination vector, the first operand vector, and the second operand vector, for a
doubleword operation.
Operation
VMUL multiplies corresponding elements in two vectors, and places the results in the destination vector.

ARM DUI0801G

14-682

14 Advanced SIMD Instructions (32-bit)
14.75 VMUL (by scalar)

14.75

VMUL (by scalar)
Vector Multiply by scalar.
Syntax
VMUL{cond}.datatype {Qd}, Qn, Dm[x]
VMUL{cond}.datatype {Dd}, Dn, Dm[x]

where:
cond

is an optional condition code.
datatype

must be one of I16, I32, or F32.
Qd, Qn

are the destination vector and the first operand vector, for a quadword operation.
Dd, Dn

are the destination vector and the first operand vector, for a doubleword operation.
Dm[x]

is the scalar holding the second operand.
Operation
VMUL multiplies each element in a vector by a scalar, and places the results in the destination vector.

Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-683

14 Advanced SIMD Instructions (32-bit)
14.76 VMULL

14.76

VMULL
Vector Multiply Long
Syntax
VMULL{cond}.datatype Qd, Dn, Dm

where:
cond

is an optional condition code.
datatype

must be one of U8, U16, U32, S8, S16, S32, or P8.
Qd, Dn, Dm

are the destination vector, the first operand vector, and the second operand vector, for a long
operation.
Operation
VMULL multiplies corresponding elements in two vectors, and places the results in the destination vector.

ARM DUI0801G

14-684

14 Advanced SIMD Instructions (32-bit)
14.77 VMULL (by scalar)

14.77

VMULL (by scalar)
Vector Multiply Long by scalar
Syntax
VMULL{cond}.datatype Qd, Dn, Dm[x]

where:
cond

is an optional condition code.
datatype

must be one of S16, S32, U16, or U32.
Qd, Dn

are the destination vector and the first operand vector, for a long operation.
Dm[x]

is the scalar holding the second operand.
Operation
VMULL multiplies each element in a vector by a scalar, and places the results in the destination vector.

Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-685

14 Advanced SIMD Instructions (32-bit)
14.78 VMVN (register)

14.78

VMVN (register)
Vector Move NOT (register).
Syntax
VMVN{cond}{.datatype} Qd, Qm
VMVN{cond}{.datatype} Dd, Dm

where:
cond

is an optional condition code.
datatype

is an optional datatype. The assembler ignores datatype.
Qd, Qm

specifies the destination vector and the source vector, for a quadword operation.
Dd, Dm

specifies the destination vector and the source vector, for a doubleword operation.
Operation
VMVN inverts the value of each bit from the source register and places the results into the destination
register.

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-686

14 Advanced SIMD Instructions (32-bit)
14.79 VMVN (immediate)

14.79

VMVN (immediate)
Vector Move NOT (immediate).
Syntax
VMVN{cond}.datatype Qd, #imm
VMVN{cond}.datatype Dd, #imm

where:
cond

is an optional condition code.
datatype

must be one of I8, I16, I32, I64, or F32.
Qd or Dd

is the Advanced SIMD register for the result.
imm

is an immediate value of the type specified by datatype. This is replicated to fill the destination
register.
Operation
VMVN inverts the value of each bit from an immediate value and places the results into each element in the
destination register.
Table 14-8 Available immediate values in VMVN (immediate)
datatype imm
I8

I16

0xFFXY, 0xXYFF

I32

0xFFFFFFXY, 0xFFFFXYFF, 0xFFXYFFFF, 0xXYFFFFFF
0xFFFFXY00, 0xFFXY0000

I64

F32

Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-687

14 Advanced SIMD Instructions (32-bit)
14.80 VNEG

14.80

VNEG
Vector Negate.
Syntax
VNEG{cond}.datatype Qd, Qm
VNEG{cond}.datatype Dd, Dm

where:
cond

is an optional condition code.
datatype

must be one of S8, S16, S32, or F32.
Qd, Qm

are the destination vector and the operand vector, for a quadword operation.
Dd, Dm

are the destination vector and the operand vector, for a doubleword operation.
Operation
VNEG negates each element in a vector, and places the results in a second vector. (The floating-point
version only inverts the sign bit.)

Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
15.28 VNEG (floating-point) on page 15-780.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-688

14 Advanced SIMD Instructions (32-bit)
14.81 VORN (register)

14.81

VORN (register)
Vector bitwise OR NOT (register).
Syntax
VORN{cond}{.datatype} {Qd}, Qn, Qm
VORN{cond}{.datatype} {Dd}, Dn, Dm

where:
cond

is an optional condition code.
datatype

is an optional data type. The assembler ignores datatype.
Qd, Qn, Qm

specifies the destination register, the first operand register, and the second operand register, for a
quadword operation.
Dd, Dn, Dm

specifies the destination register, the first operand register, and the second operand register, for a
doubleword operation.
Operation
VORN performs a bitwise logical OR complement between two registers, and places the results in the

destination register.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-689

14 Advanced SIMD Instructions (32-bit)
14.82 VORN (immediate)

14.82

VORN (immediate)
Vector bitwise OR NOT (immediate) pseudo-instruction.
Syntax
VORN{cond}.datatype Qd, #imm
VORN{cond}.datatype Dd, #imm

where:
cond

is an optional condition code.
datatype

must be either I8, I16, I32, or I64.
Qd or Dd

is the Advanced SIMD register for the result.
imm

is the immediate value.
Operation
VORN takes each element of the destination vector, performs a bitwise OR complement with an immediate

value, and returns the results in the destination vector.
Note
On disassembly, this pseudo-instruction is disassembled to a corresponding VORR instruction, with a
complementary immediate value.

Immediate values
If datatype is I16, the immediate value must have one of the following forms:
•
•

0xFFXY.
0xXYFF.

If datatype is I32, the immediate value must have one of the following forms:
• 0xFFFFFFXY.
• 0xFFFFXYFF.
• 0xFFXYFFFF.
• 0xXYFFFFFF.
Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
14.84 VORR (immediate) on page 14-692.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-690

14 Advanced SIMD Instructions (32-bit)
14.83 VORR (register)

14.83

VORR (register)
Vector bitwise OR (register).
Syntax
VORR{cond}{.datatype} {Qd}, Qn, Qm
VORR{cond}{.datatype} {Dd}, Dn, Dm

where:
cond

is an optional condition code.
datatype

is an optional data type. The assembler ignores datatype.
Qd, Qn, Qm

specifies the destination register, the first operand register, and the second operand register, for a
quadword operation.
Dd, Dn, Dm

specifies the destination register, the first operand register, and the second operand register, for a
doubleword operation.
Note
VORR with the same register for both operands is a VMOV instruction. You can use VORR in this way, but
disassembly of the resulting code produces the VMOV syntax.

Operation
VORR performs a bitwise logical OR between two registers, and places the result in the destination
register.

Related references
14.66 VMOV (register) on page 14-674.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-691

14 Advanced SIMD Instructions (32-bit)
14.84 VORR (immediate)

14.84

VORR (immediate)
Vector bitwise OR immediate.
Syntax
VORR{cond}.datatype Qd, #imm
VORR{cond}.datatype Dd, #imm

where:
cond

is an optional condition code.
datatype

must be either I8, I16, I32, or I64.
Qd or Dd

is the Advanced SIMD register for the source and result.
imm

is the immediate value.
Operation
VORR takes each element of the destination vector, performs a bitwise logical OR with an immediate

value, and places the results in the destination vector.
Immediate values
You can either specify imm as a pattern which the assembler repeats to fill the destination register, or you
can directly specify the immediate value (that conforms to the pattern) in full. The pattern for imm
depends on the datatype, as shown in the following table:
Table 14-9 Patterns for immediate value in VORR (immediate)
I16

I32

0x00XY 0x000000XY
0xXY00 0x0000XY00
-

0x00XY0000

0xXY000000

ARM DUI0801G

14-692

14 Advanced SIMD Instructions (32-bit)
14.85 VPADAL

14.85

VPADAL
Vector Pairwise Add and Accumulate Long.
Syntax
VPADAL{cond}.datatype Qd, Qm
VPADAL{cond}.datatype Dd, Dm

where:
cond

is an optional condition code.
datatype

must be one of S8, S16, S32, U8, U16, or U32.
Qd, Qm

are the destination vector and the operand vector, for a quadword instruction.
Dd, Dm

are the destination vector and the operand vector, for a doubleword instruction.
Operation
VPADAL adds adjacent pairs of elements of a vector, and accumulates the absolute values of the results

into the elements of the destination vector.
Dm
+

Dd
Figure 14-3 Example of operation of VPADAL (in this case for data type S16)

Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-693

14 Advanced SIMD Instructions (32-bit)
14.86 VPADD

14.86

VPADD
Vector Pairwise Add.
Syntax
VPADD{cond}.datatype {Dd}, Dn, Dm

where:
cond

is an optional condition code.
datatype

must be one of I8, I16, I32, or F32.
Dd, Dn, Dm

are the destination vector, the first operand vector, and the second operand vector.
Operation
VPADD adds adjacent pairs of elements of two vectors, and places the results in the destination vector.
Dm

Figure 14-4 Example of operation of VPADD (in this case, for data type I16)

Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-694

14 Advanced SIMD Instructions (32-bit)
14.87 VPADDL

14.87

VPADDL
Vector Pairwise Add Long.
Syntax
VPADDL{cond}.datatype Qd, Qm
VPADDL{cond}.datatype Dd, Dm

where:
cond

is an optional condition code.
datatype

must be one of S8, S16, S32, U8, U16, or U32.
Qd, Qm

are the destination vector and the operand vector, for a quadword instruction.
Dd, Dm

are the destination vector and the operand vector, for a doubleword instruction.
Operation
VPADDL adds adjacent pairs of elements of a vector, sign or zero extends the results to twice their original

width, and places the final results in the destination vector.
Dm
+

Figure 14-5 Example of operation of doubleword VPADDL (in this case, for data type S16)

Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-695

14 Advanced SIMD Instructions (32-bit)
14.88 VPMAX and VPMIN

14.88

VPMAX and VPMIN
Vector Pairwise Maximum, Vector Pairwise Minimum.
Syntax
VPop{cond}.datatype Dd, Dn, Dm

where:
op

must be either MAX or MIN.
cond

is an optional condition code.
datatype

must be one of S8, S16, S32, U8, U16, U32, or F32.
Dd, Dn, Dm

are the destination doubleword vector, the first operand doubleword vector, and the second
operand doubleword vector.
Operation
VPMAX compares adjacent pairs of elements in two vectors, and copies the larger of each pair into the

corresponding element in the destination vector. Operands and results must be doubleword vectors.
VPMIN compares adjacent pairs of elements in two vectors, and copies the smaller of each pair into the
corresponding element in the destination vector. Operands and results must be doubleword vectors.

ARM DUI0801G

14-696

14 Advanced SIMD Instructions (32-bit)
14.89 VPOP

14.89

VPOP
Pop extension registers from the stack.
Syntax
VPOP{cond} Registers

where:
cond

is an optional condition code.
Registers

You can use either form of this instruction. They both disassemble to VPOP.

Related concepts
6.16 Stack implementation using LDM and STM on page 6-122.
Related references
7.11 Condition code suffixes on page 7-150.
14.90 VPUSH on page 14-698.
15.32 VPOP (floating-point) on page 15-784.

ARM DUI0801G

14-697

14 Advanced SIMD Instructions (32-bit)
14.90 VPUSH

14.90

VPUSH
Push extension registers onto the stack.
Syntax
VPUSH{cond} Registers

where:
cond

is an optional condition code.
Registers

You can use either form of this instruction. They both disassemble to VPUSH.

Related concepts
6.16 Stack implementation using LDM and STM on page 6-122.
Related references
7.11 Condition code suffixes on page 7-150.
14.89 VPOP on page 14-697.
15.33 VPUSH (floating-point) on page 15-785.

ARM DUI0801G

14-698

14 Advanced SIMD Instructions (32-bit)
14.91 VQABS

14.91

VQABS
Vector Saturating Absolute.
Syntax
VQABS{cond}.datatype Qd, Qm
VQABS{cond}.datatype Dd, Dm

where:
cond

is an optional condition code.
datatype

must be one of S8, S16, or S32.
Qd, Qm

are the destination vector and the operand vector, for a quadword operation.
Dd, Dm

are the destination vector and the operand vector, for a doubleword operation.
Operation
VQABS takes the absolute value of each element in a vector, and places the results in a second vector.

The sticky QC flag (FPSCR bit[27]) is set if saturation occurs.
Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-699

14 Advanced SIMD Instructions (32-bit)
14.92 VQADD

14.92

VQADD
Vector Saturating Add.
Syntax
VQADD{cond}.datatype {Qd}, Qn, Qm
VQADD{cond}.datatype {Dd}, Dn, Dm

where:
cond

is an optional condition code.
datatype

must be one of S8, S16, S32, S64, U8, U16, U32, or U64.
Qd, Qn, Qm

are the destination vector, the first operand vector, and the second operand vector, for a
quadword operation.
Dd, Dn, Dm

are the destination vector, the first operand vector, and the second operand vector, for a
doubleword operation.
Operation
VQADD adds corresponding elements in two vectors, and places the results in the destination vector.

ARM DUI0801G

14-700

14 Advanced SIMD Instructions (32-bit)
14.93 VQDMLAL and VQDMLSL (by vector or by scalar)

14.93

VQDMLAL and VQDMLSL (by vector or by scalar)
Vector Saturating Doubling Multiply Accumulate Long, Vector Saturating Doubling Multiply Subtract
Long.
Syntax
VQDopL{cond}.datatype Qd, Dn, Dm
VQDopL{cond}.datatype Qd, Dn, Dm[x]

where:
op

must be one of:
MLA

Multiply Accumulate.
MLS

Multiply Subtract.
cond

is an optional condition code.
datatype

must be either S16 or S32.
Qd, Dn

are the destination vector and the first operand vector.
Dm

is the vector holding the second operand, for a by vector operation.
Dm[x]

is the scalar holding the second operand, for a by scalar operation.
Operation
These instructions multiply their operands and double the results. VQDMLAL adds the results to the values
in the destination register. VQDMLSL subtracts the results from the values in the destination register.
If any of the results overflow, they are saturated. The sticky QC flag (FPSCR bit[27]) is set if saturation
occurs.
Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-701

14 Advanced SIMD Instructions (32-bit)
14.94 VQDMULH (by vector or by scalar)

14.94

VQDMULH (by vector or by scalar)
Vector Saturating Doubling Multiply Returning High Half.
Syntax
VQDMULH{cond}.datatype {Qd}, Qn, Qm
VQDMULH{cond}.datatype {Dd}, Dn, Dm
VQDMULH{cond}.datatype {Qd}, Qn, Dm[x]
VQDMULH{cond}.datatype {Dd}, Dn, Dm[x]

where:
cond

is an optional condition code.
datatype

must be either S16 or S32.
Qd, Qn

are the destination vector and the first operand vector, for a quadword operation.
Dd, Dn

are the destination vector and the first operand vector, for a doubleword operation.
Qm or Dm

is the vector holding the second operand, for a by vector operation.
Dm[x]

is the scalar holding the second operand, for a by scalar operation.
Operation
VQDMULH multiplies corresponding elements in two vectors, doubles the results, and places the most
significant half of the final results in the destination vector.

The second operand can be a scalar instead of a vector.
If any of the results overflow, they are saturated. The sticky QC flag (FPSCR bit[27]) is set if saturation
occurs. Each result is truncated.
Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-702

14 Advanced SIMD Instructions (32-bit)
14.95 VQDMULL (by vector or by scalar)

14.95

VQDMULL (by vector or by scalar)
Vector Saturating Doubling Multiply Long.
Syntax
VQDMULL{cond}.datatype Qd, Dn, Dm
VQDMULL{cond}.datatype Qd, Dn, Dm[x]

where:
cond

is an optional condition code.
datatype

must be either S16 or S32.
Qd, Dn

are the destination vector and the first operand vector.
Dm

is the vector holding the second operand, for a by vector operation.
Dm[x]

is the scalar holding the second operand, for a by scalar operation.
Operation
VQDMULL multiplies corresponding elements in two vectors, doubles the results and places the results in

the destination register.
The second operand can be a scalar instead of a vector.
If any of the results overflow, they are saturated. The sticky QC flag (FPSCR bit[27]) is set if saturation
occurs.
Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-703

14 Advanced SIMD Instructions (32-bit)
14.96 VQMOVN and VQMOVUN

14.96

VQMOVN and VQMOVUN
Vector Saturating Move and Narrow.
Syntax
VQMOVN{cond}.datatype Dd, Qm
VQMOVUN{cond}.datatype Dd, Qm

where:
cond

is an optional condition code.
datatype

must be one of:
S16, S32, S64
for VQMOVN or VQMOVUN.
U16, U32, U64
for VQMOVN.
Dd, Qm

specifies the destination vector and the operand vector.
Operation
VQMOVN copies each element of the operand vector to the corresponding element of the destination vector.
The result element is half the width of the operand element, and values are saturated to the result width.
The results are the same type as the operands.
VQMOVUN copies each element of the operand vector to the corresponding element of the destination
vector. The result element is half the width of the operand element, and values are saturated to the result
width. The elements in the operand are signed and the elements in the result are unsigned.

Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-704

14 Advanced SIMD Instructions (32-bit)
14.97 VQNEG

14.97

VQNEG
Vector Saturating Negate.
Syntax
VQNEG{cond}.datatype Qd, Qm
VQNEG{cond}.datatype Dd, Dm

where:
cond

is an optional condition code.
datatype

must be one of S8, S16, or S32.
Qd, Qm

are the destination vector and the operand vector, for a quadword operation.
Dd, Dm

are the destination vector and the operand vector, for a doubleword operation.
Operation
VQNEG negates each element in a vector, and places the results in a second vector.

ARM DUI0801G

14-705

14 Advanced SIMD Instructions (32-bit)
14.98 VQRDMULH (by vector or by scalar)

14.98

VQRDMULH (by vector or by scalar)
Vector Saturating Rounding Doubling Multiply Returning High Half.
Syntax
VQRDMULH{cond}.datatype {Qd}, Qn, Qm
VQRDMULH{cond}.datatype {Dd}, Dn, Dm
VQRDMULH{cond}.datatype {Qd}, Qn, Dm[x]
VQRDMULH{cond}.datatype {Dd}, Dn, Dm[x]

where:
cond

is an optional condition code.
datatype

must be either S16 or S32.
Qd, Qn

are the destination vector and the first operand vector, for a quadword operation.
Dd, Dn

are the destination vector and the first operand vector, for a doubleword operation.
Qm or Dm

is the vector holding the second operand, for a by vector operation.
Dm[x]

is the scalar holding the second operand, for a by scalar operation.
Operation
VQRDMULH multiplies corresponding elements in two vectors, doubles the results, and places the most

significant half of the final results in the destination vector.
The second operand can be a scalar instead of a vector.
If any of the results overflow, they are saturated. The sticky QC flag (FPSCR bit[27]) is set if saturation
occurs. Each result is rounded.
Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-706

14 Advanced SIMD Instructions (32-bit)
14.99 VQRSHL (by signed variable)

14.99

VQRSHL (by signed variable)
Vector Saturating Rounding Shift Left by signed variable.
Syntax
VQRSHL{cond}.datatype {Qd}, Qm, Qn
VQRSHL{cond}.datatype {Dd}, Dm, Dn

where:
cond

is an optional condition code.
datatype

must be one of S8, S16, S32, S64, U8, U16, U32, or U64.
Qd, Qm, Qn

are the destination vector, the first operand vector, and the second operand vector, for a
quadword operation.
Dd, Dm, Dn

are the destination vector, the first operand vector, and the second operand vector, for a
doubleword operation.
Operation
VQRSHL takes each element in a vector, shifts them by a value from the least significant byte of the
corresponding element of a second vector, and places the results in the destination vector. If the shift
value is positive, the operation is a left shift. Otherwise, it is a rounding right shift.

ARM DUI0801G

14-707

14 Advanced SIMD Instructions (32-bit)
14.100 VQRSHRN and VQRSHRUN (by immediate)

14.100

VQRSHRN and VQRSHRUN (by immediate)
Vector Saturating Shift Right, Narrow, by immediate value, with Rounding.
Syntax
VQRSHR{U}N{cond}.datatype Dd, Qm, #imm

where:
U

if present, indicates that the results are unsigned, although the operands are signed. Otherwise,
the results are the same type as the operands.
cond

is an optional condition code.
datatype

must be one of:
I16, I32, I64
for VQRSHRN or VQRSHRUN. Only a #0 immediate is permitted with these datatypes.
S16, S32, S64
for VQRSHRN or VQRSHRUN.
U16, U32, U64
for VQRSHRN only.
Dd, Qm

are the destination vector and the operand vector.
imm

is the immediate value specifying the size of the shift. The ranges are shown in the following
table:
Table 14-10 Available immediate ranges in VQRSHRN and VQRSHRUN (by immediate)
datatype

imm range

S16 or U16 0 to 8
S32 or U32 0 to 16
S64 or U64 0 to 32

Operation
VQRSHR{U}N takes each element in a quadword vector of integers, right shifts them by an immediate

value, and places the results in a doubleword vector.
The sticky QC flag (FPSCR bit[27]) is set if saturation occurs.
Results are rounded.
Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-708

14 Advanced SIMD Instructions (32-bit)
14.101 VQSHL (by signed variable)

14.101

VQSHL (by signed variable)
Vector Saturating Shift Left by signed variable.
Syntax
VQSHL{cond}.datatype {Qd}, Qm, Qn
VQSHL{cond}.datatype {Dd}, Dm, Dn

where:
cond

is an optional condition code.
datatype

must be one of S8, S16, S32, S64, U8, U16, U32, or U64.
Qd, Qm, Qn

are the destination vector, the first operand vector, and the second operand vector, for a
quadword operation.
Dd, Dm, Dn

are the destination vector, the first operand vector, and the second operand vector, for a
doubleword operation.
Operation
VQSHL takes each element in a vector, shifts them by a value from the least significant byte of the

corresponding element of a second vector, and places the results in the destination vector. If the shift
value is positive, the operation is a left shift. Otherwise, it is a truncating right shift.
The sticky QC flag (FPSCR bit[27]) is set if saturation occurs.
Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-709

14 Advanced SIMD Instructions (32-bit)
14.102 VQSHL and VQSHLU (by immediate)

14.102

VQSHL and VQSHLU (by immediate)
Vector Saturating Shift Left.
Syntax
VQSHL{U}{cond}.datatype {Qd}, Qm, #imm
VQSHL{U}{cond}.datatype {Dd}, Dm, #imm

where:
U

only permitted if Q is also present. Indicates that the results are unsigned even though the
operands are signed.
cond

is an optional condition code.
datatype

must be one of :
S8, S16, S32, S64
for VQSHL or VQSHLU.
U8, U16, U32, U64
for VQSHL only.
Qd, Qm

are the destination and operand vectors, for a quadword operation.
Dd, Dm

are the destination and operand vectors, for a doubleword operation.
imm

is the immediate value specifying the size of the shift, in the range 0 to (size(datatype) – 1).
The ranges are shown in the following table:
Table 14-11 Available immediate ranges in VQSHL and VQSHLU (by immediate)
datatype

imm range

S8 or U8

0 to 7

S16 or U16 0 to 15
S32 or U32 0 to 31
S64 or U64 0 to 63

Operation
VQSHL and VQSHLU instructions take each element in a vector of integers, left shift them by an immediate

value, and place the results in the destination vector.
The sticky QC flag (FPSCR bit[27]) is set if saturation occurs.
Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-710

14 Advanced SIMD Instructions (32-bit)
14.103 VQSHRN and VQSHRUN (by immediate)

14.103

VQSHRN and VQSHRUN (by immediate)
Vector Saturating Shift Right, Narrow, by immediate value.
Syntax
VQSHR{U}N{cond}.datatype Dd, Qm, #imm

where:
U

if present, indicates that the results are unsigned, although the operands are signed. Otherwise,
the results are the same type as the operands.
cond

is an optional condition code.
datatype

must be one of:
I16, I32, I64
for VQSHRN or VQSHRUN. Only a #0 immediate is permitted with these datatypes.
S16, S32, S64
for VQSHRN or VQSHRUN.
U16, U32, U64
for VQSHRN only.
Dd, Qm

are the destination vector and the operand vector.
imm

is the immediate value specifying the size of the shift. The ranges are shown in the following
table:
Table 14-12 Available immediate ranges in VQSHRN and VQSHRUN (by immediate)
datatype

imm range

S16 or U16 0 to 8
S32 or U32 0 to 16
S64 or U64 0 to 32

Operation
VQSHR{U}N takes each element in a quadword vector of integers, right shifts them by an immediate value,

and places the results in a doubleword vector.
The sticky QC flag (FPSCR bit[27]) is set if saturation occurs.
Results are truncated.
Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-711

14 Advanced SIMD Instructions (32-bit)
14.104 VQSUB

14.104

VQSUB
Vector Saturating Subtract.
Syntax
VQSUB{cond}.datatype {Qd}, Qn, Qm
VQSUB{cond}.datatype {Dd}, Dn, Dm

where:
cond

is an optional condition code.
datatype

must be one of S8, S16, S32, S64, U8, U16, U32, or U64.
Qd, Qn, Qm

are the destination vector, the first operand vector, and the second operand vector, for a
quadword operation.
Dd, Dn, Dm

are the destination vector, the first operand vector, and the second operand vector, for a
doubleword operation.
Operation
VQSUB subtracts the elements of one vector from the corresponding elements of another vector, and

places the results in the destination vector.
The sticky QC flag (FPSCR bit[27]) is set if saturation occurs.
Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-712

14 Advanced SIMD Instructions (32-bit)
14.105 VRADDHN

14.105

VRADDHN
Vector Rounding Add and Narrow, selecting High half.
Syntax
VRADDHN{cond}.datatype Dd, Qn, Qm

where:
cond

is an optional condition code.
datatype

must be one of I16, I32, or I64.
Dd, Qn, Qm

are the destination vector, the first operand vector, and the second operand vector.
Operation
VRADDHN adds corresponding elements in two quadword vectors, selects the most significant halves of the

results, and places the final results in the destination doubleword vector. Results are rounded.
Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-713

14 Advanced SIMD Instructions (32-bit)
14.106 VRECPE

14.106

VRECPE
Vector Reciprocal Estimate.
Syntax
VRECPE{cond}.datatype Qd, Qm
VRECPE{cond}.datatype Dd, Dm

where:
cond

is an optional condition code.
datatype

must be either U32 or F32.
Qd, Qm

are the destination vector and the operand vector, for a quadword operation.
Dd, Dm

are the destination vector and the operand vector, for a doubleword operation.
Operation
VRECPE finds an approximate reciprocal of each element in a vector, and places the results in a second

vector.
Results for out-of-range inputs
The following table shows the results where input values are out of range:
Table 14-13 Results for out-of-range inputs in VRECPE

Integer

Operand element

Result element

<= 0x7FFFFFFF

0xFFFFFFFF

Floating-point NaN

Default NaN

Negative 0, Negative Denormal Negative Infinity ao
Positive 0, Positive Denormal

Positive Infinity ao

Positive infinity

Positive 0

Negative infinity

Negative 0

Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

The Division by Zero exception bit in the FPSCR (FPSCR[1]) is set

ARM DUI0801G

14-714

14 Advanced SIMD Instructions (32-bit)
14.107 VRECPS

14.107

VRECPS
Vector Reciprocal Step.
Syntax
VRECPS{cond}.F32 {Qd}, Qn, Qm
VRECPS{cond}.F32 {Dd}, Dn, Dm

where:
cond

is an optional condition code.
Qd, Qn, Qm

are the destination vector, the first operand vector, and the second operand vector, for a
quadword operation.
Dd, Dn, Dm

are the destination vector, the first operand vector, and the second operand vector, for a
doubleword operation.
Operation
VRECPS multiplies the elements of one vector by the corresponding elements of another vector, subtracts
each of the results from 2, and places the final results into the elements of the destination vector.

The Newton-Raphson iteration:
xn+1 = xn (2-dxn)

converges to (1/d) if x0 is the result of VRECPE applied to d.
Results for out-of-range inputs
The following table shows the results where input values are out of range:
Table 14-14 Results for out-of-range inputs in VRECPS
1st operand element 2nd operand element Result element
NaN

Default NaN

NaN

Default NaN

+/– 0.0 or denormal

+/– infinity

2.0

+/– infinity

+/– 0.0 or denormal

2.0

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-715

14 Advanced SIMD Instructions (32-bit)
14.108 VREV16, VREV32, and VREV64

14.108

VREV16, VREV32, and VREV64
Vector Reverse within halfwords, words, or doublewords.
Syntax
VREVn{cond}.size Qd, Qm
VREVn{cond}.size Dd, Dm

where:
n

must be one of 16, 32, or 64.
cond

is an optional condition code.
size

must be one of 8, 16, or 32, and must be less than n.
Qd, Qm

specifies the destination vector and the operand vector, for a quadword operation.
Dd, Dm

specifies the destination vector and the operand vector, for a doubleword operation.
Operation
VREV16 reverses the order of 8-bit elements within each halfword of the vector, and places the result in

the corresponding destination vector.
VREV32 reverses the order of 8-bit or 16-bit elements within each word of the vector, and places the result

in the corresponding destination vector.
VREV64 reverses the order of 8-bit, 16-bit, or 32-bit elements within each doubleword of the vector, and
places the result in the corresponding destination vector.

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-716

14 Advanced SIMD Instructions (32-bit)
14.109 VRHADD

14.109

VRHADD
Vector Rounding Halving Add.
Syntax
VRHADD{cond}.datatype {Qd}, Qn, Qm
VRHADD{cond}.datatype {Dd}, Dn, Dm

where:
cond

is an optional condition code.
datatype

must be one of S8, S16, S32, U8, U16, or U32.
Qd, Qn, Qm

are the destination vector, the first operand vector, and the second operand vector, for a
quadword operation.
Dd, Dn, Dm

are the destination vector, the first operand vector, and the second operand vector, for a
doubleword operation.
Operation
VRHADD adds corresponding elements in two vectors, shifts each result right one bit, and places the results

in the destination vector. Results are rounded.
Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-717

14 Advanced SIMD Instructions (32-bit)
14.110 VRSHL (by signed variable)

14.110

VRSHL (by signed variable)
Vector Rounding Shift Left by signed variable.
Syntax
VRSHL{cond}.datatype {Qd}, Qm, Qn
VRSHL{cond}.datatype {Dd}, Dm, Dn

where:
cond

is an optional condition code.
datatype

must be one of S8, S16, S32, S64, U8, U16, U32, or U64.
Qd, Qm, Qn

are the destination vector, the first operand vector, and the second operand vector, for a
quadword operation.
Dd, Dm, Dn

are the destination vector, the first operand vector, and the second operand vector, for a
doubleword operation.
Operation
VRSHL takes each element in a vector, shifts them by a value from the least significant byte of the
corresponding element of a second vector, and places the results in the destination vector. If the shift
value is positive, the operation is a left shift. Otherwise, it is a rounding right shift.

Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-718

14 Advanced SIMD Instructions (32-bit)
14.111 VRSHR (by immediate)

14.111

VRSHR (by immediate)
Vector Rounding Shift Right by immediate value.
Syntax
VRSHR{cond}.datatype {Qd}, Qm, #imm
VRSHR{cond}.datatype {Dd}, Dm, #imm

where:
cond

is an optional condition code.
datatype

must be one of S8, S16, S32, S64, U8, U16, U32, or U64.
Qd, Qm

are the destination vector and the operand vector, for a quadword operation.
Dd, Dm

are the destination vector and the operand vector, for a doubleword operation.
imm

is the immediate value specifying the size of the shift, in the range 0 to (size(datatype)). The
ranges are shown in the following table:
Table 14-15 Available immediate ranges in VRSHR (by immediate)
datatype

imm range

S8 or U8

0 to 8

S16 or U16 0 to 16
S32 or U32 0 to 32
S64 or U64 0 to 64
VRSHR with an immediate value of zero is a pseudo-instruction for VORR.

Operation
VRSHR takes each element in a vector, right shifts them by an immediate value, and places the results in

the destination vector. The results are rounded.
Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
14.83 VORR (register) on page 14-691.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-719

14 Advanced SIMD Instructions (32-bit)
14.112 VRSHRN (by immediate)

14.112

VRSHRN (by immediate)
Vector Rounding Shift Right, Narrow, by immediate value.
Syntax
VRSHRN{cond}.datatype Dd, Qm, #imm

where:
cond

is an optional condition code.
datatype

must be one of I16, I32, or I64.
Dd, Qm

are the destination vector and the operand vector.
imm

is the immediate value specifying the size of the shift, in the range 0 to (size(datatype)/2). The
ranges are shown in the following table:
Table 14-16 Available immediate ranges in VRSHRN (by immediate)
datatype imm range
I16

0 to 8

I32

0 to 16

I64

0 to 32

VRSHRN with an immediate value of zero is a pseudo-instruction for VMOVN.

Operation
VRSHRN takes each element in a quadword vector, right shifts them by an immediate value, and places the

results in a doubleword vector. The results are rounded.
Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
14.70 VMOVN on page 14-678.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-720

14 Advanced SIMD Instructions (32-bit)
14.113 VRINT

14.113

VRINT
VRINT (Vector Round to Integer) rounds each floating-point element in a vector to integer, and places the
results in the destination vector.

The resulting integers are represented in floating-point format.
Note
This instruction is supported only in ARMv8.

Syntax
VRINTmode.F32.F32 Qd, Qm
VRINTmode.F32.F32 Dd, Dm

where:
mode

must be one of:
A

meaning round to nearest, ties away from zero. This cannot generate an Inexact
exception, even if the result is not exact.
N

meaning round to nearest, ties to even. This cannot generate an Inexact exception, even
if the result is not exact.
X

meaning round to nearest, ties to even, generating an Inexact exception if the result is
not exact.
P

meaning round towards plus infinity. This cannot generate an Inexact exception, even if
the result is not exact.
M

meaning round towards minus infinity. This cannot generate an Inexact exception, even
if the result is not exact.
Z

meaning round towards zero. This cannot generate an Inexact exception, even if the
result is not exact.
Qd, Qm

specifies the destination vector and the operand vector, for a quadword operation.
Dd, Dm

specifies the destination and operand vectors, for a doubleword operation.
Notes
You cannot use VRINT inside an IT block.

ARM DUI0801G

14-721

14 Advanced SIMD Instructions (32-bit)
14.114 VRSQRTE

14.114

VRSQRTE
Vector Reciprocal Square Root Estimate.
Syntax
VRSQRTE{cond}.datatype Qd, Qm
VRSQRTE{cond}.datatype Dd, Dm

where:
cond

is an optional condition code.
datatype

must be either U32 or F32.
Qd, Qm

are the destination vector and the operand vector, for a quadword operation.
Dd, Dm

are the destination vector and the operand vector, for a doubleword operation.
Operation
VRSQRTE finds an approximate reciprocal square root of each element in a vector, and places the results in

a second vector.
Results for out-of-range inputs
The following table shows the results where input values are out of range:
Table 14-17 Results for out-of-range inputs in VRSQRTE

Integer

Operand element

Result element

<= 0x3FFFFFFF

0xFFFFFFFF

Floating-point NaN, Negative Normal, Negative Infinity Default NaN
Negative 0, Negative Denormal

Negative Infinity ap

Positive 0, Positive Denormal

Positive Infinity ap

Positive infinity

Positive 0
Negative 0

Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

The Division by Zero exception bit in the FPSCR (FPSCR[1]) is set

ARM DUI0801G

14-722

14 Advanced SIMD Instructions (32-bit)
14.115 VRSQRTS

14.115

VRSQRTS
Vector Reciprocal Square Root Step.
Syntax
VRSQRTS{cond}.F32 {Qd}, Qn, Qm
VRSQRTS{cond}.F32 {Dd}, Dn, Dm

where:
cond

is an optional condition code.
Qd, Qn, Qm

are the destination vector, the first operand vector, and the second operand vector, for a
quadword operation.
Dd, Dn, Dm

are the destination vector, the first operand vector, and the second operand vector, for a
doubleword operation.
Operation
VRSQRTS multiplies the elements of one vector by the corresponding elements of another vector, subtracts
each of the results from three, divides these results by two, and places the final results into the elements
of the destination vector.

The Newton-Raphson iteration:
xn+1 = xn (3-dxn2)/2

converges to (1/√d) if x0 is the result of VRSQRTE applied to d.
Results for out-of-range inputs
The following table shows the results where input values are out of range:
Table 14-18 Results for out-of-range inputs in VRSQRTS
1st operand element 2nd operand element Result element
NaN

Default NaN

NaN

Default NaN

+/– 0.0 or denormal

+/– infinity

1.5

+/– infinity

+/– 0.0 or denormal

1.5

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-723

14 Advanced SIMD Instructions (32-bit)
14.116 VRSRA (by immediate)

14.116

VRSRA (by immediate)
Vector Rounding Shift Right by immediate value and Accumulate.
Syntax
VRSRA{cond}.datatype {Qd}, Qm, #imm
VRSRA{cond}.datatype {Dd}, Dm, #imm

where:
cond

is an optional condition code.
datatype

must be one of S8, S16, S32, S64, U8, U16, U32, or U64.
Qd, Qm

are the destination vector and the operand vector, for a quadword operation.
Dd, Dm

are the destination vector and the operand vector, for a doubleword operation.
imm

is the immediate value specifying the size of the shift, in the range 1 to (size(datatype)). The
ranges are shown in the following table:
Table 14-19 Available immediate ranges in VRSRA (by immediate)
datatype

imm range

S8 or U8

1 to 8

S16 or U16 1 to 16
S32 or U32 1 to 32
S64 or U64 1 to 64

Operation
VRSRA takes each element in a vector, right shifts them by an immediate value, and accumulates the
results into the destination vector. The results are rounded.

Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-724

14 Advanced SIMD Instructions (32-bit)
14.117 VRSUBHN

14.117

VRSUBHN
Vector Rounding Subtract and Narrow, selecting High half.
Syntax
VRSUBHN{cond}.datatype Dd, Qn, Qm

where:
cond

is an optional condition code.
datatype

must be one of I16, I32, or I64.
Dd, Qn, Qm

are the destination vector, the first operand vector, and the second operand vector.
Operation
VRSUBHN subtracts the elements of one quadword vector from the corresponding elements of another
quadword vector, selects the most significant halves of the results, and places the final results in the
destination doubleword vector. Results are rounded.

Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-725

14 Advanced SIMD Instructions (32-bit)
14.118 VSHL (by immediate)

14.118

VSHL (by immediate)
Vector Shift Left by immediate.
Syntax
VSHL{cond}.datatype {Qd}, Qm, #imm
VSHL{cond}.datatype {Dd}, Dm, #imm

where:
cond

is an optional condition code.
datatype

must be one of I8, I16, I32, or I64.
Qd, Qm

are the destination and operand vectors, for a quadword operation.
Dd, Dm

are the destination and operand vectors, for a doubleword operation.
imm

is the immediate value specifying the size of the shift. The ranges are shown in the following
table:
Table 14-20 Available immediate ranges in VSHL (by immediate)
datatype imm range
I8

0 to 7

I16

0 to 15

I32

0 to 31

I64

0 to 63

Operation
VSHL takes each element in a vector of integers, left shifts them by an immediate value, and places the
results in the destination vector.

Bits shifted out of the left of each element are lost.
The following figure shows the operation of VSHL with two elements and a shift value of one. The least
significant bit in each element in the destination vector is set to zero.
Element 1

Element 0

...

Figure 14-6 Operation of quadword VSHL.I64 Qd, Qm, #1

Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-726

14 Advanced SIMD Instructions (32-bit)
14.119 VSHL (by signed variable)

14.119

VSHL (by signed variable)
Vector Shift Left by signed variable.
Syntax
VSHL{cond}.datatype {Qd}, Qm, Qn
VSHL{cond}.datatype {Dd}, Dm, Dn

where:
cond

is an optional condition code.
datatype

must be one of S8, S16, S32, S64, U8, U16, U32, or U64.
Qd, Qm, Qn

are the destination vector, the first operand vector, and the second operand vector, for a
quadword operation.
Dd, Dm, Dn

are the destination vector, the first operand vector, and the second operand vector, for a
doubleword operation.
Operation
VSHL takes each element in a vector, shifts them by the value from the least significant byte of the
corresponding element of a second vector, and places the results in the destination vector. If the shift
value is positive, the operation is a left shift. Otherwise, it is a truncating right shift.

Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-727

14 Advanced SIMD Instructions (32-bit)
14.120 VSHLL (by immediate)

14.120

VSHLL (by immediate)
Vector Shift Left Long.
Syntax
VSHLL{cond}.datatype Qd, Dm, #imm

where:
cond

is an optional condition code.
datatype

must be one of S8, S16, S32, U8, U16, or U32.
Qd, Dm

are the destination and operand vectors, for a long operation.
imm

is the immediate value specifying the size of the shift. The ranges are shown in the following
table:
Table 14-21 Available immediate ranges in VSHLL (by immediate)
datatype

imm range

S8 or U8

1 to 8

S16 or U16 1 to 16
S32 or U32 1 to 32

0 is permitted, but the resulting code disassembles to VMOVL.
Operation
VSHLL takes each element in a vector of integers, left shifts them by an immediate value, and places the
results in the destination vector. Values are sign or zero extended.

Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-728

14 Advanced SIMD Instructions (32-bit)
14.121 VSHR (by immediate)

14.121

VSHR (by immediate)
Vector Shift Right by immediate value.
Syntax
VSHR{cond}.datatype {Qd}, Qm, #imm
VSHR{cond}.datatype {Dd}, Dm, #imm

where:
cond

is an optional condition code.
datatype

must be one of S8, S16, S32, S64, U8, U16, U32, or U64.
Qd, Qm

are the destination vector and the operand vector, for a quadword operation.
Dd, Dm

are the destination vector and the operand vector, for a doubleword operation.
imm

is the immediate value specifying the size of the shift. The ranges are shown in the following
table:
Table 14-22 Available immediate ranges in VSHR (by immediate)
datatype

imm range

S8 or U8

0 to 8

S16 or U16 0 to 16
S32 or U32 0 to 32
S64 or U64 0 to 64
VSHR with an immediate value of zero is a pseudo-instruction for VORR.

Operation
VSHR takes each element in a vector, right shifts them by an immediate value, and places the results in the
destination vector. The results are truncated.

Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
14.83 VORR (register) on page 14-691.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-729

14 Advanced SIMD Instructions (32-bit)
14.122 VSHRN (by immediate)

14.122

VSHRN (by immediate)
Vector Shift Right, Narrow, by immediate value.
Syntax
VSHRN{cond}.datatype Dd, Qm, #imm

where:
cond

is an optional condition code.
datatype

must be one of I16, I32, or I64.
Dd, Qm

are the destination vector and the operand vector.
imm

is the immediate value specifying the size of the shift. The ranges are shown in the following
table:
Table 14-23 Available immediate ranges in VSHRN (by immediate)
datatype imm range
I16

0 to 8

I32

0 to 16

I64

0 to 32

VSHRN with an immediate value of zero is a pseudo-instruction for VMOVN.

Operation
VSHRN takes each element in a quadword vector, right shifts them by an immediate value, and places the

results in a doubleword vector. The results are truncated.
Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
14.70 VMOVN on page 14-678.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-730

14 Advanced SIMD Instructions (32-bit)
14.123 VSLI

14.123

VSLI
Vector Shift Left and Insert.
Syntax
VSLI{cond}.size {Qd}, Qm, #imm
VSLI{cond}.size {Dd}, Dm, #imm

where:
cond

is an optional condition code.
size

must be one of 8, 16, 32, or 64.
Qd, Qm

are the destination vector and the operand vector, for a quadword operation.
Dd, Dm

are the destination vector and the operand vector, for a doubleword operation.
imm

is the immediate value specifying the size of the shift, in the range 0 to (size – 1).
Operation
VSLI takes each element in a vector, left shifts them by an immediate value, and inserts the results in the

destination vector. Bits shifted out of the left of each element are lost. The following figure shows the
operation of VSLI with two elements and a shift value of one. The least significant bit in each element in
the destination vector is unchanged.
Element 1

Element 0

...

Qd
Unchanged
bit

Unchanged
bit

Figure 14-7 Operation of quadword VSLI.64 Qd, Qm, #1

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-731

14 Advanced SIMD Instructions (32-bit)
14.124 VSRA (by immediate)

14.124

VSRA (by immediate)
Vector Shift Right by immediate value and Accumulate.
Syntax
VSRA{cond}.datatype {Qd}, Qm, #imm
VSRA{cond}.datatype {Dd}, Dm, #imm

where:
cond

is an optional condition code.
datatype

must be one of S8, S16, S32, S64, U8, U16, U32, or U64.
Qd, Qm

are the destination vector and the operand vector, for a quadword operation.
Dd, Dm

are the destination vector and the operand vector, for a doubleword operation.
imm

is the immediate value specifying the size of the shift. The ranges are shown in the following
table:
Table 14-24 Available immediate ranges in VSRA (by immediate)
datatype

imm range

S8 or U8

1 to 8

S16 or U16 1 to 16
S32 or U32 1 to 32
S64 or U64 1 to 64

Operation
VSRA takes each element in a vector, right shifts them by an immediate value, and accumulates the results
into the destination vector. The results are truncated.

Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-732

14 Advanced SIMD Instructions (32-bit)
14.125 VSRI

14.125

VSRI
Vector Shift Right and Insert.
Syntax
VSRI{cond}.size {Qd}, Qm, #imm
VSRI{cond}.size {Dd}, Dm, #imm

where:
cond

is an optional condition code.
size

must be one of 8, 16, 32, or 64.
Qd, Qm

are the destination vector and the operand vector, for a quadword operation.
Dd, Dm

are the destination vector and the operand vector, for a doubleword operation.
imm

is the immediate value specifying the size of the shift, in the range 1 to size.
Operation
VSRI takes each element in a vector, right shifts them by an immediate value, and inserts the results in the
destination vector. Bits shifted out of the right of each element are lost. The following figure shows the
operation of VSRI with a single element and a shift value of two. The two most significant bits in the
destination vector are unchanged.
Element 0

Dm
...

...

Dd
Unchanged
bits

Figure 14-8 Operation of doubleword VSRI.64 Dd, Dm, #2

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-733

14 Advanced SIMD Instructions (32-bit)
14.126 VSTM

14.126

VSTM
Extension register store multiple.
Syntax
VSTMmode{cond} Rn{!}, Registers

where:
mode

must be one of:
IA

meaning Increment address After each transfer. IA is the default, and can be omitted.
DB

meaning Decrement address Before each transfer.
EA

meaning Empty Ascending stack operation. This is the same as IA for stores.
FD

meaning Full Descending stack operation. This is the same as DB for stores.
cond

is an optional condition code.
Rn

is the ARM register holding the base address for the transfer.
!

is optional. ! specifies that the updated base address must be written back to Rn. If ! is not
specified, mode must be IA.
Registers

You can use either form of this instruction. They both disassemble to VPUSH.

Related concepts
6.16 Stack implementation using LDM and STM on page 6-122.
Related references
7.11 Condition code suffixes on page 7-150.
15.37 VSTM (floating-point) on page 15-789.

ARM DUI0801G

14-734

14 Advanced SIMD Instructions (32-bit)
14.127 VSTn (multiple n-element structures)

14.127

VSTn (multiple n-element structures)
Vector Store multiple n-element structures.
Syntax
VSTn{cond}.datatype list, [Rn{@align}]{!}
VSTn{cond}.datatype list, [Rn{@align}], Rm

where:
n

must be one of 1, 2, 3, or 4.
cond

is an optional condition code.
datatype

see the following table for options.
list

is the list of Advanced SIMD registers enclosed in braces, { and }. See the following table for
options.
Rn

is the ARM register containing the base address. Rn cannot be PC.
align

specifies an optional alignment. See the following table for options.
!

if ! is present, Rn is updated to (Rn + the number of bytes transferred by the instruction). The
update occurs after all the stores have taken place.
Rm

is an ARM register containing an offset from the base address. If Rm is present, the instruction
updates Rn to (Rn + Rm) after using the address to access memory. Rm cannot be SP or PC.
Operation
VSTn stores multiple n-element structures to memory from one or more Advanced SIMD registers, with
interleaving (unless n == 1). Every element of each register is stored.
Table 14-25 Permitted combinations of parameters for VSTn (multiple n-element structures)
n datatype

list aq

align ar

alignment

@64

8-byte

{Dd, D(d+1)}

@64 or @128

8-byte or 16-byte

{Dd, D(d+1), D(d+2)}

@64

8-byte

1 8, 16, 32, or 64 {Dd}

{Dd, D(d+1), D(d+2), D(d+3)} @64, @128, or @256 8-byte, 16-byte, or 32-byte
2 8, 16, or 32

{Dd, D(d+1)}

@64, @128

8-byte or 16-byte

{Dd, D(d+2)}

@64, @128

8-byte or 16-byte

{Dd, D(d+1), D(d+2), D(d+3)} @64, @128, or @256 8-byte, 16-byte, or 32-byte
3 8, 16, or 32

aq
ar

{Dd, D(d+1), D(d+2)}

@64

8-byte

{Dd, D(d+2), D(d+4)}

@64

8-byte

Every register in the list must be in the range D0-D31.
align can be omitted. In this case, standard alignment rules apply.

ARM DUI0801G

14-735

14 Advanced SIMD Instructions (32-bit)
14.127 VSTn (multiple n-element structures)

Table 14-25 Permitted combinations of parameters for VSTn (multiple n-element structures) (continued)
n datatype

list aq

align ar

4 8, 16, or 32

{Dd, D(d+1), D(d+2), D(d+3)} @64, @128, or @256 8-byte, 16-byte, or 32-byte

alignment

{Dd, D(d+2), D(d+4), D(d+6)} @64, @128, or @256 8-byte, 16-byte, or 32-byte

Related concepts
14.4 Interleaving provided by load and store element and structure instructions on page 14-609.
14.5 Alignment restrictions in load and store element and structure instructions on page 14-610.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-736

14 Advanced SIMD Instructions (32-bit)
14.128 VSTn (single n-element structure to one lane)

14.128

VSTn (single n-element structure to one lane)
Vector Store single n-element structure to one lane.
Syntax
VSTn{cond}.datatype list, [Rn{@align}]{!}
VSTn{cond}.datatype list, [Rn{@align}], Rm

where:
n

must be one of 1, 2, 3, or 4.
cond

is an optional condition code.
datatype

see the following table.
list

is the list of Advanced SIMD registers enclosed in braces, { and }. See the following table for
options.
Rn

is the ARM register containing the base address. Rn cannot be PC.
align

specifies an optional alignment. See the following table for options.
!

if ! is present, Rn is updated to (Rn + the number of bytes transferred by the instruction). The
update occurs after all the stores have taken place.
Rm

is an ARM register containing an offset from the base address. If Rm is present, the instruction
updates Rn to (Rn + Rm) after using the address to access memory. Rm cannot be SP or PC.
Operation
VSTn stores one n-element structure into memory from one or more Advanced SIMD registers.
Table 14-26 Permitted combinations of parameters for VSTn (single n-element structure to one lane)
n datatype list as

align at

alignment

1 8

{Dd[x]}

Standard only

{Dd[x]}

@16

2-byte

{Dd[x]}

@32

4-byte

{Dd[x], D(d+1)[x]}

@16

2-byte

{Dd[x], D(d+1)[x]}

@32

4-byte

{Dd[x], D(d+2)[x]}

@32

4-byte

{Dd[x], D(d+1)[x]}

@64

8-byte

{Dd[x], D(d+2)[x]}

@64

8-byte

{Dd[x], D(d+1)[x], D(d+2)[x]}

Standard only

{Dd[x], D(d+1)[x], D(d+2)[x]}

Standard only

{Dd[x], D(d+2)[x], D(d+4)[x]}

Standard only

2 8
16

3 8
16 or 32

as
at

Every register in the list must be in the range D0-D31.
align can be omitted. In this case, standard alignment rules apply.

ARM DUI0801G

14-737

14 Advanced SIMD Instructions (32-bit)
14.128 VSTn (single n-element structure to one lane)

Table 14-26 Permitted combinations of parameters for VSTn (single n-element structure to one lane) (continued)
n datatype list as
4 8
16

align at

alignment

{Dd[x], D(d+1)[x], D(d+2)[x], D(d+3)[x]} @32

4-byte

{Dd[x], D(d+1)[x], D(d+2)[x], D(d+3)[x]} @64

8-byte

{Dd[x], D(d+2)[x], D(d+4)[x], D(d+6)[x]} @64

8-byte

{Dd[x], D(d+1)[x], D(d+2)[x], D(d+3)[x]} @64 or @128 8-byte or 16-byte
{Dd[x], D(d+2)[x], D(d+4)[x], D(d+6)[x]} @64 or @128 8-byte or 16-byte

ARM DUI0801G

14-738

14 Advanced SIMD Instructions (32-bit)
14.129 VSTR

14.129

VSTR
Extension register store.
Syntax
VSTR{cond}{.64} Dd, [Rn{, #offset}]

where:
cond

is an optional condition code.
Dd

is the extension register to be saved.
Rn

is the ARM register holding the base address for the transfer.
offset

is an optional numeric expression. It must evaluate to a numeric value at assembly time. The
value must be a multiple of 4, and lie in the range –1020 to +1020. The value is added to the
base address to form the address used for the transfer.
Operation
The VSTR instruction saves the contents of an extension register to memory.
Two words are transferred.
Related concepts
12.5 Register-relative and PC-relative expressions on page 12-302.
Related references
7.11 Condition code suffixes on page 7-150.
15.38 VSTR (floating-point) on page 15-790.

ARM DUI0801G

14-739

14 Advanced SIMD Instructions (32-bit)
14.130 VSTR (post-increment and pre-decrement)

14.130

VSTR (post-increment and pre-decrement)
Pseudo-instruction that stores extension registers with post-increment and pre-decrement forms.
Note
There are also VLDR and VSTR instructions without post-increment and pre-decrement.

Syntax
VSTR{cond}{.64} Dd, [Rn], #offset ; post-increment
VSTR{cond}{.64} Dd, [Rn, #-offset]! ; pre-decrement

where:
cond

is an optional condition code.
Dd

is the extension register to be saved.
Rn

is the ARM register holding the base address for the transfer.
offset

is a numeric expression that must evaluate to 8 at assembly time.
Operation
The post-increment instruction increments the base address in the register by the offset value, after the
transfer. The pre-decrement instruction decrements the base address in the register by the offset value,
and then performs the transfer using the new address in the register. This pseudo-instruction assembles to
a VSTM instruction.
Related references
14.129 VSTR on page 14-739.
14.126 VSTM on page 14-734.
7.11 Condition code suffixes on page 7-150.
15.39 VSTR (post-increment and pre-decrement, floating-point) on page 15-791.

ARM DUI0801G

14-740

14 Advanced SIMD Instructions (32-bit)
14.131 VSUB

14.131

VSUB
Vector Subtract.
Syntax
VSUB{cond}.datatype {Qd}, Qn, Qm
VSUB{cond}.datatype {Dd}, Dn, Dm

where:
cond

is an optional condition code.
datatype

must be one of I8, I16, I32, I64, or F32.
Qd, Qn, Qm

are the destination vector, the first operand vector, and the second operand vector, for a
quadword operation.
Operation
VSUB subtracts the elements of one vector from the corresponding elements of another vector, and places
the results in the destination vector.

Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-741

14 Advanced SIMD Instructions (32-bit)
14.132 VSUBHN

14.132

VSUBHN
Vector Subtract and Narrow, selecting High half.
Syntax
VSUBHN{cond}.datatype Dd, Qn, Qm

where:
cond

is an optional condition code.
datatype

must be one of I16, I32, or I64.
Dd, Qn, Qm

are the destination vector, the first operand vector, and the second operand vector.
Operation
VSUBHN subtracts the elements of one quadword vector from the corresponding elements of another

quadword vector, selects the most significant halves of the results, and places the final results in the
destination doubleword vector. Results are truncated.
Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-742

14 Advanced SIMD Instructions (32-bit)
14.133 VSUBL and VSUBW

14.133

VSUBL and VSUBW
Vector Subtract Long, Vector Subtract Wide.
Syntax
VSUBL{cond}.datatype Qd, Dn, Dm ; Long operation
VSUBW{cond}.datatype {Qd}, Qn, Dm ; Wide operation

where:
cond

is an optional condition code.
datatype

must be one of S8, S16, S32, U8, U16, or U32.
Qd, Dn, Dm

are the destination vector, the first operand vector, and the second operand vector, for a long
operation.
Qd, Qn, Dm

are the destination vector, the first operand vector, and the second operand vector, for a wide
operation.
Operation
VSUBL subtracts the elements of one doubleword vector from the corresponding elements of another

doubleword vector, and places the results in the destination quadword vector.
VSUBW subtracts the elements of a doubleword vector from the corresponding elements of a quadword
vector, and places the results in the destination quadword vector.

Related concepts
9.10 Advanced SIMD data types in A32/T32 instructions on page 9-194.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-743

14 Advanced SIMD Instructions (32-bit)
14.134 VSWP

14.134

VSWP
Vector Swap.
Syntax
VSWP{cond}{.datatype} Qd, Qm
VSWP{cond}{.datatype} Dd, Dm

where:
cond

is an optional condition code.
datatype

is an optional datatype. The assembler ignores datatype.
Qd, Qm

specifies the vectors for a quadword operation.
Dd, Dm

specifies the vectors for a doubleword operation.
Operation
VSWP exchanges the contents of two vectors. The vectors can be either doubleword or quadword. There is

no distinction between data types.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-744

14 Advanced SIMD Instructions (32-bit)
14.135 VTBL and VTBX

14.135

VTBL and VTBX
Vector Table Lookup, Vector Table Extension.
Syntax
Vop{cond}.8 Dd, list, Dm

where:
op

must be either TBL or TBX.
cond

is an optional condition code.
Dd

specifies the destination vector.
list

Specifies the vectors containing the table. It must be one of:
• {Dn}.
• {Dn,D(n+1)}.
• {Dn,D(n+1),D(n+2)}.
• {Dn,D(n+1),D(n+2),D(n+3)}.
• {Qn,Q(n+1)}.
All the registers in list must be in the range D0-D31 or Q0-Q15 and must not wrap around the
end of the register bank. For example {D31,D0,D1} is not permitted. If list contains Q registers,
they disassemble to the equivalent D registers.
Dm

specifies the index vector.
Operation
VTBL uses byte indexes in a control vector to look up byte values in a table and generate a new vector.

Indexes out of range return zero.
VTBX works in the same way, except that indexes out of range leave the destination element unchanged.

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-745

14 Advanced SIMD Instructions (32-bit)
14.136 VTRN

14.136

VTRN
Vector Transpose.
Syntax
VTRN{cond}.size Qd, Qm
VTRN{cond}.size Dd, Dm

where:
cond

is an optional condition code.
size

must be one of 8, 16, or 32.
Qd, Qm

specifies the vectors, for a quadword operation.
Dd, Dm

specifies the vectors, for a doubleword operation.
Operation
VTRN treats the elements of its operand vectors as elements of 2 x 2 matrices, and transposes the matrices.
The following figures show examples of the operation of VTRN:
7

Figure 14-9 Operation of doubleword VTRN.8
1

Figure 14-10 Operation of doubleword VTRN.32

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-746

14 Advanced SIMD Instructions (32-bit)
14.137 VTST

14.137

VTST
Vector Test bits.
Syntax
VTST{cond}.size {Qd}, Qn, Qm
VTST{cond}.size {Dd}, Dn, Dm

where:
cond

is an optional condition code.
size

must be one of 8, 16, or 32.
Qd, Qn, Qm

specifies the destination register, the first operand register, and the second operand register, for a
quadword operation.
Dd, Dn, Dm

specifies the destination register, the first operand register, and the second operand register, for a
doubleword operation.
Operation
VTST takes each element in a vector, and bitwise logical ANDs them with the corresponding element of a
second vector. If the result is not zero, the corresponding element in the destination vector is set to all
ones. Otherwise, it is set to all zeros.

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-747

14 Advanced SIMD Instructions (32-bit)
14.138 VUZP

14.138

VUZP
Vector Unzip.
Syntax
VUZP{cond}.size Qd, Qm
VUZP{cond}.size Dd, Dm

where:
cond

is an optional condition code.
size

must be one of 8, 16, or 32.
Qd, Qm

specifies the vectors, for a quadword operation.
Dd, Dm

specifies the vectors, for a doubleword operation.
Note
The following are all the same instruction:
• VZIP.32 Dd, Dm.
• VUZP.32 Dd, Dm.
• VTRN.32 Dd, Dm.
The instruction is disassembled as VTRN.32 Dd, Dm.

Operation
VUZP de-interleaves the elements of two vectors.

De-interleaving is the inverse process of interleaving.
Table 14-27 Operation of doubleword VUZP.8
Register state before operation Register state after operation
Dd

A7 A6 A5 A4 A3 A2 A1 A0 B6 B4 B2 B0 A6 A4 A2 A0

Dm B7 B6 B5 B4 B3 B2 B1 B0 B7 B5 B3 B1 A7 A5 A3 A1
Table 14-28 Operation of quadword VUZP.32
Register state before operation Register state after operation
Qd

Qm B3

Related concepts
14.4 Interleaving provided by load and store element and structure instructions on page 14-609.
Related references
14.136 VTRN on page 14-746.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

14-748

14 Advanced SIMD Instructions (32-bit)
14.139 VZIP

14.139

VZIP
Vector Zip.
Syntax
VZIP{cond}.size Qd, Qm
VZIP{cond}.size Dd, Dm

where:
cond

is an optional condition code.
size

must be one of 8, 16, or 32.
Qd, Qm

specifies the vectors, for a quadword operation.
Dd, Dm

Operation
VZIP interleaves the elements of two vectors.
Table 14-29 Operation of doubleword VZIP.8
Register state before operation Register state after operation
Dd

A7 A6 A5 A4 A3 A2 A1 A0 B3 A3 B2 A2 B1 A1 B0 A0

Dm B7 B6 B5 B4 B3 B2 B1 B0 B7 A7 B6 A6 B5 A5 B4 A4
Table 14-30 Operation of quadword VZIP.32
Register state before operation Register state after operation
Qd

Qm B3

ARM DUI0801G

14-749

Chapter 15
Floating-point Instructions (32-bit)

Describes floating-point assembly language instructions.
It contains the following sections:
• 15.1 Summary of floating-point instructions on page 15-752.
• 15.2 VABS (floating-point) on page 15-754.
• 15.3 VADD (floating-point) on page 15-755.
• 15.4 VCMP, VCMPE on page 15-756.
• 15.5 VCVT (between single-precision and double-precision) on page 15-757.
• 15.6 VCVT (between floating-point and integer) on page 15-758.
• 15.7 VCVT (from floating-point to integer with directed rounding modes) on page 15-759.
• 15.8 VCVT (between floating-point and fixed-point) on page 15-760.
• 15.9 VCVTB, VCVTT (half-precision extension) on page 15-761.
• 15.10 VCVTB, VCVTT (between half-precision and double-precision) on page 15-762.
• 15.11 VDIV on page 15-763.
• 15.12 VFMA, VFMS, VFNMA, VFNMS (floating-point) on page 15-764.
• 15.13 VJCVT on page 15-765.
• 15.14 VLDM (floating-point) on page 15-766.
• 15.15 VLDR (floating-point) on page 15-767.
• 15.16 VLDR (post-increment and pre-decrement, floating-point) on page 15-768.
• 15.17 VLDR pseudo-instruction (floating-point) on page 15-769.
• 15.18 VMAXNM, VMINNM (floating-point) on page 15-770.
• 15.19 VMLA (floating-point) on page 15-771.
• 15.20 VMLS (floating-point) on page 15-772.
• 15.21 VMOV (floating-point) on page 15-773.
• 15.22 VMOV (between one ARM register and single precision floating-point register)
on page 15-774.

ARM DUI0801G

15-750

15 Floating-point Instructions (32-bit)

•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•

ARM DUI0801G

15.23 VMOV (between two ARM registers and one or two extension registers) on page 15-775.
15.24 VMOV (between an ARM register and half a double precision floating-point register)
on page 15-776.
15.25 VMRS (floating-point) on page 15-777.
15.26 VMSR (floating-point) on page 15-778.
15.27 VMUL (floating-point) on page 15-779.
15.28 VNEG (floating-point) on page 15-780.
15.29 VNMLA (floating-point) on page 15-781.
15.30 VNMLS (floating-point) on page 15-782.
15.31 VNMUL (floating-point) on page 15-783.
15.32 VPOP (floating-point) on page 15-784.
15.33 VPUSH (floating-point) on page 15-785.
15.34 VRINT (floating-point) on page 15-786.
15.35 VSEL on page 15-787.
15.36 VSQRT on page 15-788.
15.37 VSTM (floating-point) on page 15-789.
15.38 VSTR (floating-point) on page 15-790.
15.39 VSTR (post-increment and pre-decrement, floating-point) on page 15-791.
15.40 VSUB (floating-point) on page 15-792.

15-751

15 Floating-point Instructions (32-bit)
15.1 Summary of floating-point instructions

15.1

Summary of floating-point instructions
A summary of the floating-point instructions. Not all of these instructions are available in all floatingpoint versions.
The following table shows a summary of floating-point instructions that are not available in Advanced
SIMD.
Note
Floating-point vector mode is not supported in ARMv8. Use Advanced SIMD instructions for vector
floating-point.

Table 15-1 Summary of floating-point instructions
Mnemonic

Brief description

VABS

Absolute value

VADD

Add

VCMP, VCMPE

Compare

VCVT

Convert between single-precision and double-precision
Convert between floating-point and integer
Convert between floating-point and fixed-point
Convert floating-point to integer with directed rounding modes

VCVTB, VCVTT

Convert between half-precision and single-precision floating-point
Convert between half-precision and double-precision

VDIV

Divide

VFMA, VFMS

Fused multiply accumulate, Fused multiply subtract

VFNMA, VFNMS

Fused multiply accumulate with negation, Fused multiply subtract with negation

VJCVT

Javascript Convert to signed fixed-point, rounding toward Zero

VLDM

Extension register load multiple

VLDR

Extension register load

VMAXNM, VMINNM Maximum, Minimum, consistent with IEEE 754-2008
VMLA

Multiply accumulate

VMLS

Multiply subtract

VMOV

Insert floating-point immediate in single-precision or double-precision register, or copy one FP register into
another FP register of the same width

VMRS

Transfer contents from a floating-point system register to an ARM register

VMSR

Transfer contents from an ARM register to a floating-point system register

VMUL

Multiply

VNEG

Negate

VNMLA

Negated multiply accumulate

VNMLS

Negated multiply subtract

ARM DUI0801G

15-752

15 Floating-point Instructions (32-bit)
15.1 Summary of floating-point instructions

Table 15-1 Summary of floating-point instructions (continued)
Mnemonic

Brief description

VNMUL

Negated multiply

VPOP

Extension register load multiple

VPUSH

Extension register store multiple

VRINT

Round to integer

VSEL

Select

VSQRT

Square Root

VSTM

Extension register store multiple

VSTR

Extension register store

VSUB

Subtract

ARM DUI0801G

15-753

15 Floating-point Instructions (32-bit)
15.2 VABS (floating-point)

15.2

VABS (floating-point)
Floating-point absolute value.
Syntax
VABS{cond}.F32 Sd, Sm
VABS{cond}.F64 Dd, Dm

where:
cond

is an optional condition code.
Sd, Sm

are the single-precision registers for the result and operand.
Dd, Dm

are the double-precision registers for the result and operand.
Operation
The VABS instruction takes the contents of Sm or Dm, clears the sign bit, and places the result in Sd or Dd.
This gives the absolute value.
If the operand is a NaN, the sign bit is cleared, but no exception is produced.
Floating-point exceptions
VABS instructions do not produce any exceptions.

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

15-754

15 Floating-point Instructions (32-bit)
15.3 VADD (floating-point)

15.3

VADD (floating-point)
Floating-point add.
Syntax
VADD{cond}.F32 {Sd}, Sn, Sm
VADD{cond}.F64 {Dd}, Dn, Dm

where:
cond

is an optional condition code.
Sd, Sn, Sm

are the single-precision registers for the result and operands.
Dd, Dn, Dm

are the double-precision registers for the result and operands.
Operation
The VADD instruction adds the values in the operand registers and places the result in the destination
register.
Floating-point exceptions
The VADD instruction can produce Invalid Operation, Overflow, or Inexact exceptions.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

15-755

15 Floating-point Instructions (32-bit)
15.4 VCMP, VCMPE

15.4

VCMP, VCMPE
Floating-point compare.
Syntax
VCMP{E}{cond}.F32 Sd, Sm
VCMP{E}{cond}.F32 Sd, #0
VCMP{E}{cond}.F64 Dd, Dm
VCMP{E}{cond}.F64 Dd, #0

where:
E

if present, indicates that the instruction raises an Invalid Operation exception if either operand is
a quiet or signaling NaN. Otherwise, it raises the exception only if either operand is a signaling
NaN.
cond

is an optional condition code.
Sd, Sm

are the single-precision registers holding the operands.
Dd, Dm

are the double-precision registers holding the operands.
Operation
The VCMP{E} instruction subtracts the value in the second operand register (or 0 if the second operand is
#0) from the value in the first operand register, and sets the VFP condition flags based on the result.
Floating-point exceptions
VCMP{E} instructions can produce Invalid Operation exceptions.

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

15-756

15 Floating-point Instructions (32-bit)
15.5 VCVT (between single-precision and double-precision)

15.5

VCVT (between single-precision and double-precision)
Convert between single-precision and double-precision numbers.
Syntax
VCVT{cond}.F64.F32 Dd, Sm
VCVT{cond}.F32.F64 Sd, Dm

where:
cond

is an optional condition code.
Dd

is a double-precision register for the result.
Sm

is a single-precision register holding the operand.
Sd

is a single-precision register for the result.
Dm

is a double-precision register holding the operand.
Operation
These instructions convert the single-precision value in Sm to double-precision, placing the result in Dd,
or the double-precision value in Dm to single-precision, placing the result in Sd.
Floating-point exceptions
These instructions can produce Invalid Operation, Input Denormal, Overflow, Underflow, or Inexact
exceptions.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

15-757

15 Floating-point Instructions (32-bit)
15.6 VCVT (between floating-point and integer)

15.6

VCVT (between floating-point and integer)
Convert between floating-point numbers and integers.
Syntax
VCVT{R}{cond}.type.F64 Sd, Dm
VCVT{R}{cond}.type.F32 Sd, Sm
VCVT{cond}.F64.type Dd, Sm
VCVT{cond}.F32.type Sd, Sm

where:
R

makes the operation use the rounding mode specified by the FPSCR. Otherwise, the operation
rounds towards zero.
cond

is an optional condition code.
type

can be either U32 (unsigned 32-bit integer) or S32 (signed 32-bit integer).
Sd

is a single-precision register for the result.
Dd

is a double-precision register for the result.
Sm

is a single-precision register holding the operand.
Dm

is a double-precision register holding the operand.
Operation
The first two forms of this instruction convert from floating-point to integer.
The third and fourth forms convert from integer to floating-point.
Floating-point exceptions
These instructions can produce Input Denormal, Invalid Operation, or Inexact exceptions.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

15-758

15 Floating-point Instructions (32-bit)
15.7 VCVT (from floating-point to integer with directed rounding modes)

15.7

VCVT (from floating-point to integer with directed rounding modes)
Convert from floating-point to signed or unsigned integer with directed rounding modes.
Note
This instruction is supported only in ARMv8.

Syntax
VCVTmode.S32.F64 Sd, Dm
VCVTmode.S32.F32 Sd, Sm
VCVTmode.U32.F64 Sd, Dm
VCVTmode.U32.F32 Sd, Sm

where:
mode

must be one of:
A

meaning round to nearest, ties away from zero
N

meaning round to nearest, ties to even
P

meaning round towards plus infinity
M

meaning round towards minus infinity.
Sd, Sm

specifies the single-precision registers for the operand and result.
Sd, Dm

specifies a single-precision register for the result and double-precision register holding the
operand.
Notes
You cannot use VCVT with a directed rounding mode inside an IT block.
Floating-point exceptions
These instructions can produce Input Denormal, Invalid Operation, or Inexact exceptions.

ARM DUI0801G

15-759

15 Floating-point Instructions (32-bit)
15.8 VCVT (between floating-point and fixed-point)

15.8

VCVT (between floating-point and fixed-point)
Convert between floating-point and fixed-point numbers.
Syntax
VCVT{cond}.type.F64 Dd, Dd, #fbits
VCVT{cond}.type.F32 Sd, Sd, #fbits
VCVT{cond}.F64.type Dd, Dd, #fbits
VCVT{cond}.F32.type Sd, Sd, #fbits

where:
cond

is an optional condition code.
type

can be any one of:
S16

16-bit signed fixed-point number.
U16

16-bit unsigned fixed-point number.
S32

32-bit signed fixed-point number.
U32

32-bit unsigned fixed-point number.
Sd

is a single-precision register for the operand and result.
Dd

is a double-precision register for the operand and result.
fbits

is the number of fraction bits in the fixed-point number, in the range 0-16 if type is S16 or U16,
or in the range 1-32 if type is S32 or U32.
Operation
The first two forms of this instruction convert from floating-point to fixed-point.
The third and fourth forms convert from fixed-point to floating-point.
In all cases the fixed-point number is contained in the least significant 16 or 32 bits of the register.
Floating-point exceptions
These instructions can produce Input Denormal, Invalid Operation, or Inexact exceptions.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

15-760

15 Floating-point Instructions (32-bit)
15.9 VCVTB, VCVTT (half-precision extension)

15.9

VCVTB, VCVTT (half-precision extension)
Convert between half-precision and single-precision floating-point numbers.
Syntax
VCVTB{cond}.type Sd, Sm
VCVTT{cond}.type Sd, Sm

where:
cond

is an optional condition code.
type

can be any one of:
F32.F16

Convert from half-precision to single-precision.
F16.F32

Convert from single-precision to half-precision.
Sd

is a single word register for the result.
Sm

is a single word register for the operand.
Operation
VCVTB uses the bottom half (bits[15:0]) of the single word register to obtain or store the half-precision

value
VCVTT uses the top half (bits[31:16]) of the single word register to obtain or store the half-precision

value.
Architectures
The instructions are only available in VFPv3 systems with the half-precision extension, and VFPv4.
Floating-point exceptions
These instructions can produce Input Denormal, Invalid Operation, Overflow, Underflow, or Inexact
exceptions.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

15-761

15 Floating-point Instructions (32-bit)
15.10 VCVTB, VCVTT (between half-precision and double-precision)

15.10

From half-precision floating-point to double-precision floating-point (F64.F16).
From double-precision floating-point to half-precision floating-point (F16.F64).

VCVTB uses the bottom half (bits[15:0]) of the single word register to obtain or store the half-precision

value.
VCVTT uses the top half (bits[31:16]) of the single word register to obtain or store the half-precision

value.
Note
These instructions are supported only in ARMv8.

Syntax
VCVTB{cond}.F64.F16 Dd, Sm
VCVTB{cond}.F16.F64 Sd, Dm
VCVTT{cond}.F64.F16 Dd, Sm
VCVTT{cond}.F16.F64 Sd, Dm

where:
cond

is an optional condition code.
Dd

is a double-precision register for the result.
Sm

is a single word register holding the operand.
Sd

is a single word register for the result.
Dm

ARM DUI0801G

15-762

15 Floating-point Instructions (32-bit)
15.11 VDIV

15.11

VDIV
Floating-point divide.
Syntax
VDIV{cond}.F32 {Sd}, Sn, Sm
VDIV{cond}.F64 {Dd}, Dn, Dm

where:
cond

is an optional condition code.
Sd, Sn, Sm

are the single-precision registers for the result and operands.
Dd, Dn, Dm

are the double-precision registers for the result and operands.
Operation
The VDIV instruction divides the value in the first operand register by the value in the second operand
register, and places the result in the destination register.
Floating-point exceptions
VDIV operations can produce Division by Zero, Invalid Operation, Overflow, Underflow, or Inexact

exceptions.
Related concepts
Control of scalar, vector, and mixed operations.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

15-763

15 Floating-point Instructions (32-bit)
15.12 VFMA, VFMS, VFNMA, VFNMS (floating-point)

15.12

VFMA, VFMS, VFNMA, VFNMS (floating-point)
Fused floating-point multiply accumulate and fused floating-point multiply subtract, with optional
negation.
Syntax
VF{N}op{cond}.F64 {Dd}, Dn, Dm
VF{N}op{cond}.F32 {Sd}, Sn, Sm

where:
op

is one of MA or MS.
N

negates the final result.
cond

is an optional condition code.
Sd, Sn, Sm

are the single-precision registers for the result and operands.
Dd, Dn, Dm

are the double-precision registers for the result and operands.
Operation
VFMA multiplies the values in the operand registers, adds the value in the destination register, and places
the final result in the destination register. The result of the multiply is not rounded before the
accumulation.
VFMS multiplies the values in the operand registers, subtracts the product from the value in the destination
register, and places the final result in the destination register. The result of the multiply is not rounded
before the subtraction.

In each case, the final result is negated if the N option is used.
Floating-point exceptions
These instructions can produce Input Denormal, Invalid Operation, Overflow, Underflow, or Inexact
exceptions.
Related references
15.27 VMUL (floating-point) on page 15-779.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

15-764

15 Floating-point Instructions (32-bit)
15.13 VJCVT

15.13

VJCVT
Javascript Convert to signed fixed-point, rounding toward Zero.
Syntax
VJCVT{q}.S32.F64 Sd, Dm ; A1 FP/SIMD registers (A32)
VJCVT{q}.S32.F64 Sd, Dm ; T1 FP/SIMD registers (T32)

Where:
q

See Standard assembler syntax fields in the ARMv8-A Architecture Reference Manual.
Sd

Is the 32-bit name of the SIMD and FP destination register.
Dm

Is the 64-bit name of the SIMD and FP source register.
Architectures supported
Supported in ARMv8.3.
Usage
Javascript Convert to signed fixed-point, rounding toward Zero. This instruction converts the doubleprecision floating-point value in the SIMD and FP source register to a 32-bit signed integer using the
Round towards Zero rounding mode, and write the result to the general-purpose destination register. If
the result is too large to be accomodated as a signed 32-bit integer, then the result is the integer modulo
232, as held in a 32-bit signed integer.
Depending on settings in the CPACR in the ARMv8-A Architecture Reference Manual, NSACR in the
ARMv8-A Architecture Reference Manual, HCPTR in the ARMv8-A Architecture Reference Manual, and
FPEXC in the ARMv8-A Architecture Reference Manual registers, and the security state and mode in
which the instruction is executed, an attempt to execute the instruction might be UNDEFINED, or trapped to
Hyp mode. For more information see Enabling Advanced SIMD and floating-point support in the
ARMv8-A Architecture Reference Manual.
Related references
15.1 Summary of floating-point instructions on page 15-752.

ARM DUI0801G

15-765

15 Floating-point Instructions (32-bit)
15.14 VLDM (floating-point)

15.14

VLDM (floating-point)
Extension register load multiple.
Syntax
VLDMmode{cond} Rn{!}, Registers

where:
mode

must be one of:
IA

meaning Increment address After each transfer. IA is the default, and can be omitted.
DB

meaning Decrement address Before each transfer.
EA

meaning Empty Ascending stack operation. This is the same as DB for loads.
FD

meaning Full Descending stack operation. This is the same as IA for loads.
cond

is an optional condition code.
Rn

is the ARM register holding the base address for the transfer.
!

is optional. ! specifies that the updated base address must be written back to Rn. If ! is not
specified, mode must be IA.
Registers

is a list of consecutive extension registers enclosed in braces, { and }. The list can be commaseparated, or in range format. There must be at least one register in the list.
You can specify S or D registers, but they must not be mixed. The number of registers must not
exceed 16 D registers.
Note
VPOP Registers is equivalent to VLDM sp!, Registers.

You can use either form of this instruction. They both disassemble to VPOP.

Related concepts
6.16 Stack implementation using LDM and STM on page 6-122.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

15-766

15 Floating-point Instructions (32-bit)
15.15 VLDR (floating-point)

15.15

VLDR (floating-point)
Extension register load.
Syntax
VLDR{cond}{.size} Fd, [Rn{, #offset}]
VLDR{cond}{.size} Fd, label

where:
cond

is an optional condition code.
size

is an optional data size specifier. Must be 32 if Fd is an S register, or 64 otherwise.
Fd

is the extension register to be loaded, and can be either a D or S register.
Rn

is the ARM register holding the base address for the transfer.
offset

is a PC-relative expression.
label must be aligned on a word boundary within ±1KB of the current instruction.

Operation
The VLDR instruction loads an extension register from memory.
One word is transferred if Fd is an S register. Two words are transferred otherwise.
There is also a VLDR pseudo-instruction.
Related concepts
12.5 Register-relative and PC-relative expressions on page 12-302.
Related references
15.17 VLDR pseudo-instruction (floating-point) on page 15-769.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

15-767

15 Floating-point Instructions (32-bit)
15.16 VLDR (post-increment and pre-decrement, floating-point)

15.16

VLDR (post-increment and pre-decrement, floating-point)
Pseudo-instruction that loads extension registers, with post-increment and pre-decrement forms.
Note
There are also VLDR and VSTR instructions without post-increment and pre-decrement.

Syntax
VLDR{cond}{.size} Fd, [Rn], #offset ; post-increment
VLDR{cond}{.size} Fd, [Rn, #-offset]! ; pre-decrement

where:
cond

is an optional condition code.
size

is an optional data size specifier. Must be 32 if Fd is an S register, or 64 if Fd is a D register.
Fd

is the extension register to load. It can be either a double precision (Dd) or a single precision (Sd)
register.
Rn

is the ARM register holding the base address for the transfer.
offset

is a numeric expression that must evaluate to a numeric value at assembly time. The value must
be 4 if Fd is an S register, or 8 if Fd is a D register.
Operation
The post-increment instruction increments the base address in the register by the offset value, after the
transfer. The pre-decrement instruction decrements the base address in the register by the offset value,
and then performs the transfer using the new address in the register. This pseudo-instruction assembles to
a VLDM instruction.
Related references
15.14 VLDM (floating-point) on page 15-766.
15.15 VLDR (floating-point) on page 15-767.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

15-768

15 Floating-point Instructions (32-bit)
15.17 VLDR pseudo-instruction (floating-point)

15.17

VLDR pseudo-instruction (floating-point)
The VLDR pseudo-instruction loads a constant value into a floating-point single-precision or doubleprecision register.
Note
This description is for the VLDR pseudo-instruction only.

Syntax
VLDR{cond}.F64 Dd,=constant
VLDR{cond}.F32 Sd,=constant

where:
cond

is an optional condition code.
Dd or Sd

is the extension register to be loaded.
constant

is an immediate value of the appropriate type for the extension register width.
Usage
If an instruction (for example, VMOV) is available that can generate the constant directly into the register,
the assembler uses it. Otherwise, it generates a doubleword literal pool entry containing the constant and
loads the constant using a VLDR instruction.
Related concepts
10.10 Floating-point data types in A32/T32 instructions on page 10-217.
Related references
15.15 VLDR (floating-point) on page 15-767.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

15-769

15 Floating-point Instructions (32-bit)
15.18 VMAXNM, VMINNM (floating-point)

15.18

VMAXNM, VMINNM (floating-point)
Vector Minimum, Vector Maximum.
Note
These instructions are supported only in ARMv8.

Syntax
Vop.F32 Sd, Sn, Sm
Vop.F64 Dd, Dn, Dm

where:
op

must be either MAXNM or MINNM.
Sd, Sn, Sm

are the single-precision destination register, first operand register, and second operand register.
Dd, Dn, Dm

are the double-precision destination register, first operand register, and second operand register.
Operation
VMAXNM compares the values in the operand registers, and copies the larger value into the destination
operand register.
VMINNM compares the values in the operand registers, and copies the smaller value into the destination

operand register.
If one of the values being compared is a number and the other value is NaN, the number is copied into
the destination operand register. This is consistent with the IEEE 754-2008 standard.
Notes
You cannot use VMAXNM or VMINNM inside an IT block.
Floating-point exceptions
These instructions can produce Input Denormal, Invalid Operation, Overflow, Underflow, or Inexact
exceptions.

ARM DUI0801G

15-770

15 Floating-point Instructions (32-bit)
15.19 VMLA (floating-point)

15.19

VMLA (floating-point)
Floating-point multiply accumulate.
Syntax
VMLA{cond}.F32 Sd, Sn, Sm
VMLA{cond}.F64 Dd, Dn, Dm

where:
cond

is an optional condition code.
Sd, Sn, Sm

are the single-precision registers for the result and operands.
Dd, Dn, Dm

are the double-precision registers for the result and operands.
Operation
The VMLA instruction multiplies the values in the operand registers, adds the value in the destination
register, and places the final result in the destination register.
Floating-point exceptions
This instruction can produce Invalid Operation, Overflow, Underflow, Inexact, or Input Denormal
exceptions.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

15-771

15 Floating-point Instructions (32-bit)
15.20 VMLS (floating-point)

15.20

VMLS (floating-point)
Floating-point multiply subtract.
Syntax
VMLS{cond}.F32 Sd, Sn, Sm
VMLS{cond}.F64 Dd, Dn, Dm

where:
cond

is an optional condition code.
Sd, Sn, Sm

are the single-precision registers for the result and operands.
Dd, Dn, Dm

are the double-precision registers for the result and operands.
Operation
The VMLS instruction multiplies the values in the operand registers, subtracts the result from the value in
the destination register, and places the final result in the destination register.
Floating-point exceptions
This instruction can produce Invalid Operation, Overflow, Underflow, Inexact, or Input Denormal
exceptions.
Related concepts
Control of scalar, vector, and mixed operations.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

15-772

15 Floating-point Instructions (32-bit)
15.21 VMOV (floating-point)

15.21

VMOV (floating-point)
Insert a floating-point immediate value into a single-precision or double-precision register, or copy one
register into another register. This instruction is always scalar.
Syntax
VMOV{cond}.F32 Sd, #imm
VMOV{cond}.F64 Dd, #imm
VMOV{cond}.F32 Sd, Sm
VMOV{cond}.F64 Dd, Dm

where:
cond

is an optional condition code.
Sd

is the single-precision destination register.
Dd

is the double-precision destination register.
imm

is the floating-point immediate value.
Sm

is the single-precision source register.
Dm

is the double-precision source register.
Immediate values
Any number that can be expressed as +/–n * 2–r,where n and r are integers, 16 <= n <= 31, 0 <= r <= 7.
Architectures
The instructions that copy immediate constants are available in VFPv3 and above.
The instructions that copy from registers are available in all VFP systems.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

15-773

15 Floating-point Instructions (32-bit)
15.22 VMOV (between one ARM register and single precision floating-point register)

15.22

VMOV (between one ARM register and single precision floating-point
register)
Transfer contents between a single-precision floating-point register and an ARM register.
Syntax
VMOV{cond} Rd, Sn
VMOV{cond} Sn, Rd

where:
cond

is an optional condition code.
Sn

is the floating-point single-precision register.
Rd

is the ARM register. Rd must not be PC.
Operation
VMOV Rd, Sn transfers the contents of Sn into Rd.
VMOV Sn, Rd transfers the contents of Rd into Sn.

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

15-774

15 Floating-point Instructions (32-bit)
15.23 VMOV (between two ARM registers and one or two extension registers)

15.23

VMOV (between two ARM registers and one or two extension registers)
Transfer contents between two ARM registers and either one 64-bit register or two consecutive 32-bit
registers.
Syntax
VMOV{cond} Dm, Rd, Rn
VMOV{cond} Rd, Rn, Dm
VMOV{cond} Sm, Sm1, Rd, Rn
VMOV{cond} Rd, Rn, Sm, Sm1

where:
cond

is an optional condition code.
Dm

is a 64-bit extension register.
Sm

is a VFP 32-bit register.
Sm1

is the next consecutive VFP 32-bit register after Sm.
Rd, Rn

are the ARM registers. Rd and Rn must not be PC.
Operation
VMOV Dm, Rd, Rn transfers the contents of Rd into the low half of Dm, and the contents of Rn into the

high half of Dm.
VMOV Rd, Rn, Dm transfers the contents of the low half of Dm into Rd, and the contents of the high half of
Dm into Rn.
VMOV Rd, Rn, Sm, Sm1 transfers the contents of Sm into Rd, and the contents of Sm1 into Rn.
VMOV Sm, Sm1, Rd, Rn transfers the contents of Rd into Sm, and the contents of Rn into Sm1.

Architectures
The instructions are available in VFPv2 and above.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

15-775

15 Floating-point Instructions (32-bit)
15.24 VMOV (between an ARM register and half a double precision floating-point register)

15.24

VMOV (between an ARM register and half a double precision floating-point
register)
Transfer contents between an ARM register and half a double precision floating-point register.
Syntax
VMOV{cond}{.size} Dn[x], Rd
VMOV{cond}{.size} Rd, Dn[x]

where:
cond

is an optional condition code.
size

the data size. Must be either 32 or omitted. If omitted, size is 32.
Dn[x]

is the upper or lower half of a double precision floating-point register.
Rd

is the ARM register. Rd must not be PC.
Operation
VMOV Dn[x], Rd transfers the contents of Rd into Dn[x].
VMOV Rd, Dn[x] transfers the contents of Dn[x] into Rd.

Related concepts
10.10 Floating-point data types in A32/T32 instructions on page 10-217.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

15-776

15 Floating-point Instructions (32-bit)
15.25 VMRS (floating-point)

15.25

VMRS (floating-point)
Transfer contents from an floating-point system register to an ARM register.
Syntax
VMRS{cond} Rd, extsysreg

where:
cond

is an optional condition code.
extsysreg

is the floating-point system register, usually FPSCR, FPSID, or FPEXC.
Rd

Examples
VMRS
VMRS

r2,FPCID
APSR_nzcv, FPSCR

; transfer FP status register to ARM APSR

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

15-777

15 Floating-point Instructions (32-bit)
15.26 VMSR (floating-point)

15.26

VMSR (floating-point)
Transfer contents of an ARM register to an floating-point system register.
Syntax
VMSR{cond} extsysreg, Rd

where:
cond

is an optional condition code.
extsysreg

is the floating-point system register, usually FPSCR, FPSID, or FPEXC.
Rd

is the ARM register. Rd must not be PC.
It can be APSR_nzcv, if extsysreg is FPSCR. In this case, the floating-point status flags are
transferred into the corresponding flags in the ARM APSR.
Usage
The VMSR instruction transfers the contents of Rd into extsysreg.
Note
The instruction stalls the processor until all current floating-point operations complete.

Example
VMSR

FPSCR, r4

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

15-778

15 Floating-point Instructions (32-bit)
15.27 VMUL (floating-point)

15.27

VMUL (floating-point)
Floating-point multiply.
Syntax
VMUL{cond}.F32 {Sd,} Sn, Sm
VMUL{cond}.F64 {Dd,} Dn, Dm

where:
cond

is an optional condition code.
Sd, Sn, Sm

are the single-precision registers for the result and operands.
Dd, Dn, Dm

are the double-precision registers for the result and operands.
Operation
The VMUL operation multiplies the values in the operand registers and places the result in the destination
register.
Floating-point exceptions
This instruction can produce Invalid Operation, Overflow, Underflow, Inexact, or Input Denormal
exceptions.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

15-779

15 Floating-point Instructions (32-bit)
15.28 VNEG (floating-point)

15.28

VNEG (floating-point)
Floating-point negate.
Syntax
VNEG{cond}.F32 Sd, Sm
VNEG{cond}.F64 Dd, Dm

where:
cond

is an optional condition code.
Sd, Sm

are the single-precision registers for the result and operand.
Dd, Dm

are the double-precision registers for the result and operand.
Operation
The VNEG instruction takes the contents of Sm or Dm, changes the sign bit, and places the result in Sd or Dd.
This gives the negation of the value.
If the operand is a NaN, the sign bit is changed, but no exception is produced.
Floating-point exceptions
VNEG instructions do not produce any exceptions.

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

15-780

15 Floating-point Instructions (32-bit)
15.29 VNMLA (floating-point)

15.29

VNMLA (floating-point)
Floating-point multiply accumulate with negation.
Syntax
VNMLA{cond}.F32 Sd, Sn, Sm
VNMLA{cond}.F64 Dd, Dn, Dm

where:
cond

is an optional condition code.
Sd, Sn, Sm

are the single-precision registers for the result and operands.
Dd, Dn, Dm

are the double-precision registers for the result and operands.
Operation
The VNMLA instruction multiplies the values in the operand registers, adds the value to the destination
register, and places the negated final result in the destination register.
Floating-point exceptions
This instruction can produce Invalid Operation, Overflow, Underflow, Inexact, or Input Denormal
exceptions.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

15-781

15 Floating-point Instructions (32-bit)
15.30 VNMLS (floating-point)

15.30

VNMLS (floating-point)
Floating-point multiply subtract with negation.
Syntax
VNMLS{cond}.F32 Sd, Sn, Sm
VNMLS{cond}.F64 Dd, Dn, Dm

where:
cond

is an optional condition code.
Sd, Sn, Sm

are the single-precision registers for the result and operands.
Dd, Dn, Dm

are the double-precision registers for the result and operands.
Operation
The VNMLS instruction multiplies the values in the operand registers, subtracts the result from the value in
the destination register, and places the negated final result in the destination register.
Floating-point exceptions
This instruction can produce Invalid Operation, Overflow, Underflow, Inexact, or Input Denormal
exceptions.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

15-782

15 Floating-point Instructions (32-bit)
15.31 VNMUL (floating-point)

15.31

VNMUL (floating-point)
Floating-point multiply with negation.
Syntax
VNMUL{cond}.F32 {Sd,} Sn, Sm
VNMUL{cond}.F64 {Dd,} Dn, Dm

where:
cond

is an optional condition code.
Sd, Sn, Sm

are the single-precision registers for the result and operands.
Dd, Dn, Dm

are the double-precision registers for the result and operands.
Operation
The VNMUL instruction multiplies the values in the operand registers and places the negated result in the
destination register.
Floating-point exceptions
This instruction can produce Invalid Operation, Overflow, Underflow, Inexact, or Input Denormal
exceptions.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

15-783

15 Floating-point Instructions (32-bit)
15.32 VPOP (floating-point)

15.32

VPOP (floating-point)
Pop extension registers from the stack.
Syntax
VPOP{cond} Registers

where:
cond

is an optional condition code.
Registers

is a list of consecutive extension registers enclosed in braces, { and }. The list can be commaseparated, or in range format. There must be at least one register in the list.
You can specify S or D registers, but they must not be mixed. The number of registers must not
exceed 16 D registers.
Note
VPOP Registers is equivalent to VLDM sp!, Registers.

You can use either form of this instruction. They both disassemble to VPOP.

Related concepts
6.16 Stack implementation using LDM and STM on page 6-122.
Related references
7.11 Condition code suffixes on page 7-150.
15.33 VPUSH (floating-point) on page 15-785.

ARM DUI0801G

15-784

15 Floating-point Instructions (32-bit)
15.33 VPUSH (floating-point)

15.33

VPUSH (floating-point)
Push extension registers onto the stack.
Syntax
VPUSH{cond} Registers

where:
cond

is an optional condition code.
Registers

is a list of consecutive extension registers enclosed in braces, { and }. The list can be commaseparated, or in range format. There must be at least one register in the list.
You can specify S or D registers, but they must not be mixed. The number of registers must not
exceed 16 D registers.
Note
VPUSH Registers is equivalent to VSTMDB sp!, Registers.

You can use either form of this instruction. They both disassemble to VPUSH.

Related concepts
6.16 Stack implementation using LDM and STM on page 6-122.
Related references
7.11 Condition code suffixes on page 7-150.
15.32 VPOP (floating-point) on page 15-784.

ARM DUI0801G

15-785

15 Floating-point Instructions (32-bit)
15.34 VRINT (floating-point)

15.34

VRINT (floating-point)
Rounds a floating-point number to integer and places the result in the destination register. The resulting
integer is represented in floating-point format.
Note
This instruction is supported only in ARMv8.

Syntax
VRINTmode{cond}.F64.F64 Dd, Dm
VRINTmode{cond}.F32.F32 Sd, Sm

where:
mode

must be one of:
Z

meaning round towards zero.
R

meaning use the rounding mode specified in the FPSCR.
X

meaning use the rounding mode specified in the FPSCR, generating an Inexact
exception if the result is not exact.
A

meaning round to nearest, ties away from zero.
N

meaning round to nearest, ties to even.
P

meaning round towards plus infinity.
M

meaning round towards minus infinity.
cond

is an optional condition code. This can only be used when mode is Z, R or X.
Sd, Sm

specifies the destination and operand registers, for a word operation.
Dd, Dm

specifies the destination and operand registers, for a doubleword operation.
Notes
You cannot use VRINT with a rounding mode of A, N, P or M inside an IT block.
Floating-point exceptions
These instructions cannot produce any exceptions, except VRINTX which can generate an Inexact
exception.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

15-786

15 Floating-point Instructions (32-bit)
15.35 VSEL

15.35

VSEL
Floating-point select.
Note
This instruction is supported only in ARMv8.

Syntax
VSELcond.F32 Sd, Sn, Sm
VSELcond.F64 Dd, Dn, Dm

where:
cond

must be one of GE, GT, EQ, VS.
Sd, Sn, Sm

are the single-precision registers for the result and operands.
Dd, Dn, Dm

are the double-precision registers for the result and operands.
Usage
The VSEL instruction compares the values in the operand registers. If the condition is true, it copies the
value in the first operand register into the destination operand register. Otherwise, it copies the value in
the second operand register.
You cannot use VSEL inside an IT block.
Floating-point exceptions
VSEL instructions cannot produce any exceptions.

Related references
7.13 Comparison of condition code meanings in integer and floating-point code on page 7-152.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

15-787

15 Floating-point Instructions (32-bit)
15.36 VSQRT

15.36

VSQRT
Floating-point square root.
Syntax
VSQRT{cond}.F32 Sd, Sm
VSQRT{cond}.F64 Dd, Dm

where:
cond

is an optional condition code.
Sd, Sm

are the single-precision registers for the result and operand.
Dd, Dm

are the double-precision registers for the result and operand.
Operation
The VSQRT instruction takes the square root of the contents of Sm or Dm, and places the result in Sd or Dd.
Floating-point exceptions
VSQRT instructions can produce Invalid Operation or Inexact exceptions.

Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

15-788

15 Floating-point Instructions (32-bit)
15.37 VSTM (floating-point)

15.37

VSTM (floating-point)
Extension register store multiple.
Syntax
VSTMmode{cond} Rn{!}, Registers

where:
mode

must be one of:
IA

meaning Increment address After each transfer. IA is the default, and can be omitted.
DB

meaning Decrement address Before each transfer.
EA

meaning Empty Ascending stack operation. This is the same as IA for stores.
FD

meaning Full Descending stack operation. This is the same as DB for stores.
cond

is an optional condition code.
Rn

is the ARM register holding the base address for the transfer.
!

is optional. ! specifies that the updated base address must be written back to Rn. If ! is not
specified, mode must be IA.
Registers

is a list of consecutive extension registers enclosed in braces, { and }. The list can be commaseparated, or in range format. There must be at least one register in the list.
You can specify S or D registers, but they must not be mixed. The number of registers must not
exceed 16 D registers.
Note
VPUSH Registers is equivalent to VSTMDB sp!, Registers.

You can use either form of this instruction. They both disassemble to VPUSH.

Related concepts
6.16 Stack implementation using LDM and STM on page 6-122.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

15-789

15 Floating-point Instructions (32-bit)
15.38 VSTR (floating-point)

15.38

VSTR (floating-point)
Extension register store.
Syntax
VSTR{cond}{.size} Fd, [Rn{, #offset}]

where:
cond

is an optional condition code.
size

is an optional data size specifier. Must be 32 if Fd is an S register, or 64 otherwise.
Fd

is the extension register to be saved. It can be either a D or S register.
Rn

is the ARM register holding the base address for the transfer.
offset

is an optional numeric expression. It must evaluate to a numeric value at assembly time. The
value must be a multiple of 4, and lie in the range –1020 to +1020. The value is added to the
base address to form the address used for the transfer.
Operation
The VSTR instruction saves the contents of an extension register to memory.
One word is transferred if Fd is an S register. Two words are transferred otherwise.
Related concepts
12.5 Register-relative and PC-relative expressions on page 12-302.
Related references
15.17 VLDR pseudo-instruction (floating-point) on page 15-769.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

15-790

15 Floating-point Instructions (32-bit)
15.39 VSTR (post-increment and pre-decrement, floating-point)

15.39

VSTR (post-increment and pre-decrement, floating-point)
Pseudo-instruction that stores extension registers with post-increment and pre-decrement forms.
Note
There are also VLDR and VSTR instructions without post-increment and pre-decrement.

Syntax
VSTR{cond}{.size} Fd, [Rn], #offset ; post-increment
VSTR{cond}{.size} Fd, [Rn, #-offset]! ; pre-decrement

where:
cond

is an optional condition code.
size

is an optional data size specifier. Must be 32 if Fd is an S register, or 64 if Fd is a D register.
Fd

is the extension register to be saved. It can be either a double precision (Dd) or a single precision
(Sd) register.
Rn

is the ARM register holding the base address for the transfer.
offset

is a numeric expression that must evaluate to a numeric value at assembly time. The value must
be 4 if Fd is an S register, or 8 if Fd is a D register.
Operation
The post-increment instruction increments the base address in the register by the offset value, after the
transfer. The pre-decrement instruction decrements the base address in the register by the offset value,
and then performs the transfer using the new address in the register. This pseudo-instruction assembles to
a VSTM instruction.
Related references
15.38 VSTR (floating-point) on page 15-790.
15.37 VSTM (floating-point) on page 15-789.
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

15-791

15 Floating-point Instructions (32-bit)
15.40 VSUB (floating-point)

15.40

VSUB (floating-point)
Floating-point subtract.
Syntax
VSUB{cond}.F32 {Sd}, Sn, Sm
VSUB{cond}.F64 {Dd}, Dn, Dm

where:
cond

is an optional condition code.
Sd, Sn, Sm

are the single-precision registers for the result and operands.
Dd, Dn, Dm

are the double-precision registers for the result and operands.
Operation
The VSUB instruction subtracts the value in the second operand register from the value in the first operand
register, and places the result in the destination register.
Floating-point exceptions
The VSUB instruction can produce Invalid Operation, Overflow, or Inexact exceptions.
Related references
7.11 Condition code suffixes on page 7-150.

ARM DUI0801G

15-792

Chapter 16
A64 General Instructions

Describes the A64 general instructions.
It contains the following sections:
• 16.1 A64 instructions in alphabetical order on page 16-797.
• 16.2 Register restrictions for A64 instructions on page 16-803.
• 16.3 ADC on page 16-804.
• 16.4 ADCS on page 16-805.
• 16.5 ADD (extended register) on page 16-806.
• 16.6 ADD (immediate) on page 16-808.
• 16.7 ADD (shifted register) on page 16-809.
• 16.8 ADDS (extended register) on page 16-810.
• 16.9 ADDS (immediate) on page 16-812.
• 16.10 ADDS (shifted register) on page 16-813.
• 16.11 ADR on page 16-814.
• 16.12 ADRL pseudo-instruction on page 16-815.
• 16.13 ADRP on page 16-816.
• 16.14 AND (immediate) on page 16-817.
• 16.15 AND (shifted register) on page 16-818.
• 16.16 ANDS (immediate) on page 16-819.
• 16.17 ANDS (shifted register) on page 16-820.
• 16.18 ASR (register) on page 16-821.
• 16.19 ASR (immediate) on page 16-822.
• 16.20 ASRV on page 16-823.
• 16.21 AT on page 16-824.
• 16.22 AUTDA, AUTDZA on page 16-825.
• 16.23 AUTDB, AUTDZB on page 16-826.

ARM DUI0801G

16-793

16 A64 General Instructions

16.24 AUTIA, AUTIZA, AUTIA1716, AUTIASP, AUTIAZ on page 16-827.
16.25 AUTIB, AUTIZB, AUTIB1716, AUTIBSP, AUTIBZ on page 16-828.
16.26 B.cond on page 16-829.
16.27 B on page 16-830.
16.28 BFC on page 16-831.
16.29 BFI on page 16-832.
16.30 BFM on page 16-833.
16.31 BFXIL on page 16-834.
16.32 BIC (shifted register) on page 16-835.
16.33 BICS (shifted register) on page 16-836.
16.34 BL on page 16-837.
16.35 BLR on page 16-838.
16.36 BLRAA, BLRAAZ, BLRAB, BLRABZ on page 16-839.
16.37 BR on page 16-840.
16.38 BRAA, BRAAZ, BRAB, BRABZ on page 16-841.
16.39 BRK on page 16-842.
16.40 CBNZ on page 16-843.
16.41 CBZ on page 16-844.
16.42 CCMN (immediate) on page 16-845.
16.43 CCMN (register) on page 16-846.
16.44 CCMP (immediate) on page 16-847.
16.45 CCMP (register) on page 16-848.
16.46 CINC on page 16-849.
16.47 CINV on page 16-850.
16.48 CLREX on page 16-851.
16.49 CLS on page 16-852.
16.50 CLZ on page 16-853.
16.51 CMN (extended register) on page 16-854.
16.52 CMN (immediate) on page 16-856.
16.53 CMN (shifted register) on page 16-857.
16.54 CMP (extended register) on page 16-858.
16.55 CMP (immediate) on page 16-860.
16.56 CMP (shifted register) on page 16-861.
16.57 CNEG on page 16-862.
16.58 CRC32B, CRC32H, CRC32W, CRC32X on page 16-863.
16.59 CRC32CB, CRC32CH, CRC32CW, CRC32CX on page 16-864.
16.60 CSEL on page 16-865.
16.61 CSET on page 16-866.
16.62 CSETM on page 16-867.
16.63 CSINC on page 16-868.
16.64 CSINV on page 16-869.
16.65 CSNEG on page 16-870.
16.66 DC on page 16-871.
16.67 DCPS1 on page 16-872.
16.68 DCPS2 on page 16-873.
16.69 DCPS3 on page 16-874.
16.70 DMB on page 16-875.
16.71 DRPS on page 16-876.
16.72 DSB on page 16-877.
16.73 EON (shifted register) on page 16-878.
16.74 EOR (immediate) on page 16-879.
16.75 EOR (shifted register) on page 16-880.
16.76 ERET on page 16-881.
16.77 ERETAA, ERETAB on page 16-882.
16.78 ESB on page 16-883.
16.79 EXTR on page 16-884.
Copyright © 2014-2016 ARM Limited or its affiliates. All rights
reserved.
Non-Confidential

16-794

16 A64 General Instructions

16.80 HINT on page 16-885.
16.81 HLT on page 16-886.
16.82 HVC on page 16-887.
16.83 IC on page 16-888.
16.84 ISB on page 16-889.
16.85 LSL (register) on page 16-890.
16.86 LSL (immediate) on page 16-891.
16.87 LSLV on page 16-892.
16.88 LSR (register) on page 16-893.
16.89 LSR (immediate) on page 16-894.
16.90 LSRV on page 16-895.
16.91 MADD on page 16-896.
16.92 MNEG on page 16-897.
16.93 MOV (to or from SP) on page 16-898.
16.94 MOV (inverted wide immediate) on page 16-899.
16.95 MOV (wide immediate) on page 16-900.
16.96 MOV (bitmask immediate) on page 16-901.
16.97 MOV (register) on page 16-902.
16.98 MOVK on page 16-903.
16.99 MOVL pseudo-instruction on page 16-904.
16.100 MOVN on page 16-905.
16.101 MOVZ on page 16-906.
16.102 MRS on page 16-907.
16.103 MSR (immediate) on page 16-908.
16.104 MSR (register) on page 16-909.
16.105 MSUB on page 16-910.
16.106 MUL on page 16-911.
16.107 MVN on page 16-912.
16.108 NEG (shifted register) on page 16-913.
16.109 NEGS on page 16-914.
16.110 NGC on page 16-915.
16.111 NGCS on page 16-916.
16.112 NOP on page 16-917.
16.113 ORN (shifted register) on page 16-918.
16.114 ORR (immediate) on page 16-919.
16.115 ORR (shifted register) on page 16-920.
16.116 PACDA, PACDZA on page 16-921.
16.117 PACDB, PACDZB on page 16-922.
16.118 PACGA on page 16-923.
16.119 PACIA, PACIZA, PACIA1716, PACIASP, PACIAZ on page 16-924.
16.120 PACIB, PACIZB, PACIB1716, PACIBSP, PACIBZ on page 16-925.
16.121 PSB on page 16-926.
16.122 RBIT on page 16-927.
16.123 RET on page 16-928.
16.124 RETAA, RETAB on page 16-929.
16.125 REV16 on page 16-930.
16.126 REV32 on page 16-931.
16.127 REV64 on page 16-932.
16.128 REV on page 16-933.
16.129 ROR (immediate) on page 16-934.
16.130 ROR (register) on page 16-935.
16.131 RORV on page 16-936.
16.132 SBC on page 16-937.
16.133 SBCS on page 16-938.
16.134 SBFIZ on page 16-939.
16.135 SBFM on page 16-940.
Copyright © 2014-2016 ARM Limited or its affiliates. All rights
reserved.
Non-Confidential

16-795

16 A64 General Instructions

•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•

ARM DUI0801G

16.136 SBFX on page 16-941.
16.137 SDIV on page 16-942.
16.138 SEV on page 16-943.
16.139 SEVL on page 16-944.
16.140 SMADDL on page 16-945.
16.141 SMC on page 16-946.
16.142 SMNEGL on page 16-947.
16.143 SMSUBL on page 16-948.
16.144 SMULH on page 16-949.
16.145 SMULL on page 16-950.
16.146 SUB (extended register) on page 16-951.
16.147 SUB (immediate) on page 16-953.
16.148 SUB (shifted register) on page 16-954.
16.149 SUBS (extended register) on page 16-955.
16.150 SUBS (immediate) on page 16-957.
16.151 SUBS (shifted register) on page 16-958.
16.152 SVC on page 16-959.
16.153 SXTB on page 16-960.
16.154 SXTH on page 16-961.
16.155 SXTW on page 16-962.
16.156 SYS on page 16-963.
16.157 SYSL on page 16-964.
16.158 TBNZ on page 16-965.
16.159 TBZ on page 16-966.
16.160 TLBI on page 16-967.
16.161 TST (immediate) on page 16-969.
16.162 TST (shifted register) on page 16-970.
16.163 UBFIZ on page 16-971.
16.164 UBFM on page 16-972.
16.165 UBFX on page 16-973.
16.166 UDIV on page 16-974.
16.167 UMADDL on page 16-975.
16.168 UMNEGL on page 16-976.
16.169 UMSUBL on page 16-977.
16.170 UMULH on page 16-978.
16.171 UMULL on page 16-979.
16.172 UXTB on page 16-980.
16.173 UXTH on page 16-981.
16.174 WFE on page 16-982.
16.175 WFI on page 16-983.
16.176 XPACD, XPACI, XPACLRI on page 16-984.
16.177 YIELD on page 16-985.

16-796

16 A64 General Instructions
16.1 A64 instructions in alphabetical order

16.1

A64 instructions in alphabetical order
A summary of the A64 instructions and pseudo-instructions that are supported.
Table 16-1 Summary of A64 general instructions

Mnemonic

Brief description

See

ADC

Add with Carry

16.3 ADC on page 16-804

ADCS

Add with Carry, setting flags

16.4 ADCS on page 16-805

ADD (extended register)

Add (extended register)

16.5 ADD (extended register) on page 16-806

ADD (immediate)

Add (immediate)

16.6 ADD (immediate) on page 16-808

ADD (shifted register)

Add (shifted register)

16.7 ADD (shifted register) on page 16-809

ADDS (extended register)

Add (extended register), setting flags

16.8 ADDS (extended register) on page 16-810

ADDS (immediate)

Add (immediate), setting flags

16.9 ADDS (immediate) on page 16-812

ADDS (shifted register)

Add (shifted register), setting flags

16.10 ADDS (shifted register) on page 16-813

ADR

Form PC-relative address

16.11 ADR on page 16-814

ADRL pseudo-instruction

Load a PC-relative address into a register

16.12 ADRL pseudo-instruction on page 16-815

ADRP

Form PC-relative address to 4KB page

16.13 ADRP on page 16-816

AND (immediate)

Bitwise AND (immediate)

16.14 AND (immediate) on page 16-817

AND (shifted register)

Bitwise AND (shifted register)

16.15 AND (shifted register) on page 16-818

ANDS (immediate)

Bitwise AND (immediate), setting flags

16.16 ANDS (immediate) on page 16-819

ANDS (shifted register)

Bitwise AND (shifted register), setting
flags

16.17 ANDS (shifted register) on page 16-820

ASR (register)

Arithmetic Shift Right (register)

16.18 ASR (register) on page 16-821

ASR (immediate)

Arithmetic Shift Right (immediate)

16.19 ASR (immediate) on page 16-822

ASRV

Arithmetic Shift Right Variable

16.20 ASRV on page 16-823

Address Translate

16.21 AT on page 16-824

AUTDA, AUTDZA

Authenticate Data address, using key A

16.22 AUTDA, AUTDZA on page 16-825

AUTDB, AUTDZB

Authenticate Data address, using key B

16.23 AUTDB, AUTDZB on page 16-826

AUTIA, AUTIZA,
AUTIA1716, AUTIASP,
AUTIAZ

Authenticate Instruction address, using
key A

16.24 AUTIA, AUTIZA, AUTIA1716, AUTIASP, AUTIAZ
on page 16-827

AUTIB, AUTIZB,
AUTIB1716, AUTIBSP,
AUTIBZ

Authenticate Instruction address, using
key B

16.25 AUTIB, AUTIZB, AUTIB1716, AUTIBSP, AUTIBZ
on page 16-828

B.cond

Branch conditionally

16.26 B.cond on page 16-829

Branch

16.27 B on page 16-830

BFC

Bitfield Clear, leaving other bits
unchanged

16.28 BFC on page 16-831

BFI

Bitfield Insert

16.29 BFI on page 16-832

ARM DUI0801G

16-797

16 A64 General Instructions
16.1 A64 instructions in alphabetical order

Table 16-1 Summary of A64 general instructions (continued)
Mnemonic

Brief description

See

BFM

Bitfield Move

16.30 BFM on page 16-833

BFXIL

Bitfield extract and insert at low end

16.31 BFXIL on page 16-834

BIC (shifted register)

Bitwise Bit Clear (shifted register)

16.32 BIC (shifted register) on page 16-835

BICS (shifted register)

Bitwise Bit Clear (shifted register), setting 16.33 BICS (shifted register) on page 16-836
flags

Branch with Link

16.34 BL on page 16-837

BLR

Branch with Link to Register

16.35 BLR on page 16-838

BLRAA, BLRAAZ, BLRAB,
BLRABZ

Branch with Link to Register, with pointer 16.36 BLRAA, BLRAAZ, BLRAB, BLRABZ
authentication
on page 16-839

Branch to Register

16.37 BR on page 16-840

BRAA, BRAAZ, BRAB, BRABZ Branch to Register, with pointer
authentication

16.38 BRAA, BRAAZ, BRAB, BRABZ on page 16-841

BRK

Breakpoint instruction

16.39 BRK on page 16-842

CBNZ

Compare and Branch on Nonzero

16.40 CBNZ on page 16-843

CBZ

Compare and Branch on Zero

16.41 CBZ on page 16-844

CCMN (immediate)

Conditional Compare Negative
(immediate)

16.42 CCMN (immediate) on page 16-845

CCMN (register)

Conditional Compare Negative (register)

16.43 CCMN (register) on page 16-846

CCMP (immediate)

Conditional Compare (immediate)

16.44 CCMP (immediate) on page 16-847

CCMP (register)

Conditional Compare (register)

16.45 CCMP (register) on page 16-848

CINC

Conditional Increment

16.46 CINC on page 16-849

CINV

Conditional Invert

16.47 CINV on page 16-850

CLREX

Clear Exclusive

16.48 CLREX on page 16-851

CLS

Count leading sign bits

16.49 CLS on page 16-852

CLZ

Count leading zero bits

16.50 CLZ on page 16-853

CMN (extended register)

Compare Negative (extended register)

16.51 CMN (extended register) on page 16-854

CMN (immediate)

Compare Negative (immediate)

16.52 CMN (immediate) on page 16-856

CMN (shifted register)

Compare Negative (shifted register)

16.53 CMN (shifted register) on page 16-857

CMP (extended register)

Compare (extended register)

16.54 CMP (extended register) on page 16-858

CMP (immediate)

Compare (immediate)

16.55 CMP (immediate) on page 16-860

CMP (shifted register)

Compare (shifted register)

16.56 CMP (shifted register) on page 16-861

CNEG

Conditional Negate

16.57 CNEG on page 16-862

CRC32B, CRC32H, CRC32W,
CRC32X

CRC32 checksum

16.58 CRC32B, CRC32H, CRC32W, CRC32X
on page 16-863

CRC32CB, CRC32CH,
CRC32CW, CRC32CX

CRC32C checksum

16.59 CRC32CB, CRC32CH, CRC32CW, CRC32CX
on page 16-864

ARM DUI0801G

16-798

16 A64 General Instructions
16.1 A64 instructions in alphabetical order

Table 16-1 Summary of A64 general instructions (continued)
Mnemonic

Brief description

See

CSEL

Conditional Select

16.60 CSEL on page 16-865

CSET

Conditional Set

16.61 CSET on page 16-866

CSETM

Conditional Set Mask

16.62 CSETM on page 16-867

CSINC

Conditional Select Increment

16.63 CSINC on page 16-868

CSINV

Conditional Select Invert

16.64 CSINV on page 16-869

CSNEG

Conditional Select Negation

16.65 CSNEG on page 16-870

Data Cache operation

16.66 DC on page 16-871

DCPS1

Debug Change PE State to EL1

16.67 DCPS1 on page 16-872

DCPS2

Debug Change PE State to EL2

16.68 DCPS2 on page 16-873

DCPS3

Debug Change PE State to EL3

16.69 DCPS3 on page 16-874

DMB

Data Memory Barrier

16.70 DMB on page 16-875

DRPS

Debug restore process state

16.71 DRPS on page 16-876

DSB

Data Synchronization Barrier

16.72 DSB on page 16-877

EON (shifted register)

Bitwise Exclusive OR NOT (shifted
register)

16.73 EON (shifted register) on page 16-878

EOR (immediate)

Bitwise Exclusive OR (immediate)

16.74 EOR (immediate) on page 16-879

EOR (shifted register)

Bitwise Exclusive OR (shifted register)

16.75 EOR (shifted register) on page 16-880

ERET

Returns from an exception

16.76 ERET on page 16-881

ERETAA, ERETAB

Exception Return, with pointer
authentication

16.77 ERETAA, ERETAB on page 16-882

ESB

Error Synchronization Barrier

16.78 ESB on page 16-883

EXTR

Extract register

16.79 EXTR on page 16-884

HINT

Hint instruction

16.80 HINT on page 16-885

HLT

Halt instruction

16.81 HLT on page 16-886

HVC

Hypervisor call to allow OS code to call
the Hypervisor

16.82 HVC on page 16-887

Instruction Cache operation

16.83 IC on page 16-888

ISB

Instruction Synchronization Barrier

16.84 ISB on page 16-889

LSL (register)

Logical Shift Left (register)

16.85 LSL (register) on page 16-890

LSL (immediate)

Logical Shift Left (immediate)

16.86 LSL (immediate) on page 16-891

LSLV

Logical Shift Left Variable

16.87 LSLV on page 16-892

LSR (register)

Logical Shift Right (register)

16.88 LSR (register) on page 16-893

LSR (immediate)

Logical Shift Right (immediate)

16.89 LSR (immediate) on page 16-894

LSRV

Logical Shift Right Variable

16.90 LSRV on page 16-895

MADD

Multiply-Add

16.91 MADD on page 16-896

MNEG

Multiply-Negate

16.92 MNEG on page 16-897

ARM DUI0801G

16-799

16 A64 General Instructions
16.1 A64 instructions in alphabetical order

Table 16-1 Summary of A64 general instructions (continued)
Mnemonic

Brief description

See

MOV (to or from SP)

Move between register and stack pointer

16.93 MOV (to or from SP) on page 16-898

MOV (inverted wide
immediate)

Move (inverted wide immediate)

16.94 MOV (inverted wide immediate) on page 16-899

MOV (wide immediate)

Move (wide immediate)

16.95 MOV (wide immediate) on page 16-900

MOV (bitmask immediate)

Move (bitmask immediate)

16.96 MOV (bitmask immediate) on page 16-901

MOV (register)

Move (register)

16.97 MOV (register) on page 16-902

MOVK

Move wide with keep

16.98 MOVK on page 16-903

MOVL pseudo-instruction

Load a register with either a 32-bit or 64bit immediate value or any address

16.99 MOVL pseudo-instruction on page 16-904

MOVN

Move wide with NOT

16.100 MOVN on page 16-905

MOVZ

Move wide with zero

16.101 MOVZ on page 16-906

MRS

Move System Register

16.102 MRS on page 16-907

MSR (immediate)

Move immediate value to Special Register 16.103 MSR (immediate) on page 16-908

MSR (register)

Move general-purpose register to System
Register

16.104 MSR (register) on page 16-909

MSUB

Multiply-Subtract

16.105 MSUB on page 16-910

MUL

Multiply

16.106 MUL on page 16-911

MVN

Bitwise NOT

16.107 MVN on page 16-912

NEG (shifted register)

Negate (shifted register)

16.108 NEG (shifted register) on page 16-913

NEGS

Negate, setting flags

16.109 NEGS on page 16-914

NGC

Negate with Carry

16.110 NGC on page 16-915

NGCS

Negate with Carry, setting flags

16.111 NGCS on page 16-916

NOP

No Operation

16.112 NOP on page 16-917

ORN (shifted register)

Bitwise OR NOT (shifted register)

16.113 ORN (shifted register) on page 16-918

ORR (immediate)

Bitwise OR (immediate)

16.114 ORR (immediate) on page 16-919

ORR (shifted register)

Bitwise OR (shifted register)

16.115 ORR (shifted register) on page 16-920

PACDA, PACDZA

Pointer Authentication Code for Data
address, using key A

16.116 PACDA, PACDZA on page 16-921

PACDB, PACDZB

Pointer Authentication Code for Data
address, using key B

16.117 PACDB, PACDZB on page 16-922

PACGA

Pointer Authentication Code, using
Generic key

16.118 PACGA on page 16-923

PACIA, PACIZA,
PACIA1716, PACIASP,
PACIAZ

Pointer Authentication Code for
Instruction address, using key A

16.119 PACIA, PACIZA, PACIA1716, PACIASP, PACIAZ
on page 16-924

PACIB, PACIZB,
PACIB1716, PACIBSP,
PACIBZ

Pointer Authentication Code for
Instruction address, using key B

16.120 PACIB, PACIZB, PACIB1716, PACIBSP, PACIBZ
on page 16-925

ARM DUI0801G

16-800

16 A64 General Instructions
16.1 A64 instructions in alphabetical order

Table 16-1 Summary of A64 general instructions (continued)
Mnemonic

Brief description

See

PSB

Profiling Synchronization Barrier

16.121 PSB on page 16-926

RBIT

Reverse Bits

16.122 RBIT on page 16-927

RET

Return from subroutine

16.123 RET on page 16-928

RETAA, RETAB

Return from subroutine, with pointer
authentication

16.124 RETAA, RETAB on page 16-929

REV16

Reverse bytes in 16-bit halfwords

16.125 REV16 on page 16-930

REV32

Reverse bytes in 32-bit words

16.126 REV32 on page 16-931

REV64

Reverse Bytes

16.127 REV64 on page 16-932

REV

Reverse Bytes

16.128 REV on page 16-933

ROR (immediate)

Rotate right (immediate)

16.129 ROR (immediate) on page 16-934

ROR (register)

Rotate Right (register)

16.130 ROR (register) on page 16-935

RORV

Rotate Right Variable

16.131 RORV on page 16-936

SBC

Subtract with Carry

16.132 SBC on page 16-937

SBCS

Subtract with Carry, setting flags

16.133 SBCS on page 16-938

SBFIZ

Signed Bitfield Insert in Zero

16.134 SBFIZ on page 16-939

SBFM

Signed Bitfield Move

16.135 SBFM on page 16-940

SBFX

Signed Bitfield Extract

16.136 SBFX on page 16-941

SDIV

Signed Divide

16.137 SDIV on page 16-942

SEV

Send Event

16.138 SEV on page 16-943

SEVL

Send Event Local

16.139 SEVL on page 16-944

SMADDL

Signed Multiply-Add Long

16.140 SMADDL on page 16-945

SMC

Supervisor call to allow OS or Hypervisor 16.141 SMC on page 16-946
code to call the Secure Monitor

SMNEGL

Signed Multiply-Negate Long

16.142 SMNEGL on page 16-947

SMSUBL

Signed Multiply-Subtract Long

16.143 SMSUBL on page 16-948

SMULH

Signed Multiply High

16.144 SMULH on page 16-949

SMULL

Signed Multiply Long

16.145 SMULL on page 16-950

SUB (extended register)

Subtract (extended register)

16.146 SUB (extended register) on page 16-951

SUB (immediate)

Subtract (immediate)

16.147 SUB (immediate) on page 16-953

SUB (shifted register)

Subtract (shifted register)

16.148 SUB (shifted register) on page 16-954

SUBS (extended register)

Subtract (extended register), setting flags

16.149 SUBS (extended register) on page 16-955

SUBS (immediate)

Subtract (immediate), setting flags

16.150 SUBS (immediate) on page 16-957

SUBS (shifted register)

Subtract (shifted register), setting flags

16.151 SUBS (shifted register) on page 16-958

SVC

Supervisor call to allow application code
to call the OS

16.152 SVC on page 16-959

SXTB

Signed Extend Byte

16.153 SXTB on page 16-960

ARM DUI0801G

16-801

16 A64 General Instructions
16.1 A64 instructions in alphabetical order

Table 16-1 Summary of A64 general instructions (continued)
Mnemonic

Brief description

See

SXTH

Sign Extend Halfword

16.154 SXTH on page 16-961

SXTW

Sign Extend Word

16.155 SXTW on page 16-962

SYS

System instruction

16.156 SYS on page 16-963

SYSL

System instruction with result

16.157 SYSL on page 16-964

TBNZ

Test bit and Branch if Nonzero

16.158 TBNZ on page 16-965

TBZ

Test bit and Branch if Zero

16.159 TBZ on page 16-966

TLBI

TLB Invalidate operation

16.160 TLBI on page 16-967

TST (immediate)

, setting the condition flags and discarding 16.161 TST (immediate) on page 16-969
the result

TST (shifted register)

Test (shifted register)

16.162 TST (shifted register) on page 16-970

UBFIZ

Unsigned Bitfield Insert in Zero

16.163 UBFIZ on page 16-971

UBFM

Unsigned Bitfield Move

16.164 UBFM on page 16-972

UBFX

Unsigned Bitfield Extract

16.165 UBFX on page 16-973

UDIV

Unsigned Divide

16.166 UDIV on page 16-974

UMADDL

Unsigned Multiply-Add Long

16.167 UMADDL on page 16-975

UMNEGL

Unsigned Multiply-Negate Long

16.168 UMNEGL on page 16-976

UMSUBL

Unsigned Multiply-Subtract Long

16.169 UMSUBL on page 16-977

UMULH

Unsigned Multiply High

16.170 UMULH on page 16-978

UMULL

Unsigned Multiply Long

16.171 UMULL on page 16-979

UXTB

Unsigned Extend Byte

16.172 UXTB on page 16-980

UXTH

Unsigned Extend Halfword

16.173 UXTH on page 16-981

WFE

Wait For Event

16.174 WFE on page 16-982

WFI

Wait For Interrupt

16.175 WFI on page 16-983

XPACD, XPACI, XPACLRI

Strip Pointer Authentication Code

16.176 XPACD, XPACI, XPACLRI on page 16-984

YIELD

16.177 YIELD on page 16-985

ARM DUI0801G

16-802

16 A64 General Instructions
16.2 Register restrictions for A64 instructions

16.2

Register restrictions for A64 instructions
In A64 instructions, the general-purpose integer registers are W0-W30 for 32-bit registers and X0-X30
for 64-bit registers.
You cannot refer to register 31 by number. In a few instructions, you can refer to it using one of the
following names:
WSP

the current stack pointer in a 32-bit context.
SP

the current stack pointer in a 64-bit context.
WZR

the zero register in a 32-bit context.
XZR

the zero register in a 64-bit context.
You can only use one of these names if it is mentioned in the Syntax section for the instruction.
You cannot refer to the Program Counter (PC) explicitly by name or by number.

ARM DUI0801G

16-803

16 A64 General Instructions
16.3 ADC

16.3

ADC
Add with Carry.
Syntax
ADC Wd, Wn, Wm ; 32-bit general registers
ADC Xd, Xn, Xm ; 64-bit general registers

Where:
Wd

Is the 32-bit name of the general-purpose destination register.
Wn

Is the 32-bit name of the first general-purpose source register.
Wm

Is the 32-bit name of the second general-purpose source register.
Xd

Is the 64-bit name of the general-purpose destination register.
Xn

Is the 64-bit name of the first general-purpose source register.
Xm

Is the 64-bit name of the second general-purpose source register.
Usage
Add with Carry adds two register values and the Carry flag value, and writes the result to the destination
register.
Related references
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-804

16 A64 General Instructions
16.4 ADCS

16.4

ADCS
Add with Carry, setting flags.
Syntax
ADCS Wd, Wn, Wm ; 32-bit general registers
ADCS Xd, Xn, Xm ; 64-bit general registers

Where:
Wd

Is the 32-bit name of the general-purpose destination register.
Wn

Is the 32-bit name of the first general-purpose source register.
Wm

Is the 32-bit name of the second general-purpose source register.
Xd

Is the 64-bit name of the general-purpose destination register.
Xn

Is the 64-bit name of the first general-purpose source register.
Xm

Is the 64-bit name of the second general-purpose source register.
Usage
Add with Carry, setting flags, adds two register values and the Carry flag value, and writes the result to
the destination register. It updates the condition flags based on the result.
Related references
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-805

16 A64 General Instructions
16.5 ADD (extended register)

16.5

ADD (extended register)
Add (extended register).
Syntax
ADD Wd|WSP, Wn|WSP, Wm{, extend {#amount}} ; 32-bit general registers
ADD Xd|SP, Xn|SP, Rm{, extend {#amount}} ; 64-bit general registers

Where:
Wd|WSP

Is the 32-bit name of the destination general-purpose register or stack pointer.
Wn|WSP

Is the 32-bit name of the first source general-purpose register or stack pointer.
Wm

Is the 32-bit name of the second general-purpose source register.
extend

Is the extension to be applied to the second source operand:
32-bit general registers
Can be one of UXTB, UXTH, LSL|UXTW, UXTX, SXTB, SXTH, SXTW or SXTX.
If Rd or Rn is WSP then LSL is preferred rather than UXTW, and can be omitted when
amount is 0. In all other cases extend is required and must be UXTW rather than LSL.
64-bit general registers
Can be one of UXTB, UXTH, UXTW, LSL|UXTX, SXTB, SXTH, SXTW or SXTX.
If Rd or Rn is SP then LSL is preferred rather than UXTX, and can be omitted when
amount is 0. In all other cases extend is required and must be UXTX rather than LSL.
Xd|SP

Is the 64-bit name of the destination general-purpose register or stack pointer.
Xn|SP

Is the 64-bit name of the first source general-purpose register or stack pointer.
R

Is a width specifier, and can be either W or X.
m

Is the number [0-30] of the second general-purpose source register or the name ZR (31).
amount

Is the left shift amount to be applied after extension in the range 0 to 4, defaulting to 0. It must
be absent when extend is absent, is required when extend is LSL, and is optional when extend
is present but not LSL.
Usage
Add (extended register) adds a register value and a sign or zero-extended register value, followed by an
optional left shift amount, and writes the result to the destination register. The argument that is extended
from the Rm register can be a byte, halfword, word, or doubleword.
Table 16-2 ADD (64-bit general registers) specifier combinations
R

extend

W SXTB
W SXTH
W SXTW
W UXTB

ARM DUI0801G

16-806

16 A64 General Instructions
16.5 ADD (extended register)

Table 16-2 ADD (64-bit general registers) specifier combinations (continued)
R

extend

W UXTH
W UXTW
X LSL|UXTX
X SXTX

Related references
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-807

16 A64 General Instructions
16.6 ADD (immediate)

16.6

ADD (immediate)
Add (immediate).
This instruction is used by the alias MOV (to or from SP).
Syntax
ADD Wd|WSP, Wn|WSP, #imm{, shift} ; 32-bit general registers
ADD Xd|SP, Xn|SP, #imm{, shift} ; 64-bit general registers

Where:
Wd|WSP

Is the 32-bit name of the destination general-purpose register or stack pointer.
Wn|WSP

Is the 32-bit name of the source general-purpose register or stack pointer.
Xd|SP

Is the 64-bit name of the destination general-purpose register or stack pointer.
Xn|SP

Is the 64-bit name of the source general-purpose register or stack pointer.
imm

Is an unsigned immediate, in the range 0 to 4095.
shift

Is the optional left shift to apply to the immediate, defaulting to LSL #0, and can be either LSL
#0 or LSL #12.

Usage
Add (immediate) adds a register value and an optionally-shifted immediate value, and writes the result to
the destination register.
Related references
16.93 MOV (to or from SP) on page 16-898.
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-808

16 A64 General Instructions
16.7 ADD (shifted register)

16.7

ADD (shifted register)
Add (shifted register).
Syntax
ADD Wd, Wn, Wm{, shift #amount} ; 32-bit general registers
ADD Xd, Xn, Xm{, shift #amount} ; 64-bit general registers

Where:
Wd

Is the 32-bit name of the general-purpose destination register.
Wn

Is the 32-bit name of the first general-purpose source register.
Wm

Is the 32-bit name of the second general-purpose source register.
amount

Depends on the instruction variant:
32-bit general registers
Is the shift amount, in the range 0 to 31, defaulting to 0.
64-bit general registers
Is the shift amount, in the range 0 to 63, defaulting to 0.
Xd

Is the 64-bit name of the general-purpose destination register.
Xn

Is the 64-bit name of the first general-purpose source register.
Xm

Is the 64-bit name of the second general-purpose source register.
shift

Is the optional shift type to be applied to the second source operand, defaulting to LSL, and can
be one of LSL, LSR, or ASR.
Usage
Add (shifted register) adds a register value and an optionally-shifted register value, and writes the result
to the destination register.
Related references
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-809

16 A64 General Instructions
16.8 ADDS (extended register)

16.8

ADDS (extended register)
Add (extended register), setting flags.
This instruction is used by the alias CMN (extended register).
Syntax
ADDS Wd, Wn|WSP, Wm{, extend {#amount}} ; 32-bit general registers
ADDS Xd, Xn|SP, Rm{, extend {#amount}} ; 64-bit general registers

Where:
Wd

Is the 32-bit name of the general-purpose destination register.
Wn|WSP

Is the 32-bit name of the first source general-purpose register or stack pointer.
Wm

Is the 32-bit name of the second general-purpose source register.
extend

Is the extension to be applied to the second source operand:
32-bit general registers
Can be one of UXTB, UXTH, LSL|UXTW, UXTX, SXTB, SXTH, SXTW or SXTX.
If Rn is WSP then LSL is preferred rather than UXTW, and can be omitted when amount is
0. In all other cases extend is required and must be UXTW rather than LSL.
64-bit general registers
Can be one of UXTB, UXTH, UXTW, LSL|UXTX, SXTB, SXTH, SXTW or SXTX.
If Rn is SP then LSL is preferred rather than UXTX, and can be omitted when amount is 0.
In all other cases extend is required and must be UXTX rather than LSL.
Xd

Is the 64-bit name of the general-purpose destination register.
Xn|SP

Is the 64-bit name of the first source general-purpose register or stack pointer.
R

Is a width specifier, and can be either W or X.
m

Is the number [0-30] of the second general-purpose source register or the name ZR (31).
amount

Is the left shift amount to be applied after extension in the range 0 to 4, defaulting to 0. It must
be absent when extend is absent, is required when extend is LSL, and is optional when extend
is present but not LSL.
Usage
Add (extended register), setting flags, adds a register value and a sign or zero-extended register value,
followed by an optional left shift amount, and writes the result to the destination register. The argument
that is extended from the Rm register can be a byte, halfword, word, or doubleword. It updates the
condition flags based on the result.
Table 16-3 ADDS (64-bit general registers) specifier combinations
R

extend

W SXTB
W SXTH

ARM DUI0801G

16-810

16 A64 General Instructions
16.8 ADDS (extended register)

Table 16-3 ADDS (64-bit general registers) specifier combinations (continued)
R

extend

W SXTW
W UXTB
W UXTH
W UXTW
X LSL|UXTX
X SXTX

Related references
16.51 CMN (extended register) on page 16-854.
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-811

16 A64 General Instructions
16.9 ADDS (immediate)

16.9

ADDS (immediate)
Add (immediate), setting flags.
This instruction is used by the alias CMN (immediate).
Syntax
ADDS Wd, Wn|WSP, #imm{, shift} ; 32-bit general registers
ADDS Xd, Xn|SP, #imm{, shift} ; 64-bit general registers

Where:
Wd

Is the 32-bit name of the general-purpose destination register.
Wn|WSP

Is the 32-bit name of the source general-purpose register or stack pointer.
Xd

Is the 64-bit name of the general-purpose destination register.
Xn|SP

Is the 64-bit name of the source general-purpose register or stack pointer.
imm

Is an unsigned immediate, in the range 0 to 4095.
shift

Is the optional left shift to apply to the immediate, defaulting to LSL #0, and can be either LSL
#0 or LSL #12.
Usage
Add (immediate), setting flags, adds a register value and an optionally-shifted immediate value, and
writes the result to the destination register. It updates the condition flags based on the result.
Related references
16.52 CMN (immediate) on page 16-856.
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-812

16 A64 General Instructions
16.10 ADDS (shifted register)

16.10

ADDS (shifted register)
Add (shifted register), setting flags.
This instruction is used by the alias CMN (shifted register).
Syntax
ADDS Wd, Wn, Wm{, shift #amount} ; 32-bit general registers
ADDS Xd, Xn, Xm{, shift #amount} ; 64-bit general registers

Where:
Wd

Is the 32-bit name of the general-purpose destination register.
Wn

Is the 32-bit name of the first general-purpose source register.
Wm

Is the 32-bit name of the second general-purpose source register.
amount

Is the 64-bit name of the general-purpose destination register.
Xn

Is the 64-bit name of the first general-purpose source register.
Xm

Is the 64-bit name of the second general-purpose source register.
shift

Is the optional shift type to be applied to the second source operand, defaulting to LSL, and can
be one of LSL, LSR, or ASR.
Usage
Add (shifted register), setting flags, adds a register value and an optionally-shifted register value, and
writes the result to the destination register. It updates the condition flags based on the result.
Related references
16.53 CMN (shifted register) on page 16-857.
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-813

16 A64 General Instructions
16.11 ADR

16.11

ADR
Form PC-relative address.
Syntax
ADR Xd, label

Where:
Xd

Is the 64-bit name of the general-purpose destination register.
label

Is the program label whose address is to be calculated. Its offset from the address of this
instruction, in the range ±1MB.
Usage
Form PC-relative address adds an immediate value to the PC value to form a PC-relative address, and
writes the result to the destination register.
Related references
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-814

16 A64 General Instructions
16.12 ADRL pseudo-instruction

16.12

ADRL pseudo-instruction
Load a PC-relative address into a register. It is similar to the ADR instruction but ADRL can load a wider
range of addresses than ADR because it generates two data processing instructions.
Syntax
ADRL Wd,label
ADRL Xd,label

where:
Wd

Is the register to load with a 32-bit address.
Xd

Is the register to load with a 64-bit address.
label

Is a PC-relative expression.
Usage
ADRL assembles to two instructions, an ADRP followed by ADD.

If the assembler cannot construct the address in two instructions, it generates a relocation. The linker
then generates the correct offsets.
ADRL produces position-independent code, because the address is calculated relative to PC.

Example
ADRL

x0, mylabel

; loads address of mylabel into x0

Related concepts
12.5 Register-relative and PC-relative expressions on page 12-302.
Related information
ARMv8-A Architecture Reference Manual.

ARM DUI0801G

16-815

16 A64 General Instructions
16.13 ADRP

16.13

ADRP
Form PC-relative address to 4KB page.
Syntax
ADRP Xd, label

Where:
Xd

Is the 64-bit name of the general-purpose destination register.
label

Is the program label whose 4KB page address is to be calculated. Its offset from the page
address of this instruction, in the range ±4GB.
Usage
Form PC-relative address to 4KB page adds an immediate value that is shifted left by 12 bits, to the PC
value to form a PC-relative address, with the bottom 12 bits masked out, and writes the result to the
destination register.
Related references
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-816

16 A64 General Instructions
16.14 AND (immediate)

16.14

AND (immediate)
Bitwise AND (immediate).
Syntax
AND Wd|WSP, Wn, #imm ; 32-bit general registers
AND Xd|SP, Xn, #imm ; 64-bit general registers

Where:
Wd|WSP

Is the 32-bit name of the destination general-purpose register or stack pointer.
Wn

Is the 32-bit name of the general-purpose source register.
imm

The bitmask immediate.
Xd|SP

Is the 64-bit name of the destination general-purpose register or stack pointer.
Xn

Is the 64-bit name of the general-purpose source register.
Usage
Bitwise AND (immediate) performs a bitwise AND of a register value and an immediate value, and
writes the result to the destination register.
Related references
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-817

16 A64 General Instructions
16.15 AND (shifted register)

16.15

AND (shifted register)
Bitwise AND (shifted register).
Syntax
AND Wd, Wn, Wm{, shift #amount} ; 32-bit general registers
AND Xd, Xn, Xm{, shift #amount} ; 64-bit general registers

Where:
Wd

Is the 32-bit name of the general-purpose destination register.
Wn

Is the 32-bit name of the first general-purpose source register.
Wm

Is the 32-bit name of the second general-purpose source register.
amount

Is the 64-bit name of the general-purpose destination register.
Xn

Is the 64-bit name of the first general-purpose source register.
Xm

Is the 64-bit name of the second general-purpose source register.
shift

Is the optional shift to be applied to the final source, defaulting to LSL, and can be one of LSL,
LSR, ASR, or ROR.

Usage
Bitwise AND (shifted register) performs a bitwise AND of a register value and an optionally-shifted
register value, and writes the result to the destination register.
Related references
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-818

16 A64 General Instructions
16.16 ANDS (immediate)

16.16

ANDS (immediate)
Bitwise AND (immediate), setting flags.
This instruction is used by the alias TST (immediate).
Syntax
ANDS Wd, Wn, #imm ; 32-bit general registers
ANDS Xd, Xn, #imm ; 64-bit general registers

Where:
Wd

Is the 32-bit name of the general-purpose destination register.
Wn

Is the 32-bit name of the general-purpose source register.
imm

The bitmask immediate.
Xd

Is the 64-bit name of the general-purpose destination register.
Xn

Is the 64-bit name of the general-purpose source register.
Usage
Bitwise AND (immediate), setting flags, performs a bitwise AND of a register value and an immediate
value, and writes the result to the destination register. It updates the condition flags based on the result.
Related references
16.161 TST (immediate) on page 16-969.
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-819

16 A64 General Instructions
16.17 ANDS (shifted register)

16.17

ANDS (shifted register)
Bitwise AND (shifted register), setting flags.
This instruction is used by the alias TST (shifted register).
Syntax
ANDS Wd, Wn, Wm{, shift #amount} ; 32-bit general registers
ANDS Xd, Xn, Xm{, shift #amount} ; 64-bit general registers

Where:
Wd

Is the 32-bit name of the general-purpose destination register.
Wn

Is the 32-bit name of the first general-purpose source register.
Wm

Is the 32-bit name of the second general-purpose source register.
amount

Is the 64-bit name of the general-purpose destination register.
Xn

Is the 64-bit name of the first general-purpose source register.
Xm

Is the 64-bit name of the second general-purpose source register.
shift

Is the optional shift to be applied to the final source, defaulting to LSL, and can be one of LSL,
LSR, ASR, or ROR.
Usage
Bitwise AND (shifted register), setting flags, performs a bitwise AND of a register value and an
optionally-shifted register value, and writes the result to the destination register. It updates the condition
flags based on the result.
Related references
16.162 TST (shifted register) on page 16-970.
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-820

16 A64 General Instructions
16.18 ASR (register)

16.18

ASR (register)
Arithmetic Shift Right (register).
This instruction is an alias of ASRV.
The equivalent instruction is ASRV Wd, Wn, Wm.
Syntax
ASR Wd, Wn, Wm ; 32-bit general registers
ASR Xd, Xn, Xm ; 64-bit general registers

Where:
Wd

Is the 32-bit name of the general-purpose destination register.
Wn

Is the 32-bit name of the first general-purpose source register.
Wm

Is the 32-bit name of the second general-purpose source register holding a shift amount from 0
to 31 in its bottom 5 bits.
Xd

Is the 64-bit name of the general-purpose destination register.
Xn

Is the 64-bit name of the first general-purpose source register.
Xm

Is the 64-bit name of the second general-purpose source register holding a shift amount from 0
to 63 in its bottom 6 bits.
Usage
Arithmetic Shift Right (register) shifts a register value right by a variable number of bits, shifting in
copies of its sign bit, and writes the result to the destination register. The remainder obtained by dividing
the second source register by the data size defines the number of bits by which the first source register is
right-shifted.
Related references
16.20 ASRV on page 16-823.
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-821

16 A64 General Instructions
16.19 ASR (immediate)

16.19

ASR (immediate)
Arithmetic Shift Right (immediate).
This instruction is an alias of SBFM.
The equivalent instruction is SBFM Wd, Wn, #shift, #31.
Syntax
ASR Wd, Wn, #shift ; 32-bit general registers
ASR Xd, Xn, #shift ; 64-bit general registers

Where:
Wd

Is the 32-bit name of the general-purpose destination register.
Wn

Is the 32-bit name of the general-purpose source register.
shift

Depends on the instruction variant:
32-bit general registers
Is the shift amount, in the range 0 to 31.
64-bit general registers
Is the shift amount, in the range 0 to 63.
Xd

Is the 64-bit name of the general-purpose destination register.
Xn

Is the 64-bit name of the general-purpose source register.
Usage
Arithmetic Shift Right (immediate) shifts a register value right by an immediate number of bits, shifting
in copies of the sign bit in the upper bits and zeros in the lower bits, and writes the result to the
destination register.
Related references
16.135 SBFM on page 16-940.
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-822

16 A64 General Instructions
16.20 ASRV

16.20

ASRV
Arithmetic Shift Right Variable.
This instruction is used by the alias ASR (register).
Syntax
ASRV Wd, Wn, Wm ; 32-bit general registers
ASRV Xd, Xn, Xm ; 64-bit general registers

Where:
Wd

Is the 32-bit name of the general-purpose destination register.
Wn

Is the 32-bit name of the first general-purpose source register.
Wm

Is the 32-bit name of the second general-purpose source register holding a shift amount from 0
to 31 in its bottom 5 bits.
Xd

Is the 64-bit name of the general-purpose destination register.
Xn

Is the 64-bit name of the first general-purpose source register.
Xm

Is the 64-bit name of the second general-purpose source register holding a shift amount from 0
to 63 in its bottom 6 bits.
Usage
Arithmetic Shift Right Variable shifts a register value right by a variable number of bits, shifting in
copies of its sign bit, and writes the result to the destination register. The remainder obtained by dividing
the second source register by the data size defines the number of bits by which the first source register is
right-shifted.
Related references
16.18 ASR (register) on page 16-821.
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-823

16 A64 General Instructions
16.21 AT

16.21

AT
Address Translate.
This instruction is an alias of SYS.
The equivalent instruction is SYS #op1, C7, Cm, #op2, Xt.
Syntax
AT at_op, Xt

Where:
at_op

Is an AT instruction name, as listed for the AT system instruction group, and can be one of the
values shown in Usage.
op1

Is a 3-bit unsigned immediate, in the range 0 to 7.
Cm

Is a name Cm, with m in the range 0 to 15.
op2

Is a 3-bit unsigned immediate, in the range 0 to 7.
Xt

Is the 64-bit name of the general-purpose source register.
Usage
Address Translate. For more information, see A64 system instructions for address translation in the
ARMv8-A Architecture Reference Manual.
The following table shows the valid specifier combinations:
Table 16-4 SYS parameter values corresponding to AT operations
at_op op1 op2
000

000

001

010

011

100

101

110

111

Related references
16.156 SYS on page 16-963.
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-824

16 A64 General Instructions
16.22 AUTDA, AUTDZA

16.22

AUTDA, AUTDZA
Authenticate Data address, using key A.
Syntax
AUTDA Xd, Xn|SP ; AUTDA general registers
AUTDZA Xd ; AUTDZA general registers

Where:
Xn|SP

Is the 64-bit name of the general-purpose source register or stack pointer.
Xd

Is the 64-bit name of the general-purpose destination register.
Architectures supported
Supported in ARMv8.3.
Usage
Authenticate Data address, using key A. This instruction authenticates a data address, using a modifier
and key A.
The address is in the general-purpose register that is specified by Xd.
The modifier is:
• In the general-purpose register or stack pointer that is specified by Xn|SP for AUTDA.
• The value zero, for AUTDZA.
If the authentication passes, the upper bits of the address are restored to enable subsequent use of the
address. If the authentication fails, the upper bits are corrupted and any subsequent use of the address
results in a Translation fault.
Related references
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-825

16 A64 General Instructions
16.23 AUTDB, AUTDZB

16.23

AUTDB, AUTDZB
Authenticate Data address, using key B.
Syntax
AUTDB Xd, Xn|SP ; AUTDB general registers
AUTDZB Xd ; AUTDZB general registers

Where:
Xn|SP

Is the 64-bit name of the general-purpose source register or stack pointer.
Xd

Is the 64-bit name of the general-purpose destination register.
Architectures supported
Supported in ARMv8.3.
Usage
Authenticate Data address, using key B. This instruction authenticates a data address, using a modifier
and key B.
The address is in the general-purpose register that is specified by Xd.
The modifier is:
• In the general-purpose register or stack pointer that is specified by Xn|SP for AUTDB.
• The value zero, for AUTDZB.
If the authentication passes, the upper bits of the address are restored to enable subsequent use of the
address. If the authentication fails, the upper bits are corrupted and any subsequent use of the address
results in a Translation fault.
Related references
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-826

16 A64 General Instructions
16.24 AUTIA, AUTIZA, AUTIA1716, AUTIASP, AUTIAZ

16.24

AUTIA, AUTIZA, AUTIA1716, AUTIASP, AUTIAZ
Authenticate Instruction address, using key A.
Syntax
AUTIA Xd, Xn|SP ; AUTIA general registers
AUTIZA Xd ; AUTIZA general registers
AUTIA1716
AUTIASP
AUTIAZ

Where:
Xd

Is the 64-bit name of the general-purpose destination register.
Xn|SP

Is the 64-bit name of the general-purpose source register or stack pointer.
Architectures supported
Supported in ARMv8.3.
Usage
Authenticate Instruction address, using key A. This instruction authenticates an instruction address, using
a modifier and key A.
The address is:
•
•
•

In the general-purpose register that is specified by Xd for AUTIA and AUTIZA.
In X17, for AUTIA1716.
In X30, for AUTIASP and AUTIAZ.

The modifier is:
• In the general-purpose register or stack pointer that is specified by Xn|SP for AUTIA.
• The value zero, for AUTIZA and AUTIAZ.
• In X16, for AUTIA1716.
• In SP, for AUTIASP.
If the authentication passes, the upper bits of the address are restored to enable subsequent use of the
address. If the authentication fails, the upper bits are corrupted and any subsequent use of the address
results in a Translation fault.
Related references
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-827

16 A64 General Instructions
16.25 AUTIB, AUTIZB, AUTIB1716, AUTIBSP, AUTIBZ

16.25

AUTIB, AUTIZB, AUTIB1716, AUTIBSP, AUTIBZ
Authenticate Instruction address, using key B.
Syntax
AUTIB Xd, Xn|SP ; AUTIB general registers
AUTIZB Xd ; AUTIZB general registers
AUTIB1716
AUTIBSP
AUTIBZ

Where:
Xd

Is the 64-bit name of the general-purpose destination register.
Xn|SP

Is the 64-bit name of the general-purpose source register or stack pointer.
Architectures supported
Supported in ARMv8.3.
Usage
Authenticate Instruction address, using key B. This instruction authenticates an instruction address, using
a modifier and key B.
The address is:
•
•
•

In the general-purpose register that is specified by Xd for AUTIB and AUTIZB.
In X17, for AUTIB1716.
In X30, for AUTIBSP and AUTIBZ.

The modifier is:
• In the general-purpose register or stack pointer that is specified by Xn|SP for AUTIB.
• The value zero, for AUTIZB and AUTIBZ.
• In X16, for AUTIB1716.
• In SP, for AUTIBSP.
If the authentication passes, the upper bits of the address are restored to enable subsequent use of the
address. If the authentication fails, the upper bits are corrupted and any subsequent use of the address
results in a Translation fault.
Related references
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-828

16 A64 General Instructions
16.26 B.cond

16.26

B.cond
Branch conditionally.
Syntax
B.cond label

Where:
cond

Is one of the standard conditions.
label

Is the program label to be conditionally branched to. Its offset from the address of this
instruction, in the range ±1MB.
Usage
Branch conditionally to a label at a PC-relative offset, with a hint that this is not a subroutine call or
return.
Related references
7.12 Condition code suffixes and related flags on page 7-151.
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-829

16 A64 General Instructions
16.27 B

16.27

B
Branch.
Syntax
B label

Where:
label

Is the program label to be unconditionally branched to. Its offset from the address of this
instruction, in the range ±128MB. The branch can be forward or backward within 128MB.
Usage
Branch causes an unconditional branch to a label at a PC-relative offset, with a hint that this is not a
subroutine call or return.
Related references
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-830

16 A64 General Instructions
16.28 BFC

16.28

BFC
Bitfield Clear, leaving other bits unchanged.
This instruction is an alias of BFM.
The equivalent instruction is BFM Wd, WZR, #(-lsb MOD 32), #(width-1).
Syntax
BFC Wd, #lsb, #width ; 32-bit general registers
BFC Xd, #lsb, #width ; 64-bit general registers

Where:
Wd

Is the 32-bit name of the general-purpose destination register.
lsb

Depends on the instruction variant:
32-bit general registers
Is the bit number of the lsb of the destination bitfield, in the range 0 to 31.
64-bit general registers
Is the bit number of the lsb of the destination bitfield, in the range 0 to 63.
width

Depends on the instruction variant:
32-bit general registers
Is the width of the bitfield, in the range 1 to 32-lsb.
64-bit general registers
Is the width of the bitfield, in the range 1 to 64-lsb.
Xd

Is the 64-bit name of the general-purpose destination register.
Architectures supported
Supported in ARMv8.2 and later.
Related references
16.30 BFM on page 16-833.
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-831

16 A64 General Instructions
16.29 BFI

16.29

BFI
Bitfield Insert.
This instruction is an alias of BFM.
The equivalent instruction is BFM Wd, Wn, #(-lsb MOD 32), #(width-1).
Syntax
BFI Wd, Wn, #lsb, #width ; 32-bit general registers
BFI Xd, Xn, #lsb, #width ; 64-bit general registers

Where:
Wd

Is the 32-bit name of the general-purpose destination register.
Wn

Is the 32-bit name of the general-purpose source register.
lsb

Is the 64-bit name of the general-purpose destination register.
Xn

Is the 64-bit name of the general-purpose source register.
Usage
Bitfield Insert copies any number of low-order bits from a source register into the same number of
adjacent bits at any position in the destination register, leaving other bits unchanged.
Related references
16.30 BFM on page 16-833.
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-832

16 A64 General Instructions
16.30 BFM

16.30

BFM
Bitfield Move.
This instruction is used by the aliases:
• BFC.
• BFI.
• BFXIL.
Syntax
BFM Wd, Wn, #, # ; 32-bit general registers
BFM Xd, Xn, #, # ; 64-bit general registers

Where:
Wd

Is the 32-bit name of the general-purpose destination register.
Wn

Is the 32-bit name of the general-purpose source register.

Depends on the instruction variant:
32-bit general registers
Is the right rotate amount, in the range 0 to 31.
64-bit general registers
Is the right rotate amount, in the range 0 to 63.

Depends on the instruction variant:
32-bit general registers
Is the leftmost bit number to be moved from the source, in the range 0 to 31.
64-bit general registers
Is the leftmost bit number to be moved from the source, in the range 0 to 63.
Xd

Is the 64-bit name of the general-purpose destination register.
Xn

Is the 64-bit name of the general-purpose source register.
Usage
Bitfield Move copies any number of low-order bits from a source register into the same number of
adjacent bits at any position in the destination register, leaving other bits unchanged.
Related references
16.28 BFC on page 16-831.
16.29 BFI on page 16-832.
16.31 BFXIL on page 16-834.
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-833

16 A64 General Instructions
16.31 BFXIL

16.31

BFXIL
Bitfield extract and insert at low end.
This instruction is an alias of BFM.
The equivalent instruction is BFM Wd, Wn, #lsb, #(lsb+width-1).
Syntax
BFXIL Wd, Wn, #lsb, #width ; 32-bit general registers
BFXIL Xd, Xn, #lsb, #width ; 64-bit general registers

Where:
Wd

Is the 32-bit name of the general-purpose destination register.
Wn

Is the 32-bit name of the general-purpose source register.
lsb

Depends on the instruction variant:
32-bit general registers
Is the bit number of the lsb of the source bitfield, in the range 0 to 31.
64-bit general registers
Is the bit number of the lsb of the source bitfield, in the range 0 to 63.
width

Is the 64-bit name of the general-purpose destination register.
Xn

Is the 64-bit name of the general-purpose source register.
Usage
Bitfield extract and insert at low end copies any number of low-order bits from a source register into the
same number of adjacent bits at the low end in the destination register, leaving other bits unchanged.
Related references
16.30 BFM on page 16-833.
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-834

16 A64 General Instructions
16.32 BIC (shifted register)

16.32

BIC (shifted register)
Bitwise Bit Clear (shifted register).
Syntax
BIC Wd, Wn, Wm{, shift #amount} ; 32-bit general registers
BIC Xd, Xn, Xm{, shift #amount} ; 64-bit general registers

Where:
Wd

Is the 32-bit name of the general-purpose destination register.
Wn

Is the 32-bit name of the first general-purpose source register.
Wm

Is the 32-bit name of the second general-purpose source register.
amount

Is the 64-bit name of the general-purpose destination register.
Xn

Is the 64-bit name of the first general-purpose source register.
Xm

Is the 64-bit name of the second general-purpose source register.
shift

Is the optional shift to be applied to the final source, defaulting to LSL, and can be one of LSL,
LSR, ASR, or ROR.
Usage
Bitwise Bit Clear (shifted register) performs a bitwise AND of a register value and the complement of an
optionally-shifted register value, and writes the result to the destination register.
Related references
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-835

16 A64 General Instructions
16.33 BICS (shifted register)

16.33

BICS (shifted register)
Bitwise Bit Clear (shifted register), setting flags.
Syntax
BICS Wd, Wn, Wm{, shift #amount} ; 32-bit general registers
BICS Xd, Xn, Xm{, shift #amount} ; 64-bit general registers

Where:
Wd

Is the 32-bit name of the general-purpose destination register.
Wn

Is the 32-bit name of the first general-purpose source register.
Wm

Is the 32-bit name of the second general-purpose source register.
amount

Is the 64-bit name of the general-purpose destination register.
Xn

Is the 64-bit name of the first general-purpose source register.
Xm

Is the 64-bit name of the second general-purpose source register.
shift

Is the optional shift to be applied to the final source, defaulting to LSL, and can be one of LSL,
LSR, ASR, or ROR.
Usage
Bitwise Bit Clear (shifted register), setting flags, performs a bitwise AND of a register value and the
complement of an optionally-shifted register value, and writes the result to the destination register. It
updates the condition flags based on the result.
Related references
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-836

16 A64 General Instructions
16.34 BL

16.34

BL
Branch with Link.
Syntax
BL label

Where:
label

Is the program label to be unconditionally branched to. Its offset from the address of this
instruction, in the range ±128MB. The branch can be forward or backward within 128MB.
Usage
Branch with Link branches to a PC-relative offset, setting the register X30 to PC+4. It provides a hint
that this is a subroutine call.
Related references
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-837

16 A64 General Instructions
16.35 BLR

16.35

BLR
Branch with Link to Register.
Syntax
BLR Xn

Where:
Xn

Is the 64-bit name of the general-purpose register holding the address to be branched to.
Usage
Branch with Link to Register calls a subroutine at an address in a register, setting register X30 to PC+4.
Related references
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-838

16 A64 General Instructions
16.36 BLRAA, BLRAAZ, BLRAB, BLRABZ

16.36

BLRAA, BLRAAZ, BLRAB, BLRABZ
Branch with Link to Register, with pointer authentication.
Syntax
BLRAA Xn, Xm|SP ; BLRAA general registers
BLRAAZ Xn ; BLRAAZ general registers
BLRAB Xn, Xm|SP ; BLRAB general registers
BLRABZ Xn ; BLRABZ general registers

Where:
Xn

Is the 64-bit name of the general-purpose register holding the address to be branched to.
Xm|SP

Is the 64-bit name of the general-purpose source register or stack pointer holding the modifier.
Architectures supported
Supported in ARMv8.3.
Usage
Branch with Link to Register, with pointer authentication. This instruction authenticates the address in
the general-purpose register that is specified by Xn, using a modifier and the specified key, and calls a
subroutine at the authenticated address, setting register X30 to PC+4.
The modifier is:
• In the general-purpose register or stack pointer that is specified by Xm|SP for BLRAA and BLRAB.
• The value zero, for BLRAAZ and BLRABZ.
Key A is used for BLRAA and BLRAAZ, and key B is used for BLRAB and BLRABZ.
If the authentication passes, the PE continues execution at the target of the branch. If the authentication
fails, a Translation fault is generated.
The authenticated address is not written back to the general-purpose register.
Related references
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-839

16 A64 General Instructions
16.37 BR

16.37

BR
Branch to Register.
Syntax
BR Xn

Where:
Xn

Is the 64-bit name of the general-purpose register holding the address to be branched to.
Usage
Branch to Register branches unconditionally to an address in a register, with a hint that this is not a
subroutine return.
Related references
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-840

16 A64 General Instructions
16.38 BRAA, BRAAZ, BRAB, BRABZ

16.38

BRAA, BRAAZ, BRAB, BRABZ
Branch to Register, with pointer authentication.
Syntax
BRAA Xn, Xm|SP ; BRAA general registers
BRAAZ Xn ; BRAAZ general registers
BRAB Xn, Xm|SP ; BRAB general registers
BRABZ Xn ; BRABZ general registers

Where:
Xn

Is the 64-bit name of the general-purpose register holding the address to be branched to.
Xm|SP

Is the 64-bit name of the general-purpose source register or stack pointer holding the modifier.
Architectures supported
Supported in ARMv8.3.
Usage
Branch to Register, with pointer authentication. This instruction authenticates the address in the generalpurpose register that is specified by Xn, using a modifier and the specified key, and branches to the
authenticated address.
The modifier is:
• In the general-purpose register or stack pointer that is specified by Xm|SP for BRAA and BRAB.
• The value zero, for BRAAZ and BRABZ.
Key A is used for BRAA and BRAAZ, and key B is used for BRAB and BRABZ.
If the authentication passes, the PE continues execution at the target of the branch. If the authentication
fails, a Translation fault is generated.
The authenticated address is not written back to the general-purpose register.
Related references
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-841

16 A64 General Instructions
16.39 BRK

16.39

BRK
Breakpoint instruction.
Syntax
BRK #imm

Where:
imm

Is a 16-bit unsigned immediate, in the range 0 to 65535.
Usage
Breakpoint instruction generates a Breakpoint Instruction exception. The PE records the exception in
ESR_ELx, using the EC value 0x3c, and captures the value of the immediate argument in ESR_ELx.ISS.
Related references
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-842

16 A64 General Instructions
16.40 CBNZ

16.40

CBNZ
Compare and Branch on Nonzero.
Syntax
CBNZ Wt, label ; 32-bit general registers
CBNZ Xt, label ; 64-bit general registers

Where:
Wt

Is the 32-bit name of the general-purpose register to be tested.
Xt

Is the 64-bit name of the general-purpose register to be tested.
label

Is the program label to be conditionally branched to. Its offset from the address of this
instruction, in the range ±1MB.
Usage
Compare and Branch on Nonzero compares the value in a register with zero, and conditionally branches
to a label at a PC-relative offset if the comparison is not equal. It provides a hint that this is not a
subroutine call or return. This instruction does not affect the condition flags.
Related references
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-843

16 A64 General Instructions
16.41 CBZ

16.41

CBZ
Compare and Branch on Zero.
Syntax
CBZ Wt, label ; 32-bit general registers
CBZ Xt, label ; 64-bit general registers

Where:
Wt

Is the 32-bit name of the general-purpose register to be tested.
Xt

Is the 64-bit name of the general-purpose register to be tested.
label

Is the program label to be conditionally branched to. Its offset from the address of this
instruction, in the range ±1MB.
Usage
Compare and Branch on Zero compares the value in a register with zero, and conditionally branches to a
label at a PC-relative offset if the comparison is equal. It provides a hint that this is not a subroutine call
or return. This instruction does not affect condition flags.
Related references
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-844

16 A64 General Instructions
16.42 CCMN (immediate)

16.42

CCMN (immediate)
Conditional Compare Negative (immediate).
Syntax
CCMN Wn, #imm, #nzcv, cond ; 32-bit general registers
CCMN Xn, #imm, #nzcv, cond ; 64-bit general registers

Where:
Wn

Is the 32-bit name of the first general-purpose source register.
Xn

Is the 64-bit name of the first general-purpose source register.
imm

Is a five bit unsigned immediate.
nzcv

Is the flag bit specifier, an immediate in the range 0 to 15, giving the alternative state for the 4bit NZCV condition flags.
cond

Is one of the standard conditions.
Usage
Conditional Compare Negative (immediate) sets the value of the condition flags to the result of the
comparison of a register value and a negated immediate value if the condition is TRUE, and an
immediate value otherwise.
Related references
7.12 Condition code suffixes and related flags on page 7-151.
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-845

16 A64 General Instructions
16.43 CCMN (register)

16.43

CCMN (register)
Conditional Compare Negative (register).
Syntax
CCMN Wn, Wm, #nzcv, cond ; 32-bit general registers
CCMN Xn, Xm, #nzcv, cond ; 64-bit general registers

Where:
Wn

Is the 32-bit name of the first general-purpose source register.
Wm

Is the 32-bit name of the second general-purpose source register.
Xn

Is the 64-bit name of the first general-purpose source register.
Xm

Is the 64-bit name of the second general-purpose source register.
nzcv

Is the flag bit specifier, an immediate in the range 0 to 15, giving the alternative state for the 4bit NZCV condition flags.
cond

Is one of the standard conditions.
Usage
Conditional Compare Negative (register) sets the value of the condition flags to the result of the
comparison of a register value and the inverse of another register value if the condition is TRUE, and an
immediate value otherwise.
Related references
7.12 Condition code suffixes and related flags on page 7-151.
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-846

16 A64 General Instructions
16.44 CCMP (immediate)

16.44

CCMP (immediate)
Conditional Compare (immediate).
Syntax
CCMP Wn, #imm, #nzcv, cond ; 32-bit general registers
CCMP Xn, #imm, #nzcv, cond ; 64-bit general registers

Where:
Wn

Is the 32-bit name of the first general-purpose source register.
Xn

Is the 64-bit name of the first general-purpose source register.
imm

Is a five bit unsigned immediate.
nzcv

Is the flag bit specifier, an immediate in the range 0 to 15, giving the alternative state for the 4bit NZCV condition flags.
cond

Is one of the standard conditions.
Usage
Conditional Compare (immediate) sets the value of the condition flags to the result of the comparison of
a register value and an immediate value if the condition is TRUE, and an immediate value otherwise.
Related references
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-847

16 A64 General Instructions
16.45 CCMP (register)

16.45

CCMP (register)
Conditional Compare (register).
Syntax
CCMP Wn, Wm, #nzcv, cond ; 32-bit general registers
CCMP Xn, Xm, #nzcv, cond ; 64-bit general registers

Where:
Wn

Is the 32-bit name of the first general-purpose source register.
Wm

Is the 32-bit name of the second general-purpose source register.
Xn

Is the 64-bit name of the first general-purpose source register.
Xm

Is the 64-bit name of the second general-purpose source register.
nzcv

Is the flag bit specifier, an immediate in the range 0 to 15, giving the alternative state for the 4bit NZCV condition flags.
cond

Is one of the standard conditions.
Usage
Conditional Compare (register) sets the value of the condition flags to the result of the comparison of
two registers if the condition is TRUE, and an immediate value otherwise.
Related references
7.12 Condition code suffixes and related flags on page 7-151.
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-848

16 A64 General Instructions
16.46 CINC

16.46

CINC
Conditional Increment.
This instruction is an alias of CSINC.
The equivalent instruction is CSINC Wd, Wn, Wn, invert(cond).
Syntax
CINC Wd, Wn, cond ; 32-bit general registers
CINC Xd, Xn, cond ; 64-bit general registers

Where:
Wd

Is the 32-bit name of the general-purpose destination register.
Wn

Is the 32-bit name of the general-purpose source register.
Xd

Is the 64-bit name of the general-purpose destination register.
Xn

Is the 64-bit name of the general-purpose source register.
cond

Is one of the standard conditions, excluding AL and NV.
Usage
Conditional Increment returns, in the destination register, the value of the source register incremented by
1 if the condition is TRUE, and otherwise returns the value of the source register.
Related references
16.63 CSINC on page 16-868.
7.12 Condition code suffixes and related flags on page 7-151.
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-849

16 A64 General Instructions
16.47 CINV

16.47

CINV
Conditional Invert.
This instruction is an alias of CSINV.
The equivalent instruction is CSINV Wd, Wn, Wn, invert(cond).
Syntax
CINV Wd, Wn, cond ; 32-bit general registers
CINV Xd, Xn, cond ; 64-bit general registers

Where:
Wd

Is the 32-bit name of the general-purpose destination register.
Wn

Is the 32-bit name of the general-purpose source register.
Xd

Is the 64-bit name of the general-purpose destination register.
Xn

Is the 64-bit name of the general-purpose source register.
cond

Is one of the standard conditions, excluding AL and NV.
Usage
Conditional Invert returns, in the destination register, the bitwise inversion of the value of the source
register if the condition is TRUE, and otherwise returns the value of the source register.
Related references
16.64 CSINV on page 16-869.
7.12 Condition code suffixes and related flags on page 7-151.
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-850

16 A64 General Instructions
16.48 CLREX

16.48

CLREX
Clear Exclusive.
Syntax
CLREX {#imm}

Where:
imm

Is an optional 4-bit unsigned immediate, in the range 0 to 15, defaulting to 15.
Usage
Clear Exclusive clears the local monitor of the executing PE.
Related references
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-851

16 A64 General Instructions
16.49 CLS

16.49

CLS
Count leading sign bits.
Syntax
CLS Wd, Wn ; 32-bit general registers
CLS Xd, Xn ; 64-bit general registers

Where:
Wd

Is the 32-bit name of the general-purpose destination register.
Wn

Is the 32-bit name of the general-purpose source register.
Xd

Is the 64-bit name of the general-purpose destination register.
Xn

Is the 64-bit name of the general-purpose source register.
Operation
Rd = CLS(Rn), where R is either W or X.

Related references
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-852

16 A64 General Instructions
16.50 CLZ

16.50

CLZ
Count leading zero bits.
Syntax
CLZ Wd, Wn ; 32-bit general registers
CLZ Xd, Xn ; 64-bit general registers

Where:
Wd

Is the 32-bit name of the general-purpose destination register.
Wn

Is the 32-bit name of the general-purpose source register.
Xd

Is the 64-bit name of the general-purpose destination register.
Xn

Is the 64-bit name of the general-purpose source register.
Operation
Rd = CLZ(Rn), where R is either W or X.

Related references
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-853

16 A64 General Instructions
16.51 CMN (extended register)

16.51

CMN (extended register)
Compare Negative (extended register).
This instruction is an alias of ADDS (extended register).
The equivalent instruction is ADDS WZR, Wn|WSP, Wm{, extend {#amount}}.
Syntax
CMN Wn|WSP, Wm{, extend {#amount}} ; 32-bit general registers
CMN Xn|SP, Rm{, extend {#amount}} ; 64-bit general registers

Where:
Wn|WSP

Is the 32-bit name of the first source general-purpose register or stack pointer.
Wm

Is the 32-bit name of the second general-purpose source register.
extend

Is the 64-bit name of the first source general-purpose register or stack pointer.
R

Is a width specifier, and can be either W or X.
m

Is the number [0-30] of the second general-purpose source register or the name ZR (31).
amount

Is the left shift amount to be applied after extension in the range 0 to 4, defaulting to 0. It must
be absent when extend is absent, is required when extend is LSL, and is optional when extend
is present but not LSL.
Usage
Compare Negative (extended register) adds a register value and a sign or zero-extended register value,
followed by an optional left shift amount. The argument that is extended from the Rm register can be a
byte, halfword, word, or doubleword. It updates the condition flags based on the result, and discards the
result.
Table 16-5 CMN (64-bit general registers) specifier combinations
R

extend

W SXTB
W SXTH
W SXTW
W UXTB

ARM DUI0801G

16-854

16 A64 General Instructions
16.51 CMN (extended register)

Table 16-5 CMN (64-bit general registers) specifier combinations (continued)
R

extend

W UXTH
W UXTW
X LSL|UXTX
X SXTX

Related references
16.8 ADDS (extended register) on page 16-810.
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-855

16 A64 General Instructions
16.52 CMN (immediate)

16.52

CMN (immediate)
Compare Negative (immediate).
This instruction is an alias of ADDS (immediate).
The equivalent instruction is ADDS WZR, Wn|WSP, #imm {, shift}.
Syntax
CMN Wn|WSP, #imm{, shift} ; 32-bit general registers
CMN Xn|SP, #imm{, shift} ; 64-bit general registers

Where:
Wn|WSP

Is the 32-bit name of the source general-purpose register or stack pointer.
Xn|SP

Is the 64-bit name of the source general-purpose register or stack pointer.
imm

Is an unsigned immediate, in the range 0 to 4095.
shift

Is the optional left shift to apply to the immediate, defaulting to LSL #0, and can be either LSL
#0 or LSL #12.
Usage
Compare Negative (immediate) adds a register value and an optionally-shifted immediate value. It
updates the condition flags based on the result, and discards the result.
Related references
16.9 ADDS (immediate) on page 16-812.
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-856

16 A64 General Instructions
16.53 CMN (shifted register)

16.53

CMN (shifted register)
Compare Negative (shifted register).
This instruction is an alias of ADDS (shifted register).
The equivalent instruction is ADDS WZR, Wn, Wm {, shift #amount}.
Syntax
CMN Wn, Wm{, shift #amount} ; 32-bit general registers
CMN Xn, Xm{, shift #amount} ; 64-bit general registers

Where:
Wn

Is the 32-bit name of the first general-purpose source register.
Wm

Is the 32-bit name of the second general-purpose source register.
amount

Is the 64-bit name of the first general-purpose source register.
Xm

Is the 64-bit name of the second general-purpose source register.
shift

Is the optional shift type to be applied to the second source operand, defaulting to LSL, and can
be one of LSL, LSR, or ASR.
Usage
Compare Negative (shifted register) adds a register value and an optionally-shifted register value. It
updates the condition flags based on the result, and discards the result.
Related references
16.10 ADDS (shifted register) on page 16-813.
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-857

16 A64 General Instructions
16.54 CMP (extended register)

16.54

CMP (extended register)
Compare (extended register).
This instruction is an alias of SUBS (extended register).
The equivalent instruction is SUBS WZR, Wn|WSP, Wm{, extend {#amount}}.
Syntax
CMP Wn|WSP, Wm{, extend {#amount}} ; 32-bit general registers
CMP Xn|SP, Rm{, extend {#amount}} ; 64-bit general registers

Where:
Wn|WSP

Is the 32-bit name of the first source general-purpose register or stack pointer.
Wm

Is the 32-bit name of the second general-purpose source register.
extend

Is the 64-bit name of the first source general-purpose register or stack pointer.
R

Is a width specifier, and can be either W or X.
m

Is the number [0-30] of the second general-purpose source register or the name ZR (31).
amount

Is the left shift amount to be applied after extension in the range 0 to 4, defaulting to 0. It must
be absent when extend is absent, is required when extend is LSL, and is optional when extend
is present but not LSL.
Usage
Compare (extended register) subtracts a sign or zero-extended register value, followed by an optional left
shift amount, from a register value. The argument that is extended from the Rm register can be a byte,
halfword, word, or doubleword. It updates the condition flags based on the result, and discards the result.
Table 16-6 CMP (64-bit general registers) specifier combinations
R

extend

W SXTB
W SXTH
W SXTW
W UXTB
W UXTH
ARM DUI0801G

16-858

16 A64 General Instructions
16.54 CMP (extended register)

Table 16-6 CMP (64-bit general registers) specifier combinations (continued)
R

extend

W UXTW
X LSL|UXTX
X SXTX

Related references
16.149 SUBS (extended register) on page 16-955.
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-859

16 A64 General Instructions
16.55 CMP (immediate)

16.55

CMP (immediate)
Compare (immediate).
This instruction is an alias of SUBS (immediate).
The equivalent instruction is SUBS WZR, Wn|WSP, #imm {, shift}.
Syntax
CMP Wn|WSP, #imm{, shift} ; 32-bit general registers
CMP Xn|SP, #imm{, shift} ; 64-bit general registers

Where:
Wn|WSP

Is the 32-bit name of the source general-purpose register or stack pointer.
Xn|SP

Is the 64-bit name of the source general-purpose register or stack pointer.
imm

Is an unsigned immediate, in the range 0 to 4095.
shift

Is the optional left shift to apply to the immediate, defaulting to LSL #0, and can be either LSL
#0 or LSL #12.
Usage
Compare (immediate) subtracts an optionally-shifted immediate value from a register value. It updates
the condition flags based on the result, and discards the result.
Related references
16.150 SUBS (immediate) on page 16-957.
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-860

16 A64 General Instructions
16.56 CMP (shifted register)

16.56

CMP (shifted register)
Compare (shifted register).
This instruction is an alias of SUBS (shifted register).
The equivalent instruction is SUBS WZR, Wn, Wm {, shift #amount}.
Syntax
CMP Wn, Wm{, shift #amount} ; 32-bit general registers
CMP Xn, Xm{, shift #amount} ; 64-bit general registers

Where:
Wn

Is the 32-bit name of the first general-purpose source register.
Wm

Is the 32-bit name of the second general-purpose source register.
amount

Is the 64-bit name of the first general-purpose source register.
Xm

Is the 64-bit name of the second general-purpose source register.
shift

Is the optional shift type to be applied to the second source operand, defaulting to LSL, and can
be one of LSL, LSR, or ASR.
Usage
Compare (shifted register) subtracts an optionally-shifted register value from a register value. It updates
the condition flags based on the result, and discards the result.
Related references
16.151 SUBS (shifted register) on page 16-958.
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-861

16 A64 General Instructions
16.57 CNEG

16.57

CNEG
Conditional Negate.
This instruction is an alias of CSNEG.
The equivalent instruction is CSNEG Wd, Wn, Wn, invert(cond).
Syntax
CNEG Wd, Wn, cond ; 32-bit general registers
CNEG Xd, Xn, cond ; 64-bit general registers

Where:
Wd

Is the 32-bit name of the general-purpose destination register.
Wn

Is the 32-bit name of the general-purpose source register.
Xd

Is the 64-bit name of the general-purpose destination register.
Xn

Is the 64-bit name of the general-purpose source register.
cond

Is one of the standard conditions, excluding AL and NV.
Usage
Conditional Negate returns, in the destination register, the negated value of the source register if the
condition is TRUE, and otherwise returns the value of the source register.
Related references
16.65 CSNEG on page 16-870.
7.12 Condition code suffixes and related flags on page 7-151.
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-862

16 A64 General Instructions
16.58 CRC32B, CRC32H, CRC32W, CRC32X

16.58

CRC32B, CRC32H, CRC32W, CRC32X
CRC32 checksum.
Syntax
CRC32B Wd, Wn, Wm ; Wd = CRC32(Wn, Rm[<7:0>])
CRC32H Wd, Wn, Wm ; Wd = CRC32(Wn, Rm[<15:0>])
CRC32W Wd, Wn, Wm ; Wd = CRC32(Wn, Rm[<31:0>])
CRC32X Wd, Wn, Xm ; Wd = CRC32(Wn, Rm[<63:0>])

Where:
Wm

Is the 32-bit name of the general-purpose data source register.
Xm

Is the 64-bit name of the general-purpose data source register.
Wd

Is the 32-bit name of the general-purpose accumulator output register.
Wn

Is the 32-bit name of the general-purpose accumulator input register.
Usage
CRC32 checksum performs a cyclic redundancy check (CRC) calculation on a value held in a generalpurpose register. It takes an input CRC value in the first source operand, performs a CRC on the input
value in the second source operand, and returns the output CRC value. The second source operand can be
8, 16, 32, or 64 bits. To align with common usage, the bit order of the values is reversed as part of the
operation, and the polynomial 0x04C11DB7 is used for the CRC calculation.

In ARMv8-A, this is an OPTIONAL instruction, and in ARMv8.1 it is mandatory for all implementations to
implement it.
Note
ID_AA64ISAR0_EL1.CRC32 indicates whether this instruction is supported. See ID_AA64ISAR0_EL1
in the ARMv8-A Architecture Reference Manual.

Related references
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-863

16 A64 General Instructions
16.59 CRC32CB, CRC32CH, CRC32CW, CRC32CX

16.59

CRC32CB, CRC32CH, CRC32CW, CRC32CX
CRC32C checksum.
Syntax
CRC32CB Wd, Wn, Wm ; Wd = CRC32C(Wn, Rm[<7:0>])
CRC32CH Wd, Wn, Wm ; Wd = CRC32C(Wn, Rm[<15:0>])
CRC32CW Wd, Wn, Wm ; Wd = CRC32C(Wn, Rm[<31:0>])
CRC32CX Wd, Wn, Xm ; Wd = CRC32C(Wn, Rm[<63:0>])

Where:
Wm

Is the 32-bit name of the general-purpose data source register.
Xm

Is the 64-bit name of the general-purpose data source register.
Wd

Is the 32-bit name of the general-purpose accumulator output register.
Wn

Is the 32-bit name of the general-purpose accumulator input register.
Usage
CRC32 checksum performs a cyclic redundancy check (CRC) calculation on a value held in a general-

purpose register. It takes an input CRC value in the first source operand, performs a CRC on the input
value in the second source operand, and returns the output CRC value. The second source operand can be
8, 16, 32, or 64 bits. To align with common usage, the bit order of the values is reversed as part of the
operation, and the polynomial 0x1EDC6F41 is used for the CRC calculation.
In ARMv8-A, this is an OPTIONAL instruction, and in ARMv8.1 it is mandatory for all implementations to
implement it.
Note
ID_AA64ISAR0_EL1.CRC32 indicates whether this instruction is supported. See ID_AA64ISAR0_EL1
in the ARMv8-A Architecture Reference Manual.

Related references
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-864

16 A64 General Instructions
16.60 CSEL

16.60

CSEL
Conditional Select.
Syntax
CSEL Wd, Wn, Wm, cond ; 32-bit general registers
CSEL Xd, Xn, Xm, cond ; 64-bit general registers

Where:
Wd

Is the 32-bit name of the general-purpose destination register.
Wn

Is the 32-bit name of the first general-purpose source register.
Wm

Is the 32-bit name of the second general-purpose source register.
Xd

Is the 64-bit name of the general-purpose destination register.
Xn

Is the 64-bit name of the first general-purpose source register.
Xm

Is the 64-bit name of the second general-purpose source register.
cond

Is one of the standard conditions.
Usage
Conditional Select returns, in the destination register, the value of the first source register if the condition
is TRUE, and otherwise returns the value of the second source register.
Related references
7.12 Condition code suffixes and related flags on page 7-151.
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-865

16 A64 General Instructions
16.61 CSET

16.61

CSET
Conditional Set.
This instruction is an alias of CSINC.
The equivalent instruction is CSINC Wd, WZR, WZR, invert(cond).
Syntax
CSET Wd, cond ; 32-bit general registers
CSET Xd, cond ; 64-bit general registers

Where:
Wd

Is the 32-bit name of the general-purpose destination register.
Xd

Is the 64-bit name of the general-purpose destination register.
cond

Is one of the standard conditions, excluding AL and NV.
Usage
Conditional Set sets the destination register to 1 if the condition is TRUE, and otherwise sets it to 0.
Related references
16.63 CSINC on page 16-868.
7.12 Condition code suffixes and related flags on page 7-151.
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-866

16 A64 General Instructions
16.62 CSETM

16.62

CSETM
Conditional Set Mask.
This instruction is an alias of CSINV.
The equivalent instruction is CSINV Wd, WZR, WZR, invert(cond).
Syntax
CSETM Wd, cond ; 32-bit general registers
CSETM Xd, cond ; 64-bit general registers

Where:
Wd

Is the 32-bit name of the general-purpose destination register.
Xd

Is the 64-bit name of the general-purpose destination register.
cond

Is one of the standard conditions, excluding AL and NV.
Usage
Conditional Set Mask sets all bits of the destination register to 1 if the condition is TRUE, and otherwise
sets all bits to 0.
Related references
16.64 CSINV on page 16-869.
7.12 Condition code suffixes and related flags on page 7-151.
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-867

16 A64 General Instructions
16.63 CSINC

16.63

CSINC
Conditional Select Increment.
This instruction is used by the aliases:
• CINC.
• CSET.
Syntax
CSINC Wd, Wn, Wm, cond ; 32-bit general registers
CSINC Xd, Xn, Xm, cond ; 64-bit general registers

Where:
Wd

Is the 32-bit name of the general-purpose destination register.
Wn

Is the 32-bit name of the first general-purpose source register.
Wm

Is the 32-bit name of the second general-purpose source register.
Xd

Is the 64-bit name of the general-purpose destination register.
Xn

Is the 64-bit name of the first general-purpose source register.
Xm

Is the 64-bit name of the second general-purpose source register.
cond

Is one of the standard conditions.
Usage
Conditional Select Increment returns, in the destination register, the value of the first source register if
the condition is TRUE, and otherwise returns the value of the second source register incremented by 1.
Related references
16.46 CINC on page 16-849.
16.61 CSET on page 16-866.
7.12 Condition code suffixes and related flags on page 7-151.
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-868

16 A64 General Instructions
16.64 CSINV

16.64

CSINV
Conditional Select Invert.
This instruction is used by the aliases:
• CINV.
• CSETM.
Syntax
CSINV Wd, Wn, Wm, cond ; 32-bit general registers
CSINV Xd, Xn, Xm, cond ; 64-bit general registers

Where:
Wd

Is the 32-bit name of the general-purpose destination register.
Wn

Is the 32-bit name of the first general-purpose source register.
Wm

Is the 32-bit name of the second general-purpose source register.
Xd

Is the 64-bit name of the general-purpose destination register.
Xn

Is the 64-bit name of the first general-purpose source register.
Xm

Is the 64-bit name of the second general-purpose source register.
cond

Is one of the standard conditions.
Usage
Conditional Select Invert returns, in the destination register, the value of the first source register if the
condition is TRUE, and otherwise returns the bitwise inversion value of the second source register.
Related references
16.47 CINV on page 16-850.
16.62 CSETM on page 16-867.
7.12 Condition code suffixes and related flags on page 7-151.
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-869

16 A64 General Instructions
16.65 CSNEG

16.65

CSNEG
Conditional Select Negation.
This instruction is used by the alias CNEG.
Syntax
CSNEG Wd, Wn, Wm, cond ; 32-bit general registers
CSNEG Xd, Xn, Xm, cond ; 64-bit general registers

Where:
Wd

Is the 32-bit name of the general-purpose destination register.
Wn

Is the 32-bit name of the first general-purpose source register.
Wm

Is the 32-bit name of the second general-purpose source register.
Xd

Is the 64-bit name of the general-purpose destination register.
Xn

Is the 64-bit name of the first general-purpose source register.
Xm

Is the 64-bit name of the second general-purpose source register.
cond

Is one of the standard conditions.
Usage
Conditional Select Negation returns, in the destination register, the value of the first source register if the
condition is TRUE, and otherwise returns the negated value of the second source register.
Related references
16.57 CNEG on page 16-862.
7.12 Condition code suffixes and related flags on page 7-151.
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-870

16 A64 General Instructions
16.66 DC

16.66

DC
Data Cache operation.
This instruction is an alias of SYS.
The equivalent instruction is SYS #op1, C7, Cm, #op2, Xt.
Syntax
DC , Xt

Where:

Is a DC instruction name, as listed for the DC system instruction group, and can be one of the
values shown in Usage.
op1

Is a 3-bit unsigned immediate, in the range 0 to 7.
Cm

Is a name Cm, with m in the range 0 to 15.
op2

Is a 3-bit unsigned immediate, in the range 0 to 7.
Xt

Is the 64-bit name of the general-purpose source register.
Usage
Data Cache operation. For more information, see A64 system instructions for cache maintenance in the
ARMv8-A Architecture Reference Manual.
The following table shows the valid specifier combinations:
Table 16-7 SYS parameter values corresponding to DC operations
op1 Cm op2
CISW

14 2

CIVAC

14 1

CSW

10 2

CVAC

10 1

CVAP

12 1

CVAU

11 1

ISW

IVAC

ZVA

Related references
16.156 SYS on page 16-963.
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-871

16 A64 General Instructions
16.67 DCPS1

16.67

DCPS1
Debug Change PE State to EL1.
Syntax
DCPS1 {#imm}

Where:
imm

Is an optional 16-bit unsigned immediate, in the range 0 to 65535, defaulting to 0.
Usage
Debug Change PE State to EL1, when executed in Debug state:
•
•

If executed at EL0 changes the current Exception level and SP to EL1 using SP_EL1.
Otherwise, if executed at ELx, selects SP_ELx.

The target exception level of a DCPS1 instruction is:
•
•

EL1 if the instruction is executed at EL0.
Otherwise, the Exception level at which the instruction is executed.

When the target Exception level of a DCPS1 instruction is ELx, on executing this instruction:
• ELR_ELx becomes UNKNOWN.
• SPSR_ELx becomes UNKNOWN.
• ESR_ELx becomes UNKNOWN.
• DLR_EL0 and DSPSR_EL0 become UNKNOWN.
• The endianness is set according to SCTLR_ELx.EE.
This instruction is UNDEFINED at EL0 in Non-secure state if EL2 is implemented and HCR_EL2.TGE ==
1.
This instruction is always UNDEFINED in Non-debug state.
For more information on the operation of the DCPSn instructions, see DCPS in the ARMv8-A
Architecture Reference Manual.
Related references
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-872

16 A64 General Instructions
16.68 DCPS2

16.68

DCPS2
Debug Change PE State to EL2.
Syntax
DCPS2 {#imm}

Where:
imm

Is an optional 16-bit unsigned immediate, in the range 0 to 65535, defaulting to 0.
Usage
Debug Change PE State to EL2, when executed in Debug state:
•
•

If executed at EL0 or EL1 changes the current Exception level and SP to EL2 using SP_EL2.
Otherwise, if executed at ELx, selects SP_ELx.

The target exception level of a DCPS2 instruction is:
•
•

EL2 if the instruction is executed at an exception level that is not EL3.
EL3 if the instruction is executed at EL3.

When the target Exception level of a DCPS2 instruction is ELx, on executing this instruction:
•
•
•
•
•

ELR_ELx becomes UNKNOWN.
SPSR_ELx becomes UNKNOWN.
ESR_ELx becomes UNKNOWN.
DLR_EL0 and DSPSR_EL0 become UNKNOWN.
The endianness is set according to SCTLR_ELx.EE.

This instruction is UNDEFINED at the following exception levels:
• All exception levels if EL2 is not implemented.
• At EL0 and EL1 in Secure state if EL2 is implemented.
This instruction is always UNDEFINED in Non-debug state.
For more information on the operation of the DCPSn instructions, see DCPS in the ARMv8-A
Architecture Reference Manual.
Related references
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-873

16 A64 General Instructions
16.69 DCPS3

16.69

DCPS3
Debug Change PE State to EL3.
Syntax
DCPS3 {#imm}

Where:
imm

Is an optional 16-bit unsigned immediate, in the range 0 to 65535, defaulting to 0.
Usage
Debug Change PE State to EL3, when executed in Debug state:
•
•

If executed at EL3 selects SP_EL3.
Otherwise, changes the current Exception level and SP to EL3 using SP_EL3.

The target exception level of a DCPS3 instruction is EL3.
On executing a DCPS3 instruction:
•
•
•
•
•

ELR_EL3 becomes UNKNOWN.
SPSR_EL3 becomes UNKNOWN.
ESR_EL3 becomes UNKNOWN.
DLR_EL0 and DSPSR_EL0 become UNKNOWN.
The endianness is set according to SCTLR_EL3.EE.

This instruction is UNDEFINED at all exception levels if either:
• EDSCR.SDD == 1.
• EL3 is not implemented.
This instruction is always UNDEFINED in Non-debug state.
For more information on the operation of the DCPSn instructions, see DCPS in the ARMv8-A
Architecture Reference Manual.
Related references
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-874

16 A64 General Instructions
16.70 DMB

16.70

DMB
Data Memory Barrier.
Syntax
DMB option|#imm

Where:
option

Specifies the limitation on the barrier operation. Values are:
SY

Full system is the required shareability domain, reads and writes are the required access
types in both Group A and Group B. This option is referred to as the full system DMB.
ST

Full system is the required shareability domain, writes are the required access type in
both Group A and Group B.
LD

Full system is the required shareability domain, reads are the required access type in
Group A, and reads and writes are the required access types in Group B.
ISH

Inner Shareable is the required shareability domain, reads and writes are the required
access types in both Group A and Group B.
ISHST

Inner Shareable is the required shareability domain, writes are the required access type
in both Group A and Group B.
ISHLD

Inner Shareable is the required shareability domain, reads are the required access type
in Group A, and reads and writes are the required access types in Group B.
NSH

Non-shareable is the required shareability domain, reads and writes are the required
access types in both Group A and Group B.
NSHST

Non-shareable is the required shareability domain, writes are the required access type in
both Group A and Group B.
NSHLD

Non-shareable is the required shareability domain, reads are the required access type in
Group A, and reads and writes are the required access types in Group B.
OSH

Outer Shareable is the required shareability domain, reads and writes are the required
access types in both Group A and Group B.
OSHST

Outer Shareable is the required shareability domain, writes are the required access type
in both Group A and Group B.
OSHLD

Outer Shareable is the required shareability domain, reads are the required access type
in Group A, and reads and writes are the required access types in Group B.
imm

Is a 4-bit unsigned immediate, in the range 0 to 15.
Usage
Data Memory Barrier is a memory barrier that ensures the ordering of observations of memory accesses,
see Data Memory Barrier in the ARMv8-A Architecture Reference Manual.
Related references
16.1 A64 instructions in alphabetical order on page 16-797.
ARM DUI0801G

16-875

16 A64 General Instructions
16.71 DRPS

16.71

DRPS
Debug restore process state.
Syntax
DRPS

Related references
16.1 A64 instructions in alphabetical order on page 16-797.

ARM DUI0801G

16-876

16 A64 General Instructions
16.72 DSB

16.72

DSB
Data Synchronization Barrier.
Syntax
DSB option|#imm