ARM Architecture Reference Manual ARMv8, For ARMv8 A Profile AArch64

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 5158

DownloadARM Architecture Reference Manual ARMv8, For ARMv8-A Profile AArch64-Reference-Manual
Open PDF In BrowserView PDF
ARM Architecture Reference Manual
®

ARMv8, for ARMv8-A architecture profile
Beta

Copyright © 2013 ARM Limited. All rights reserved.
ARM DDI 0487A.a (ID090413)

ARM Architecture Reference Manual
ARMv8, for ARMv8-A architecture profile
Copyright © 2013 ARM Limited. All rights reserved.
Release Information
The following releases of this document have been made.
Release history
Date

Issue

Confidentiality

Change

30 April 2013

A.a-1

Confidential-Beta Draft

Beta draft of first issue, limited circulation

12 June 2013

A.a-2

Confidential-Beta Draft

Second beta draft of first issue, limited circulation

04 September 2013

A.a

Non-Confidential Beta

Beta release.

Proprietary Notice
This document is protected by copyright and other related rights and the practice or implementation of the information contained
in this document may be protected by one or more patents or pending patent applications. No part of this document may be
reproduced in any form by any means without the express prior written permission of ARM Limited (“ARM”). No license,
express or implied, by estoppel or otherwise to any intellectual property rights is granted by this document unless
specifically stated.
Your access to the information in this document is conditional upon your acceptance that you will not use or permit others to use
the information for the purposes of determining whether implementations infringe any third party patents.
THIS DOCUMENT IS PROVIDED “AS IS”. ARM PROVIDES NO REPRESENTATIONS AND NO WARRANTIES,
EXPRESS, IMPLIED OR STATUTORY, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
MERCHANTABILITY, SATISFACTORY QUALITY, NON-INFRINGEMENT OR FITNESS FOR A PARTICULAR
PURPOSE WITH RESPECT TO THE DOCUMENT. For the avoidance of doubt, ARM makes no representation with respect to,
and has undertaken no analysis to identify or understand the scope and content of, third party patents, copyrights, trade secrets, or
other rights.
This document may include technical inaccuracies or typographical errors.
TO THE EXTENT NOT PROHIBITED BY LAW, IN NO EVENT WILL ARM BE LIABLE FOR ANY DAMAGES,
INCLUDING WITHOUT LIMITATION ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, PUNITIVE, OR
CONSEQUENTIAL DAMAGES, HOWEVER CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY, ARISING
OUT OF ANY USE OF THIS DOCUMENT, EVEN IF ARM HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH
DAMAGES.
This document consists solely of commercial items. You shall be responsible for ensuring that any use, duplication or disclosure
of this document complies fully with any relevant export laws and regulations to assure that this document or any portion thereof
is not exported, directly or indirectly, in violation of such export laws. Use of the word “partner” in reference to ARM’s customers
is not intended to create or refer to any partnership relationship with any other company. ARM may make changes to this document
at any time and without notice.
If any of the provisions contained in these terms conflict with any of the provisions of any signed written agreement specifically
covering this document with ARM, then the signed written agreement prevails over and supersedes the conflicting provisions of
these terms.
Words and logos marked with ® or ™ are registered trademarks or trademarks of ARM Limited or its affiliates in the EU and/or
elsewhere. All rights reserved. Other brands and names mentioned in this document may be the trademarks of their respective
owners. You must follow the ARM trademark usage guidelines, http://www.arm.com/about/trademark-usage-guidelines.php.
This document is Non-Confidential but any disclosure by you is subject to you providing the recipient the conditions set out in
this notice and procuring the acceptance by the recipient of the conditions set out in this notice.
Copyright © 2013 ARM Limited or its affiliates. All rights reserved.
ARM Limited. Company 02557590 registered in England.
110 Fulbourn Road, Cambridge, England CB1 9NJ.
LES-PRE-20327
In this document, where the term ARM is used to refer to the company it means “ARM or any of its subsidiaries as appropriate”.

ii

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

Note
The term ARM can refer to versions of the ARM architecture, for example ARMv7 refers to version 7 of the ARM architecture.
The context makes it clear when the term is used in this way.

Web Address
http://www.arm.com

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

iii

iv

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

Contents
ARM Architecture Reference Manual ARMv8, for
ARMv8-A architecture profile

Preface
About this manual ..................................................................................................... xvi
Using this manual ................................................................................................... xviii
Conventions ............................................................................................................ xxiii
Additional reading .................................................................................................... xxv
Feedback ................................................................................................................ xxvi

Part A
Chapter A1

ARMv8 Architecture Introduction and Overview
Introduction to the ARMv8 Architecture
A1.1
A1.2
A1.3
A1.4
A1.5
A1.6
A1.7

Part B
Chapter B1

A1-30
A1-32
A1-33
A1-36
A1-46
A1-52
A1-53

The AArch64 Application Level Architecture
The AArch64 Application Level Programmers’ Model
B1.1
B1.2
B1.3

ARM DDI 0487A.a
ID090413

About the ARM architecture .................................................................................
Architecture profiles .............................................................................................
ARMv8 architectural concepts .............................................................................
Supported data types ...........................................................................................
Floating-point and Advanced SIMD support ........................................................
Cryptographic Extension ......................................................................................
The ARM memory model .....................................................................................

About the Application level programmers’ model ................................................. B1-58
Registers in AArch64 Execution state .................................................................. B1-59
Software control features and EL0 ....................................................................... B1-65

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

v

Chapter B2

The AArch64 Application Level Memory Model
B2.1
B2.2
B2.3
B2.4
B2.5
B2.6
B2.7
B2.8
B2.9
B2.10

Part C
Chapter C1

The AArch64 Instruction Set
The A64 Instruction Set
C1.1
C1.2
C1.3
C1.4

Chapter C2

A64 instruction index by encoding .....................................................................
Branches, exception generating and system instructions .................................
Loads and stores ...............................................................................................
Data processing - immediate .............................................................................
Data processing - register .................................................................................
Data processing - SIMD and floating point ........................................................

C3-172
C3-173
C3-176
C3-193
C3-196
C3-203

About the System instruction and System register descriptions .......................
The System instruction class encoding space ..................................................
PSTATE and special purpose registers ............................................................
A64 system instructions for cache maintenance ...............................................
A64 system instructions for address translation ................................................
A64 system instructions for TLB maintenance ..................................................

C4-230
C4-232
C4-251
C4-306
C4-322
C4-335

Introduction ........................................................................................................
Register size ......................................................................................................
Use of the PC ....................................................................................................
Use of the stack pointer .....................................................................................
Condition flags and related instructions ............................................................
Alphabetical list of instructions ..........................................................................

C5-386
C5-387
C5-388
C5-389
C5-390
C5-391

A64 SIMD and Floating-point Instruction Descriptions
C6.1
C6.2
C6.3

vi

C2-124
C2-129
C2-140
C2-145
C2-152

A64 Base Instruction Descriptions
C5.1
C5.2
C5.3
C5.4
C5.5
C5.6

Chapter C6

Branches, Exception generating, and System instructions ...............................
Loads and stores ...............................................................................................
Data processing - immediate .............................................................................
Data processing - register .................................................................................
Data processing - SIMD and floating-point ........................................................

The AArch64 System Instruction Class
C4.1
C4.2
C4.3
C4.4
C4.5
C4.6

Chapter C5

C1-112
C1-113
C1-118
C1-121

A64 Instruction Set Encoding
C3.1
C3.2
C3.3
C3.4
C3.5
C3.6

Chapter C4

Introduction ........................................................................................................
Structure of the A64 assembler language .........................................................
Address generation ...........................................................................................
Instruction aliases ..............................................................................................

A64 Instruction Set Overview
C2.1
C2.2
C2.3
C2.4
C2.5

Chapter C3

Address space ..................................................................................................... B2-68
Memory type overview ........................................................................................ B2-69
Caches and memory hierarchy ........................................................................... B2-70
Alignment support ............................................................................................... B2-75
Endian support .................................................................................................... B2-76
Atomicity in the ARM architecture ....................................................................... B2-79
Memory ordering ................................................................................................. B2-82
Memory types and attributes ............................................................................... B2-89
Mismatched memory attributes ........................................................................... B2-98
Synchronization and semaphores ..................................................................... B2-100

Introduction ........................................................................................................ C6-776
About the SIMD and floating-point instructions ................................................. C6-777
Alphabetical list of floating-point and Advanced SIMD instructions ................... C6-779

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

Part D
Chapter D1

The AArch64 System Level Architecture
The AArch64 System Level Programmers’ Model
D1.1
D1.2
D1.3
D1.4
D1.5
D1.6
D1.7
D1.8
D1.9
D1.10
D1.11
D1.12
D1.13
D1.14
D1.15
D1.16
D1.17
D1.18
D1.19
D1.20
D1.21
D1.22

Chapter D2

Debug Exceptions
D2.1
D2.2
D2.3
D2.4
D2.5
D2.6
D2.7
D2.8
D2.9

Chapter D3

About the memory system architecture ........................................................... D4-1672
Address space ................................................................................................. D4-1673
Mixed-endian support ...................................................................................... D4-1674
Cache support .................................................................................................. D4-1675
External aborts ................................................................................................. D4-1694
Memory barrier instructions ............................................................................. D4-1696
Pseudocode details of general memory system instructions ........................... D4-1697

The AArch64 Virtual Memory System Architecture
D5.1

ARM DDI 0487A.a
ID090413

About debug exceptions .................................................................................. D3-1650
The debug exceptions enable controls ............................................................ D3-1651
Routing debug exceptions ............................................................................... D3-1652
Enabling debug exceptions from current Exception level and Security state .. D3-1656
The effect of powerdown on debug exceptions ............................................... D3-1661
Summary of permitted routing and enabling of debug exceptions ................... D3-1662
Debug exception behavior ............................................................................... D3-1665
Pseudocode descriptions of debug exceptions ................................................ D3-1669

The AArch64 System Level Memory Model
D4.1
D4.2
D4.3
D4.4
D4.5
D4.6
D4.7

Chapter D5

Introduction to debug exceptions ..................................................................... D2-1560
Legacy debug exceptions ................................................................................ D2-1564
Understanding the descriptions for AArch64 state and AArch32 state ............ D2-1565
Software Breakpoint Instruction exceptions ..................................................... D2-1566
Breakpoint exceptions ...................................................................................... D2-1569
Watchpoint exceptions ..................................................................................... D2-1606
Vector Catch exceptions .................................................................................. D2-1627
Software Step exceptions ................................................................................ D2-1634
Synchronization and debug exceptions ........................................................... D2-1647

The Debug Exception Model
D3.1
D3.2
D3.3
D3.4
D3.5
D3.6
D3.7
D3.8

Chapter D4

Exception levels ............................................................................................... D1-1408
Exception terminology ...................................................................................... D1-1409
Execution state ................................................................................................ D1-1411
Security state ................................................................................................... D1-1412
Virtualization .................................................................................................... D1-1414
Registers for instruction processing and exception handling ........................... D1-1416
Process state, PSTATE ................................................................................... D1-1421
Program counter and stack pointer alignment ................................................. D1-1423
Reset ................................................................................................................ D1-1426
Exception entry ................................................................................................ D1-1429
Exception return ............................................................................................... D1-1439
The Exception level hierarchy .......................................................................... D1-1443
Synchronous exception types, routing and priorities ....................................... D1-1450
Asynchronous exception types, routing, masking and priorities ...................... D1-1456
Trapping functionality to higher Exception levels ............................................. D1-1462
System calls ..................................................................................................... D1-1511
Use of the ESR_EL1, ESR_EL2, and ESR_EL3 ............................................. D1-1512
Mechanisms for entering a low-power state .................................................... D1-1533
Self-hosted debug ............................................................................................ D1-1539
Performance Monitors extension ..................................................................... D1-1541
Interprocessing ................................................................................................ D1-1542
Supported configurations ................................................................................. D1-1554

About the Virtual Memory System Architecture (VMSA) .................................. D5-1708

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

vii

D5.2
D5.3
D5.4
D5.5
D5.6
D5.7
D5.8

Chapter D6

Chapter E1

About the AArch64 System registers ..............................................................
General system control registers .....................................................................
Debug registers ...............................................................................................
Performance Monitors registers ......................................................................
Generic Timer registers ...................................................................................
Generic Interrupt Controller CPU interface registers .......................................

D8-1866
D8-1870
D8-2077
D8-2134
D8-2170
D8-2194

The AArch32 Application Level Programmers’ Model
About the Application level programmers’ model ............................................
Additional information about the programmers’ model in AArch32 state .........
Advanced SIMD and floating-point instructions ...............................................
Coprocessor support .......................................................................................
Exceptions and debug events .........................................................................

E1-2288
E1-2289
E1-2303
E1-2331
E1-2332

The AArch32 Application Level Memory Model
E2.1
E2.2
E2.3
E2.4
E2.5
E2.6
E2.7
E2.8
E2.9
E2.10

viii

About the Generic Timer ................................................................................. D7-1856
About the Generic Timer registers .................................................................. D7-1864

The AArch32 Application Level Architecture
E1.1
E1.2
E1.3
E1.4
E1.5

Chapter E2

D6-1822
D6-1824
D6-1826
D6-1828
D6-1829
D6-1831
D6-1832
D6-1833
D6-1834
D6-1836
D6-1851
D6-1854

AArch64 System Register Descriptions
D8.1
D8.2
D8.3
D8.4
D8.5
D8.6

Part E

About the Performance Monitors .....................................................................
Accuracy of the Performance Monitors ...........................................................
Behavior on overflow .......................................................................................
Attributability ....................................................................................................
Effect of EL3 and EL2 .....................................................................................
Event filtering ...................................................................................................
Performance Monitors and Debug state ..........................................................
Counter enables ..............................................................................................
Counter access ...............................................................................................
Event numbers and mnemonics ......................................................................
Performance Monitors Extension registers .....................................................
Pseudocode details .........................................................................................

The Generic Timer
D7.1
D7.2

Chapter D8

D5-1710
D5-1760
D5-1772
D5-1781
D5-1796
D5-1804
D5-1818

The Performance Monitors Extension
D6.1
D6.2
D6.3
D6.4
D6.5
D6.6
D6.7
D6.8
D6.9
D6.10
D6.11
D6.12

Chapter D7

The VMSAv8-64 address translation system ..................................................
Translation table walk examples .....................................................................
VMSAv8-64 translation table format descriptors .............................................
Access controls and memory region attributes ...............................................
MMU faults ......................................................................................................
Translation Lookaside Buffers (TLBs) .............................................................
Caches in a VMSA implementation .................................................................

Address space .................................................................................................
Memory type overview ....................................................................................
Caches and memory hierarchy .......................................................................
Alignment support ...........................................................................................
Endian support ................................................................................................
Atomicity in the ARM architecture ...................................................................
Memory ordering .............................................................................................
Memory types and attributes ...........................................................................
Mismatched memory attributes .......................................................................
Synchronization and semaphores ...................................................................

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

E2-2334
E2-2336
E2-2337
E2-2341
E2-2343
E2-2346
E2-2350
E2-2357
E2-2366
E2-2369

ARM DDI 0487A.a
ID090413

Part F
Chapter F1

The AArch32 Instruction Sets
The AArch32 Instruction Sets Overview
F1.1
F1.2
F1.3
F1.4
F1.5
F1.6
F1.7
F1.8
F1.9
F1.10
F1.11
F1.12
F1.13

Chapter F2

A32 instruction set encoding ............................................................................
Data-processing and miscellaneous instructions .............................................
Load/store word and unsigned byte .................................................................
Media instructions ............................................................................................
Branch, branch with link, and block data transfer ............................................
Coprocessor instructions, and Supervisor Call ................................................
Unconditional instructions ................................................................................

F4-2466
F4-2468
F4-2480
F4-2481
F4-2486
F4-2487
F4-2488

Overview ..........................................................................................................
Advanced SIMD and floating-point instruction syntax ......................................
Register encoding ............................................................................................
Advanced SIMD data-processing instructions .................................................
Floating-point data-processing instructions .....................................................
Extension register load/store instructions ........................................................
Advanced SIMD element or structure load/store instructions ..........................
8, 16, and 32-bit transfer between general-purpose and extension registers ..
64-bit transfers between general-purpose and extension registers .................

F5-2492
F5-2493
F5-2497
F5-2499
F5-2511
F5-2514
F5-2515
F5-2518
F5-2519

ARMv8 Changes to the T32 and A32 Instruction Sets
F6.1
F6.2
F6.3

ARM DDI 0487A.a
ID090413

T32 instruction set encoding ............................................................................ F3-2432
16-bit T32 instruction encoding ........................................................................ F3-2435
32-bit T32 instruction encoding ........................................................................ F3-2442

T32 and A32 Instruction Sets Advanced SIMD and floating-point
Encodings
F5.1
F5.2
F5.3
F5.4
F5.5
F5.6
F5.7
F5.8
F5.9

Chapter F6

F2-2410
F2-2415
F2-2416
F2-2419
F2-2422
F2-2423
F2-2426
F2-2427

A32 Base Instruction Set Encoding
F4.1
F4.2
F4.3
F4.4
F4.5
F4.6
F4.7

Chapter F5

Format of instruction descriptions ....................................................................
Standard assembler syntax fields ....................................................................
Conditional execution .......................................................................................
Shifts applied to a register ...............................................................................
Memory accesses ............................................................................................
Integer arithmetic in the T32 and A32 instruction sets .....................................
Encoding of lists of general-purpose registers and the PC ..............................
Additional pseudocode support for instruction descriptions .............................

T32 Base Instruction Set Encoding
F3.1
F3.2
F3.3

Chapter F4

F1-2380
F1-2382
F1-2383
F1-2391
F1-2392
F1-2394
F1-2395
F1-2396
F1-2397
F1-2398
F1-2400
F1-2401
F1-2408

About the T32 and A32 Instruction Descriptions
F2.1
F2.2
F2.3
F2.4
F2.5
F2.6
F2.7
F2.8

Chapter F3

Unified Assembler Language ...........................................................................
Branch instructions ..........................................................................................
Data-processing instructions ............................................................................
Status register access instructions ..................................................................
Load/store instructions .....................................................................................
Load/store multiple instructions .......................................................................
Miscellaneous instructions ...............................................................................
Exception-generating and exception-handling instructions ..............................
Coprocessor instructions .................................................................................
Advanced SIMD and floating-point load/store instructions ..............................
Advanced SIMD and floating-point register transfer instructions .....................
Advanced SIMD data-processing instructions .................................................
Floating-point data-processing instructions .....................................................

The A32 and T32 instruction sets .................................................................... F6-2522
Partial Deprecation of IT .................................................................................. F6-2523
New A32 and T32 Load-Acquire/Store-Release instructions ........................... F6-2524

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ix

F6.4
F6.5
F6.6
F6.7

Chapter F7

Chapter G1

About the AArch32 System level programmers’ model ...................................
Exception levels ..............................................................................................
Exception terminology .....................................................................................
Execution state ................................................................................................
Instruction Set state .........................................................................................
Debug state .....................................................................................................
Security state ...................................................................................................
Virtualization ....................................................................................................
AArch32 PE modes, general-purpose registers, and the PC ..........................
Instruction set states .......................................................................................
Handling exceptions that are taken to an Exception level using AArch32 ......
Asynchronous exception behavior for exceptions taken from AArch32 state .
AArch32 state exception descriptions .............................................................
The conceptual coprocessor interface and system control .............................
Advanced SIMD and floating-point support .....................................................
AArch32 control of traps to the hypervisor ......................................................

G1-3400
G1-3401
G1-3402
G1-3404
G1-3406
G1-3406
G1-3407
G1-3410
G1-3412
G1-3429
G1-3431
G1-3465
G1-3475
G1-3492
G1-3494
G1-3503

The AArch32 System Level Memory Model
About the memory system architecture ...........................................................
Address space .................................................................................................
Mixed-endian support ......................................................................................
Cache support .................................................................................................
ARMv8 CP15 register support for IMPLEMENTATION DEFINED features ....
External aborts ................................................................................................
Memory barrier instructions .............................................................................
Pseudocode details of general memory system instructions ..........................

G2-3520
G2-3521
G2-3522
G2-3524
G2-3545
G2-3546
G2-3548
G2-3549

The AArch32 Virtual Memory System Architecture
G3.1
G3.2
G3.3
G3.4
G3.5
G3.6
G3.7
G3.8
G3.9
G3.10
G3.11
G3.12

x

Alphabetical list of floating-point and Advanced SIMD instructions ................. F8-3076

The AArch32 System Level Programmers’ Model

G2.1
G2.2
G2.3
G2.4
G2.5
G2.6
G2.7
G2.8

Chapter G3

F7-2534
F7-3028
F7-3029
F7-3033

The AArch32 System Level Architecture
G1.1
G1.2
G1.3
G1.4
G1.5
G1.6
G1.7
G1.8
G1.9
G1.10
G1.11
G1.12
G1.13
G1.14
G1.15
G1.16

Chapter G2

Alphabetical list of T32 and A32 base instruction set instructions ...................
General restrictions on system instructions .....................................................
Encoding and use of Banked register transfer instructions .............................
Alphabetical list of system instructions ............................................................

T32 and A32 Advanced SIMD and floating-point Instruction Descriptions
F8.1

Part G

F6-2525
F6-2528
F6-2530
F6-2531

T32 and A32 Base Instruction Set Instruction Descriptions
F7.1
F7.2
F7.3
F7.4

Chapter F8

New A32 and T32 scalar floating-point instructions ........................................
New A32 and T32 Advanced SIMD floating-point instructions ........................
New A32 and T32 cryptography instructions ...................................................
New A32 and T32 System instructions ...........................................................

Execution privilege, Exception levels, and AArch32 Privilege levels ...............
About VMSAv8-32 ...........................................................................................
The effects of disabling address translation stages on VMSAv8-32 behavior
Translation tables ............................................................................................
The VMSAv8-32 Short-descriptor translation table format ..............................
The VMSAv8-32 Long-descriptor translation table format ..............................
Memory access control ...................................................................................
Memory region attributes .................................................................................
Translation Lookaside Buffers (TLBs) .............................................................
TLB maintenance requirements ......................................................................
Caches in VMSAv8-32 ....................................................................................
VMSAv8-32 memory aborts ............................................................................

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

G3-3560
G3-3562
G3-3569
G3-3573
G3-3578
G3-3591
G3-3609
G3-3618
G3-3630
G3-3633
G3-3644
G3-3647

ARM DDI 0487A.a
ID090413

G3.13
G3.14
G3.15
G3.16
G3.17
G3.18
G3.19

Chapter G4

Chapter H1

Introduction to Halting debug events ............................................................... H3-4364
Halting Step debug event ................................................................................. H3-4366
Halt Instruction debug event ............................................................................ H3-4376
Exception Catch debug event .......................................................................... H3-4377
External Debug Request debug event ............................................................. H3-4380
OS Unlock Catch debug event ......................................................................... H3-4381
Reset Catch debug event ................................................................................ H3-4382
Software Access debug event ......................................................................... H3-4383
Synchronization and Halting debug events ...................................................... H3-4384

The Debug Communication Channel and Instruction Transfer Register
Introduction ...................................................................................................... H4-4388
DCC and ITR registers ..................................................................................... H4-4389
DCC and ITR access modes ........................................................................... H4-4391
Flow-control of the DCC and ITR registers ...................................................... H4-4395
Synchronization of DCC and ITR accesses ..................................................... H4-4398
Interrupt-driven use of the DCC ....................................................................... H4-4402
Pseudocode details for the operation of the DCC and ITR registers ............... H4-4403

The Embedded Cross Trigger Interface
H5.1
H5.2
H5.3
H5.4
H5.5
H5.6

ARM DDI 0487A.a
ID090413

About Debug state ........................................................................................... H2-4328
Halting the PE on debug events ...................................................................... H2-4329
Entering Debug state ....................................................................................... H2-4337
Behavior in Debug state ................................................................................... H2-4341
Exiting Debug state .......................................................................................... H2-4361

Halting Debug Events

H4.1
H4.2
H4.3
H4.4
H4.5
H4.6
H4.7

Chapter H5

Introduction to external debug ......................................................................... H1-4324
External debug ................................................................................................. H1-4325

Debug State

H3.1
H3.2
H3.3
H3.4
H3.5
H3.6
H3.7
H3.8
H3.9

Chapter H4

G4-3772
G4-3773
G4-4101
G4-4170
G4-4208
G4-4230

Introduction to External Debug

H2.1
H2.2
H2.3
H2.4
H2.5

Chapter H3

About the AArch32 System registers ..............................................................
General system control registers ....................................................................
Debug registers ...............................................................................................
Performance Monitors registers ......................................................................
Generic Timer registers ..................................................................................
Generic Interrupt Controller CPU interface registers ......................................

External Debug
H1.1
H1.2

Chapter H2

G3-3659
G3-3685
G3-3691
G3-3713
G3-3716
G3-3735
G3-3755

AArch32 System Register Descriptions
G4.1
G4.2
G4.3
G4.4
G4.5
G4.6

Part H

Exception reporting in a VMSAv8-32 implementation .....................................
Virtual Address to Physical Address translation operations ............................
About the System registers for VMSAv8-32 ...................................................
Organization of the CP14 registers in VMSAv8-32 .........................................
Organization of the CP15 registers in VMSAv8-32 .........................................
Functional grouping of VMSAv8-32 System registers ....................................
Pseudocode details of VMSAv8-32 memory system operations ....................

About the Embedded Cross Trigger (ECT) ...................................................... H5-4408
Basic operation on the ECT ............................................................................. H5-4410
Cross-triggers on a PE in an ARMv8 implementation ...................................... H5-4414
Description and allocation of CTI triggers ........................................................ H5-4415
CTI registers programmers’ model .................................................................. H5-4418
Examples ......................................................................................................... H5-4419

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

xi

Chapter H6

Debug Reset and Powerdown Support
H6.1
H6.2
H6.3
H6.4
H6.5

Chapter H7

Chapter I1

Appendix A

About the memory-mapped views of the Performance Monitors registers ........ I3-4694

Appendixes
Architectural Constraints on UNPREDICTABLE behaviors
AArch32 CONSTRAINED UNPREDICTABLE behaviors .......................... AppxA-4702
Constraints on AArch64 state UNPREDICTABLE behaviors .................... AppxA-4765

Recommended External Debug Interface
B.1
B.2

xii

About the Generic Timer specification .............................................................. I2-4680
Memory-mapped counter module ..................................................................... I2-4681
Counter module control and status register summary ....................................... I2-4684
About the memory-mapped view of the counter and timer ................................ I2-4686
The CNTBaseN and CNTPL0BaseN frames .................................................... I2-4687
The CNTCTLBase frame ................................................................................... I2-4689
Providing a complete set of counter and timer features .................................... I2-4690
Gray-count scheme for timer distribution scheme ............................................. I2-4692

Recommended Memory-mapped Interfaces to the Performance Monitors

A.1
A.2

Appendix B

Introduction ........................................................................................................ I1-4598
Performance Monitors registers ........................................................................ I1-4599
Generic Timer registers ..................................................................................... I1-4647

System Level Implementation of the Generic Timer

I3.1

Part J

Introduction ...................................................................................................... H9-4468
Debug registers ............................................................................................... H9-4469
Cross-Trigger Interface registers ..................................................................... H9-4554

Memory-Mapped System Register Descriptions

I2.1
I2.2
I2.3
I2.4
I2.5
I2.6
I2.7
I2.8

Chapter I3

H8-4442
H8-4444
H8-4445
H8-4449
H8-4451
H8-4456
H8-4461
H8-4463
H8-4465

Memory-mapped Components of the ARMv8 Architecture
I1.1
I1.2
I1.3

Chapter I2

Relationship between external debug and System registers ..........................
Supported access sizes ..................................................................................
Synchronization of changes to the external debug registers ...........................
Memory-mapped accesses to the external debug interface ............................
External debug interface register access permissions ....................................
External debug interface registers ...................................................................
Cross-trigger interface registers ......................................................................
Reset and debug .............................................................................................
External debug register resets ........................................................................

External Debug Register Descriptions
H9.1
H9.2
H9.3

Part I

Sample-based profiling .................................................................................... H7-4436

About the External Debug Registers
H8.1
H8.2
H8.3
H8.4
H8.5
H8.6
H8.7
H8.8
H8.9

Chapter H9

H6-4424
H6-4425
H6-4426
H6-4428
H6-4430

The Sample-based Profiling Extension
H7.1

Chapter H8

About Debug over powerdown ........................................................................
Power domains and debug ..............................................................................
Core power domain power states ....................................................................
Emulating low-power states ............................................................................
Debug OS Save and Restore sequences .......................................................

About the recommended external debug interface ................................... AppxB-4774
PMUEVENT bus ........................................................................................ AppxB-4777

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

B.3
B.4
B.5

Appendix C

Recommendations for Performance Monitors Event Numbers for
IMPLEMENTATION DEFINED Events
C.1
C.2

Appendix D

About the ARM pseudocode .....................................................................
Pseudocode for instruction descriptions ...................................................
Data types .................................................................................................
Expressions ..............................................................................................
Operators and built-in functions ................................................................
Statements and program structure ...........................................................
Miscellaneous helper procedures and functions .......................................

AppxH-5060
AppxH-5061
AppxH-5063
AppxH-5067
AppxH-5069
AppxH-5074
AppxH-5078

Pseudocode Index
I.1
I.2

Appendix J

Library pseudocode for AArch64 .............................................................. AppxG-4878
Library pseudocode for AArch32 .............................................................. AppxG-4927
Common library pseudocode .................................................................... AppxG-4986

ARM Pseudocode Definition
H.1
H.2
H.3
H.4
H.5
H.6
H.7

Appendix I

Introduction ................................................................................................ AppxF-4828
Load-Acquire, Store-Release and barriers ................................................ AppxF-4831
Load-Acquire Exclusive, Store-Release Exclusive and barriers ................ AppxF-4839
Using a mailbox to send an interrupt ......................................................... AppxF-4845
Cache and TLB maintenance operations and barriers .............................. AppxF-4846
ARMv7 compatible approaches for ordering, using DMB and DSB barriers .................
AppxF-4859

ARMv8 Pseudocode Library
G.1
G.2
G.3

Appendix H

Implementation guidance for multiple views of Debug registers ................ AppxE-4810
AArch32 equivalent Advanced SIMD Mnemonics ..................................... AppxE-4813
Identifying the cache resources in ARMv8 ................................................. AppxE-4821
Memory access mode in Debug state ........................................................ AppxE-4822

Barrier Litmus Tests
F.1
F.2
F.3
F.4
F.5
F.6

Appendix G

Save Debug registers ............................................................................... AppxD-4804
Restore Debug registers ........................................................................... AppxD-4806

Additional Guidance
E.1
E.2
E.3
E.4

Appendix F

ARM recommendations for IMPLEMENTATION DEFINED event numbers ..................
AppxC-4790
Summary of events taken to an Exception Level using AArch64 ............. AppxC-4801

Example OS Save and Restore sequences
D.1
D.2

Appendix E

DBGCPUDONE ......................................................................................... AppxB-4778
Recommended authentication interface .................................................... AppxB-4779
Management registers and CoreSight compliance .................................... AppxB-4782

Pseudocode operators and keywords ......................................................... AppxI-5082
Pseudocode indexes ................................................................................... AppxI-5085

Registers Index
J.1
J.2
J.3
J.4
J.5
J.6
J.7

Introduction and register disambiguation ...................................................
Alphabetical index of AArch64 registers and system instructions ..............
Functional index of AArch64 registers and system instructions .................
Alphabetical index of AArch32 registers and system instructions ..............
Functional index of AArch32 registers and system instructions .................
Alphabetical index of memory-mapped registers .......................................
Functional index of memory-mapped registers ..........................................

AppxJ-5088
AppxJ-5092
AppxJ-5102
AppxJ-5113
AppxJ-5122
AppxJ-5133
AppxJ-5138

Glossary
ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

xiii

xiv

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

Preface

This preface introduces the ARM Architecture Reference Manual, ARMv8, for ARMv8-A architecture profile. It
contains the following sections:
•
About this manual on page xvi
•
Using this manual on page xviii
•
Conventions on page xxiii
•
Additional reading on page xxv
•
Feedback on page xxvi.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

xv

Preface
About this manual

About this manual
This manual describes the ARM® architecture v8, ARMv8. The architecture describes the operation of an
ARMv8-A Processing element (PE), and this manual includes descriptions of:
•

The two Execution states, AArch64 and AArch32.

•

The instruction sets:
—

In AArch32 state, the A32 and T32 instruction sets, that are compatible with earlier versions of the
ARM architecture.

—

In AArch64 state, the A64 instruction set.

•

The states that determine how a PE operates, including the current Exception level and Security state, and in
AArch32 state the PE mode.

•

The Exception model.

•

The interprocessing model, that supports transitions between AArch64 state and AArch32 state.

•

The memory model, that defines memory ordering and memory management. This manual covers a single
architecture profile, ARMv8-A, that defines a Virtual Memory System Architecture (VMSA).

•

The programmers’ model, and its interfaces to System registers that control most PE and memory system
features, and provide status information.

•

The Advanced SIMD and floating-point instructions, that provide high-performance:
—
Single-precision and double-precision floating-point operations
—
Conversions between double-precision, single-precision, and half-precision floating-point values.
—
Integer, single-precision floating-point, and in A64, double-precision vector operations in all
instruction sets.
—
Double-precision floating-point vector operations in the A64 instruction set.

•

The security model, that provides two security states to support secure applications.

•

The virtualization model, that support the virtualization of Non-secure operation.

•

The Debug architecture, that provides software access to debug features.

This manual gives the assembler syntax for the instructions it describes, meaning that it describes instructions in
textual form. However, this manual is not a tutorial for ARM assembler language, nor does it describe ARM
assembler language, except at a very basic level. To make effective use of ARM assembler language, read the
documentation supplied with the assembler being used.
This manual is organized into parts:

xvi

Part A

Provides an introduction to the ARMv8-A architecture, and an overview of the AArch64 and
AArch32 Execution states.

Part B

Describes the application level view of the AArch64 Execution state, meaning the view from EL0.
It describes the application level view of the programmers’ model and the memory model.

Part C

Describes the A64 instruction set, that is available in the AArch64 Execution state. The descriptions
for each instruction also include the precise effects of each instruction when executed at EL0,
described as unprivileged execution, including any restrictions on its use, and how the effects of the
instruction differ at higher Exception levels. This information is of primary importance to authors
and users of compilers, assemblers, and other programs that generate ARM machine code.

Part D

Describes the system level view of the AArch64 Execution state. It includes details of the System
registers, most of which are not accessible from EL0, and the system level view of the programmers’
model and the memory model. This part includes the description of self-hosted debug.

Part E

Describes the application level view of the AArch32 Execution state, meaning the view from the
EL0. It describes the application level view of the programmers’ model and the memory model.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

Preface
About this manual

Note
In AArch32 state, execution at EL0 is execution in User mode.
Part F

Describes the T32 and A32 instruction sets, that are available in the AArch32 Execution state. These
instruction sets are backwards-compatible with earlier versions of the ARM architecture. This part
describes the precise effects of each instruction when executed in User mode, described as
unprivileged execution or execution at EL0, including any restrictions on its use, and how the effects
of the instruction differ at higher Exception levels. This information is of primary importance to
authors and users of compilers, assemblers, and other programs that generate ARM machine code.

Note
User mode is the only mode where software execution is unprivileged.

ARM DDI 0487A.a
ID090413

Part G

Describes the system level view of the AArch32 Execution state, that is generally compatible with
earlier versions of the ARM architecture. This part includes details of the System registers, most of
which are not accessible from EL0, and the conceptual coprocessor interface to those registers. It
also describes the system level view of the programmers’ model and the memory model.

Part H

Describes the Debug architecture for external debug. This provides configuration, breakpoint and
watchpoint support, and a Debug Communications Channel (DCC) to a debug host.

Part I

Describes additional features of the architecture that are not closely coupled to a processing element
(PE), and therefore are accessed through memory-mapped interfaces. Some of these features are
OPTIONAL.

Appendixes

Provide additional information that is not part of the ARMv8 architectural requirements.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

xvii

Preface
Using this manual

Using this manual
The information in this manual is organized into parts, as described in this section.

Part A, Introduction and Architecture Overview
Part A gives an overview of the ARMv8-A architecture profile, including its relationship to the other ARM PE
architectures. It introduces the terminology used to describe the architecture, and gives an overview of the
Executions states, AArch64 and AArch32. It contains the following chapter:
Chapter A1 Introduction to the ARMv8 Architecture
Read this for an introduction to the ARMv8 architecture.

Part B, The AArch64 Application Level Architecture
Part B describes the application level view of the architecture in AArch64 state. It contains the following chapters:
Chapter B1 The AArch64 Application Level Programmers’ Model
Read this for an application level description of the programmers’ model for software executing in
AArch64 state. It describes execution at EL0 when EL0 is using AArch64 state.
Chapter B2 The AArch64 Application Level Memory Model
Read this for an application level description of the memory model for software executing in
AArch64 state. It describes the memory model for execution in EL0 when EL0 is using AArch64
state. It includes information about ARM memory types, attributes, and memory access controls.

Part C, The A64 Instruction Set
Part C describes the A64 instruction set, that is used in AArch64 state. It contains the following chapters:
Chapter C1 The A64 Instruction Set
Read this for a description of the A64 instruction set and common instruction operation details.
Chapter C2 A64 Instruction Set Overview
Read this for an overview of the individual A64 instructions, that are divided into five functional
groups.
Chapter C3 A64 Instruction Set Encoding
Read this for a description of the A64 instruction set encoding.
Chapter C4 The AArch64 System Instruction Class
Read this for a description of the AArch64 system instructions and register descriptions, and the
system instruction class encoding space.
Chapter C5 A64 Base Instruction Descriptions
Read this for information on key aspects of the A64 base instructions and for descriptions of the
individual instructions, which are listed in alphabetical order.
Chapter C6 A64 SIMD and Floating-point Instruction Descriptions
Read this for information on key aspects of the A64 Advanced SIMD and floating-point instructions
and for descriptions of the individual instructions, which are listed in alphabetical order.

Part D, The AArch64 System Level Architecture
Part D describes the AArch64 the system level view of the architecture. It contains the following chapters:
Chapter D1 The AArch64 System Level Programmers’ Model
Read this for a description of the AArch64 system level view of the programmers’ model.

xviii

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

Preface
Using this manual

Chapter D2 Debug Exceptions
Read this for an introduction to, and a description of, different software debug events.
Chapter D3 The Debug Exception Model
Read this for a description of debug exceptions.
Chapter D4 The AArch64 System Level Memory Model
Read this for a description of the AArch64 system level view of the general features of the memory
system.
Chapter D5 The AArch64 Virtual Memory System Architecture
Read this for a system level view of the AArch64 Virtual Memory System Architecture (VMSA),
the memory system architecture of an ARMv8 implementation that is executing in AArch64 state.
Chapter D6 The Performance Monitors Extension
Read this for a description of an implementation of the ARM Performance Monitors, that are an
optional non-invasive debug component.
Chapter D7 The Generic Timer
Read this for a description of an implementation of the ARM Generic Timer, that is an extension to
an ARMv8 PE implementation.
Chapter D8 AArch64 System Register Descriptions
Read this for an introduction to, and description of, each of the AArch64 system registers.

Part E, The AArch32 Application Level Architecture
Part E describes the AArch32 application level view of the architecture. It contains the following chapters:
Chapter E1 The AArch32 Application Level Programmers’ Model
Read this for an application level description of the programmers’ model for software executing in
AArch32 state. It describes execution at EL0 when EL0 is using AArch32 state.
Chapter E2 The AArch32 Application Level Memory Model
Read this for an application level description of the memory model for software executing in
AArch32 state. It describes the memory model for execution in EL0 when EL0 is using AArch32
state. It includes information about ARM memory types, attributes, and memory access controls.

Part F, The AArch32 Instruction Sets
Part F describes the T32 and A32 instruction sets, that are used in AArch32 state. It contains the following chapters:
Chapter F1 The AArch32 Instruction Sets Overview
Read this for an overview of the T32 and A32 instruction sets.
Chapter F2 About the T32 and A32 Instruction Descriptions
Read this for a description of the T32 and A32 instructions.
Chapter F3 T32 Base Instruction Set Encoding
Read this for an introduction to the T32 instruction set and a description of how the T32 instruction
set uses the ARM programmers’ model.
Chapter F4 A32 Base Instruction Set Encoding
Read this for a description of the A32 base instruction set encoding.
Chapter F5 T32 and A32 Instruction Sets Advanced SIMD and floating-point Encodings
Read this for an overview of the T32 and A32 Advanced SIMD and floating-point instruction sets.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

xix

Preface
Using this manual

Chapter F6 ARMv8 Changes to the T32 and A32 Instruction Sets
Read this for a summary of the changes that are introduced to the T32 and A32 instruction sets in
ARMv8.
Chapter F7 T32 and A32 Base Instruction Set Instruction Descriptions
Read this for a description of each T32 and A32 base instruction.
Chapter F8 T32 and A32 Advanced SIMD and floating-point Instruction Descriptions
Read this for a description of each T32 and A32 Advanced SIMD and floating-point instruction.

Part G, The AArch32 System Level Architecture
Part G describes the AArch32 system level view of the architecture. It contains the following chapters:
Chapter G1 The AArch32 System Level Programmers’ Model
Read this for a description of the AArch32 system level view of the programmers’ model for
execution in an Exception level that is using AArch32.
Chapter G2 The AArch32 System Level Memory Model
Read this for a system level view of the general features of the memory system.
Chapter G3 The AArch32 Virtual Memory System Architecture
Read this for a description of the AArch32 Virtual Memory System Architecture (VMSA).
Chapter G4 AArch32 System Register Descriptions
Read this for a description of each of the AArch32 system registers.

Part H, External Debug
Part H describes the architecture for external debug. It contains the following chapters:
Chapter H1 Introduction to External Debug
Read this for an introduction to external debug, and a definition of the scope of this part of the
manual.
Chapter H2 Debug State
Read this for a description of debug state, which the PE might enter as the result of a Halting debug
event.
Chapter H3 Halting Debug Events
Read this for a description of the external debug events referred to as Halting debug events.
Chapter H4 The Debug Communication Channel and Instruction Transfer Register
Read this for a description of the communication between a debugger and the PE debug logic using
the Debug Communications Channel and the Instruction Transfer register.
Chapter H5 The Embedded Cross Trigger Interface
Read this for a description of the embedded cross-trigger interface.
Chapter H6 Debug Reset and Powerdown Support
Read this for a description of reset and powerdown support in the Debug architecture.
Chapter H7 The Sample-based Profiling Extension
Read this for a description of the Sample-based Profiling extension that is an optional extension to
the ARMv8 architecture.

xx

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

Preface
Using this manual

Chapter H8 About the External Debug Registers
Read this for some additional information about the external debug registers.
Chapter H9 External Debug Register Descriptions
Read this for a description of each external debug register.

Part I, Memory-mapped Components of the ARMv8 Architecture
Part I describes the memory-mapped components in the architecture. It contains the following chapters:
Chapter I1 Memory-Mapped System Register Descriptions
Read this for a description of each memory-mapped system register.
Chapter I2 System Level Implementation of the Generic Timer
Read this for a definition of a system level implementation of the Generic Timer.
Chapter I3 Recommended Memory-mapped Interfaces to the Performance Monitors
Read this for a description of the recommended memory-mapped and external debug interfaces to
the Performance Monitors.

Part J, Appendixes
This manual contains the following appendixes:
Appendix A Architectural Constraints on UNPREDICTABLE behaviors
Read this for a description of the architecturally-required constraints on UNPREDICTABLE behaviors
in the ARMv8 architecture, including AArch32 behaviors that were UNPREDICTABLE in previous
versions of the architecture.
Appendix B Recommended External Debug Interface
Read this for a description of the recommended external debug interface.

Note
This description is not part of the ARM architecture specification. It is included here as
supplementary information, for the convenience of developers and users who might require this
information.
Appendix C Recommendations for Performance Monitors Event Numbers for IMPLEMENTATION
DEFINED Events
Read this for a description of ARM recommendations for the use of the IMPLEMENTATION DEFINED
event numbers.

Note
This description is not part of the ARM architecture specification. It is included here as
supplementary information, for the convenience of developers and users who might require this
information.
Appendix D Example OS Save and Restore sequences
Read this for software examples that perform the OS Save and Restore sequences for an ARMv8
debug implementation.

Note
Chapter H6 Debug Reset and Powerdown Support describes the OS Save and Restore mechanism.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

xxi

Preface
Using this manual

Appendix E Additional Guidance
Read this for information about implementing and using the ARM architecture.

Note
This description is not part of the ARM architecture specification. It is included here as
supplementary information, for the convenience of developers and users who might require this
information.
Appendix F Barrier Litmus Tests
Read this for examples of the use of barrier instructions provided by the ARMv8 architecture.

Note
This description is not part of the ARM architecture specification. It is included here as
supplementary information, for the convenience of developers and users who might require this
information.
Appendix G ARMv8 Pseudocode Library
Read this for the pseudocode definitions that are shared between AArch32 and AArch64.
Appendix H ARM Pseudocode Definition
Read this for definitions of the AArch32 pseudocode.
Appendix I Pseudocode Index
Read this for an index of the pseudocode.
Appendix J Registers Index
Read this for an alphabetic and functional index of AArch32 and AArch64 registers, and
memory-mapped registers.

xxii

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

Preface
Conventions

Conventions
The following sections describe conventions that this book can use:
•
Typographic conventions.
•
Signals.
•
Numbers.
•
Pseudocode descriptions.
•
Assembler syntax descriptions on page xxiv.

Typographic conventions
The typographical conventions are:
italic

Introduces special terminology, and denotes citations.

bold

Denotes signal names, and is used for terms in descriptive lists, where appropriate.

monospace

Used for assembler syntax descriptions, pseudocode, and source code examples.
Also used in the main text for instruction mnemonics and for references to other items appearing in
assembler syntax descriptions, pseudocode, and source code examples.

SMALL CAPITALS

Used in body text for a few terms that have specific technical meanings, and are defined in the
Glossary.
Colored text

Indicates a link. This can be:
•

A URL, for example, http://infocenter.arm.com.

•

A cross-reference, that includes the page number of the referenced information if it is not on
the current page, for example, Pseudocode descriptions.

•

A link, to a chapter or appendix, or to a glossary entry, or to the section of the document that
defines the colored term, for example Simple sequential execution or SCTLR.

Signals
In general this specification does not define hardware signals, but it does include some signal examples and
recommendations. The signal conventions are:
Signal level

The level of an asserted signal depends on whether the signal is active-HIGH or
active-LOW. Asserted means:
•
HIGH for active-HIGH signals.
•
LOW for active-LOW signals.

Lower-case n

At the start or end of a signal name denotes an active-LOW signal.

Numbers
Numbers are normally written in decimal. Binary numbers are preceded by 0b, and hexadecimal numbers by 0x. In
both cases, the prefix and the associated value are written in a monospace font, for example 0xFFFF0000. To improve
readability, long numbers can be written with an underscore separator between every four characters, for example
0xFFFF_0000_0000_0000. Ignore any underscores when interpreting the value of a number.

Pseudocode descriptions
This manual uses a form of pseudocode to provide precise descriptions of the specified functionality. This
pseudocode is written in monospace font, and is described in Appendix H ARM Pseudocode Definition.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

xxiii

Preface
Conventions

Assembler syntax descriptions
This manual contains numerous syntax descriptions for assembler instructions and for components of assembler
instructions. These are shown in a monospace font, and use the conventions described in Structure of the A64
assembler language on page C1-113, Appendix H ARM Pseudocode Definition, and Pseudocode operators and
keywords on page AppxI-5082.

xxiv

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

Preface
Additional reading

Additional reading
This section lists relevant publications from ARM and third parties.
See the Infocenter, http://infocenter.arm.com, for access to ARM documentation.

ARM publications
•

ARM® Architecture Reference Manual, ARMv7-A and ARMv7-R edition (ARM DDI 0406).

•

ARM® Debug Interface Architecture Specification, ADIv5.0 to ADIv5.2 (ARM IHI 0031).

•
•

CoreSight™ Program Flow Trace Architecture Specification (ARM IHI 0035).
ARM®Embedded Trace Macrocell Architecture Specification, ETMv4 (ARM IHI 0064).

•
•

ARM® Generic Interrupt Controller Architecture Specification, GIC architecture version 3.0
(ARM IHI 0048).
CoreSight™ SoC Technical Reference Manual (ARM DDI 0480).

•

ARM® Procedure Call Standard for the ARM 64-bit Architecture (ARM IHI 0055).

Other publications
The following publications are referred to in this manual, or provide more information:
•
Announcing the Advanced Encryption Standard (AES), Federal Information Processing Standards
Publication 197, November 2001.
•
IEEE 754-2008, IEEE Standard for Floating-point Arithmetic, August 2008.
•
Secure Hash Standard (SHA), Federal Information Processing Standards Publication 180-2, August 2002.
•
The Galois/Counter Mode of Operation, McGraw, D. and Viega, J., Submission to NIST Modes of Operation
Process, January 2004.
•
Memory Consistency Models for Shared Memory-Multiprocessors, Gharachorloo, Kourosh, 1995, Stanford
University Technical Report CSL-TR-95-685.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

xxv

Preface
Feedback

Feedback
ARM welcomes feedback on its documentation.

Feedback on this manual
If you have comments on the content of this manual, send e-mail to errata@arm.com. Give:
•
The title.
•
The number, ARM DDI 0487A.a.
•
The page numbers to which your comments apply.
•
A concise explanation of your comments.
ARM also welcomes general suggestions for additions and improvements.

xxvi

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

Part A
ARMv8 Architecture Introduction and Overview

Chapter A1
Introduction to the ARMv8 Architecture

This chapter introduces the ARM architecture and contains the following sections:
•
About the ARM architecture on page A1-30.
•
Architecture profiles on page A1-32.
•
ARMv8 architectural concepts on page A1-33.
•
Supported data types on page A1-36.
•
Floating-point and Advanced SIMD support on page A1-46.
•
Cryptographic Extension on page A1-52.
•
The ARM memory model on page A1-53.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

A1-29

A1 Introduction to the ARMv8 Architecture
A1.1 About the ARM architecture

A1.1

About the ARM architecture
The ARM architecture-described in this Architecture Reference Manual-defines the behavior of an abstract
machine, referred to as a Processing Element, often abbreviated to PE. Implementations compliant with the ARM
architecture must conform to the described behavior of the Processing Element. It is not intended to describe how
to build an implementation of the PE, nor to limit the scope of such implementations beyond the defined behaviors.
Except where the architecture specifies differently, the programmer-visible behavior of an implementation that is
compliant with the ARM architecture must be the same as a simple sequential execution of the program on the
processing element. This programmer-visible behavior does not include the execution time of the program.
The ARM Architecture Reference Manual also describes rules for software to use the Processing Element.
The ARM architecture includes definitions of:
•

An associated debug architecture, see Debug architecture versions on page A1-32 and Part H of this manual.

•

Associated trace architectures, that define trace macrocells that implementers can implement with the
associated processor hardware. For more information see the Embedded Trace Macrocell Architecture
Specification and the CoreSight Program Flow Trace Architecture Specification.

The ARM architecture is a Reduced Instruction Set Computer (RISC) architecture with the following RISC
architecture features:
•

A large uniform register file.

•

A load/store architecture, where data-processing operations only operate on register contents, not directly on
memory contents.

•

Simple addressing modes, with all load/store addresses determined from register contents and instruction
fields only.

The architecture defines the interaction of the Processing Element with memory, including caches, and includes a
memory translation system. It also describes how multiple Processing Elements interact with each other and with
other observers in a system.
This document defines the ARMv8 version of the A profile. See Architecture profiles on page A1-32 for more
information on architecture profiles.
The ARM architecture supports implementations across a wide range of performance points. Implementation size,
performance, and very low power consumption are key attributes of the ARM architecture.
An important feature of the ARMv8 architecture is backwards compatibility, combined with the freedom for optimal
implementation in a wide range of standard and more specialized use cases. The ARMv8 architecture supports:
•
A 64-bit Execution state, AArch64.
•
A 32-bit Execution state, AArch32, that is compatible with previous versions of the ARM architecture.

Note
The AArch32 Execution state is compatible with the ARMv7-A architecture profile, and enhances that profile to
support some features included in the AArch64 Execution state.
Both Execution states support SIMD and floating-point instructions.

A1-30

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

A1 Introduction to the ARMv8 Architecture
A1.1 About the ARM architecture

Note

ARM DDI 0487A.a
ID090413

•

AArch32 state provides both:
—
SIMD instructions in the base instruction sets, that operate on the 32-bit general-purpose registers.
—
SIMD instructions that operate on 64-bit SIMD and floating-point registers, and are identified as
Advanced SIMD instructions.

•

AArch64 state provides only SIMD instructions that operate on 128-bit SIMD and floating-point registers.
AArch64 state descriptions use SIMD as a synonym for Advanced SIMD.

•

See Conventions on page xxiii for information about conventions used in this manual, including the use of
SMALL CAPITALS for the terms CONSTRAINED UNPREDICTABLE, IMPLEMENTATION DEFINED,
OPTIONAL, RES0, RES1, UNDEFINED, UNKNOWN, and UNPREDICTABLE, that have ARM-specific
meanings that are defined in the Glossary

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

A1-31

A1 Introduction to the ARMv8 Architecture
A1.2 Architecture profiles

A1.2

Architecture profiles
The ARM architecture has evolved significantly since its introduction, and ARM continues to develop it. Eight
major versions of the architecture have been defined to date, denoted by the version numbers 1 to 8. Of these, the
first three versions are now obsolete.
The generic names AArch64 and AArch32 describe the 64-bit and 32-bit Execution states:
AArch64

Is the 64-bit Execution state, meaning addresses are held in 64-bit registers, and instructions in the
base instruction set can use 64-bit registers for their processing. AArch64 state supports the A64
instruction set.

AArch32

Is the 32-bit Execution state, meaning addresses are held in 32-bit registers, and instructions in the
base instruction sets use 32-bit registers for their processing. AArch32 state supports the T32 and
A32 instruction sets.

See sections Execution state on page A1-33 and The ARM instruction sets on page A1-34 for more information.
ARM defines three architecture profiles:
A

Application profile, described in this manual:
•

Supports a Virtual Memory System Architecture (VMSA) based on a Memory Management
Unit (MMU).

Note
An ARMv8-A implementation can be called an AArchv8-A implementation.
•
R

Supports the A64, A32 and T32 instruction sets.

Real-time profile:

M

•

Supports a Protected Memory System Architecture (PMSA) based on a Memory Protection
Unit (MPU).

•

Supports the A32 and T32 instruction sets.

Microcontroller profile:
•

Implements a programmers' model designed for low-latency interrupt processing, with
hardware stacking of registers and support for writing interrupt handlers in high-level
languages.

•

Implements a variant of the R-profile PMSA.

•

Supports a variant of the T32 instruction set.

Note
This Architecture Reference Manual describes only the ARMv8-A profile.
For information about the R and M architecture profiles, and earlier ARM architecture versions see:
•
The ARM® Architecture Reference Manual, ARMv7-A and ARMv7-R edition.
•
The ARM®v7-M Architecture Reference Manual.
•
The ARM®v6-M Architecture Reference Manual.

A1.2.1

Debug architecture versions
From ARMv7 the ARM debug architecture is fully integrated with the architecture version.
For information about earlier ARM debug architecture versions, see the ARM® Architecture Reference Manual,
ARMv7-A and ARMv7-R edition.
For more information, see Chapter H1 Introduction to External Debug.

A1-32

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

A1 Introduction to the ARMv8 Architecture
A1.3 ARMv8 architectural concepts

A1.3

ARMv8 architectural concepts
ARMv8 introduces major changes to the ARM architecture, while maintaining a high level of consistency with
previous versions of the architecture. The ARMv8 Architecture Reference Manual includes significant changes in
the terminology used to describe the architecture, and this section introduces both the ARMv8 architectural concepts
and the associated terminology.
The following subsections describe key ARMv8 architectural concepts. Each section introduces the corresponding
terms that are used to describe the architecture:
•
Execution state.
•
The ARM instruction sets on page A1-34.
•
System registers on page A1-34.
•
ARMv8 Debug on page A1-35.

A1.3.1

Execution state
The Execution state defines the PE execution environment, including:
•
The supported register widths.
•
The supported instruction sets.
•
Significant aspects of:
—
The exception model.
—
The Virtual Memory System Architecture (VMSA).
—
The programmers’ model.
The Execution states are:
AArch64

AArch32

ARM DDI 0487A.a
ID090413

The 64-bit Execution state. This Execution state:
•

Provides 31 64-bit general-purpose registers, of which X30 is used as the procedure link
register.

•

Provides a 64-bit program counter (PC), stack pointers (SPs), and exception link registers
(ELRs).

•

Provides 32 128-bit registers for SIMD vector and scalar floating-point support.

•

Provides a single instruction set, A64. For more information, see The ARM instruction sets
on page A1-34.

•

Defines the ARMv8 Exception model, with up to four Exception levels, EL0 - EL3, that
provide an execution privilege hierarchy, see Exception levels on page D1-1408.

•

Support for 64-bit virtual addressing. For more information, including the limits on address
ranges, see Chapter D5 The AArch64 Virtual Memory System Architecture.

•

Defines a number of PSTATE elements that hold PE state. The A64 instruction set includes
instructions that operate directly on various PSTATE elements.

•

Names each system register using a suffix that indicates the lowest Exception level at which
the register can be accessed.

The 32-bit Execution state. This Execution state:
•

Provides 13 32-bit general-purpose registers, and a 32-bit PC, SP, and link register (LR).The
LR is used as both an ELR and a procedure link register.
Some of these registers have multiple banked instances for use in different PE modes.

•

Provides a single ELR, for exception returns from Hyp mode.

•

Provides 32 64-bit registers for Advanced SIMD vector and scalar floating-point support.

•

Provides two instruction sets, A32 and T32. For more information, see The ARM instruction
sets on page A1-34.

•

Supports the ARMv7-A exception model, based on PE modes, and maps this onto the
ARMv8 Exception model, that is based on the Exception levels, see Exception levels on
page G1-3401.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

A1-33

A1 Introduction to the ARMv8 Architecture
A1.3 ARMv8 architectural concepts

•

Uses 32-bit virtual addresses.

•

Uses a single Current Program State Register (CPSR) to hold the PE state.

Later subsections give more information about the different properties of the Execution states.
Making transitions between the AArch64 and AArch32 execution states is known as interprocessing.The PE can
move between execution states only on a change of Exception level, and subject to the rules given in Interprocessing
on page D1-1542. This means different software layers, such as an application, an operating system kernel, and a
hypervisor, executing at different Exception levels, can execute in different execution states.

A1.3.2

The ARM instruction sets
In ARMv8 the possible instruction sets depend on the execution state:
AArch64

AArch64 state supports only a single instruction set, called A64. This is a fixed-length instruction
set that uses 32-bit instruction encodings.
For information on the A64 instruction set, see Chapter C2 A64 Instruction Set Overview.

AArch32

AArch32 state supports the following instruction sets:
A32

This is a fixed-length instruction set that uses 32-bit instruction encodings. It is
compatible with the ARMv7 ARM instruction set.

T32

This is a variable-length instruction set that uses both 16-bit and 32-bit instruction
encodings. It is compatible with the ARMv7 Thumb® instruction set

In previous documentation, these instruction sets were called the ARM and Thumb instruction sets.
ARMv8 extends each of these instruction sets. The PE Instruction set state determines the
instruction set that the PE executes.
For information on the A32 and T32 instruction sets, see Chapter F1 The AArch32 Instruction Sets
Overview.
The ARMv8 instruction sets support SIMD and scalar floating-point instructions. See Floating-point and Advanced
SIMD support on page A1-46.

A1.3.3

System registers
System registers provide control and status information of architected features.
The System registers use a standard naming format: . to identify specific
registers as well as control and status bits within a register.
Bits can also be described by their numerical position in the form [x:y] or the generic form
bits[x:y].
In addition, in AArch64 state, most register names include the lowest Exception level that can access the register as
a suffix to the register name:
•

_ELx, where x is 0, 1, 2, or 3.

For information about Exception levels, see Exception levels on page D1-1408.
The System registers comprise:
•
General system control registers.
•
Debug registers.
•
Generic Timer registers.
•
Optionally, Performance Monitor registers.
•
Optionally, Trace registers.
•
Optionally, Generic Interrupt Controller (GIC) CPU interface registers.
The Embedded Trace Macrocell Architecture Specification, ETMv4 defines the Trace registers. This ARMv8
reference manual describes all the other System registers.

A1-34

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

A1 Introduction to the ARMv8 Architecture
A1.3 ARMv8 architectural concepts

For information about the AArch64 System registers, see Chapter D8 AArch64 System Register Descriptions.
For information about the AArch32 System registers, see Chapter G4 AArch32 System Register Descriptions.

The ARM Generic Interrupt Controller CPU interface
Version 3 of the ARM Generic Interrupt Controller architecture, GICv3, defines a system register interface to the
GIC CPU interface. The System register descriptions in this ARMv8 manual include these registers, see Generic
Interrupt Controller CPU interface registers on page D8-2194.

Note
The programmers’ model for earlier versions of the GIC architecture is wholly memory-mapped.
For more information about the ARM Generic Interrupt Controller, see the ARM Generic Interrupt Controller
Architecture Specification, GIC architecture version 3.0.

A1.3.4

ARMv8 Debug
ARMv8 supports the following:
Self-hosted debug
In this model, the PE generates debug exceptions. Debug exceptions are part of the ARMv8
Exception model.
External debug
In this model, debug events cause the PE to enter Debug state. In Debug state the PE is controlled
by an external debugger.
All ARMv8 implementations support both models. The model chosen by a particular user depends on the debug
requirements during different stages of the design and development life cycle of the product. For example, external
debug might be used during debugging of the hardware implementation and OS bring-up, and self-hosted debug
might be used during application development.
For more information about self-hosted debug, see:
•
Chapter D2 Debug Exceptions.
•
Chapter D3 The Debug Exception Model.
For more information about external debug, see Part H External Debug.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

A1-35

A1 Introduction to the ARMv8 Architecture
A1.4 Supported data types

A1.4

Supported data types
The ARMv8 architecture supports the following integer data types:
Byte
8 bits.
Halfword
16 bits.
Word
32 bits.
Doubleword 64 bits.
Quadword
128 bits.
The architecture also supports the following floating-point data types:
•
Half-precision, see Half-precision floating-point formats on page A1-40 for details.
•
Single-precision, see Single-precision floating-point format on page A1-42 for details.
•
Double-precision, see Double-precision floating-point format on page A1-43 for details.
It also supports:
•
Fixed-point interpretation of words and doublewords. See Fixed-point format on page A1-44.
•
Vectors, where a register holds multiple elements, each of the same data type. See Vector formats on
page A1-37 for details.
The ARMv8 architecture provides two register files:
•
A general-purpose register file.
•
A SIMD and floating-point register file.
In each of these, the possible register widths depend on the Execution state.
In AArch64 state:
•

A general-purpose register file contains 64-bit registers:
—

•

Many instructions can access these registers as 64-bit registers or as 32-bit registers, using only the
bottom 32 bits.

A SIMD and floating-point register file contains 128-bit registers:
—

The quadword integer data types only apply to the SIMD and floating-point register file.

—

The floating-point data types only apply to the SIMD and floating-point register file.

—

While the AArch64 vector registers support 128-bit vectors, the effective vector length can be 64-bits
or 128-bits depending on the A64 instruction encoding used, see Instruction Mnemonics on
page C1-113

For more information on the register files in AArch64, see Registers in AArch64 Execution state on page B1-59.
In AArch32 state:
•

•

A general-purpose register file contains 32-bit registers:
—

Two 32-bit registers can support a doubleword.

—

Vector formatting is supported, see Figure A1-4 on page A1-40.

A SIMD and floating-point register file contains 64-bit registers:
—

AArch32 state does not support quadword integer or floating-point data types.

Note
Two consecutive 64-bit registers can be used as a 128-bit register.
For more information on the register files in AArch32, see The general-purpose registers, and the PC, in AArch32
state on page E1-2294

A1-36

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

A1 Introduction to the ARMv8 Architecture
A1.4 Supported data types

A1.4.1

Vector formats
In an implementation that includes the SIMD instructions that operate on the SIMD and floating-point register file,
a register can hold one or more packed elements, all of the same size and type. The combination of a register and a
data type describes a vector of elements. The vector is considered to be an array of elements of the data type
specified in the instruction. The number of elements in the vector is implied by the size of the data elements and the
size of the register.
Vector indices are in the range 0 to (number of elements – 1). An index of 0 refers to the least significant end of the
vector.

Vector formats in AArch64 state
In AArch64 state, the SIMD and floating-point registers are generically known by the name Vn, where n is a value
from 0 to 31 that identifies 1 of 32 registers.
The SIMD and floating-point registers support three data formats for loads, stores and data processing operations:
•
A single, scalar, element in the least significant bits of the register.
•
A 64-bit vector of byte, halfword, or word elements.
•
A 128-bit vector of byte, halfword, word or doubleword elements.
The element sizes are defined in Table A1-1 with the vector format described as:
•
For a 128-bit vector: Vn{.2D, .4S, .8H, .16B}.
•
For a 64-bit vector: Vn{.1D, .2S, .4H, .8B}.
Table A1-1 SIMD elements
Mnemonic

Size

B

8 bits

H

16 bits

S

32 bits

D

64 bits

Figure A1-1 on page A1-38 shows the SIMD vectors in AArch64 state.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

A1-37

A1 Introduction to the ARMv8 Architecture
A1.4 Supported data types

127

112 111

96 95

80 79

64 63

48 47

32 31

16 15

0

Vn
128-bit vector of 64-bit elements (.2D)

128-bit vector of 32-bit elements (.4S)

128-bit vector of 16-bit elements (.8H)

128-bit vector of 8-bit elements (.16B)

.D

.D

[1]

[0]

.S

.S

.S

.S

[3]

[2]

[1]

[0]

.H

.H

.H

.H

.H

.H

.H

.H

[7]

[6]

[5]

[4]

[3]

[2]

[1]

[0]

.B

.B

.B

.B

.B

.B

.B

.B

.B

.B

.B

.B

.B

[15] [14] [13] [12] [11] [10] [9]

[8]

[7]

[6]

[5]

[4]

[3]

[2]

[1]

[0]

.B

.B

.B

63

48 47

32 31

0

16 15

Vn
.S

.S

[1]

[0]

64-bit vector of 32-bit elements (.2S)

.H

.H

.H

.H

[3]

[2]

[1]

[0]

64-bit vector of 16-bit elements (.4H)

64-bit vector of 8-bit elements (.8B)

.B

.B

.B

.B

.B

.B

.B

.B

[7]

[6]

[5]

[4]

[3]

[2]

[1]

[0]

Figure A1-1 SIMD vectors in AArch64 state

Vector formats in AArch32 state
Table A1-2 shows the available formats. Each instruction description specifies the data types that the instruction
supports.
Table A1-2 Advanced SIMD data types in AArch32
Data type specifier

Meaning

.

Any element of  bits

.F

Floating-point number of  bits

.I

Signed or unsigned integer of  bits

.P

Polynomial over {0, 1} of degree less than 

.S

Signed integer of  bits

.U

Unsigned integer of  bits

Polynomial arithmetic over {0, 1} on page A1-45 describes the polynomial data type.
The .F16 data type is the half-precision data type selected by the FPSCR.AHP bit.
The .F32 data type is the ARM standard single-precision floating-point data type, see Single-precision
floating-point format on page A1-42.
The instruction definitions use a data type specifier to define the data types appropriate to the operation. Figure A1-2
on page A1-39 shows the hierarchy of the Advanced SIMD data types.

A1-38

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

A1 Introduction to the ARMv8 Architecture
A1.4 Supported data types

.S8
.U8

.I8
.8

.P8
.S16
.U16

.I16
.16

.P16 †
.F16
.S32
.U32

.I32
.32

.F32
.S64
.U64

.I64
.64

.P64 ‡
-

† Output format only. See VMULL instruction description.
‡ Available only if the Cyptographic Extension is implemented.
See VMULL instruction description.

Figure A1-2 Advanced SIMD data type hierarchy in AArch32
For example, a multiply instruction must distinguish between integer and floating-point data types.
An integer multiply instruction that generates a double-width (long) result must specify the input data types as
signed or unsigned. However, some integer multiply instructions use modulo arithmetic, and therefore do not have
to distinguish between signed and unsigned inputs.
Figure A1-3 on page A1-40 shows the Advanced SIMD vectors in AArch32 state.

Note
In AArch32 state, a pair of even and following odd numbered doubleword registers can be concatenated and treated
as a single quadword register.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

A1-39

A1 Introduction to the ARMv8 Architecture
A1.4 Supported data types

127

112 111

96 95

80 79

64 63

48 47

32 31

16 15

0

Qn
128-bit vector of double-precision
(64-bit) elements
128-bit vector of single-precision
(32-bit) elements

128-bit vector of 16-bit elements

128-bit vector of 8-bit elements

.64

.64

[1]

[0]

.32

.32

.32

.32

[3]

[2]

[1]

[0]

.16

.16

.16

.16

.16

.16

.16

.16

[7]

[6]

[5]

[4]

[3]

[2]

[1]

[0]

.8

.8

.8

.8

.8

.8

.8

.8

.8

.8

.8

.8

.8

.8

.8

.8

[15] [14] [13] [12] [11] [10] [9]

[8]

[7]

[6]

[5]

[4]

[3]

[2]

[1]

[0]

48 47

63

32 31

16 15

0

Dn
64-bit vector of 32-bit elements

64-bit vector of 16-bit elements

64-bit vector of 8-bit elements

.32

.32

[1]

[0]

.16

.16

.16

.16

[3]

[2]

[1]

[0]

.8

.8

.8

.8

.8

.8

.8

.8

[7]

[6]

[5]

[4]

[3]

[2]

[1]

[0]

Figure A1-3 Advanced SIMD vectors in AArch32
The general-purpose registers support vector formatting in the AArch32 Execution state only, as shown in
Figure A1-4.
This means that a general-purpose register can be treated as either a two halfwords or four bytes.
31

24 23

16 15

8 7

0

Rn
32-bit general-purpose register
as a set of two halfwords

32-bit general-purpose register
as a set of four bytes

.16

.16

[1]

[0]

.8

.8

.8

.8

[3]

[2]

[1]

[0]

Figure A1-4 Vector formatting in AArch32

A1.4.2

Half-precision floating-point formats
ARMv8 supports two half-precision floating-point formats:
•
IEEE half-precision, as described in the IEEE 754-2008 standard
•
Alternative half-precision.

Note
Half-precision floating-point formats can only be converted to and from other floating-point formats. They cannot
be used in any other data processing operations. This applies to both AArch32 state and AArch64 state.

A1-40

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

A1 Introduction to the ARMv8 Architecture
A1.4 Supported data types

The description of IEEE half-precision includes ARM-specific details that are left open by the standard, and is only
an introduction to the formats and to the values they can contain. For more information, especially on the handling
of infinities, NaNs and signed zeros, see the IEEE 754 standard.
For both half-precision floating-point formats, the layout of the 16-bit format is the same. The format is:
15 14
S

10 9
exponent

0
fraction

The interpretation of the format depends on the value of the exponent field, bits[14:10] and on which half-precision
format is being used.
0 < exponent < 0x1F
The value is a normalized number and is equal to:
(–1)S × 2(exponent-15) × (1.fraction)
The minimum positive normalized number is 2–14, or approximately 6.104 × 10–5.
The maximum positive normalized number is (2 – 2–10) × 215, or 65504.
Larger normalized numbers can be expressed using the alternative format when the
exponent == 0x1F.
exponent == 0
The value is either a zero or a denormalized number, depending on the fraction bits:
fraction == 0
The value is a zero. There are two distinct zeros:
+0
when S==0
–0
when S==1.
fraction != 0
The value is a denormalized number and is equal to:
(–1)S × 2–14 × (0.fraction)
The minimum positive denormalized number is 2–24, or approximately 5.960 × 10–8.
exponent == 0x1F
The value depends on which half-precision format is being used:
IEEE half-precision
The value is either an infinity or a Not a Number (NaN), depending on the fraction bits:
fraction == 0
The value is an infinity. There are two distinct infinities:
+infinity

When S==0. This represents all positive numbers that are too
big to be represented accurately as a normalized number.

-infinity

When S==1. This represents all negative numbers with an
absolute value that is too big to be represented accurately as a
normalized number.

fraction != 0
The value is a NaN, and is either a quiet NaN or a signaling NaN.
The two types of NaN are distinguished by their most significant fraction
bit, bit[9]:
bit[9] == 0 The NaN is a signaling NaN. The sign bit can take any value,
and the remaining fraction bits can take any value except all
zeros.
bit[9] == 1 The NaN is a quiet NaN. The sign bit and remaining fraction
bits can take any value.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

A1-41

A1 Introduction to the ARMv8 Architecture
A1.4 Supported data types

Alternative half-precision
The value is a normalized number and is equal to:
-1S × 216 × (1.fraction)
The maximum positive normalized number is (2-2-10) × 216 or 131008.

A1.4.3

Single-precision floating-point format
The single-precision floating-point format is as defined by the IEEE 754 standard.
This description includes ARM-specific details that are left open by the standard. It is only intended as an
introduction to the formats and to the values they can contain. For full details, especially of the handling of infinities,
NaNs and signed zeros, see the IEEE 754 standard.
A single-precision value is a 32-bit word with the format:
31 30
S

23 22

0

exponent

fraction

The interpretation of the format depends on the value of the exponent field, bits[30:23]:
0 < exponent < 0xFF
The value is a normalized number and is equal to:
(–1)S × 2(exponent – 127) × (1.fraction)
The minimum positive normalized number is 2–126, or approximately 1.175 × 10–38.
The maximum positive normalized number is (2 – 2–23) × 2127, or approximately 3.403 × 1038.
exponent == 0
The value is either a zero or a denormalized number, depending on the fraction bits:
fraction == 0
The value is a zero. There are two distinct zeros:
+0
When S==0.
–0
When S==1.
These usually behave identically. In particular, the result is equal if +0 and –0 are
compared as floating-point numbers. However, they yield different results in some
circumstances. For example, the sign of the infinity produced as the result of dividing
by zero depends on the sign of the zero. The two zeros can be distinguished from each
other by performing an integer comparison of the two words.
fraction != 0
The value is a denormalized number and is equal to:
(–1)S × 2–126 × (0.fraction)
The minimum positive denormalized number is 2–149, or approximately 1.401 × 10–45.
Denormalized numbers are always flushed to zero in AArch32 Advanced SIMD processing. They
are optionally flushed to zero in floating-point processing and AArch64 SIMD. For details see
Flush-to-zero on page A1-49.
exponent == 0xFF
The value is either an infinity or a Not a Number (NaN), depending on the fraction bits:
fraction == 0
The value is an infinity. There are two distinct infinities:

A1-42

+infinity

When S==0. This represents all positive numbers that are too big to be
represented accurately as a normalized number.

-infinity

When S==1. This represents all negative numbers with an absolute value
that is too big to be represented accurately as a normalized number.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

A1 Introduction to the ARMv8 Architecture
A1.4 Supported data types

fraction != 0
The value is a NaN, and is either a quiet NaN or a signaling NaN.
The two types of NaN are distinguished by their most significant fraction bit, bit[22]:
bit[22] == 0
The NaN is a signaling NaN. The sign bit can take any value, and the
remaining fraction bits can take any value except all zeros.
bit[22] == 1
The NaN is a quiet NaN. The sign bit and remaining fraction bits can take
any value.
For details of the default NaN see NaN handling and the Default NaN on page A1-50.

Note
NaNs with different sign or fraction bits are distinct NaNs, but this does not mean software can use floating-point
comparison instructions to distinguish them. This is because the IEEE 754 standard specifies that a NaN compares
as unordered with everything, including itself.

A1.4.4

Double-precision floating-point format
The double-precision floating-point format is as defined by the IEEE 754 standard. Double-precision floating-point
is supported by both floating-point and SIMD instructions in AArch64 state, and only by floating-point instructions
in AArch32 state.
This description includes implementation-specific details that are left open by the standard. It is only intended as an
introduction to the formats and to the values they can contain. For full details, especially of the handling of infinities,
NaNs and signed zeros, see the IEEE 754 standard.
A double-precision value is a 64-bit doubleword, with the format:
63 62
S

52 51

32 31

exponent

0
fraction

Double-precision values represent numbers, infinities and NaNs in a similar way to single-precision values, with
the interpretation of the format depending on the value of the exponent:
0 < exponent < 0x7FF
The value is a normalized number and is equal to:
(–1)S × 2(exponent–1023) × (1.fraction)
The minimum positive normalized number is 2–1022, or approximately 2.225 × 10–308.
The maximum positive normalized number is (2 – 2–52) × 21023, or approximately 1.798 × 10308.
exponent == 0
The value is either a zero or a denormalized number, depending on the fraction bits:
fraction == 0
The value is a zero. There are two distinct zeros that behave in the same way as the two
single-precision zeros:
+0
when S==0
–0
when S==1.
fraction != 0
The value is a denormalized number and is equal to:
(-1)S × 2–1022 × (0.fraction)
The minimum positive denormalized number is 2–1074, or approximately 4.941 × 10–324.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

A1-43

A1 Introduction to the ARMv8 Architecture
A1.4 Supported data types

Optionally, denormalized numbers are flushed to zero in floating-point calculations. For details see
Flush-to-zero on page A1-49.
exponent == 0x7FF
The value is either an infinity or a NaN, depending on the fraction bits:
fraction == 0
the value is an infinity. As for single-precision, there are two infinities:
+infinity When S==0.
-infinity When S==1.
fraction != 0
The value is a NaN, and is either a quiet NaN or a signaling NaN.
The two types of NaN are distinguished by their most significant fraction bit, bit[19] of
the most significant word:
bit[19] == 0
The NaN is a signaling NaN. The sign bit can take any value, and the
remaining fraction bits can take any value except all zeros.
bit[19] == 1
The NaN is a quiet NaN. The sign bit and the remaining fraction bits can
take any value.
For details of the default NaN see NaN handling and the Default NaN on page A1-50.

Note
NaNs with different sign or fraction bits are distinct NaNs, but this does not mean software can use floating-point
comparison instructions to distinguish them. This is because the IEEE 754 standard specifies that a NaN compares
as unordered with everything, including itself.

A1.4.5

Fixed-point format
Fixed-point formats are used only for conversions between floating-point and fixed-point values. They apply to
general-purpose registers.
Fixed-point values can be signed or unsigned, and can be 16-bit or 32-bit. Conversion instructions take an argument
that specifies the number of fraction bits in the fixed-point number. That is, it specifies the position of the binary
point.

A1.4.6

Conversion between floating-point and fixed-point values
ARMv8 supports the conversion of a scalar floating-point to or from a signed or unsigned fixed-point value in a
general-purpose register.
The instruction argument #fbits indicates that the general-purpose register holds a fixed-point number with fbits bits
after the binary point, where fbits is in the range 1 to 64 for a 64-bit general-purpose register, or 1 to 32 for a 32-bit
general-purpose register.
More specifically:
•
For a 64-bit register Xd:

•

A1-44

—

The integer part is Xd[63:#fbits].

—

The fractional part is Xd[(#fbits-1):0].

For a 32-bit register Wd or Rd:
—

The integer part is Wd[31:#fbits] or Rd[31:#fbits].

—

The fractional part is Wd[(#fbits-1):0] or Rd[(#fbits-1):0].

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

A1 Introduction to the ARMv8 Architecture
A1.4 Supported data types

These instructions might generate the following exceptions:
Invalid Operation

When the floating-point input is NaN or Infinity or when a numerical value cannot be
represented within the destination register.

Inexact

When the numeric result differs from the input.

Input Denormal

When flush-to-zero mode is enabled and the denormal input is replaced by a zero.

Note
An out of range fixed-point result is saturated to the destination size.

A1.4.7

Polynomial arithmetic over {0, 1}
Some SIMD instructions that operate on SIMD and floating-point registers can operate on polynomials over {0, 1},
see Supported data types on page A1-36. The polynomial data type represents a polynomial in x of the form bn–1xn–1
+ … + b1x + b0 where bk is bit[k] of the value.
The coefficients 0 and 1 are manipulated using the rules of Boolean arithmetic:
•
0+0=1+1=0
•
0+1=1+0=1
•
0×0=0×1=1×0=0
•
1 × 1 = 1.
That is:
•

Adding two polynomials over {0, 1} is the same as a bitwise exclusive OR.

•

Multiplying two polynomials over {0, 1} is the same as integer multiplication except that partial products are
exclusive-ORed instead of being added.

A64, A32 and T32 provide instructions for performing polynomial multiplication of 8-bit values. For AArch32, see
VMUL, VMULL (integer and polynomial) on page F8-3236. For AArch64 see PMUL on page C6-1095 and
PMULL, PMULL2 on page C6-1096.
The Cryptographic Extension adds the ability to perform long polynomial multiplies of 64-bit values. See PMULL,
PMULL2 on page C6-1096.

Pseudocode details of polynomial multiplication
In pseudocode, polynomial addition is described by the EOR operation on bitstrings.
Polynomial multiplication is described by the PolynomialMult() function:
// PolynomialMult()
// ================
bits(M+N) PolynomialMult(bits(M) op1, bits(N) op2)
result = Zeros(M+N);
extended_op2 = ZeroExtend(op2, M+N);
for i=0 to M-1
if op1 == ‘1’ then
result = result EOR LSL(extended_op2, i);
return result;

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

A1-45

A1 Introduction to the ARMv8 Architecture
A1.5 Floating-point and Advanced SIMD support

A1.5

Floating-point and Advanced SIMD support
Note
In AArch32 state, the SIMD instructions that operate on SIMD and floating-point registers are always described as
the Advanced SIMD instructions, to distinguish them from the SIMD instructions in the base instruction sets, that
operate on the 32-bit general-purpose registers. The A64 instruction set does not provide any SIMD instructions that
operate on the general-purpose registers, and therefore some AArch64 state descriptions use SIMD as a synonym
for Advanced SIMD. Unless the context clearly indicates otherwise, this section describes the support for SIMD
instructions that operate on SIMD and floating-point registers.
ARMv8 can support the following levels of support for floating-point and Advanced SIMD instructions:
•

Full floating-point and SIMD support without exception trapping.

•

Full floating-point and SIMD support with exception trapping.

•

No floating-point or SIMD support. This option is licensed only for implementations targeting specialised
markets.

Note
All systems that support standard operating systems with rich application environments provide hardware
support for floating-point and Advanced SIMD. It is a requirement of the ARM Procedure Call Standard for
AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.
ARMv8 supports single-precision (32-bit) and double-precision (64-bit) floating-point data types and arithmetic as
defined by the IEEE 754 floating-point standard. It also supports the half-precision (16-bit) floating-point data type
for data storage only, by supporting conversions between single-precision and half-precision data types and
double-precision and half-precision data types.
The SIMD instructions provide packed Single Instruction Multiple Data (SIMD) and single-element scalar
operations, and support:
•
Single-precision and double-precision arithmetic in AArch64 state.
•
Single-precision arithmetic only in AArch32 state.
Floating-point support in AArch64 state SIMD is IEEE 754-2008 compliant with:
•
Configurable rounding modes.
•
Configurable Default NaN behavior.
•
Configurable Flush-to-zero behavior.
Floating-point computation using AArch32 Advanced SIMD instructions remains unchanged from ARMv7. A32
and T32 Advanced SIMD floating-point always uses ARM standard floating-point arithmetic and performs
IEEE 754 floating-point arithmetic with the following restrictions:
•
Denormalized numbers are flushed to zero, see Flush-to-zero on page A1-49.
•
Only default NaNs are supported, see NaN handling and the Default NaN on page A1-50.
•
The Round to Nearest rounding mode is used.
•
Untrapped exception handling is used for all floating-point exceptions.
ARMv8 introduces new instructions for AArch32 state:

A1-46

•

Floating-point selection, see VSEL on page F8-3336.

•

Floating-point maximum and minimum numbers, see VMAXNM, VMINNM on page F8-3206.

•

Floating-point integer conversions with directed rounding modes, see VCVTA, VCVTN, VCVTP, VCVTM
(between floating-point and integer, Advanced SIMD) on page F8-3152 and VCVTA, VCVTN, VCVTP,
VCVTM (between floating-point and integer, floating-point) on page F8-3154.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

A1 Introduction to the ARMv8 Architecture
A1.5 Floating-point and Advanced SIMD support

•

Floating-point round to integral floating-point, see VRINTA, VRINTN, VRINTP, VRINTM (Advanced SIMD)
on page F8-3310, VRINTA, VRINTN, VRINTP, VRINTM (floating-point) on page F8-3312, VRINTX
(Advanced SIMD) on page F8-3314, VRINTX (floating-point) on page F8-3316, VRINTZ (Advanced SIMD)
on page F8-3318 and VRINTZ, VRINTR (floating-point) on page F8-3320.

•

Floating-point conversions between half-precision and double-precision, see VCVTB, VCVTT on
page F8-3156.

If trapping is supported, Floating-point exceptions, such as overflow or division by zero, can be handled without
trapping. This applies to both floating-point and SIMD operations. When handled in this way, a Floating-point
exception causes a cumulative status register bit to be set to 1 and a default result to be produced by the operation.
For more information about Floating-point exceptions, see Supported data types on page A1-36.
In AArch64 state, the following registers control floating-point operation and return floating-point status
information:
•

•

The Floating-Point Control Register, FPCR, controls:
—

The half-precision format where applicable, FPCR.AHP bit.

—

Default NaN behavior, FPCR.DN bit.

—

Flush to zero behavior, FPCR.FZ bit.

—

Rounding mode support, FPCR.Rmode field.

—

Optional LEN and STRIDE fields associated with AArch32 execution, only supported for a context
save and restore in AArch64. These fields are obsolete in ARMv8 and are either RAZ/WI or, when
nonzero, cause an UNDEFINED instruction trap when an affected AArch32 instruction is executed.

—

Optional exception trap controls, the FPCR.{IDE, IXE, UFE, OFE, DZE, IOE} bits, see
Floating-point Exception traps on page D1-1454.

The Floating-Point Status Register, FPSR, provides:
—

Cumulative flags, FPSR.{IDC, IXC, UFC, OFC, DZC, IOC and QC}.

—

TheAArch32 floating-point comparison flags {N,Z,C,V}. These bits are RES0 if AArch32
floating-point is not supported.

Note
In AArch64, the process state flags, PSTATE.{N,Z,C,V} are used for all data processing compares and
any associated conditional execution.
AArch32 state provides a single Floating-Point Status and Control Register, FPSCR, combining the FPCR and
FPSR fields.
For system level information about the SIMD and floating-point support, see Advanced SIMD and floating-point
support on page G1-3494.

A1.5.1

Instruction support
The floating-point and SIMD support includes the following types of instructions:
•

Load and store for single elements and vectors of multiple elements.

Note
Single elements are also referred to as scalar elements.

ARM DDI 0487A.a
ID090413

•

Data processing on single and multiple elements for both integer and floating-point data types.

•

Floating-point conversion:
—

Half-precision, single-precision, and double-precision conversions.

—

Single-precision, double-precision, and fixed point integer conversions.

—

Single-precision, double-precision, and integer conversions.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

A1-47

A1 Introduction to the ARMv8 Architecture
A1.5 Floating-point and Advanced SIMD support

•

Floating-point rounding.

For more information on the floating-point and SIMD instructions in AArch64 state, see Chapter C2 A64
Instruction Set Overview.
For more information on the floating-point and Advanced SIMD instructions in AArch32 state, see Chapter F5 T32
and A32 Instruction Sets Advanced SIMD and floating-point Encodings

A1.5.2

Floating-point standards, and terminology
The ARM includes support for all the required features of ANSI/IEEE Std 754-2008, IEEE Standard for Binary
Floating-Point Arithmetic, referred to as IEEE 754-2008. However, some terms in this manual are based on the
1985 version of this standard, referred to as IEEE 754-1985:
•

ARM floating-point terminology generally uses the IEEE 754-1985 terms. This section summarizes how
IEEE 754-2008 changes these terms.

•

References to IEEE 754 that do not include the issue year apply to either issue of the standard.

Table A1-3 shows how the terminology in this manual differs from that used in IEEE 754-2008.
Table A1-3 Floating-point terminology
This manual

IEEE 754-2008

Normalized a

Normal

Denormal, or denormalized

Subnormal

Round towards Minus Infinity (RM)

roundTowardsNegative

Round towards Plus Infinity (RP)

roundTowardsPositive

Round towards Zero (RZ)

roundTowardZero

Round to Nearest (RN)

roundTiesToEven

Round to Nearest with Ties to Away

roundTiesToAway

Rounding mode

Rounding-direction attribute

a. Normalized number is used in preference to normal number, because of the other
specific uses of normal in this manual.

A1.5.3

ARM standard floating-point input and output values
ARMv8 provides full IEEE 754 floating-point arithmetic support. In AArch32, floating-point operations performed
using Advanced SIMD instructions are limited to ARM standard floating-point operation, regardless of the selected
rounding mode in the FPSCR. Unlike AArch32, AArch64 SIMD floating point arithmetic is performed using the
rounding mode selected by the FPCR.
ARM standard floating-point arithmetic supports the following input formats defined by the IEEE 754
floating-point standard:
•
Zeros.
•
Normalized numbers.
•
Denormalized numbers are flushed to 0 before floating-point operations, see Flush-to-zero on page A1-49.
•
NaNs.
•
Infinities.
ARM standard floating-point arithmetic supports the Round to Nearest rounding mode defined by the IEEE 754
standard.

A1-48

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

A1 Introduction to the ARMv8 Architecture
A1.5 Floating-point and Advanced SIMD support

ARM standard floating-point arithmetic supports the following output result formats defined by the IEEE 754
standard:

A1.5.4

•

Zeros.

•

Normalized numbers.

•

Results that are less than the minimum normalized number are flushed to zero, see Flush-to-zero.

•

NaNs produced in floating-point operations are always the default NaN, see NaN handling and the Default
NaN on page A1-50.

•

Infinities.

Flush-to-zero
The performance of floating-point processing can be reduced when doing calculations involving denormalized
numbers and Underflow exceptions. In many algorithms, this performance can be recovered, without significantly
affecting the accuracy of the final result, by replacing the denormalized operands and intermediate results with
zeros. To permit this optimization, ARM floating-point implementations have a special processing mode called
Flush-to-zero mode. AArch32 Advanced SIMD floating-point instructions always use Flush-to-zero mode.
Behavior in Flush-to-zero mode differs from normal IEEE 754 arithmetic in the following ways:
•

All inputs to floating-point operations that are double-precision denormalized numbers or single-precision
denormalized numbers are treated as though they were zero. This causes an Input Denormal exception, but
does not cause an Inexact exception. The Input Denormal exception occurs only in Flush-to-zero mode.
In AArch32, the FPSCR contains a cumulative exception bit FPSCR.IDC and optional trap enable bit
FPSCR.IDE corresponding to Input Denormal exception.
In AArch64 the FPSR contains a cumulative exception bit FPSR.IDC and optional trap enable bit FPCR.IDE
corresponding to the Input Denormal exception.
The occurrence of all exceptions except Input Denormal is determined using the input values after
flush-to-zero processing has occurred.

•

The result of a floating-point operation is flushed to zero if the result of the operation before rounding
satisfies the condition:
0 < Abs(result) < MinNorm, where:
—

MinNorm is 2-126 for single-precision

—

MinNorm is 2-1022 for double-precision.

This causes the FPSR.UFC bit to be set to 1, and prevents any Inexact exception from occurring for the
operation.
Underflow exceptions occur only when a result is flushed to zero.
In all implementations Underflow exceptions that occur in Flush-to-zero mode are always treated as
untrapped, even when the Underflow trap enable bit, FPCR.UFE, is set to 1.
•

An Inexact exception does not occur if the result is flushed to zero, even though the final result of zero is not
equivalent to the value that would be produced if the operation were performed with unbounded precision
and exponent range.

When an input or a result is flushed to zero the value of the sign bit of the zero is preserved. That is, the sign bit of
the zero matches the sign bit of the input or result that is being flushed to zero.
Flush-to-zero mode has no effect on half-precision numbers that are inputs to floating-point operations, or results
from floating-point operations.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

A1-49

A1 Introduction to the ARMv8 Architecture
A1.5 Floating-point and Advanced SIMD support

Note
Flush-to-zero mode is incompatible with the IEEE 754 standard, and must not be used when IEEE 754 compatibility
is a requirement. Flush-to-zero mode must be used with care. Although it can improve performance on some
algorithms, there are significant limitations on its use. These are application dependent:

A1.5.5

•

On many algorithms, it has no noticeable effect, because the algorithm does not normally use denormalized
numbers.

•

On other algorithms, it can cause exceptions to occur or seriously reduce the accuracy of the results of the
algorithm.

NaN handling and the Default NaN
The IEEE 754 standard specifies that:
•

an operation that produces an Invalid Operation floating-point exception generates a quiet NaN as its result
if that exception is untrapped

•

an operation involving a quiet NaN operand, but not a signaling NaN operand, returns an input NaN as its
result.

The floating-point processing behavior when Default NaN mode is disabled adheres to this, with the following
additions:
•

•

If an untrapped Invalid Operation floating-point exception is produced, the quiet NaN result is derived from:
—

the first signaling NaN operand, if the exception was produced because at least one of the operands is
a signaling NaN

—

otherwise, the default NaN

If an untrapped Invalid Operation floating-point exception is not produced, but at least one of the operands
is a quiet NaN, the result is derived from the first quiet NaN operand.

Depending on the operation, the exact value of a derived quiet NaN result may differ in both sign and number of
fraction bits from its source.For a quiet NaN result derived from signaling NaN operand, the most-significant
fraction bit is set to 1.

Note
•

In these descriptions, first operand relates to the left-to-right ordering of the arguments to the pseudocode
function that describes the operation.

•

The IEEE 754 standard specifies that the sign bit of a NaN has no significance.

The floating-point and SIMD processing behavior when Default NaN mode is enabled is that the Default NaN is
the result of all floating-point operations that either:
•
generate untrapped Invalid Operation floating-point exceptions
•
have one or more quiet NaN inputs, but no signaling NaN inputs.
Table A1-4 on page A1-51 shows the format of the default NaN for ARM floating-point operations.
Default NaN mode is selected for the floating-point processing by setting the FPCR.DN bit to 1.
Other aspects of the functionality of the Invalid Operation exception are not affected by Default NaN mode. These
are that:
•
If untrapped, it causes the FPSR.IOC bit be set to 1.
•
If trapped, it causes a user trap handler to be invoked.

A1-50

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

A1 Introduction to the ARMv8 Architecture
A1.5 Floating-point and Advanced SIMD support

Table A1-4 Default NaN encoding

ARM DDI 0487A.a
ID090413

Half-precision, IEEE Format

Single-precision

Double-precision

Sign bit

0

0

0

Exponent

0x1F

0xFF

0x7FF

Fraction

Bit[9] == 1, bits[8:0] == 0

bit[22] == 1, bits[21:0] == 0

bit[51] == 1, bits[50:0] == 0

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

A1-51

A1 Introduction to the ARMv8 Architecture
A1.6 Cryptographic Extension

A1.6

Cryptographic Extension
The presence of this extension in an implementation is subject to export license controls. The Cryptographic
Extension is an extension of the SIMD support and operates on the vector register file. It provides instructions for
the acceleration of encryption and decryption to support the following:
•
AES
•
SHA1
•
SHA2-256
Large polynomial multiplies are included as part of the Cryptographic Extension, see PMULL, PMULL2 on
page C6-1096.

A1-52

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

A1 Introduction to the ARMv8 Architecture
A1.7 The ARM memory model

A1.7

The ARM memory model
The ARM memory model supports:
•
Generating an exception on an unaligned memory access.
•
Restricting access by applications to specified areas of memory.
•
Translating virtual addresses provided by executing instructions into physical addresses.
•
Altering the interpretation of multi-byte data between big-endian and little-endian.
•
Controlling the order of accesses to memory.
•
Controlling caches and address translation structures.
•
Synchronizing access to shared memory by multiple PEs.
Virtual address (VA) support depends on the Execution state, as follows:
AArch64 state
Supports 64-bit virtual addressing, with the Translation Control Register determining the supported
VA range. Execution at EL1 and EL0 supports two independent VA ranges, each with its own
translation controls.
AArch32 state
Supports 32-bit virtual addressing, with the Translation Control Register determining the supported
VA range. For execution at EL1 and EL0, system software can split the VA range into two
subranges, each with its own translation controls.
The supported physical address space is IMPLEMENTATION DEFINED, and can be discovered by system software.
Regardless of the Execution state, the Virtual Memory System Architecture (VMSA) can translate VAs to blocks or
pages of memory anywhere within the supported physical address space.
For more information, see:
For execution in AArch64 state
•
Chapter B2 The AArch64 Application Level Memory Model.
•
Chapter D4 The AArch64 System Level Memory Model.
•
Chapter D5 The AArch64 Virtual Memory System Architecture.
For execution in AArch32 state
•
Chapter E2 The AArch32 Application Level Memory Model.
•
Chapter G2 The AArch32 System Level Memory Model.
•
Chapter G3 The AArch32 Virtual Memory System Architecture.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

A1-53

A1 Introduction to the ARMv8 Architecture
A1.7 The ARM memory model

A1-54

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

Part B
The AArch64 Application Level Architecture

Chapter B1
The AArch64 Application Level Programmers’ Model

This chapter gives an application level view of the ARM programmers’ model. It contains the following sections:
•
About the Application level programmers’ model on page B1-58.
•
Registers in AArch64 Execution state on page B1-59.
•
Software control features and EL0 on page B1-65.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

B1-57

B1 The AArch64 Application Level Programmers’ Model
B1.1 About the Application level programmers’ model

B1.1

About the Application level programmers’ model
This chapter contains the programmers’ model information required for application development.
The information in this chapter is distinct from the system information required to service and support application
execution under an operating system, or higher level of system software. However, some knowledge of the system
information is needed to put the Application level programmers' model into context.
Depending on the implementation choices, the architecture supports multiple levels of execution privilege,
indicated by different Exception levels that number upwards from EL0 to EL3. EL0 corresponds to the lowest
privilege level and is often described as unprivileged. The Application level programmers’ model is the
programmers’ model for software executing at EL0. For more information see Exception levels on page D1-1408.
System software determines the Exception level, and therefore the level of privilege, at which software runs. When
an operating system supports execution at both EL1 and EL0, an application usually runs unprivileged at EL0. This:
•

Permits the operating system to allocate system resources to an application in a unique or shared manner.

•

Provides a degree of protection from other processes, and so helps protect the operating system from
malfunctioning software.

This chapter indicates where some system level understanding is necessary, and where relevant it gives a reference
to the system level description.
Execution at any Exception level above EL0 is often referred to as privileged execution.
For more information on the system level view of the architecture refer to Chapter D1 The AArch64 System Level
Programmers’ Model.

B1-58

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

B1 The AArch64 Application Level Programmers’ Model
B1.2 Registers in AArch64 Execution state

B1.2

Registers in AArch64 Execution state
This section describes the registers and process state visible at EL0 when executing in the AArch64 state. It includes
the following:
•
Registers in AArch64 state
•
Process state, PSTATE on page B1-63
•
System registers on page B1-63

B1.2.1

Registers in AArch64 state
In the AArch64 application level view, an ARM Processing element has:
R0-R30

31 general-purpose registers, R0 to R30. Each register can be accessed as:
•

A 64-bit general-purpose register named X0 to X30.

•

A 32-bit general-purpose register named W0 to W30.

See the register name mapping in Figure B1-1.
63

32 31

0

Rn
Wn
Xn

Figure B1-1 General-purpose register naming
The X30 general-purpose register is used as the procedure call link register.

Note
In instruction encodings, the value 0b11111 (31) is used to indicate the ZR (zero register). This
indicates that the argument takes the value zero, but does not indicate that the ZR is implemented
as a physical register.
SP

A 64-bit dedicated Stack Pointer register. The least significant 32-bits of the stack-pointer can be
accessed via the register name WSP.
The use of SP as an operand in an instruction, indicates the use of the current stack pointer.

Note
Stack pointer alignment to a 16-byte boundary is configurable at EL1. For more information see the
Procedure Call Standard for the ARM 64-bit Architecture.
PC

A 64-bit Program Counter holding the address of the current instruction.
Software cannot write directly to the PC. It can only be updated on a branch, exception entry or
exception return.

Note
Attempting to execute an A64 instruction that is not word-aligned generates an Alignment fault, see
PC alignment checking on page D1-1423.

V0-V31

ARM DDI 0487A.a
ID090413

32 SIMD and floating-point registers, V0 to V31. Each register can be accessed as:
•

A 128-bit register named Q0 to Q31.

•

A 64-bit register named D0 to D31.

•

A 32-bit register named S0 to S31.

•

A 16-bit register named H0 to H31.

•

An 8-bit register named B0 to B31.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

B1-59

B1 The AArch64 Application Level Programmers’ Model
B1.2 Registers in AArch64 Execution state

•

A 128-bit vector of elements.

•

A 64-bit vector of elements.

Where the number of bits described by a register name does not occupy an entire SIMD and
floating-point register, it refers to the least significant bits. See Figure B1-2.

127

64 63

32 31

16 15

8 7

0

Vn
Bn
Hn
Sn
Dn
Qn

Figure B1-2 SIMD and floating-point register naming
For more information about data types and vector formats, see Supported data types on page A1-36.
FPCR, FPSR Two SIMD and floating-point control and status registers, FPCR and FPSR.
See Registers for instruction processing and exception handling on page D1-1416 for more information on the
registers.

Pseudocode details of registers in AArch64 state
In the pseudocode functions that access registers:
•
The assignment form is used for register writes.
•
The non-assignment for register reads.
The uses of the X[] function are:
•
Reading or writing X0-X30, using n to index the required register.
•
Reading the zero register ZR, accessed as X[31].

Note
The pseudocode use of X[31] to represent the zero register does not indicate that hardware must implement this
register.
// X[] - assignment form
// =====================
// Write to general-purpose register from either a 32-bit and 64-bit value.

X[integer n] = bits(width) value
assert n >= 0 && n <= 31;
assert width IN {32,64};
if n != 31 then
_R[n] = ZeroExtend(value);
return;

// X[] - non-assignment form

B1-60

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

B1 The AArch64 Application Level Programmers’ Model
B1.2 Registers in AArch64 Execution state

// =========================
// Read from general-purpose register with implicit slice of 8, 16, 32 or 64 bits.

bits(width) X[integer n]
assert n >= 0 && n <= 31;
assert width IN {8,16,32,64};
if n != 31 then
return _R[n];
else
return Zeros(width);

The _R[] function provides a view of the physical array of the physical general-purpose registers.
array bits(64) _R[0..30];

The use of the SP[] function is reading or writing the current SP. This function has prototypes:
SP[] = bits(width) value;
bits(width) SP[];

The use of the PC[] function is reading the PC. This function has prototype:
bits(64) PC[];

The _V[] function provides a view of the physical array of the physical SIMD and floating-point registers.
array bits(128) _V[0..31];

The use of the V[] function is reading or writing V0-V31, using n to index the required register.
// V[] - assignment form
// =====================
// Write to SIMD&FP register with implicit extension from
// 8, 16, 32, 64 or 128 bits.

V[integer n] = bits(width) value
assert n >= 0 && n <= 31;
assert width IN {8,16,32,64,128};
_V[n] = ZeroExtend(value);
return;

// V[] - non-assignment form
// =========================
// Read from SIMD&FP register with implicit slice of 8, 16
// 32, 64 or 128 bits.

bits(width) V[integer n]

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

B1-61

B1 The AArch64 Application Level Programmers’ Model
B1.2 Registers in AArch64 Execution state

assert n >= 0 && n <= 31;
assert width IN {8,16,32,64,128};
return _V[n];

The use of the Vpart[] function is reading or writing the lower or upper half of V0-V31, using n to index the required
register, and part to indicate the required half.
// Vpart[] - non-assignment form
// =============================
// Read lower half of a SIMD&FP register with implicit slice
// of 8, 16, 32 or 64 bits, or read upper half as 64 bits.

bits(width) Vpart[integer n, integer part]
assert n >= 0 && n <= 31;
assert part IN {0, 1};
if part == 0 then
assert width IN {8,16,32,64};
return _V[n];
else
assert width == 64;
return _V[n]<127:64>;

// Vpart[] - assignment form
// =========================
// Write lower half of a SIMD&FP register with implicit extension
// from 8, 16, 32, or 64 bits, or write upper half from 64 bits.

Vpart[integer n, integer part] = bits(width) value
assert n >= 0 && n <= 31;
assert part IN {0, 1};
if part == 0 then
assert width IN {8,16,32,64};
_V[n] = ZeroExtend(value);
else
assert width == 64;
_V[n]<127:64> = value<63:0>;

B1-62

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

B1 The AArch64 Application Level Programmers’ Model
B1.2 Registers in AArch64 Execution state

B1.2.2

Process state, PSTATE
For AArch64, PSTATE holds process state related information. The following PSTATE information is accessible
at EL0
The Data processing flags
N

Negative condition flag. If the result is regarded as a two's complement signed integer,
then the PE sets N to 1 if the result is negative, and sets N to 0 if it is positive or zero.

Z

Zero condition flag. Set to 1 if the result of the instruction is zero, and to 0 otherwise. A
result of zero often indicates an equal result from a comparison.

C

Carry condition flag. Set to 1 if the instruction results in a carry condition, for example
an unsigned overflow that is the result of an addition.

V

Overflow condition flag. Set to 1 if the instruction results in an overflow condition, for
example a signed overflow that is the result of an addition.

The Exception masking bits
D

Debug exception mask bit. When EL0 is enabled to modify the mask bits, this bit is
visible and can be modified. However, this bit is architecturally ignored at EL0.

A

System error mask bit, referred to as an external asynchronous abort bit in the earlier
versions of the architecture.

I

IRQ mask bit.

F

FIQ mask bit.

The possible values of each bit are:
0

Exception not masked

1

Exception masked

See Process state, PSTATE on page D1-1421 for the system level view of PSTATE.

B1.2.3

System registers
System registers provide support for execution control, status and general system configuration. The majority of the
System registers are not accessible at EL0.
However, some system registers can be configured to allow access from software executing at EL0. Any access
from EL0 to a system register with the access right disabled causes the instruction to behave as an UNDEFINED
instruction. The registers that can be accessed from EL0 are:
Cache ID registers

The CTR_EL0 and DCZID_EL0 registers provide implementation parameters for EL0
cache management support.

Debug registers

A debug communications channel is supported by the MDCCSR_EL0, DBGDTR_EL0,
DBGDTRRX_EL0 and DBGDTRTX_EL0 registers.

Performance Monitors registers
See Performance Monitors support on page B1-64.

ARM DDI 0487A.a
ID090413

Thread registers

The TPIDR_EL0 and TPIDRRO_EL0 registers are two thread ID registers with different
access rights.

Timer registers

In ARMv8 the following operations are performed:
•

Read access to the system counter clock frequency using CNTFRQ_EL0.

•

Physical and virtual timer count registers, CNTPCT_EL0 and CNTVCT_EL0.

•

Physical up-count comparison, down-count value and timer control registers,
CNTP_CVAL_EL0, CNTP_TVAL_EL0, and CNTP_CTL_EL0.

•

Virtual up-count comparison, down-count value and timer control registers,
CNTV_CVAL_EL0, CNTV_TVAL_EL0, and CNTV_CTL_EL0.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

B1-63

B1 The AArch64 Application Level Programmers’ Model
B1.2 Registers in AArch64 Execution state

Performance Monitors support
The ARMv8 architecture defines optional Performance Monitors.
The basic form of the Performance Monitors is:
•

A 64-bit cycle counter.

•

Up to a maximum of 32 IMPLEMENTATION DEFINED event counters, where the number is identified by the
PMCR_EL0.N field.

•

System register access to the cycle counter and event registers, and related controls for:
—
Enabling and resetting counters.
—
Flagging overflows.
—
Generating interrupts on overflow.
Software can enable the cycle counter independently of the event counters.

Software executing at EL1 or a higher Exception level, for example an operating system, can enable access to the
counters from EL0. This allows an application to monitor its own performance with fine grain control without
requiring operating system support. For example, an application might implement per-function performance
monitoring.
For details on the features, configuration and control of the Performance Monitors, see Chapter D6 The
Performance Monitors Extension.
EL0 access to Performance Monitors
To allow application code to make use of the Performance Monitors, software executing at a higher Exception level
must set the following bits in the PMUSERENR_EL0 system register:
EN

When set to 1, access to all Performance Monitors registers is allowed at EL0, except for writes to
PMUSERENR_EL0, and reads/writes of PMINTENSET_EL1 and PMINTENCLR_EL1.

ER

When set to 1, read access to event counters is allowed at EL0. This includes read/write access to
PMSELR_EL0, so that the event counter to read through PMXEVCNTR_EL0 can be set.

CR

When set to 1, read access to PMCCNTR_EL0 is allowed at EL0.

SW

When set to 1, write access to PMSWINC_EL0 is allowed at EL0.

Note
Register PMUSERENR_EL0 is always read-only at EL0.

B1-64

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

B1 The AArch64 Application Level Programmers’ Model
B1.3 Software control features and EL0

B1.3

Software control features and EL0
The following sections describe the EL0 view of the ARMv8 software control features:
•
Exception handling
•
Wait for Interrupt and Wait for Event
•
The YIELD instruction
•
Application level cache management
•
Debug events on page B1-66

B1.3.1

Exception handling
In the ARM architecture, an exception causes a change of program flow. Execution of an exception handler starts,
at an Exception level higher than EL0, from a defined vector that relates to the exception taken.
Exceptions include:
•
Interrupts.
•
Memory system aborts.
•
Undefined instructions.
•
System calls.
•
Secure monitor or Hypervisor traps.
Most details of exception handling are not visible to application level software, and are described in Chapter D1 The
AArch64 System Level Programmers’ Model.
The SVC instruction causes a Supervisor Call exception. This provides a mechanism for unprivileged software to
make a system call to an operating system.

B1.3.2

Wait for Interrupt and Wait for Event
Issuing a WFI instruction indicates that no further execution is required until a WFI wake-up event occurs, see Wait
For Interrupt on page D1-1536. This permits entry to a low-power state.
Issuing a WFE instruction indicates that no further execution is required until a WFE wake-up event occurs, see Wait
for Event mechanism and Send event on page D1-1533. This permits entry to a low-power state.

B1.3.3

The YIELD instruction
The YIELD instruction provides a hint that the task performed by a thread is of low importance so that it could yield,
see YIELD on page C5-773. This mechanism can be used to improve overall performance in an Symmetric
Multi-Threading (SMT) or Symmetric Multi-Processing (SMP) system.
Examples of when the YIELD instruction might be used include a thread that is sitting in a spin-lock, or where the
arbitration priority of the snoop but in an SMP system is modified. The YIELD instruction permits binary
compatibility between SMT and SMP systems.
The YIELD instruction is a NOP (No Operation) hint instruction.
The YIELD instruction has no effect in a single-threaded system, but developers of such systems can use the
instruction to flag its intended use for future migration to a multiprocessor or multithreading system. Operating
systems can use YIELD in places where a yield hint is wanted, knowing that it will be treated as a NOP if there is no
implementation benefit.

B1.3.4

Application level cache management
A small number of cache management instructions can be enabled at EL0 from higher levels of privilege using the
SCTLR_EL1 system register. Any access from EL0 to an operation with the access right disabled causes the
instruction to behave as an UNDEFINED instruction.
About the available operations, see Application level cache instructions on page B2-72.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

B1-65

B1 The AArch64 Application Level Programmers’ Model
B1.3 Software control features and EL0

B1.3.5

Debug events
The debug logic is responsible for generating debug events. Most aspects of debug events are not visible to
application level software, and are described in Chapter H1 Introduction to External Debug. Aspects that are visible
to application level software include:
•
The BKPT instruction, which causes a BKPT instruction debug event to occur.
•
The DBG instruction, which provides a hint to the debug system.
•

B1-66

The HLT instruction, which causes entry to Debug state.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

Chapter B2
The AArch64 Application Level Memory Model

This chapter gives an application level view of the memory model. It contains the following sections:
•
Address space on page B2-68.
•
Memory type overview on page B2-69.
•
Caches and memory hierarchy on page B2-70.
•
Alignment support on page B2-75.
•
Endian support on page B2-76.
•
Atomicity in the ARM architecture on page B2-79.
•
Memory ordering on page B2-82.
•
Memory types and attributes on page B2-89.
•
Mismatched memory attributes on page B2-98.
•
Synchronization and semaphores on page B2-100.

Note
In this chapter, system register names usually link to the description of the register in Chapter D8 AArch64 System
Register Descriptions, for example SCTLR_EL1.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

B2-67

B2 The AArch64 Application Level Memory Model
B2.1 Address space

B2.1

Address space
Address calculations are performed using 64-bit registers. However, supervisory software can configure the top
eight address bits for use as a tag, as described in Address tagging in AArch64 state on page D5-1708. If this is done,
address bits[63:56]:
•
Are not considered when determining whether the address is valid.
•
Are never propagated to the program counter.
Supervisory software determines the valid address range. Attempting to access an address that is not valid generates
an MMU fault.
Address calculations are performed modulo 264.
The result of an address calculation is UNKNOWN if it overflows or underflows:
•
The 64-bit address range A[63:0], where tagged addressing is not used.
•
The 56-bit address range A[55:0], where tagged addressing is used.
Memory accesses use the Mem[] function.
The Mem[] function makes an access of the required type. If supervisory software configures the top eight address
bits for use as a tag, the top eight address bits are ignored.
bits(size*8) Mem[bits(64) address, integer size, AccType acctype]
assert size IN {1, 2, 4, 8, 16};

Mem[bits(64) address, integer size, AccType acctype] = bits(size*8) value;

The AccType enumeration defines the different access types:
enumeration AccType {AccType_NORMAL, AccType_VEC,

// Normal loads and stores

AccType_STREAM, AccType_VECSTREAM, // Streaming loads and stores
AccType_ATOMIC,

// Atomic loads and stores

AccType_ORDERED,

// Load-Acquire and Store-Release

AccType_UNPRIV,

// Load and store unprivileged

AccType_IFETCH,

// Instruction fetch

AccType_PTW,

// Page table walk

// Other operations
AccType_DC,

// Data cache maintenance

AccType_IC,

// Instruction cache maintenance

AccType_AT};

// Address translation

Note

B2-68

•

Chapter D4 The AArch64 System Level Memory Model and Chapter D5 The AArch64 Virtual Memory System
Architecture include descriptions of memory system features that are transparent to the application, including
memory access, address translation, memory maintenance instructions, and alignment checking and the
associated fault handling. These chapters also include pseudocode descriptions of these operations.

•

For information on the pseudocode that relates to memory accesses, see Basic memory access on
page D4-1698, Unaligned memory access on page D4-1699, and Aligned memory access on page D4-1698.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

B2 The AArch64 Application Level Memory Model
B2.2 Memory type overview

B2.2

Memory type overview
ARMv8 provides the following mutually-exclusive memory types:
Normal

This is generally used for bulk memory operations, both read-write and read-only operations.

Device

The ARM architecture forbids speculative reads of any type of Device memory. This means Device
memory types are suitable attributes for read-sensitive locations.
Locations of the memory map that are assigned to peripherals are usually assigned the Device
memory attribute.
Device memory has additional attributes that have the following effects:
•

They prevent aggregation of reads and writes, maintaining the number and size of the
specified memory accesses. See Gathering on page B2-94.

•

They preserve the access order and synchronization requirements, both for accesses to a
single peripheral and where there is a synchronization requirement on the observability of
one or more memory write and read accesses. See Reordering on page B2-95

•

They indicate whether a write can be acknowledged other than at the end point. See Early
Write Acknowledgement on page B2-96.

For more information on Normal memory and Device memory, see Memory types and attributes on page B2-89.

Note
Earlier versions of the ARM architecture defined a single Device memory type and a Strongly-Ordered memory
type. A Note in Device memory on page B2-92 describes how these memory types map onto the ARMv8 memory
types.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

B2-69

B2 The AArch64 Application Level Memory Model
B2.3 Caches and memory hierarchy

B2.3

Caches and memory hierarchy
The implementation of a memory system depends heavily on the microarchitecture and therefore many details of
the memory system are IMPLEMENTATION DEFINED. ARMv8 defines the application level interface to the memory
system, including a hierarchical memory system with multiple levels of cache. This section describes an application
level view of this system. It contains the subsections:
•
Introduction to caches.
•
Memory hierarchy.
•
Application level cache instructions on page B2-72
•
Implication of caches for the application programmer on page B2-72.
•
Preloading caches on page B2-73.

B2.3.1

Introduction to caches
A cache is a block of high-speed memory that contains a number of entries, each consisting of:
•
Main memory address information, commonly known as a tag.
•
The associated data.
Caches increase the average speed of a memory access. Caching takes account of two principles of locality:
Spatial locality
An access to one location is likely to be followed by accesses to adjacent locations. Examples of this
principle are:
•
Sequential instruction execution.
•
Accessing a data structure.
Temporal locality
An access to an area of memory is likely to be repeated in a short time period. An example of this
principle is the execution of a software loop.
To minimize the quantity of control information stored, the spatial locality property groups several locations
together under the same tag. This logical block is commonly known as a cache line. When data is loaded into a
cache, access times for subsequent loads and stores are reduced, resulting in overall performance benefits. An access
to information already in a cache is known as a cache hit, and other accesses are called cache misses.
Normally, caches are self-managing, with the updates occurring automatically. Whenever the PE accesses a
cacheable memory location, the cache is checked. If the access is a cache hit, the access occurs in the cache.
Otherwise, the access is made to memory. Typically, when making this access, a cache location is allocated and the
cache line loaded from memory. ARMv8 permits different cache topologies and access policies, provided they
comply with the memory coherency model described in this manual.
Caches introduce a number of potential problems, mainly because:
•
Memory accesses can occur at times other than when the programmer would expect them.
•
A data item can be held in multiple physical locations.

B2.3.2

Memory hierarchy
Typically memory close to a PE has very low latency, but is limited in size and expensive to implement. Further
from the PE it is common to implement larger blocks of memory but these have increased latency. To optimize
overall performance, an ARMv8 memory system can include multiple levels of cache in a hierarchical memory
system that exploits this trade-off between size and latency. Figure B2-1 on page B2-71 shows an example of such
a system in an ARMv8-A system that supports virtual addressing.

B2-70

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

B2 The AArch64 Application Level Memory Model
B2.3 Caches and memory hierarchy

Processing
Element
Address
translation

Physical address

Virtual
address
System configuration
and control
PE,
AArch64 state

Level 1
Cache

Level 2
Cache

Instruction
fetch

X30

DRAM
SRAM
Flash
ROM

Data

X0

Level 3

Level 4
for example,
memory card,
disk

Figure B2-1 Multiple levels of cache in a memory hierarchy

Note
In this manual, in a hierarchical memory system, Level 1 refers to the level closest to the Processing Element, as
shown in Figure B2-1.
Instructions and data can be held in separate caches or in a unified cache. A cache hierarchy can have one or more
levels of separate instruction and data caches, with one or more unified caches located at the levels closest to the
main memory. Memory coherency for cache topologies can be defined by two conceptual points:
Point of Unification (PoU)
The point at which the instruction cache, data cache, and translation table walks of a particular PE
are guaranteed to see the same copy of a memory location. In many cases, the point of unification
is the point in a uniprocessor memory system by which the instruction and data caches and the
translation table walks have merged. The point of unification might coincide with the point of
coherency.
Point of Coherency (PoC)
The point at which all agents that can access memory are guaranteed to see the same copy of a
memory location. In many cases this is effectively the main system memory, although the
architecture does not prohibit the implementation of caches beyond the PoC that have no effect on
the coherency between memory system agents.
See also Cache maintenance operations on page D4-1680.

The cacheability and shareability memory attributes
Cacheability and shareability are two attributes that describe the memory hierarchy in a multiprocessing system:
Cacheability This term defines whether memory locations are allowed to be allocated into a cache or not.
Cacheability can be defined independently for Inner and Outer cacheability locations.
Shareability

This term defines whether memory locations are shareable between different agents in a system.
Marking a memory location as shareable for a particular domain requires hardware to ensure that
the location is coherent for all agents in that domain. Shareability can be defined independently for
Inner and Outer shareability domains.

For more information about cacheability and shareability see Memory types and attributes on page B2-89.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

B2-71

B2 The AArch64 Application Level Memory Model
B2.3 Caches and memory hierarchy

B2.3.3

Application level cache instructions
In the ARM architecture, the application level is defined as Exception level 0 (EL0). The architecture defines a set
of cache maintenance instructions that software can use to manage cache coherency. Software executing at a higher
Exception level can enable EL0 access to the following:
•

The data cache maintenance instructions, DC CVAU, DC CVAC, and DC CIVAC. See Data cache maintenance
instructions (DC*) on page D4-1685.

•

The instruction cache maintenance instruction, IC IVAU. See Instruction cache maintenance instructions
(IC*) on page D4-1684.

•

The cache type register. See CTR_EL0.

•

The data cache zero instruction, DC ZVA. See Data cache zero instruction on page D4-1690.

These instructions are UNDEFINED from EL0 unless software executing at a higher Exception level has enabled
them. See Cache maintenance instructions on page D4-1684.
For all of these instructions, if the addresses do not have read access permission at EL0, executing these instructions
at EL0 generates a Permission fault.
For more information about the system controls, see Cache support on page D4-1675.

B2.3.4

Implication of caches for the application programmer
In normal operation, the caches are largely invisible to the application programmer. However they can become
visible when there is a breakdown in the coherency of the caches. Such a breakdown can occur:
•
When memory locations are updated by other agents in the system that do not use hardware management of
coherency.
•
When memory updates made from the application software must be made visible to other agents in the
system, without the use of hardware management of coherency.
For example:
•

In the absence of hardware management of coherency of DMA accesses, in a system with a DMA controller
that reads memory locations that are held in the data cache of a PE, a breakdown of coherency occurs when
the PE has written new data in the data cache, but the DMA controller reads the old data held in memory.

•

In a Harvard cache implementation, where there are separate instruction and data caches, a breakdown of
coherency occurs when new instruction data has been written into the data cache, but the instruction cache
still contains the old instruction data.

Data coherency issues
Software can ensure the data coherency of caches in the following ways:

B2-72

•

By not using the caches in situations where coherency issues can arise. This can be achieved by:
—
Using Non-cacheable or, in some cases, Write-Through Cacheable memory.
—
Not enabling caches in the system.

•

By using cache maintenance instructions to manage the coherency issues in software. See Application level
cache instructions.

•

By using hardware coherency mechanisms to ensure the coherency of data accesses to memory for cacheable
locations by observers within the different shareability domains, see Non-shareable Normal memory on
page B2-91 and Shareable, Inner Shareable, and Outer Shareable Normal memory on page B2-90.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

B2 The AArch64 Application Level Memory Model
B2.3 Caches and memory hierarchy

Note
The performance of these hardware coherency mechanisms is highly implementation-specific. In some
implementations the mechanism suppresses the ability to cache shareable locations. In other
implementations, cache coherency hardware can hold data in caches while managing coherency between
observers within the shareability domains.

Note
Not all these mechanisms are directly available to software operating at EL0 and might involve interaction with
software operating at a higher Exception level.

Synchronization and coherency issues between data and instruction accesses
How far ahead of the current point of execution instructions are fetched from is IMPLEMENTATION DEFINED. Such
prefetching can be either a fixed or a dynamically varying number of instructions, and can follow any or all possible
future execution paths. For all types of memory:
•

The PE might have fetched the instructions from memory at any time since the last Context synchronization
operation on that PE.

•

Any instructions fetched in this way might be executed multiple times, if this is required by the execution of
the program, without being re-fetched from memory.

The ARM architecture does not require the hardware to ensure coherency between instruction caches and memory,
even for locations of shared memory. This means that for cacheable locations of memory, an instruction cache can
hold instructions that were fetched from memory before any Context synchronization operation.
If software requires coherency between instruction execution and memory, it must manage this coherency using the
ISB and DSB memory barriers and cache maintenance instructions. The following code sequence can be used for this
purpose:
; Coherency example for data and instruction accesses within the same Inner Shareable domain.
; Enter this code with  containing a new 32-bit instruction,
; to be held in non-cacheable space at a location pointed to by Xn.
STR

B2.3.5

, [Xn]

DC CVAU, Xn

; Clean data cache by VA to point of unification (PoU)

DSB ISH

; Ensure visibility of the data cleaned from cache

IC IVAU, Xn

; Invalidate instruction cache by VA to PoU

DSB ISH

; Ensure completion of the invalidations

ISB

; Synchronize the fetched instruction stream

Preloading caches
The ARM architecture provides memory system hints PRFM, LDNP, and STNP that software can use to communicate
the expected use of memory locations to the hardware. The memory system can respond by taking actions that are
expected to speed up the memory accesses if they occur. The effect of these memory system hints is
IMPLEMENTATION DEFINED. Typically, implementations use this information to bring the data or instruction
locations into caches.
The Preload instructions are hints, and so implementations can treat them as NOPs without affecting the functional
behavior of the device. The instructions cannot generate synchronous Data Abort exceptions, but the resulting
memory system operations might, under exceptional circumstances, generate an asynchronous external abort, which
is taken using an SError interrupt exception. For more information, see Exception from a Data abort on
page D1-1525.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

B2-73

B2 The AArch64 Application Level Memory Model
B2.3 Caches and memory hierarchy

PrefetchHint{} defines the prefetch hint types:
enumeration PrefetchHint {Prefetch_READ, Prefetch_WRITE, Prefetch_EXEC};

The Hint_Prefetch() function signals to the memory system that memory accesses of the type hint to or from the
specified address are likely to occur in the near future. The memory system might take some action to speed-up the
memory accesses when they do occur, such as preloading the specified address into one or more caches as indicated
by the innermost cache level target and non-temporal hint stream.
Hint_Prefetch(bits(64) address, PrefetchHint hint, integer target, boolean stream);

For more information on PRFM and Load/Store instructions that provide hints to the memory system, see Prefetch
memory on page C2-138 and Load/Store SIMD and Floating-point Non-temporal pair on page C2-136.

B2-74

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

B2 The AArch64 Application Level Memory Model
B2.4 Alignment support

B2.4

Alignment support
This section describes alignment support. It contains the following subsections:
•
Instruction alignment.
•
Alignment of data accesses.
•
Unaligned data access restrictions.

B2.4.1

Instruction alignment
A64 instructions must be word-aligned.
Attempting to fetch an instruction from a misaligned location results in a Misaligned PC fault. See PC alignment
checking on page D1-1423.

B2.4.2

Alignment of data accesses
An unaligned access to any type of Device memory causes an Alignment fault.
The alignment requirements for accesses to Normal memory are as follows:
•

For all instructions that load or store a single or multiple registers, other than
Load-Exclusive/Store-Exclusive and Load-Acquire/Store-Release, if the address that is accessed is not
aligned to the size of the data element being accessed, then one of the following occurs:
—

An Alignment fault is generated.

—

An unaligned access is performed.

SCTLR_ELx.A at the current Exception level can be configured to enable an alignment check, and thereby
determine which of these two options is used.

Note

•

—

The SCTLR_EL1.A bit that is applicable to software running at EL0, can only be accessed from EL1
or above.

—

Alignment checks are based on element size, not overall access size. This affects SIMD element and
structure loads and stores, and also Load/Store pair instructions.

For all Load-Exclusive/Store-Exclusive and Load-Acquire/Store-Release memory accesses that access a
single element or a pair of elements, an alignment fault is generated if the address being accessed is not
aligned to the size of the data structure being accessed.

A failed alignment check results in an Alignment fault, which is taken as a Data Abort exception. These exceptions
are taken at the lowest Exception level that can handle the exception, consistent with the basic requirement that the
Exception level never decreases on an exception. Therefore:
•
Alignment faults at EL0 or EL1 are taken at EL1 unless redirected by HCR_EL2.TGE
•
Alignment faults at EL2 are taken at EL2.
•
Alignment faults at EL3 are taken at EL3.

B2.4.3

Unaligned data access restrictions
The following points apply to unaligned data accesses in ARMv8:

ARM DDI 0487A.a
ID090413

•

Accesses are not guaranteed to be single-copy atomic except at the byte access level, see Atomicity in the
ARM architecture on page B2-79.

•

Unaligned accesses typically takes a number of additional cycles to complete compared to a naturally-aligned
access.

•

An operation that performs an unaligned access can abort on any memory access that it makes, and can abort
on more than one access. This means that an unaligned access that occurs across a page boundary can
generate an abort on either side of the boundary.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

B2-75

B2 The AArch64 Application Level Memory Model
B2.5 Endian support

B2.5

Endian support
General description of endianness in the ARM architecture describes the relationship between endianness and
memory addressing in the ARM architecture.
The following subsections then describe the endianness schemes supported by the architecture:
•
Instruction endianness on page B2-77.
•
Data endianness on page B2-77.

B2.5.1

General description of endianness in the ARM architecture
This section only describes memory addressing and the effects of endianness for data elements up to quadwords of
128 bits. However, this description can be extended to apply to larger data elements.
For an address A, Figure B2-2 shows, for big-endian and little-endian memory systems, the relationship between:
•

The quadword at address A.

•

The doubleword at address A and A+8.

•

The words at addresses A, A+4, A+8, and A+12.

•

The halfwords at addresses A, A+2, A+4, A+6, A+8, A+10, A+12, and A+14.

•

The bytes at addresses A, A+1, A+2, A+3, A+4, A+5, A+6, A+7, A+8, A+9, A+10, A+11, A+12, A+13,
A+14, and A+15.

The terms in Figure B2-2 have the following definitions:
B_A
Byte at address A.
HW_A
Halfword at address A.
MSByte
Most-significant byte.
LSByte
Least-significant byte.
Big-endian memory system
Incrementing byte address

MSByte

LSByte

Quadword at address A
Doubleword at address A
Word at address A
HW_A
B_A

HW_A+2

Doubleword at address A+8

Word at address A+4
HW_A+4

HW_A+6

Word at address A+8
HW_A+8

HW_A+10

Word at address A+12
HW_A+12

HW_A+14

B_A+1 B_A+2 B_A+3 B_A+4 B_A+5 B_A+6 B_A+7 B_A+8 B_A+9 B_A+10 B_A+11 B_A+12 B_A+13 B_A+14 B_A+15

Little-endian memory system
Incrementing byte address

MSByte

LSByte

Quadword at address A
Doubleword at address A+8
Word at address A+12
HW_A+14

HW_A+12

Doubleword at address A

Word at address A+8
HW_A+10

HW_A+8

Word at address A+4
HW_A+6

HW_A+4

Word at address A
HW_A+2

HW_A

B_A+15 B_A+14 B_A+13 B_A+12 B_A+11 B_A+10 B_A+9 B_A+8 B_A+7 B_A+6 B_A+5 B_A+4 B_A+3 B_A+2 B_A+1

B_A

Figure B2-2 Endianness relationships

B2-76

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

B2 The AArch64 Application Level Memory Model
B2.5 Endian support

The big-endian and little-endian mapping schemes determine the order in which the bytes of a quadword,
doubleword, word or halfword are interpreted. For example, a load of a word from address 0x1000 always results in
an access to the bytes at memory locations 0x1000, 0x1001, 0x1002, and 0x1003. The endianness mapping scheme
determines the significance of these four bytes.

B2.5.2

Instruction endianness
In ARMv8-A, A64 instructions have a fixed length of 32 bits and are always little-endian.

B2.5.3

Data endianness
SCTLR_EL1.E0E, configurable at EL1 or higher, determines the data endianness for execution at EL0.
The data size used for endianness conversions:
•

Is the size of the data value that is loaded or stored for SIMD and floating-point register and general-purpose
register loads and stores.

•

Is the size of the data element that is loaded or stored for SIMD element and data structure loads and stores.
For more information see Endianness in SIMD operations.

Instructions to reverse bytes in a general-purpose register or a SIMD and floating-point
register
An application or device driver might have to interface to memory-mapped peripheral registers or shared memory
structures that are not the same endianness as the internal data structures. Similarly, the endianness of the operating
system might not match that of the peripheral registers or shared memory. In these cases, the PE requires an efficient
method to transform explicitly the endianness of the data.
Table B2-1 shows the instructions that provide this functionality:
Table B2-1 Byte reversal instructions
Function

Instructions

Notes

Reverse bytes in 32-bit word or wordsa

REV32

For use with general-purpose registers

Reverse bytes in whole register

REV

For use with general-purpose registers

Reverse bytes in 16-bit halfwords

REV16

For use with general-purpose registers

Reverse elements in doublewords, vector

REV64

For use with SIMD and floating-point registers

Reverse elements in words, vector

REV32

For use with SIMD and floating-point registers

Reverse elements in halfwords, vector

REV16

For use with SIMD and floating-point registers

a. Can operate on multiple words.

Endianness in SIMD operations
SIMD element Load/Store instructions transfer vectors of elements between memory and the SIMD and
floating-point register file. An instruction specifies both the length of the transfer and the size of the data elements
being transferred. This information is used to load and store data correctly in both big-endian and little-endian
systems.
For example:
LD1 {V0.4H}, [X1]

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

B2-77

B2 The AArch64 Application Level Memory Model
B2.5 Endian support

This loads a 64-bit register with four 16-bit values. The four elements appear in the register in array order, with the
lowest indexed element fetched from the lowest address. The order of bytes in the elements depends on the
endianness configuration, as shown in Figure B2-3. Therefore, the order of the elements in the registers is the same
regardless of the endianness configuration.
64-bit register containing four 16-bit elements
D[15:8]

0
1
2
3
4
5
6
7

D[7:0]

A[7:0]
A[15:8]
B[7:0]
B[15:8]
C[7:0]
C[15:8]
D[7:0]
D[15:8]

C[15:8]

C[7:0]

B[15:8]

LD1 {V0.4H}, [X1]

B[7:0]

A[15:8]

LD1 {V0.4H}, [X1]

Memory system with
little-endian addressing (LE)

0
1
2
3
4
5
6
7

A[7:0]

A[15:8]
A[7:0]
B[15:8]
B[7:0]
C[15:8]
C[7:0]
D[15:8]
D[7:0]

Memory system with
big-endian addressing (BE)

Figure B2-3 SIMD byte order example
The BigEndian() function determines the current endianness of the data:
boolean BigEndian();

The pseudocode function for BigEndianReverse() is as follows:
// BigEndianReverse()
// ==================

bits(width) BigEndianReverse (bits(width) value)
assert width IN {8, 16, 32, 64, 128};
integer half = width DIV 2;
if width == 8 then return value;
return BigEndianReverse(value) : BigEndianReverse(value);

B2-78

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

B2 The AArch64 Application Level Memory Model
B2.6 Atomicity in the ARM architecture

B2.6

Atomicity in the ARM architecture
Atomicity is a feature of memory accesses, described as atomic accesses. The ARM architecture description refers
to two types of atomicity, defined in:
•
Single-copy atomicity.
•
Multi-copy atomicity.
In the ARMv8 architecture, the atomicity requirements for memory accesses depends on the memory type, and
whether the access is explicit or implicit. For more information, see:
•
Memory type overview on page B2-69.
•
Requirements for single-copy atomicity.
•
Requirements for multi-copy atomicity on page B2-80.

B2.6.1

Single-copy atomicity
A read or write operation is single-copy atomic if the following conditions are both true:
•

After any number of write operations to a memory location, the value of the memory location is the value
written by one of the write operations. It is impossible for part of the value of the memory location to come
from one write operation and another part of the value to come from a different write operation.

•

When a read operation and a write operation are made to the same memory location, the value obtained by
the read operation is either:
—
The value of the memory location before the write operation.
—
The value of the memory location after the write operation.
It is never the case that the value of the read operation is partly the value of the memory location before the
write operation and partly the value of the memory location after the write operation.

B2.6.2

Multi-copy atomicity
In a multiprocessing system, writes to a memory location are multi-copy atomic if the following conditions are both
true:
•

All writes to the same location are serialized, meaning they are observed in the same order by all observers,
although some observers might not observe all of the writes.

•

A read of a location does not return the value of a write until all observers observe that write.

Note
Writes that are not coherent are not multi-copy atomic.

B2.6.3

Requirements for single-copy atomicity
For explicit memory accesses generated from an Exception level the following rules apply:

ARM DDI 0487A.a
ID090413

•

All reads generated by load instructions that load a single general-purpose register and that are aligned to the
size of the read in that instruction are single-copy atomic.

•

All writes generated by store instructions that store a single general-purpose register and that are aligned to
the size of the write in that instruction are single-copy atomic.

•

Reads of general-purpose registers generated by Load Pair instructions that are aligned to the size of the load
to each register are treated as two single-copy atomic reads, one for each register being loaded.

•

Writes of general-purpose registers generated by Store pair instructions that are aligned to the size of the store
of each register are treated as two single-copy atomic writes, one for each register being stored.

•

Load-Exclusive Pair instructions of two 32-bit quantities and Store-Exclusive Pair instructions of 32-bit
quantities are single-copy atomic.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

B2-79

B2 The AArch64 Application Level Memory Model
B2.6 Atomicity in the ARM architecture

•

When the Store-Exclusive of a Load-Exclusive /Store-Exclusive pair instruction using two 64-bit quantities
succeeds, it causes a single-copy atomic update of the entire memory location being updated.

Note
To atomically load two 64-bit quantities, perform a Load-Exclusive pair/Store-Exclusive pair sequence of
reading and writing the same value for which the Store-Exclusive pair succeeds, and use the read values from
the Load-Exclusive pair.
•

Where translation table walks generate a read of a translation table entry, this read is single-copy atomic.

•

When a store that, by the rules given in this section, would be single-copy atomic is made to a memory
location at a time when there is at least one store to the same memory location that has not completed, and
that would be single-copy atomic at a different size, then the architecture does not give any assurance of
atomicity between accesses to the bytes of that location.

•

For the atomicity of instruction fetches, see Concurrent modification and execution of instructions on
page B2-91.

All other memory accesses are regarded as streams of accesses to bytes, and no atomicity between accesses to
different bytes is ensured by the architecture.
All accesses to any byte are single-copy atomic.

Note
No memory accesses involving SIMD and floating-point registers, or memory accesses from Data cache zero
instructions, have single-copy atomicity of any quantity greater than individual bytes.

B2.6.4

Requirements for multi-copy atomicity
In a multiprocessing system, coherent writes to a memory location are multi-copy atomic if the read of a location
returns the value of a write only when all observers have observed that write.
For Normal memory, writes are not required to be multi-copy atomic.
For Device memory with the non-Gathering attribute, writes that are single-copy atomic are also multi-copy atomic.
For Device memory with the Gathering attribute, writes are not required to be multi-copy atomic.

B2.6.5

Concurrent modification and execution of instructions
The ARMv8 architecture limits the set of instructions that can be executed by one thread of execution as they are
being modified by another thread of execution without requiring explicit synchronization.
Concurrent modification and execution of instructions can lead to the resulting instruction performing any behavior
that can be achieved by executing any sequence of instructions that can be executed from the same Exception level,
except where the instruction before modification and the instruction after modification is a B, BL, NOP, BKPT, SVC, HVC,
or SMC instruction.
For the B, BL, NOP, BKPT, SVC, HVC, and SMC instructions the architecture guarantees that, after modification of the
instruction, behavior is consistent with execution of either:
•
The instruction originally fetched.
•
A fetch of the modified instruction.
For more information about the required synchronization operation, see Synchronization and coherency issues
between data and instruction accesses on page B2-73.
If one thread of execution changes a conditional branch instruction, such as B or BL, to another conditional instruction
and the change affects both the condition field and the branch target, execution of the changed instruction by another
thread of execution before the change is synchronized can lead to either:
•
The old condition being associated with the new target address.

B2-80

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

B2 The AArch64 Application Level Memory Model
B2.6 Atomicity in the ARM architecture

•

The new condition being associated with the old target address.

These possibilities apply regardless of whether the condition, either before or after the change to the branch
instruction, is the always condition.

Note
For information about memory accesses caused by instruction fetches, see Ordering requirements on page B2-83.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

B2-81

B2 The AArch64 Application Level Memory Model
B2.7 Memory ordering

B2.7

Memory ordering
This section describes observation ordering. It contains the following subsections:
•
Observability and completion.
•
Ordering requirements on page B2-83.
•
Memory barriers on page B2-85.
For information on endpoint ordering of memory accesses, see Reordering on page B2-95.
In the ARMv8 memory model, the shareability memory attribute indicates whether hardware must ensure memory
coherency.
The ARMv8 memory system architecture defines additional attributes and associated behaviors, defined in the
system level section of this manual. See:
•
Chapter D4 The AArch64 System Level Memory Model.
•
Chapter D5 The AArch64 Virtual Memory System Architecture.
See also Mismatched memory attributes on page B2-98.

B2.7.1

Observability and completion
An observer is a master in the system that is capable of observing memory accesses. For a PE, the following
mechanisms must be treated as independent observers:
•

The mechanism that performs reads or writes to memory.

•

A mechanism that causes an instruction cache to be filled from memory or that fetches instructions to be
executed directly from memory. These are treated as reads.

•

A mechanism that performs translation table walks. These are treated as reads.

The set of observers that can observe a memory access is defined by the system.
In the definitions in this subsection, subsequent means whichever of the following is appropriate to the context:
•
After the point in time where the location is observed by that observer.
•
After the point in time where the location is globally observed.
For all memory:
•

•

B2-82

A write to a location in memory is said to be observed by an observer when:
—

A subsequent read of the location by the same observer returns the value written by the observed write,
or written by a write to that location by any observer that is sequenced in the Coherence order of the
location after the observed write.

—

A subsequent write of the location by the same observer is sequenced in the Coherence order of the
location after the observed write.

A write to a location in memory is said to be globally observed for a shareability domain or set of observers
when:
—

A subsequent read of the location by any observer in that shareability domain returns the value written
by the globally observed write, or written by a write to that location by any observer that is sequenced
in the Coherence order of the location after the globally observed write.

—

A subsequent write of the location by any observer in that shareability domain is sequenced in the
Coherence order of the location after the globally observed write.

•

A read of a location in memory is said to be observed by an observer when a subsequent write to the location
by the same observer has no effect on the value returned by the read.

•

A read of a location in memory is said to be globally observed for a shareability domain when a subsequent
write to the location by any observer in that shareability domain has no effect on the value returned by the
read.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

B2 The AArch64 Application Level Memory Model
B2.7 Memory ordering

Additionally, for Device-nGnRnE memory:
•

A read or write of a memory-mapped location in a peripheral that exhibits side-effects is said to be observed,
and globally observed, only when the read or write:
—

Meets the general conditions listed.

—

Can begin to affect the state of the memory-mapped peripheral.

—

Can trigger all associated side-effects, whether they affect other peripheral devices, PEs, or memory.

Note
This definition is consistent with the memory access having reached the peripheral.
For all memory, the completion rules are defined as:
•

A read or write is complete for a shareability domain when all of the following are true:
—

The read or write is globally observed for that shareability domain.

—

Any translation table walks associated with the read or write are complete for that shareability domain.

•

A translation table walk is complete for a shareability domain when the memory accesses associated with the
translation table walk are globally observed for that shareability domain, and the TLB is updated.

•

A cache or TLB maintenance instruction is complete for a shareability domain when the effects of the
instruction are globally observed for that shareability domain, and any translation table walks that arise from
the instruction are complete for that shareability domain.
The completion of any cache or TLB maintenance instruction includes its completion on all PEs that are
affected by both the instruction and the DSB operation that is required to guarantee visibility of the
maintenance instruction.

Completion of side-effects of accesses to Device memory
The completion of a memory access to Device memory other than Device-nGnRnE is not guaranteed to be sufficient
to determine that the side-effects of the memory access are visible to all observers. The mechanism that ensures the
visibility of side-effects of a memory access is IMPLEMENTATION DEFINED.

B2.7.2

Ordering requirements
ARMv8 defines restrictions for the permitted ordering of memory accesses. These restrictions depend on the
memory locations that are being accessed. See Memory types and attributes on page B2-89.
The following additional restrictions apply to the order in which accesses to Normal memory are observed:
•

•

ARM DDI 0487A.a
ID090413

Reads and writes can be observed in any order provided the following constraints are met:
—

If an address dependency exists between two reads or between a read and a write, then those memory
accesses are observed in program order by all observers within the shareability domain of the memory
address being accessed.
The ARMv8 architecture relaxes this rule for execution where the second read is generated by a Load
Non-Temporal Pair instruction. See Load/Store Non-temporal Pair on page C2-132 and Load/Store
SIMD and Floating-point Non-temporal pair on page C2-136.

—

Writes that would not occur in a simple sequential execution of the program cannot be observed by
other observers. This implies that where a control, address or data dependency exists between a read
and a write, those memory accesses are observed in program order by all observers within the
shareability domain of the memory addresses being accessed.

—

Ordering can be achieved by using a DMB or DSB barrier. For more information on DMB and DSB
instructions, see Memory barriers on page B2-85.

Reads and writes to the same location are coherent within the shareability domain of the memory address
being accessed.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

B2-83

B2 The AArch64 Application Level Memory Model
B2.7 Memory ordering

•

Two reads of the same location by the same observer are observed in program order by all observers within
the shareability domain of the memory address being accessed.

•

Writes are not required to be multi-copy atomic. This means that in the absence of barriers, the observation
of a store by one observer does not imply the observation of the store by another observer.

•

Instructions that access multiple elements have no defined ordering requirements for the memory accesses
relative to each other.

Memory accesses caused by instruction fetches are not required to be observed in program order, unless they are
separated by an ISB or other context synchronization event.

Address dependencies and order
In the ARMv8 architecture, a register data dependency creates order between a load instruction and a subsequent
memory transaction, that is between the data value returned from the load and the address used by the subsequent
memory transaction.
A register data dependency exists between a first data value and a second data value exists when either:
•

•

The register, excluding the zero register (XZR or WZR), used to hold the first data value is used in the
calculation of the second data value, and the calculation between the first data value and the second data value
does not consist of either:
—

A conditional branch whose condition is determined by the first data value.

—

A conditional selection, move, or computation whose condition is determined by the first data value,
where the input data values for the selection, move, or computation do not have a data dependency on
the first data value.

There is a register data dependency between the first data value and a third data value, and between the third
data value and the second data value.

Note
A register data dependency can exist even if the value of the first data value is discarded as part of the calculation,
as might be the case if it is ANDed with 0x0 or if arithmetic using the first data value cancels out its contribution.
For example, each of the following code snippets exhibits order between the memory transactions:
1.

LDR X1, [X2]

1.

AND X1, X1, XZR

1.

LDR X4, [X3, X1]

2.

LDR X1, [X2]

3.

ADD X3, X3, X1

4.

SUB X3, X3, X1

5.

STR X4, [X3]

Address dependencies of Load Non-temporal Pair instructions
Where an address dependency exists between two reads, and the second read was generated by a Load
Non-temporal Pair instruction, then in the absence of any other barrier mechanism to achieve order, those memory
accesses can be observed in any order by other observers within the shareability domain of the memory addresses
being accessed.
This affects the following instruction:
•

B2-84

LDNP on page C5-509.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

B2 The AArch64 Application Level Memory Model
B2.7 Memory ordering

B2.7.3

Memory barriers
The ARM architecture is a weakly ordered memory architecture that supports out of order completion. Memory
barrier is the general term applied to an instruction, or sequence of instructions, that forces synchronization events
by a PE with respect to retiring Load/Store instructions. The memory barriers defined by the ARMv8 architecture
provide a range of functionality, including:
•
Ordering of Load/Store instructions.
•
Completion of Load/Store instructions.
•
Context synchronization.
The following subsections describe the ARMv8 memory barrier instructions:
•
Instruction Synchronization Barrier (ISB)
•
Data Memory Barrier (DMB).
•
Data Synchronization Barrier (DSB) on page B2-86.
•
Shareability and access limitations on the data barrier operations on page B2-87.
•
Load-Acquire, Store-Release on page B2-87.

Note
Depending on the required synchronization, a program might use memory barriers on their own, or it might use them
in conjunction with cache maintenance and memory management instructions that in general are only available
when software execution is at EL1 or higher.
The DMB and DSB memory barriers affect reads and writes to the memory system generated by Load/Store instructions
and data or unified cache maintenance instructions being executed by the PE. Instruction fetches or accesses caused
by a hardware translation table access are not explicit accesses.

Instruction Synchronization Barrier (ISB)
An ISB instruction flushes the pipeline in the PE, so that all instructions that come after the ISB instruction in
program order are fetched from the cache or memory only after the ISB instruction has completed. Using an ISB
ensures that the effects of context-changing operations executed before the ISB are visible to the instructions fetched
after the ISB instruction. Examples of context-changing operations that require the insertion of an ISB instruction to
ensure the effects of the operation are visible to instructions fetched after the ISB instruction are:
•
Completed cache and TLB maintenance instructions.
•
Changes to system control registers.
Any context-changing operations appearing in program order after the ISB instruction only take effect after the ISB
has been executed.
InstructionSynchronizationBarrier();

See also Memory barriers on page D4-1705.

Data Memory Barrier (DMB)
The DMB instruction is a data memory barrier. The PE that executes the DMB instruction is referred to as the executing
PE, PEe. The DMB instruction takes the required shareability domain and required access types as arguments:
DataMemoryBarrier(MBReqDomain domain, MBReqTypes types);

See Shareability and access limitations on the data barrier operations on page B2-87.
If the required shareability is Full system then the operation applies to all observers within the system.
A DMB creates two groups of memory accesses, Group A and Group B:
Group A

Contains:
•

ARM DDI 0487A.a
ID090413

All explicit memory accesses of the required access types from observers in the same
required shareability domain as PEe that are observed by PEe before the DMB instruction.
These accesses include any accesses of the required access types performed by PEe.
Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

B2-85

B2 The AArch64 Application Level Memory Model
B2.7 Memory ordering

•

Group B

All loads of required access types from an observer PEx in the same required shareability
domain as PEe that have been observed by any given different observer, PEy, in the same
required shareability domain as PEe before PEy has performed a memory access that is a
member of Group A.

Contains:
•

All explicit memory accesses of the required access types by PEe that occur in program order
after the DMB instruction.

•

All explicit memory accesses of the required access types by any given observer PEx in the
same required shareability domain as PEe that can only occur after a load by PEx has returned
the result of a store that is a member of Group B.

Any observer with the same required shareability domain as PEe observes all members of Group A before it
observes any member of Group B to the extent that those group members are required to be observed, as determined
by the shareability and cacheability of the memory locations accessed by the group members.
If members of Group A and members of Group B access the same memory-mapped peripheral of arbitrary
system-defined size, then members of Group A that are accessing Device or Normal Non-cacheable memory arrive
at that peripheral before members of Group B that are accessing Device or Normal Non-cacheable memory. Where
the members of Group A and Group B that must be ordered are from the same PE, a DMB NSH is sufficient for this
guarantee.

Note
•

A memory access might be in neither Group A nor Group B. The DMB does not affect the order of observation
of such a memory access.

•

The second part of the definition of Group A is recursive. Ultimately, membership of Group A derives from
the observation by PEy of a load before PEy performs an access that is a member of Group A as a result of
the first part of the definition of Group A.

•

The second part of the definition of Group B is recursive. Ultimately, membership of Group B derives from
the observation by any observer of an access by PEe that is a member of Group B as a result of the first part
of the definition of Group B.

DMB only affects memory accesses and the operation of data cache and unified cache maintenance instructions, see

Cache maintenance instructions on page D4-1684. It has no effect on the ordering of any other instructions
executing on the PE.
See also Memory barriers on page D4-1705.

Data Synchronization Barrier (DSB)
The DSB instruction is a special memory barrier, that synchronizes the execution stream with memory accesses.
The DSB instruction takes the required shareability domain and required access types as arguments:
DataSynchronizationBarrier(MBReqDomain domain, MBReqTypes types);

See Shareability and access limitations on the data barrier operations on page B2-87.
If the required shareability is Full system then the operation applies to all observers within the system.
A DSB behaves as a DMB with the same arguments, and also has the additional properties defined in this section. The
PE that executes the DSB instruction is referred to as the executing PE, PEe
A DSB completes when all of the following apply:
•

B2-86

All explicit memory accesses that are observed by PEe before the DSB is executed and are of the required
access types, and are from observers in the same required shareability domain as PEe, are complete for the
set of observers in the required shareability domain.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

B2 The AArch64 Application Level Memory Model
B2.7 Memory ordering

•

All cache maintenance instructions issued by PEe before the DSB are complete for the required shareability
domain.

•

If the required access types of the DSB is reads and writes, all TLB maintenance instructions issued by PEe
before the DSB are complete for the required shareability domain.

In addition, no instruction that appears in program order after the DSB instruction can execute until the DSB completes.
See also Memory barriers on page D4-1705.

Shareability and access limitations on the data barrier operations
The DMB and DSB instructions can each take an optional limitation argument that specifies:
•
The shareability domain over which the instruction must operate. This is one of:
—
Full system.
—
Outer Shareable.
—
Inner Shareable.
—
Non-shareable.
•
The accesses for which the instruction operates. This is one of:
—
Read and write accesses in Group A and Group B.
—
Write accesses only in Group A and Group B.
—
Read access only in Group A.

Note
This is occasionally referred to as a Load-Load/Store barrier.
—

Read and write accesses in Group B.

Note
This is occasionally referred to as a Load-Load/Store barrier.
If no specifiers are used then each instruction operates for read and write accesses, over the full system. See the
instruction descriptions for more information about these arguments.

Note
ISB also supports an optional limitation argument that can only contain one value that corresponds to full system

operation.

Load-Acquire, Store-Release
ARMv8 provides a set of instructions with Acquire semantics for loads, and Release semantics for stores. See
Load-Acquire/Store-Release on page C2-134.
For all memory types, these instructions have the following ordering requirements:

ARM DDI 0487A.a
ID090413

•

A Store-Release followed by a Load-Acquire is observed in program order by each observer within the
shareability domain of the memory address being accessed by the Store-Release and the memory address
being accessed by the Load-Acquire.

•

A Load-Acquire is a read that must be observed by all observers in the shareability domain of the accessed
memory location before any other read or write that both:
—
Is caused by an instruction that appears in program order after the Load-Acquire.
—
Accesses memory in the shareability domain accessed by the Load-Acquire.

•

A Load-Acquire places no additional ordering constraints on any loads or stores appearing before the
Load-Acquire.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

B2-87

B2 The AArch64 Application Level Memory Model
B2.7 Memory ordering

•

Store-Release is a write:
—

Where the reads and writes generated by loads and stores appearing in program order before the
Store-Release are observed as required by the shareability domains of the memory addresses being
accessed by those loads and stores by each observer within the shareability domain of the memory
address being accessed by the Store-Release, before that observer observes the write generated by the
Store-Release.

—

Where any writes that have been observed before the Store-Release by the processing element
executing the Store-Release are observed as required by the shareability domains of the memory
addresses being accessed by those loads and store by each observer within the shareability domain of
the memory address being accessed by the Store-Release, before that observer observes the write
generated by the Store-Release.

•

The Store-Release places no additional ordering constraints on any loads or stores appearing after the
Store-Release instruction.

•

All Store-Release instructions must be multi-copy atomic when they are observed with Load-Acquire
instructions. This means that if one observer has seen the Store-Release, then all observers have seen the
Store-Release.

Load-Acquire and Store-Release, other than Load-Acquire Exclusive Pair and Store-Release-Exclusive Pair, access
only a single data element. This access is single-copy atomic. The address of the data object must be aligned to the
size of the data element being accessed, otherwise the access generates an Alignment fault.
Load-Acquire Exclusive Pair and Store-Release Exclusive Pair access two data elements. The address supplied to
the instructions must be aligned to twice the size of the element being loaded, otherwise the access generates an
Alignment fault.
A Store-Release Exclusive instruction only has the release semantics if the store is successful.

Note

B2-88

•

Each Load-Acquire Exclusive and Store-Release Exclusive instruction is essentially a variant of the
equivalent Load-Exclusive or Store-Exclusive instruction. All usage restrictions and single-copy atomicity
properties that apply to the Load-Exclusive or Store-Exclusive instructions also apply to the Load-Acquire
Exclusive or Store-Release Exclusive instructions.

•

The Load-Acquire/Store-Release instructions can remove the requirement to use the explicit DMB memory
barrier instruction.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

B2 The AArch64 Application Level Memory Model
B2.8 Memory types and attributes

B2.8

Memory types and attributes
In ARMv8 the ordering of accesses for locations of memory, referred to as the memory order model, is defined by
the memory attributes. The following sections describe this model:
•
Normal memory.
•
Device memory on page B2-92.

B2.8.1

Normal memory
The Normal memory type attribute applies to most memory in a system. It indicates that the hardware might perform
speculative data read accesses to these locations.
The Normal memory type has the following properties:
•

A write to a memory location with the Normal attribute completes in finite time. This means that it is globally
observed for the shareability domain of the memory location in finite time. For a Non-cacheable location, the
location is observed by all observers in finite time.

•

A completed write to a memory location with the Normal attribute is globally observed for the shareability
domain of the memory location in finite time without the need for explicit cache maintenance instructions or
barriers. For a Non-cacheable location, the completed write is globally observed for all observers in finite
time without the need for explicit cache maintenance instructions or barriers.

•

Writes to a memory location with the Normal memory attribute that are Non-cacheable must reach the
endpoint for that location in the memory system in finite time.

•

Unaligned memory accesses can access Normal memory if the system is configured to generate such
accesses.

•

There is no requirement for the memory system beyond the PE to be able to identify the elements accessed
by multi-register Load/Store instructions. See Multi-register loads and stores that access Normal memory on
page B2-92.

Note
•

•

The Normal memory attribute is appropriate for locations of memory that are idempotent, meaning that they
exhibit all of the following properties:
—

Read accesses can be repeated with no side-effects.

—

Repeated read accesses return the last value written to the resource being read.

—

Read accesses can fetch additional memory locations with no side-effects.

—

Write accesses can be repeated with no side-effects if the contents of the location accessed are
unchanged between the repeated writes or as the result of an exception, as described in this section.

—

Unaligned accesses can be supported.

—

Accesses can be merged before accessing the target memory system.

An instruction that generates a sequence of accesses as described in Atomicity in the ARM architecture on
page B2-79 might be abandoned as a result of an exception being taken during the sequence of accesses. On
return from the exception the instruction is restarted, and therefore one or more of the memory locations
might be accessed multiple times. This can result in repeated write accesses to a location that has been
changed between the write accesses.

The following sections describe the other attributes for Normal memory:
•
Shareable Normal memory on page B2-90.
•
Non-shareable Normal memory on page B2-91.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

B2-89

B2 The AArch64 Application Level Memory Model
B2.8 Memory types and attributes

See also:
•

Atomicity in the ARM architecture on page B2-79.

•

Memory barriers on page B2-85. For accesses to Normal memory, a DMB instruction is required to ensure the
required ordering.

•

Concurrent modification and execution of instructions on page B2-80.

Shareable Normal memory
A Normal memory location has a Shareability attribute that is:
•
Defined independently for the Inner Shareable and Outer Shareable shareability domains.
•
Defined, for each shareability domain, as being either Shareable or Non-shareable.
The shareability attributes define the data coherency requirements of the location, that hardware must enforce. They
do not affect the coherency requirements of instruction fetches, see Synchronization and coherency issues between
data and instruction accesses on page B2-73.

Note
•

System designers can use the shareability attribute to specify the locations in Normal memory for which
coherency must be maintained. However, software developers must not assume that specifying a memory
location as Non-shareable permits software to make assumptions about the incoherency of the location
between different PEs in a shared memory system. Such assumptions are not portable between different
multiprocessing implementations that might use the shareability attribute. Any multiprocessing
implementation might implement caches that are shared, inherently, between different processing elements.

•

This architecture assumes that all PEs that use the same operating system or hypervisor are in the same Inner
Shareable shareability domain.

Shareable, Inner Shareable, and Outer Shareable Normal memory
The ARM architecture abstracts the system as a series of Inner and Outer Shareability domains.
Each Inner Shareability domain contains a set of observers that are data coherent for each member of that set for
data accesses with the Inner Shareable attribute made by any member of that set.
Each Outer Shareability domain contains a set of observers that are data coherent for each member of that set for
data accesses with the Outer Shareable attribute made by any member of that set.
The following properties also hold:
•

Each observer is only a member of a single Inner Shareability domain.

•

Each observer is only a member of a single Outer Shareability domain.

•

All observers in an Inner Shareability domain are always members of the same Outer Shareability domain.
This means that an Inner Shareability domain is a subset of an Outer Shareability domain, although it is not
required to be a proper subset.

Note

B2-90

•

Because all data accesses to Non-cacheable locations are data coherent to all observers, Non-cacheable
locations are always treated as Outer Shareable.

•

The Inner Shareable domain is expected to be the set of PEs controlled by a single hypervisor or operating
system.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

B2 The AArch64 Application Level Memory Model
B2.8 Memory types and attributes

The details of the use of the shareability attributes are system-specific. Example B2-1 shows how they might be
used.
Example B2-1 Use of shareability attributes

In an implementation, a particular subsystem with two clusters of PEs has the requirement that:
•

In each cluster, the data caches or unified caches of the PEs in the cluster are transparent for all data accesses
to memory locations with the Inner Shareable attribute.

•

However, between the two clusters, the caches:
—
Are not required to be coherent for data accesses that have only the Inner Shareable attribute.
—
Are coherent for data accesses that have the Outer Shareable attribute.

In this system, each cluster is in a different shareability domain for the Inner Shareable attribute, but all components
of the subsystem are in the same shareability domain for the Outer Shareable attribute.
A system might implement two such subsystems. If the data caches or unified caches of one subsystem are not
transparent to the accesses from the other subsystem, this system has two Outer Shareable shareability domains.

Having two levels of shareability means system designers can reduce the performance and power overhead for
shared memory locations that do not need to be part of the Outer Shareable shareability domain.
For Shareable Normal memory, the Load-Exclusive and Store-Exclusive synchronization primitives take account
of the possibility of accesses by more than one observer in the same Shareability domain.

Non-shareable Normal memory
For Normal memory locations, the Non-shareable attribute identifies Normal memory that is likely to be accessed
only by a single PE.
A location in Normal memory with the Non-shareable attribute does not require the hardware to make data accesses
by different observers coherent, unless the memory is Non-cacheable. For a Non-shareable location, if other
observers share the memory system, software must use cache maintenance instructions, if the presence of caches
might lead to coherency issues when communicating between the observers. This cache maintenance requirement
is in addition to the barrier operations that are required to ensure memory ordering.
For Non-shareable Normal memory, it is IMPLEMENTATION DEFINED whether the Load-Exclusive and
Store-Exclusive synchronization primitives take account of the possibility of accesses by more than one observer.

Concurrent modification and execution of instructions
The ARMv8 architecture limits the set of instructions that can be executed by one thread of execution as they are
being modified by another thread of execution without requiring explicit synchronization.
Except where the instruction before modification or the instruction after modification is explicitly identified in this
section, concurrent modification and execution of instructions can lead to the resulting instruction performing any
behavior that can be achieved by executing any sequence of instructions that can be executed from the same
Exception level.
For the instructions explicitly identified in this section, the architecture guarantees that, after modification of the
instruction, behavior is consistent with execution of either:
•
The instruction originally fetched.
•
A fetch of the modified instruction.
The instructions to which this applies are the B, BL, NOP, BKPT, SVC, HVC, and SMC instructions.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

B2-91

B2 The AArch64 Application Level Memory Model
B2.8 Memory types and attributes

For all other instructions, to avoid UNPREDICTABLE behavior, instruction modifications must be explicitly
synchronized before they are executed. The required synchronization is as follows:
1.

To ensure that the modified instructions are observable, the thread of execution that is modifying the
instructions must issue the following sequence of instructions and operations:
; Coherency example for self-modifying code
; Enter this code with Wt containing a new 32-bit instruction,
; to be held in non-cacheable space at a location pointed to by Xn.
STR

2.

Wt, [Xn]

DSB ISH

; Ensure visibility of the data stored

IC IVAU, Xn

; Invalidate instruction cache by VA to PoU

DSB ISH

; Ensure completion of the invalidations

Once the modified instructions are observable, the thread of execution that is executing the modified
instructions must issue the following instruction to ensure execution of the modified instructions:
ISB SY

; Synchronize fetched instruction stream

For both instruction sets, if one thread of execution changes a conditional branch instruction to another conditional
branch instruction, and the change affects both the condition field and the branch target, execution of the changed
instruction by another thread of execution before the change is synchronized can lead to either:
•
The old condition being associated with the new target address.
•
The new condition being associated with the old target address.
These possibilities apply regardless of whether the condition, either before or after the change to the branch
instruction, is the always condition.

Multi-register loads and stores that access Normal memory
For all instructions that load or store more than one general-purpose register from an Exception level there is no
requirement for the memory system beyond the PE to be able to identify the size of the elements accessed by these
load or store instructions.
For all instructions that load or store more than one general-purpose register from an Exception level the order in
which the registers are accessed is not defined by the architecture.
For all instructions that load or store one or more SIMD and floating-point register from an Exception level there is
no requirement for the memory system beyond the PE to be able to identify the size of the element accessed by these
load or store instructions.

B2.8.2

Device memory
The Device memory type attributes define memory locations where an access to the location can cause side-effects,
or where the value returned for a load can vary depending on the number of loads performed. Typically, the Device
memory attributes are used for memory-mapped peripherals and similar locations.
The attributes for ARMv8 Device memory are:
Gathering

Identified as G or nG, see Gathering on page B2-94.

Reordering

Identified as R or nR, see Reordering on page B2-95.

Early Write Acknowledgement hint
Identified as E or nE, see Early Write Acknowledgement on page B2-96.
The ARMv8 Device memory types are:
Device-nGnRnE

Device non-Gathering, non-Reordering, No Early write acknowledgement.
Equivalent to the Strongly-ordered memory type in earlier versions of the architecture.

B2-92

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

B2 The AArch64 Application Level Memory Model
B2.8 Memory types and attributes

Device-nGnRE

Device non-Gathering, non-Reordering, Early Write Acknowledgement.
Equivalent to the Device memory type in earlier versions of the architecture.

Device-nGRE

Device non-Gathering, Reordering, Early Write Acknowledgement.
ARMv8 adds this memory type to the translation table formats found in earlier versions of
the architecture. The use of barriers is required to order accesses to Device-nGRE memory.

Device-GRE

Device Gathering, Reordering, Early Write Acknowledgement.
ARMv8 adds this memory type to the translation table formats found in earlier versions of
the architecture. Device-GRE memory has the fewest constraints. It behaves similar to
Normal memory, with the restriction that speculative accesses to Device-GRE memory is
forbidden.

Collectively these are referred to as any Device memory type. Going down the list, the memory types are described
as getting weaker; conversely the going up the list the memory types are described as getting stronger.

Note
•

As the list of types shows, these additional attributes are hierarchical. For example, a memory location that
permits Gathering must also permit Reordering and Early Write Acknowledgement.

•

The architecture does not require an implementation to distinguish between each of these memory types and
ARM recognizes that not all implementations will do so. The subsection that describes each of the attributes,
describes the implementation rules for the attribute.

•

Earlier versions of the ARM architecture defined the following memory types:
—
Strongly-ordered memory. This is the equivalent of the Device-nGnRnE memory type.
—
Device memory. This is the equivalent of the Device-nGnRE memory type.

All of these memory types have the following properties:
•

Speculative data accesses are not permitted to any memory location with any Device memory attribute. This
means that each memory access to any Device memory type must be one that would be generated by a simple
sequential execution of the program.
Three exceptions to this apply:
—

Reads generated by the SIMD and floating-point instructions can access bytes that are not explicitly
accessed by the instruction if the bytes accessed are in a 16-byte window, aligned to 16-bytes, that
contains at least one byte that is explicitly accessed by the instruction.

—

For Device memory with the Gathering attribute, reads generated by the LDNP instructions are
permitted to access bytes that are not explicitly accessed by the instruction, provided that the bytes
accessed are in a 128-byte window, aligned to 128-bytes, that contains at least one byte that is
explicitly accessed by the instruction.

—

Where a load or store instruction performs a sequence of memory accesses, as opposed to one
single-copy atomic access as defined in the rules for single-copy atomicity, these accesses might occur
multiple times as a result of executing the load or store instruction. See Single-copy atomicity on
page B2-79.

Note

ARM DDI 0487A.a
ID090413

—

An instruction that generates a sequence of accesses as described in Atomicity in the ARM architecture
on page B2-79 might be abandoned as a result of an exception being taken during the sequence of
accesses. On return from the exception the instruction is restarted, and therefore one or more of the
memory locations might be accessed multiple times. This can result in repeated accesses to a location
where the program only defines a single access. For this reason, ARM strongly recommends that no
accesses to Device memory are performed from a single instruction that spans the boundary of a
translation granule or which in some other way could lead to some of the accesses being aborted.

—

Write speculation that is visible to other observers is prohibited for all memory types.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

B2-93

B2 The AArch64 Application Level Memory Model
B2.8 Memory types and attributes

•

A write to a memory location with any Device memory attribute completes in finite time. This means that it
is globally observed for all observers in the system in finite time.

•

If a location with any Device memory attribute changes without an explicit write by an observer, this change
must also be globally observed for all observers in the system in finite time. Such a change might occur in a
peripheral location that holds status information.

•

A completed write to a memory location with any Device memory attribute is globally observed for all
observers in finite time without the need for explicit maintenance.

•

Data accesses to memory locations are coherent for all observers in the system, and correspondingly are
treated as being Outer Shareable.

•

A memory location with any Device memory attribute cannot be allocated into a cache.

•

Writes to a memory location with any Device memory attribute must reach the endpoint for that address in
the memory system in finite time. Typically, the endpoint is a peripheral or some physical memory.

•

All accesses to memory with any Device memory attribute must be aligned. Any unaligned access generates
an Alignment fault at the first stage of translation that defined the location as being Device.

Note
In the Non-secure EL1 translation regime in systems where HCR_EL2.TGE == 1 and HCR_EL2.DC == 0,
any Alignment fault that results from the fact that all locations are treated as Device is a fault at the first stage
of translation. This causes ESR_EL2.ISS.[24] to be 0.
•

Hardware does not prevent speculative instruction fetches from a memory location with any of the Device
memory attributes unless the memory location is also marked as Execute-never for all Exception levels.

Note
This means that to prevent speculative instruction fetches from memory locations with Device memory
attributes, any location that is assigned any Device memory type must also be marked as Execute-never for
all Exception levels. Failure to mark a memory location with any Device memory attribute as Execute-never
for all Exception levels is a programming error.

For instruction fetches, if branches cause the program counter to point to an area of memory with the Device
attribute which is not marked as Execute-never for the current Exception level, an implementation can either:
•
Treat the instruction fetch as if it were to a memory location with the Normal Non-cacheable attribute.
•
Take a Permission fault.

Gathering
In the Device memory attribute:
G
Indicates that the location has the Gathering attribute.
nG
Indicates that the location does not have the Gathering attribute, meaning it is non-Gathering.
The Gathering attribute determines whether it is permissible for either:
•

Multiple memory accesses of the same type, read or write, to the same memory location to be merged into a
single transaction.

•

Multiple memory accesses of the same type, read or write, to different memory locations to be merged into
a single memory transaction on an interconnect.

Note
This also applies to writebacks from the cache, whether caused by a Natural eviction or as a result of a cache
maintenance instruction.

B2-94

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

B2 The AArch64 Application Level Memory Model
B2.8 Memory types and attributes

For memory types with the Gathering attribute, either of these behaviors is permitted, provided that the ordering and
coherency rules of the memory location are followed.
For memory types with the non-Gathering attribute, neither of these behaviors is permitted. As a result:
•

The number of memory accesses that are made corresponds to the number that would be generated by a
simple sequential execution of the program.

•

All access occur at their programmed size, except that there is no requirement for the memory system beyond
the PE to be able to identify the elements accessed by multi-register Load/Store instructions. See
Multi-register loads and stores that access Device memory on page B2-97.

Gathering between memory accesses separated by a memory barrier that affects those memory accesses is not
permitted. This applies if one memory access is in Group A and one memory access is in Group B. That is, gathering
is not permitted between a memory access in Group A and a memory access in Group B if the two accesses are
separated by a barrier that affects at least one of the accesses.
Gathering between two memory accesses generated by a Load-Acquire/Store-Release is not permitted.
A read from a memory location with the non-Gathering attribute cannot come from a cache or a buffer, but must
come from the endpoint for that address in the memory system. Typically this is a peripheral or physical memory.

Note
•

•

A read from a memory location with the Gathering attribute can come from intermediate buffering of a
previous write, provided that:
—

The accesses are not separated by a DMB or DSB barrier that affects both of the accesses, for example if
one access is in Group A and the other is in Group B.

—

The accesses are not separated by other ordering constructions that require that the accesses are in
order. Such a construction might be a combination of Load-Acquire and Store-Release.

—

The accesses are not generated by a Store-Release instruction.

The ARM architecture only defines programmer visible behavior. Therefore, gathering can be performed if
a programmer cannot tell whether gathering has occurred.

An implementation is permitted to perform an access with the Gathering attribute in a manner consistent with the
requirements specified by the Non-gathering attribute.
An implementation is not permitted to perform an access with the Non-gathering attribute in a manner consistent
with the relaxations allowed by the Gathering attribute.

Reordering
In the Device memory attribute:
R
Indicates that the location has the Reordering attribute.
nR
Indicates that the location does not have the Reordering attribute, meaning it is non-Reordering.
For all memory types with the non-Reordering attribute, the order of memory accesses arriving at a single peripheral
of IMPLEMENTATION DEFINED size, as defined by the peripheral, must be the same order that occurs in a simple
sequential execution of the program.That is, the accesses appear in program order. This ordering applies to all
accesses using any of the memory types with the non-Reordering attribute. As a result, if there is a mixture of
Device-nGnRE and Device-nGnRnE accesses to the same peripheral, these occur in program order. If the memory
accesses are not to a peripheral, then this attribute imposes no restrictions.

Note

ARM DDI 0487A.a
ID090413

•

The IMPLEMENTATION DEFINED size of the single peripheral is the same as applies for the ordering guarantee
provided by the DMB instruction.

•

The ARM architecture only defines programmer visible behavior. Therefore, reordering can be performed if
a programmer cannot tell whether reordering has occurred.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

B2-95

B2 The AArch64 Application Level Memory Model
B2.8 Memory types and attributes

An implementation is permitted to perform an access with the Reordering attribute in a manner consistent with the
requirements specified by the non-Reordering attribute.
An additional relaxation is that an implementation is not permitted to perform an access with the non-Reordering
attribute in a manner consistent with the relaxations allowed by the Reordering attribute.
The non-Reordering attribute does not require any additional ordering, other than that which applies to Normal
memory, between:
•

Accesses with the non-Reordering attribute and accesses with the Reordering attribute.

•

Accesses with the non-Reordering attribute and accesses to Normal memory.

•

Accesses with the non-Reordering attribute and accesses to different peripherals of IMPLEMENTATION
size.

DEFINED

The non-Reordering attribute has no effect on the ordering of cache maintenance instructions, even if the memory
location specified in the instruction has the non-Reordering attribute.

Early Write Acknowledgement
In the Device memory attribute:
E
Indicates that the location has the Early Write Acknowledgement attribute.
nE
Indicates that the location has the No Early Write Acknowledgement attribute.
Early Write Acknowledgement is a hint to the platform memory system. Assigning the No Early Write
Acknowledgement attribute to a Device memory location recommends that only the endpoint of the write access
returns a write acknowledgement of the access, and that no earlier point in the memory system returns a write
acknowledge. This means that a DSB barrier, executed by the PE that performed the write to the No Early Write
Acknowledgement location, completes only after the write has reached its endpoint in the memory system.
Typically, this endpoint is a peripheral or physical memory.
When the Early Write Acknowledgement attribute is assigned to a Device memory location, there is no such
recommendation for the handling of accesses to that location.

Note
•

The Early Write Acknowledgement hint has no effect on the ordering rules. The purpose of signalling no
Early Write Acknowledgement is to signal to the interconnect that the peripheral requires the ability to signal
the acknowledgement. The No Write Acknowledgement signal also provides an additional semantic that can
be interpreted by the driver that is accessing the peripheral.

•

This attribute is treated as a hint, as the exact nature of the interconnects accessed by a PE is outside the scope
of the ARM architecture definition, and not all interconnects provide a mechanism to ensure that a write has
reached the physical endpoint of the memory system.

•

ARM recommends that writes with the No Early Write Acknowledgement hint are used for PCIe
configuration writes. However, the mechanisms by which PCIe configuration writes are identified are
IMPLEMENTATION DEFINED.

•

ARM strongly recommends that the Early Write Acknowledgement hint is not ignored by a PE, but is made
available for use by the system.

Because the No Early Write Acknowledgement attribute is a hint:

B2-96

•

An implementation is permitted to perform an access with the Early Write Acknowledgement attribute in a
manner consistent with the requirements specified by the No Early Write Acknowledgement attribute.

•

An implementation is permitted to perform an access with the No Early Write Acknowledgement attribute
in a manner consistent with the relaxations allowed by the Early Write Acknowledgement attribute.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

B2 The AArch64 Application Level Memory Model
B2.8 Memory types and attributes

Multi-register loads and stores that access Device memory
For all instructions that load or store more than one general-purpose register from an Exception level there is no
requirement for the memory system beyond the PE to be able to identify the size of the elements accessed by these
load or store instructions.
For all instructions that load or store more than one general-purpose register from an Exception level the order in
which the registers are accessed is not defined by the architecture. This applies even to accesses to any type of
Device memory.
For all instructions that load or store one or more floating-point and SIMD register from an Exception level there is
no requirement for the memory system beyond the PE to be able to identify the size of the element accessed by these
load or store instructions, even for access to any type of Device memory.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

B2-97

B2 The AArch64 Application Level Memory Model
B2.9 Mismatched memory attributes

B2.9

Mismatched memory attributes
Memory attributes are controlled by privileged software. For more information, see Chapter D5 The AArch64
Virtual Memory System Architecture.
Physical memory locations are accessed with mismatched attributes if all accesses to the location do not use a
common definition of all of the following attributes of that location:
•
Memory type, Device or Normal.
•
Shareability.
•
Cacheability, for the same level of the inner or outer cache, but excluding any cache allocation hints.
Collectively these are referred to as memory attributes.

Note
The terms location and memory location refer to any byte within the current coherency granule and are used
interchangeably.
The following rules apply when a physical memory location is accessed with mismatched attributes:
1.

When a memory location is accessed with mismatched attributes the only software visible effects are one or
more of the following:
•

2.

Uniprocessor semantics for reads and writes to that memory location might be lost. This means:
—

A read of the memory location by one agent might not return the value most recently written to
that memory location by the same agent.

—

Multiple writes to the memory location by one agent with different memory attributes might not
be ordered in program order.

•

There might be a loss of coherency when multiple agents attempt to access a memory location.

•

There might be a loss of properties derived from the memory type, as described in later bullets in this
section.

•

If all Load-Exclusive/Store-Exclusive instructions executed across all threads to access a given
memory location do not use consistent memory attributes, the exclusive monitor state becomes
UNKNOWN.

•

Bytes written without the Write-Back cacheable attribute within the same Write-Back granule as bytes
written with the Write-Back cacheable attribute might have their values reverted to the old values as
a result of cache Write-Back.

The loss of properties associated with mismatched memory type attributes refers only to the following
properties of Device memory that are additional to the properties of Normal memory:
•
Prohibition of speculative read accesses.
•
Prohibition on Gathering.
•
Prohibition on Re-ordering.
•
The Write Acknowledgement guarantee with respect to the endpoint of the access.
If the only memory type mismatch associated with a memory location across all users of the memory location
is between different types of Device memory, then all accesses might take the properties of the weakest
Device memory type.

B2-98

3.

If all aliases of a memory location that permit write access to the location assign the same shareability and
cacheability attributes to that location, and all these aliases use a definition of the shareability attribute that
includes all the threads of execution that can access the location, then any agent that reads the memory
location using these shareability and cacheability attributes accesses it coherently, to the extent required by
that common definition of the memory attributes.

4.

The possible loss of software-visible effects caused by mismatched attributes for a memory location are
defined more precisely if all of the mismatched attributes define the memory location as one of:
•
Any Device memory type.
•
Normal Inner Non-cacheable, Outer Non-cacheable memory.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

B2 The AArch64 Application Level Memory Model
B2.9 Mismatched memory attributes

In these cases, the only permitted software-visible effects of the mismatched attributes are one or more of the
following:

5.

•

Possible loss of properties described in point 2 page B2-98, derived from the memory type when
multiple agents attempt to access the memory location.

•

Possible reordering of memory transactions to the memory location with different memory attributes,
potentially leading to a loss of coherency or uniprocessor semantics. Any possible loss of coherency
or uniprocessor semantics can be avoided by inserting DMB barrier instructions between accesses to the
same memory location that might use different attributes.

If the mismatched attributes for a memory location all assign the same shareability attribute to the location,
any loss of uniprocessor semantics or coherency within a shareability domain can be avoided by use of
software cache management. To do so, software must use the techniques that are required for the software
management of the coherency of cacheable locations between agents a in different shareability domains. This
means:
•

Before writing to a location not using the Write-Back attribute, software must invalidate, or clean, a
location from the caches if any agent might have written to the location with the Write-Back attribute.
This avoids the possibility of overwriting the location with stale data.

•

After writing to a location with the Write-Back attribute, software must clean the location from the
caches, to make the write visible to external memory.

•

Before reading the location with a cacheable attribute, software must invalidate the location from the
caches, to ensure that any value held in the caches reflects the last value made visible in external
memory.

In all cases:
•

Location refers to any byte within the current coherency granule.

•

A clean and invalidate instruction can be used instead of a clean instruction, or instead of an invalidate
instruction.

•

In the sequences outlined in this section, all cache maintenance instructions and memory transactions
must be completed, or ordered by the use of barrier operations, if they are not naturally ordered by the
use of a common address, see Ordering and completion of data and instruction cache instructions on
page D4-1689.

Note
With software management of coherency, race conditions can cause loss of data. A race condition occurs
when different agents write simultaneously to bytes that are in the same location, and the invalidate, write,
clean sequence of one agent overlaps with the equivalent sequence of another agent. A race condition also
occurs if the first operation of either sequence is a clean, rather than an invalidate.
6.

If the mismatched attributes for a location mean that multiple cacheable accesses to the location might be
made with different shareability attributes, then coherency is guaranteed only if processing elements that
accesses the location with a cacheable attribute performs a clean and invalidate of the location before and
after accessing that location.

Note
The Note in rule 5 on page B2-99 about possible race conditions also applies to this rule.
In addition, if multiple agents attempt to use Load-Exclusive or Store-Exclusive instructions to access a location,
and the accesses from the different agents have different memory attributes associated with the location, the
exclusive monitor state becomes UNKNOWN.
ARM strongly recommends that software does not use mismatched attributes for aliases of the same location. An
implementation might not optimize the performance of a system that uses mismatched aliases.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

B2-99

B2 The AArch64 Application Level Memory Model
B2.10 Synchronization and semaphores

B2.10

Synchronization and semaphores
ARMv8 provides non-blocking synchronization of shared memory, using synchronization primitives. The
information in this section about memory accesses by synchronization primitives applies to accesses to both Normal
and Device memory.

Note
Use of the ARMv8 synchronization primitives scales for multiprocessing system designs.
Table B2-2 shows the synchronization primitives and the associated CLREX instruction.
Table B2-2 Synchronization primitives and associated instruction
Function

Instruction

Load-Exclusive
Paira

LDXP, LDAXP

Registera

LDXR, LDAXR

Halfword

LDXRH, LDAXRH

Byte

LDXRB, LDAXRB

Store-Exclusive
Paira

STXP, STLXP

Registera

STXR, STLXR

Halfword

STXRH, STLXRH

Byte

STXRB, STLXRB

Clear-Exclusive

CLREX

a. The instruction operates on a doubleword if accessing an
X register, or on a word if accessing a W register.

The model for the use of a Load-Exclusive/Store-Exclusive instruction pair accessing a non-aborting memory
address x is:
•

The Load-Exclusive instruction reads a value from memory address x.

•

The corresponding Store-Exclusive instruction succeeds in writing back to memory address x only if no other
observer, process, or thread has performed a more recent store to address x. The Store-Exclusive instruction
returns a status bit that indicates whether the memory write succeeded.

A Load-Exclusive instruction marks a small block of memory for exclusive access. The size of the marked block is
see Marking and the size of the marked memory block on page B2-105. A
Store-Exclusive instruction to any address in the marked block clears the marking.
IMPLEMENTATION DEFINED,

Note
In this section, the term PE includes any observer that can generate a Load-Exclusive or a Store-Exclusive
instruction.

B2-100

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

B2 The AArch64 Application Level Memory Model
B2.10 Synchronization and semaphores

B2.10.1

Exclusive access instructions and Non-shareable memory locations
For memory locations that do not have the Shareable attribute, the exclusive access instructions rely on a local
monitor that marks any address from which the PE executes a Load-Exclusive instruction. Any non-aborted attempt
by the same PE to use a Store-Exclusive instruction to modify any address is guaranteed to clear the marking.
A Load-Exclusive instruction performs a load from memory, and:
•
The executing PE marks the physical memory address for exclusive access.
•
The local monitor of the executing PE transitions to the Exclusive Access state.
A Store-Exclusive instruction performs a conditional store to memory that depends on the state of the local monitor:
If the local monitor is in the Exclusive Access state
•

If the address of the Store-Exclusive instruction is the same as the address that has been
marked in the monitor by an earlier Load-Exclusive instruction, then the store occurs.
Otherwise, it is IMPLEMENTATION DEFINED whether the store occurs.

•

A status value is returned to a register:
—
If the store took place the status value is 0.
—
Otherwise, the status value is 1.

•

The local monitor of the executing PE transitions to the Open Access state.

If the local monitor is in the Open Access state
•
No store takes place.
•
A status value of 1 is returned to a register.
•
The local monitor remains in the Open Access state.
The Store-Exclusive instruction defines the register to which the status value is returned.
When a PE writes using any instruction other than a Store-Exclusive instruction:
•

If the write is to a physical address that is not tagged by its local monitor it is IMPLEMENTATION DEFINED
whether the write affects the state of the local monitor.

•

If the write is to a physical address that is tagged by its local monitor it is IMPLEMENTATION DEFINED whether
the write affects the state of the local monitor.

It is IMPLEMENTATION DEFINED whether a store to a marked physical address causes a mark in the local monitor to
be cleared if that store is by an observer other than the one that caused the physical address to be marked.
Figure B2-4 shows the state machine for the local monitor and the effect of each of the operations shown in the
figure.
LoadExcl(x)

LoadExcl(x)

Open
Access
CLREX
StoreExcl(x)
Store(x)

Exclusive
Access
CLREX
Store(Marked_address)*
Store(!Marked_address)*
StoreExcl(Marked_address)
StoreExcl(!Marked_address)

Store(!Marked_address)
Store(Marked_address)*

Operations marked * are possible alternative IMPLEMENTATION DEFINED options.
In the diagram: LoadExcl represents any Load-Exclusive instruction
StoreExcl represents any Store-Exclusive instruction
Store represents any other store instruction.
Any LoadExcl operation updates the marked address to the most significant bits of the address x used for the operation.

Figure B2-4 Local monitor state machine diagram

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

B2-101

B2 The AArch64 Application Level Memory Model
B2.10 Synchronization and semaphores

For more information about marking see Marking and the size of the marked memory block on page B2-105.

Note
For the local monitor state machine, as shown in Figure B2-4 on page B2-101:
•

The IMPLEMENTATION DEFINED options for the local monitor are consistent with the local monitor being
constructed so that it does not hold any physical address, but instead treats any access as matching the address
of the previous Load-Exclusive instruction.

•

A local monitor implementation can be unaware of Load-Exclusive and Store-Exclusive instructions from
other PEs.

•

The architecture does not require a load instruction by another PE, that is not a Load-Exclusive instruction,
to have any effect on the local monitor.

•

It is IMPLEMENTATION DEFINED whether the transition from Exclusive Access to Open Access state occurs
when the Store or StoreExcl is from another observer.

Changes to the local monitor state resulting from speculative execution
The architecture permits a local monitor to transition to the Open Access state as a result of speculation, or from
some other cause. This is in addition to the transitions to Open Access state caused by the architectural execution
of an operation shown in Figure B2-4 on page B2-101.
An implementation must ensure that:

B2.10.2

•

The local monitor cannot be seen to transition to the Exclusive Access state except as a result of the
architectural execution of one of the operations shown in Figure B2-4 on page B2-101.

•

Any transition of the local monitor to the Open Access state not caused by the architectural execution of an
operation shown in Figure B2-4 on page B2-101 must not indefinitely delay forward progress of execution.

Exclusive access instructions and Shareable memory locations
For memory locations that have the Shareable attribute, exclusive access instructions rely on:
•

A local monitor for each PE in the system, that marks any address from which the PE executes a
Load-Exclusive. The local monitor operates as described in Exclusive access instructions and Non-shareable
memory locations on page B2-101, except that for Shareable memory any Store-Exclusive is then subject to
checking by the global monitor if it is described in that section as doing at least one of the following:
—
Updating memory.
—
Returning a status value of 0.
The local monitor can ignore accesses from other PEs in the system.

•

A global monitor that marks a physical address as exclusive access for a particular PE. This marking is used
later to determine whether a Store-Exclusive to that address that has not been failed by the local monitor can
occur. Any successful write to the marked block by any other observer in the shareability domain of the
memory location is guaranteed to clear the marking. For each PE in the system, the global monitor:
—
Can hold one marked block.
—
Maintains a state machine for each marked block it can hold.

Note
For each PE, the architecture only requires global monitor support for a single marked address. Any situation
that might benefit from the use of multiple marked addresses on a single PE is UNPREDICTABLE, see
Load-Exclusive and Store-Exclusive instruction usage restrictions on page B2-106.

B2-102

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

B2 The AArch64 Application Level Memory Model
B2.10 Synchronization and semaphores

Note
The global monitor can either reside within the PE, or exist as a secondary monitor at the memory interfaces.The
IMPLEMENTATION DEFINED aspects of the monitors mean that the global monitor and local monitor can be combined

into a single unit, provided that the unit performs the global monitor and local monitor functions defined in this
manual.
For Shareable locations of memory, in some implementations and for some memory types, the properties of the
global monitor require functionality outside the PE. Some system implementations might not implement this
functionality for all locations of memory. In particular, this can apply to:
•
Any type of memory in the system implementation that does not support hardware cache coherency.
•
Non-cacheable memory, or memory treated as Non-cacheable, in an implementation that does support
hardware cache coherency.
In such a system, it is defined by the system:
•
Whether the global monitor is implemented.
•
If the global monitor is implemented, which address ranges or memory types it monitors.

Note
To support the use of the Load-Exclusive/Store-Exclusive mechanism when address translation is disabled, a
system might define at least one location of memory, of at least the size of the translation granule, in the system
memory map to support the global monitor for all ARM PEs within a common Inner Shareable domain. However,
this is not an architectural requirement. Therefore, architecturally-compliant software that requires mutual
exclusion must not rely on using the Load-Exclusive/Store-Exclusive mechanism, and must instead use a software
algorithm such as Lamport’s Bakery algorithm to achieve mutual exclusion.
If the global monitor is not implemented for an address range or memory type, then performing a
Load-Exclusive/Store-Exclusive instruction to such a location has one or more of the following effects:
•
The instruction generates an external abort.
•
The instruction generates an IMPLEMENTATION DEFINED MMU fault. This is reported using the Fault Status
code of ESR_ELx.DFSC = 110101.
•
The instruction is treated as a NOP.
•
The Load-Exclusive instruction is treated as if it were accessing a Non-shareable location, but the state of the
local monitor becomes UNKNOWN.
•
The Store-Exclusive instruction is treated as if it were accessing a Non-shareable location, but the state of the
local monitor becomes UNKNOWN.
•
The value held in the result register of the Store-Exclusive instruction becomes UNKNOWN.
In addition, for write transactions generated by non-PE observers that do not implement exclusive accesses or other
atomic access mechanisms, the effect that writes have on the global and local monitors used by ARM PEs is
IMPLEMENTATION DEFINED. The writes might not clear the global monitors of other PEs for:
•
Some address ranges.
•
Some memory types.

Operation of the global monitor
A Load-Exclusive instruction from Shareable memory performs a load from memory, and causes the physical
address of the access to be marked as exclusive access for the requesting PE. This access also causes the exclusive
access mark to be removed from any other physical address that has been marked by the requesting PE.

Note
The global monitor only supports a single outstanding exclusive access to Shareable memory per PE.
A Load-Exclusive instruction by one PE has no effect on the global monitor state for any other PE.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

B2-103

B2 The AArch64 Application Level Memory Model
B2.10 Synchronization and semaphores

A Store-Exclusive instruction performs a conditional store to memory:
•

•

•

The store is guaranteed to succeed only if the physical address accessed is marked as exclusive access for the
requesting PE and both the local monitor and the global monitor state machines for the requesting PE are in
the Exclusive Access state. In this case:
—

A status value of 0 is returned to a register to acknowledge the successful store.

—

The final state of the global monitor state machine for the requesting PE is IMPLEMENTATION DEFINED.

—

If the address accessed is marked for exclusive access in the global monitor state machine for any other
PE then that state machine transitions to Open Access state.

If no address is marked as exclusive access for the requesting PE, the store does not succeed:
—
A status value of 1 is returned to a register to indicate that the store failed.
—
The global monitor is not affected and remains in Open Access state for the requesting PE.
If a different physical address is marked as exclusive access for the requesting PE, it is IMPLEMENTATION
whether the store succeeds or not:

DEFINED

—

If the store succeeds a status value of 0 is returned to a register, otherwise a value of 1 is returned.

—

If the global monitor state machine for the PE was in the Exclusive Access state before the
Store-Exclusive instruction it is IMPLEMENTATION DEFINED whether that state machine transitions to
the Open Access state.

The Store-Exclusive instruction defines the register to which the status value is returned.
In a shared memory system, the global monitor implements a separate state machine for each PE in the system. The
state machine for accesses to Shareable memory by PE(n) can respond to all the Shareable memory accesses visible
to it. This means it responds to:
•
Accesses generated by PE(n).
•
Accesses generated by the other observers in the shareability domain of the memory location. These accesses
are identified as (!n).
In a shared memory system, the global monitor implements a separate state machine for each observer that can
generate a Load-Exclusive or a Store-Exclusive instruction in the system.
Clear global monitor event
Whenever the global monitor state for a PE changes from Exclusive access to Open access, an event is generated
and held in the Event register for that PE. This register is used by the Wait for Event mechanism, see Mechanisms
for entering a low-power state on page D1-1533.
Figure B2-5 on page B2-105 shows the state machine for PE(n) in a global monitor.

B2-104

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

B2 The AArch64 Application Level Memory Model
B2.10 Synchronization and semaphores

LoadExcl(x,n)
Open
Access
CLREX(n)
CLREX(!n)
LoadExcl(x,!n)
StoreExcl(x,n)
StoreExcl(x,!n)
Store(x,n)
Store(x,!n)

LoadExcl(x,n)
Exclusive
Access

StoreExcl(Marked_address,!n)‡
Store(Marked_address,!n)
StoreExcl(Marked_address,n)*
StoreExcl(!Marked_address,n)*
Store(Marked_address,n)*
CLREX(n)*

StoreExcl(Marked_address,!n)‡
Store(!Marked_address,n)
StoreExcl(Marked_address,n)*
StoreExcl(!Marked_address,n)*
Store(Marked_address,n)*
CLREX(n)*
StoreExcl(!Marked_address,!n)
Store(!Marked_address,!n)
CLREX(!n)

‡StoreExcl(Marked_address,!n) clears the monitor only if the StoreExcl updates memory
Operations marked * are possible alternative IMPLEMENTATION DEFINED options.
In the diagram: LoadExcl represents any Load-Exclusive instruction
StoreExcl represents any Store-Exclusive instruction
Store represents any other store instruction.
Any LoadExcl operation updates the marked address to the most significant bits of the address x used for the operation.

Figure B2-5 Global monitor state machine diagram for PE(n) in a multiprocessor system
For more information about marking see Marking and the size of the marked memory block.

Note
For the global monitor state machine, as shown in Figure B2-5:

B2.10.3

•

The architecture does not require a load instruction by another PE, that is not a Load-Exclusive instruction,
to have any effect on the global monitor.

•

Whether a Store-Exclusive instruction successfully updates memory or not depends on whether the address
accessed matches the marked Shareable memory address for the PE issuing the Store-Exclusive instruction,
and whether the local and global monitors are in the exclusive state. For this reason, Figure B2-5 only shows
how the operations by (!n) cause state transitions of the state machine for PE(n).

•

A Load-Exclusive instruction can only update the marked Shareable memory address for the PE issuing the
Load-Exclusive instruction.

•

When the global monitor is in the Exclusive Access state, it is IMPLEMENTATION DEFINED whether a CLREX
instruction causes the global monitor to transition from Exclusive Access to Open Access state.

•

It is IMPLEMENTATION DEFINED:
—

Whether a modification to a Non-shareable memory location can cause a global monitor to transition
from Exclusive Access to Open Access state.

—

Whether a Load-Exclusive instruction to a Non-shareable memory location can cause a global monitor
to transition from Open Access to Exclusive Access state.

Marking and the size of the marked memory block
When a Load-Exclusive instruction is executed, the resulting marked block ignores the least significant bits of the
64-bit memory address.
When a LDXR instruction is executed, a marked block of size 2a is created by ignoring the least significant bits of the
memory address. A marked address is any address within this marked block. For example, in an implementation
where a is 4, a successful LDXRB of address 0x341B4 defines a marked block using bits[47:4] of the address. This
means that the four words of memory from 0x341B0 to 0x341BF are marked for exclusive access.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

B2-105

B2 The AArch64 Application Level Memory Model
B2.10 Synchronization and semaphores

The size of the marked memory block is called the Exclusives Reservation Granule. The Exclusives Reservation
Granule is IMPLEMENTATION DEFINED in the range 2 - 512 words:
•
3 words in an implementation where a is 4.
•
512 words in an implementation where a is 11.
In some implementations the CTR identifies the Exclusives Reservation Granule, see CTR_EL0. Otherwise,
software must assume that the maximum Exclusives Reservation Granule, 512 words, is implemented.

B2.10.4

Context switch support
An exception return clears the local monitor. As a result, performing a CLREX instruction as part of a context switch
is not required in most situations.

Note
Context switching is not an application level operation. However, this information is included here to complete the
description of the exclusive operations.

B2.10.5

Load-Exclusive and Store-Exclusive instruction usage restrictions
The Load-Exclusive and Store-Exclusive instructions are intended to work together as a pair, for example a
LDXP/STXP pair or a LDXR/STXR pair. To support different implementations of these functions, software must follow
the notes and restrictions given here.
The following notes describe use of a Load-Exclusive/Store-Exclusive pair, LoadExcl/StoreExcl, to indicate the use
of any of the Load-Exclusive/Store-Exclusive instruction pairs shown in Table B2-2 on page B2-100:
•

The exclusives support a single outstanding exclusive access for each PE thread that is executed. The
architecture makes use of this by not requiring an address or size check as part of the IsExclusiveLocal()
function. If the target virtual address of a StoreExcl is different from the virtual address of the preceding
LoadExcl instruction in the same thread of execution, behavior can be UNPREDICTABLE. As a result, a
LoadExcl/StoreExcl pair can only be relied upon to eventually succeed if the LoadExcl and the StoreExcl are
executed with the same address.

•

If two StoreExcl instructions are executed without an intervening LoadExcl instruction the second StoreExcl
instruction returns a status value of 1. This means that:
—
ARM recommends that, in a given thread of execution, every StoreExcl instruction has a preceding
LoadExcl instruction associated with it.
It is not necessary for every LoadExcl instruction to have a subsequent StoreExcl instruction.

B2-106

•

An implementation of the Load-Exclusive and Store-Exclusive instructions can require that, in any thread of
execution, the transaction size of a Store-Exclusive instruction is the same as the transaction size of the
preceding Load-Exclusive instruction executed in that thread. If the transaction size of a Store-Exclusive
instruction is different from the preceding Load-Exclusive instruction in the same thread of execution,
behavior can be UNPREDICTABLE. As a result, software can rely on an LoadExcl/StoreExcl pair to eventually
succeed only if they have the same size.

•

An implementation might clear an exclusive monitor between the LoadExcl instruction and the StoreExcl,
instruction without any application-related cause. For example, this might happen because of cache evictions.
Software must, in any single thread of execution, avoid having any explicit memory accesses or cache
maintenance instructions between the LoadExcl instruction and the associated StoreExcl instruction.

•

Implementations can benefit from keeping the LoadExcl and StoreExcl operations close together in a single
thread of execution. This minimizes the likelihood of the exclusive monitor state being cleared between the
LoadExcl instruction and the StoreExcl instruction. Therefore, for best performance, ARM strongly
recommends a limit of 128 bytes between LoadExcl and StoreExcl instructions in a single thread of execution.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

B2 The AArch64 Application Level Memory Model
B2.10 Synchronization and semaphores

•

The architecture sets an upper limit of 2048 bytes on the exclusive reservation granule that can be marked as
exclusive. For performance reasons, ARM recommends that objects that are accessed by exclusive accesses
are separated by the size of the exclusive reservations granule. This is a performance guideline rather than a
functional requirement.

•

After taking a Data Abort exception, the state of the exclusive monitors is UNKNOWN.

•

If the memory attributes for the memory being accessed by a LoadExcl/StoreExcl pair are changed between
the LoadExcl instruction and the StoreExcl instruction, behavior is UNPREDICTABLE.

•

The effect of a cache invalidation instruction on a local or global exclusive monitor that is in the Exclusive
Access state is UNPREDICTABLE. The instruction might clear the monitor, or it might leave it in the Exclusive
Access state. For address-based invalidation this also applies to the monitors of other PEs in the same
shareability domain as the PE executing the cache invalidation instruction, as determined by the shareability
domain of the address being invalidated.

Note
ARM strongly recommends that implementations ensure that the use of such maintenance instructions by a
PE in the Non-secure state cannot cause a denial of service on a PE in the Secure state.

Note
In the event of repeatedly-contending Load-Exclusive/Store-Exclusive instruction sequences from multiple PEs, an
implementation must ensure that forward progress is made by at least one PE.

B2.10.6

Use of WFE and SEV instructions by spin-locks
ARMv8 provides Wait For Event, Send Event, and Send Event Local instructions, WFE, SEV, and SEVL, that can assist
with reducing power consumption and bus contention caused by PEs repeatedly attempting to obtain a spin-lock.
These instructions can be used at the application level, but a complete understanding of what they do depends on a
system level understanding of exceptions. They are described in Wait for Event mechanism and Send event on
page D1-1533. However, in ARMv8, when the global monitor for a PE changes from Exclusive Access state to
Open Access state, an event is generated.

Note
This is equivalent to issuing an SEV instruction on the PE for which the monitor state has changed. It removes the
need for spinlock code to include an SEV instruction after clearing a spinlock.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

B2-107

B2 The AArch64 Application Level Memory Model
B2.10 Synchronization and semaphores

B2-108

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

Part C
The AArch64 Instruction Set

Chapter C1
The A64 Instruction Set

This chapter describes the A64 instruction set. It contains the following sections:
•
Introduction on page C1-112.
•
Structure of the A64 assembler language on page C1-113.
•
Address generation on page C1-118.
•
Instruction aliases on page C1-121.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C1-111

C1 The A64 Instruction Set
C1.1 Introduction

C1.1

Introduction
The instruction set supported in the AArch64 execution state is known as A64.
All A64 instructions have a width of 32 bits. The A64 encoding structure breaks down into the following functional
groups:
•

A miscellaneous group of branch instructions, exception generating instructions, and system instructions.

•

Data processing instructions associated with general-purpose registers. These instructions are supported by
two functional groups, depending on whether the operands:
—

Are all held in registers.

—

Include an operand with a constant immediate value.

•

Load and store instructions associated with the general-purpose register file and the SIMD and floating-point
register file.

•

SIMD and scalar floating-point data processing instructions that operate on the SIMD and floating-point
registers.

The encoding hierarchy within a functional group breaks down as follows:
•

A functional group consists of a set of related instruction classes. A64 instruction index by encoding on
page C3-172 provides an overview of the instruction encodings in the form of a list of instruction classes
within their functional groups.

•

An instruction class consists of a set of related instruction forms. Instruction forms are documented in one of
two alphabetic lists:

•

—

The load, store, and data processing instructions associated with the general-purpose registers,
together with those in the other instruction classes. See Chapter C5 A64 Base Instruction Descriptions.

—

The load, store, and data processing instructions associated with the SIMD and floating-point support.
See Chapter C6 A64 SIMD and Floating-point Instruction Descriptions.

An instruction form might support a single instruction syntax. Where an instruction supports more than one
syntax, each syntax is an instruction variant. Instruction variants can occur because of differences in:
—

The size or format of the operands.

—

The register file used for the operands.

—

The addressing mode used for load/load/store memory operands.

Instruction variants might also arise as the result of other factors.
Instruction variants are described in the instruction description for the individual instructions.
A64 instructions have a regular bit encoding structure:
•

5-bit register operand fields at fixed positions within the instruction. For general-purpose register operands,
the values 0-30 select one of 31 registers. The value 31 is used as a special case that can:
—

Indicate use of the current stack pointer, when identifying a load/store base register or in a limited set
of data processing instructions. See The stack pointer registers on page D1-1416.

—

Indicate the value zero when used as a source register operand.

—

Indicate discarding the result when used as a destination register operand.

For SIMD and floating-point register access, the value used selects one of 32 registers.
•

Immediate bits that provide constant data processing values or address offsets are placed in contiguous bit
fields. Some computed values in instruction variants use one or more immediate bit fields together with the
secondary encoding bit fields.

All encodings that are not fully defined are described as UNALLOCATED. An attempt to execute an UNALLOCATED
instruction results in an Undefined Instruction exception, unless otherwise defined in the Exception model.

C1-112

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C1 The A64 Instruction Set
C1.2 Structure of the A64 assembler language

C1.2

Structure of the A64 assembler language
The letter W denotes a general-purpose register holding a 32-bit word, and X denotes a general-purpose register
holding a 64-bit doubleword.
An A64 assembler recognizes both upper-case and lower-case variants of the instruction mnemonics and register
names, but not mixed case variants. An A64 disassembler can output either upper-case or lower-case mnemonics
and register names. Program and data labels are case-sensitive.
The A64 assembly language does not require the # character to introduce constant immediate operands, but an
assembler must allow immediate values introduced with or without the # character. ARM recommends that an A64
disassembler outputs a # before an immediate operand.
In Example C1-1 on page C1-114 the sequence // is used as a comment leader and A64 assemblers are encouraged
to accept this syntax.

C1.2.1

Common syntax terms
The following syntax terms are used frequently throughout the A64 instruction set description.

C1.2.2

UPPER

Text in upper-case letters is fixed. Text in lower-case letters is variable. This means that register
name Xn indicates that the X is required, followed by a variable register number, for example X29.

< >

Any text enclosed by angle braces, < >, is a value that the user supplies. Subsequent text might
supply additional information.

{ }

Any item enclosed by curly brackets, { }, is optional. A description of the item and how its presence
or absence affects the instruction is normally supplied by subsequent text. In some cases curly
braces are actual symbols in the syntax, for example when they surround a register list. These cases
are called out in the surrounding text.

[ ]

Any items enclosed by square brackets, [ ], constitute a list of alternative characters. A single one
of the characters can be used in that position and the subsequent text describes the meaning of the
alternatives. In some case the square brackets are part of the syntax itself, such as addressing modes
or vector elements. These cases are called out in the surrounding text.

a|b

Alternative words are separated by a vertical bar, |, and can be surrounded by parentheses to delimit
them. For example, U(ADD|SUB)W represents UADDW or USUBW.

±

This indicates an optional + or - sign. If neither is used then + is assumed.

uimmn

An n-bit unsigned, positive, immediate value.

simmn

An n-bit two’s complement, signed immediate value, where n includes the sign bit.

SP

See Register names on page C1-114.

Wn

See Register names on page C1-114.

WSP

See Register names on page C1-114.

WZR

See Register names on page C1-114.

Xn

See Register names on page C1-114.

XZR

See Register names on page C1-114

Instruction Mnemonics
The A64 assembly language overloads instruction mnemonics and distinguishes between the different forms of an
instruction based on the operand types. For example, the following ADD instructions all have different opcodes.
However, the programmer must only remember one mnemonic, as the assembler automatically chooses the correct
opcode based on the operands. The disassembler follows the same procedure in reverse.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C1-113

C1 The A64 Instruction Set
C1.2 Structure of the A64 assembler language

Example C1-1 ADD instructions with different opcodes

ADD
ADD
ADD
ADD

C1.2.3

W0,
X0,
X0,
X0,

W1,
X1,
X1,
X1,

W2
X2
W2, SXTW
#42

//
//
//
//

add
add
add
add

32-bit
64-bit
64-bit
64-bit

register
register
extended register
immediate

Condition Code
The A64 ISA has some instructions that set condition flags or test condition codes or both. For information about
instructions that set the condition flags or use the condition mnemonics, see Condition flags and related instructions
on page C5-390.
Table C1-1 shows the available condition codes.
Table C1-1 Condition codes

cond

Mnemonic

Meaning (integer)

Meaning (floating-point)a

Condition flags

0000

EQ

Equal

Equal

Z == 1

0001

NE

Not equal

Not equal or unordered

Z == 0

0010

CS or HS

Carry set

Greater than, equal, or unordered

C == 1

0011

CC or LO

Carry clear

Less than

C == 0

0100

MI

Minus, negative

Less than

N == 1

0101

PL

Plus, positive or zero

Greater than, equal, or unordered

N == 0

0110

VS

Overflow

Unordered

V == 1

0111

VC

No overflow

Ordered

V == 0

1000

HI

Unsigned higher

Greater than, or unordered

C ==1 && Z == 0

1001

LS

Unsigned lower or same

Less than or equal

!(C ==1 && Z ==0)

1010

GE

Signed greater than or equal

Greater than or equal

N == V

1011

LT

Signed less than

Less than, or unordered

N! = V

1100

GT

Signed greater than

Greater than

Z == 0 && N == V

1101

LE

Signed less than or equal

Less than, equal, or unordered

!(Z == 0 && N == V)

1110

AL

Always

Always

Any

1111

NVb

Always

Always

Any

a. Unordered means at least one NaN operand.
b. The condition code NV exists only to provide a valid disassembly of the 0b1111 encoding, otherwise its behavior is identical to AL.

C1.2.4

Register names
This section describes the AArch64 registers. It contains the following subsections:
•
General-purpose register file and the stack pointer on page C1-115.
•
SIMD and floating-point register file on page C1-115.
•
SIMD and floating-point scalar register names on page C1-116.
•
SIMD vector register names on page C1-116.

C1-114

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C1 The A64 Instruction Set
C1.2 Structure of the A64 assembler language

•

SIMD vector element names on page C1-116.

General-purpose register file and the stack pointer
The 31 general-purpose registers in the general-purpose register file are named R0-R30 and encoded in the
instruction register fields with values 0-30. A general-purpose register field that encodes the value 31 represents
represents either the current stack pointer or the zero register, depending on the instruction and the operand position.
When the registers are used in a specific instruction variant, they must be qualified to indicate the operand data size,
32 bits or 64 bits, and the data size of the instruction.
When the data size is 32 bits, the lower 32 bits of the register are used and the upper 32 bits are ignored on a read
and cleared to zero on a write.
Table C1-2 shows the qualified names for registers, where n is a register number 0-30.
Table C1-2 General-purpose register names
Name

Size

Encoding

Description

Wn

32 bits

0-30

General-purpose register 0-30

Xn

64 bits

0-30

General-purpose register 0-30

WZR

32 bits

31

Zero register

XZR

64 bits

31

Zero register

WSP

32 bits

31

Current stack pointer

SP

64 bits

31

Current stack pointer

The following list provides further details relating to Table C1-2.
•

The names Xn and Wn both refer to the same general-purpose register, Rn.

•

There is no register named W31 or X31.

•

The name SP represents the stack pointer for 64-bit operands where an encoding of the value 31 in the
corresponding register field is interpreted as a read or write of the current stack pointer. When instructions
do not interpret this operand encoding as the stack pointer, use of the name SP is an error.

•

The name WSP represents the current stack pointer in a 32-bit context.

•

The name XZR represents the zero register for 64-bit operands where an encoding of the value 31 in the
corresponding register field is interpreted as returning zero when read or discarding the result when written.
When instructions do not interpret this operand encoding as the zero register, use of the name XZR is an error.

•

The name WZR represents the zero register in a 32-bit context.

•

The architecture does not define a special name for general-purpose register R30 that reflects its special role
as the link register on procedure calls. An A64 assembler must always use W30 and X30. Additional software
names might be defined as part of the Procedure Call Standard, see Procedure Call Standard for the ARM
64-bit Architecture.

SIMD and floating-point register file
The 32 registers in the SIMD and floating-point register file, V0-V31, hold floating-point operands for the scalar
floating-point instructions, and both scalar and vector operands for the SIMD instructions. When they are used in a
specific instruction form, the names must be further qualified to indicate the data shape, that is the data element size
and the number of elements or lanes within the register. A similar requirement is placed on the general-purpose
registers. See General-purpose register file and the stack pointer.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C1-115

C1 The A64 Instruction Set
C1.2 Structure of the A64 assembler language

Note
The data type is described by the instruction mnemonics that operate on the data. The data type is not described by
the register name. The data type is the interpretation of bits within each register or vector element, whether these
are integers, floating-point values, polynomials or cryptographic hashes.

SIMD and floating-point scalar register names
SIMD and floating-point instructions that operate on scalar data only access the lower bits of a SIMD and
floating-point register. The unused high bits are ignored on a read and cleared to 0 on a write.
Table C1-3 shows the qualified names for accessing scalar SIMD and floating-point registers. The letter n denotes
a register number between 0 and 31.
Table C1-3 SIMD and floating-point scalar register names
Size

Name

8 bits

Bn

16 bits

Hn

32 bits

Sn

64 bits

Dn

128 bits

Qn

SIMD vector register names
If a register holds multiple data elements on which arithmetic is performed in a parallel, SIMD, manner, then a
qualifier describes the vector shape. The vector shape is the element size and the number of elements or lanes. If the
element size in bits multiplied by the number of lanes does not equal 128, then the upper 64 bits of the register are
ignored on a read and cleared to zero on a write.
Table C1-4 shows the SIMD vector register names. The letter n denotes a register number between 0 and 31.
Table C1-4 SIMD vector register names
Shape

Name

8 bits × 8 lanes

Vn.8B

8 bits × 16 lanes

Vn.16B

16 bits × 4 lanes

Vn.4H

16 bits × 8 lanes

Vn.8H

32 bits × 2 lanes

Vn.2S

32 bits × 4 lanes

Vn.4S

64 bits × 1 lane

Vn.1D

64 bits × 2 lanes

Vn.2D

SIMD vector element names
Appending a constant, zero-based element index to the register name inside square brackets indicates that a single
element from a SIMD and floating-point register is used as a scalar operand. The number of lanes is not represented,
as it is not encoded in the instruction and can only be inferred from the index value.
C1-116

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C1 The A64 Instruction Set
C1.2 Structure of the A64 assembler language

Table C1-5 shows the vector register names and the element index. The letter i denotes the element index.
Table C1-5 Vector register names with element index
Size

Name

8 bits

Vn.B[i]

16 bits

Vn.H[i]

32 bits

Vn.S[i]

64 bits

Vn.D[i]

An assembler must accept a fully qualified SIMD register name, if the number of lanes is greater than the index
value. See SIMD vector register names on page C1-116. For example, an assembler must accept all of the following
forms as the name for the 32-bit element in bits [63:32] of the SIMD and floating-point register V9:
V9.S[1]
V9.2S[1]
V9.4S[1]

//standard disassembly
//optional number of lanes
//optional number of lanes

Note
The SIMD and floating-point register element name Vn.S[0] is not equivalent to the scalar SIMD and floating-point
register name Sn. Although they represent the same bits in the register, they select different instruction encoding
forms, either the vector element or the scalar form.

SIMD vector register list
Where an instruction operates on multiple SIMD and floating-point registers, for example vector Load/Store
structure and table lookup operations, the registers are specified as a list enclosed by curly braces. This list consists
of either a sequence of registers separated by commas, or a register range separated by a hyphen. The registers must
be numbered in increasing order, modulo 32, in increments of one. The hyphenated form is preferred for
disassembly if there are more than two registers in the list and the register number are increasing. The following
examples are equivalent representations of a set of four registers V4 to V7, each holding four lanes of 32-bit elements:
{ V4.4S - V7.4S }
//standard disassembly
{ V4.4S, V5.4S, V6.4S, V7.4S } //alternative representation

SIMD vector element list
Registers in a list can also have a vector element form. For example, the LD4 instruction can load one element into
each of four registers, and in this case the index is appended to the list as follows:
{ V4.S - V7.S }[3]
{ V4.4S, V5.4S, V6.4S, V7.4S }[3]

ARM DDI 0487A.a
ID090413

//standard disassembly
//alternative with optional number of lanes

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C1-117

C1 The A64 Instruction Set
C1.3 Address generation

C1.3

Address generation
The A64 instruction set supports 64-bit addresses. The valid address range is determined by the following factors:
•
The size of the implemented virtual address space.
•
Memory Management Unit (MMU) configuration settings.
The top 8 bits of the 64-bit address can be used as a tag, see Address tagging in AArch64 state on page D5-1708.
For more information on memory management and address translation, see Chapter D5 The AArch64 Virtual
Memory System Architecture.

C1.3.1

Register indexed addressing
The A64 instruction set allows a 64-bit index register to be added to the 64-bit base register, with optional scaling
of the index by the access size. Additionally it allows for sign-extension or zero-extension of a 32-bit value within
an index register, followed by optional scaling.

C1.3.2

PC-relative addressing
The A64 instruction set has support for position-independent code and data addressing:
•

PC-relative literal loads have an offset range of ± 1MB.

•

Process state flag and compare based conditional branches have a range of ± 1MB. Test bit conditional
branches have a restricted range of ± 32KB.

•

Unconditional branches, including branch and link, have a range of ± 128MB.

PC-relative Load/Store operations, and address generation with a range of ± 4GB can be performed using two
instructions.

C1.3.3

Load/Store addressing modes
Load/Store addressing modes in the A64 instruction set require a 64-bit base address from a general-purpose register
X0-X30 or the current stack pointer, SP, with an optional immediate or register offset. Table C1-6 shows the
assembler syntax for the complete set of Load/Store addressing modes.

Table C1-6 A64 Load/Store addressing modes
Offset
Addressing Mode
Immediate
Base register only (no
offset)

[base{, #0}]

Base plus offset

[base{, #imm}]

Pre-indexed

[base, #imm]!

Post-indexed

[base], #imm

Literal (PC-relative)

label

Register

Extended Register
-

[base, Xm{, LSL #imm}]

-

[base, Wm, (S|U)XTW {#imm}]

-

[base], Xma

-

-

a. The post-indexed by register offset mode can be used with the SIMD Load/Store structure instructions described in
Load/Store Vector on page C2-137. Otherwise the post-indexed by register offset mode is not available.

C1-118

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C1 The A64 Instruction Set
C1.3 Address generation

Some types of Load/Store instruction support only a subset of the Load/Store addressing modes listed in Table C1-6
on page C1-118. Details of the supported modes are as follows:
•

Base plus offset addressing means that the address is the value in the 64-bit base register plus an offset.

•

Pre-indexed addressing means that the address is the sum of the value in the 64-bit base register and an offset,
and the address is then written back to the base register.

•

Post-indexed addressing means that the address is the value in the 64-bit base register, and the sum of the
address and the offset is then written back to the base register.

•

Literal addressing means that the address is the value of the 64-bit program counter for this instruction plus
a 19-bit signed word offset. This means that it is a 4 byte aligned address within ±1MB of the address of this
instruction with no offset. Literal addressing can only be used for loads of at least 32 bits and for prefetch
instructions. The PC cannot be referenced using any other addressing modes. The syntax for labels is specific
to individual toolchains.

•

An immediate offset can be unsigned or signed, and scaled or unscaled, depending on the type of Load/Store
instruction. When the immediate offset is scaled it is encoded as a multiple of the transfer size, although the
assembly language always uses a byte offset, and the assembler or disassembler performs the necessary
conversion. The usable byte offsets therefore depend on the type of Load/Store instruction and the transfer
size.
Table C1-7 shows the offset and the type of Load/Store instruction.
Table C1-7 Immediate offsets and the type of Load/Store instruction
Offset bits
0

Sign
-

Scaling

Write-Back

-

-

Load/Store type
Exclusive/acquire/release

7

Signed

Scaled

Optional

Register pair

9

Signed

Unscaled

Optional

Single register

12

Unsigned

Scaled

No

Single register

•

A register offset means that the offset is the 64 bits from a general-purpose register, Xm, optionally scaled
by the transfer size, in bytes, if LSL #imm is present and where imm must be equal to log2(transfer_size).

•

An extended register offset means that offset is the bottom 32 bits from a general-purpose register Wm,
sign-extended or zero-extended to 64 bits, and then scaled by the transfer size if so indicated by #imm, where
imm must be equal to log2(transfer_size). An assembler must accept Wm or Xm as an extended register
offset, but Wm is preferred for disassembly.

•

Generating an address lower than the value in the base register requires a negative signed immediate offset
or a register offset holding a negative value.

•

When stack alignment checking is enabled by system software and the base register is the SP, the current
stack pointer must be initially quadword aligned, that is aligned to 16 bytes. Misalignment generates a Stack
Alignment fault. The offset does not have to be a multiple of 16 bytes unless the specific Load/Store
instruction requires this. SP can not be used as a register offset.

Address calculation
General-purpose arithmetic instructions can calculate the result of most addressing modes and write the address to
a general-purpose register or, in most cases, to the current stack pointer.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C1-119

C1 The A64 Instruction Set
C1.3 Address generation

Table C1-8 shows the arithmetic instructions that can compute addressing modes.
Table C1-8 Arithmetic instructions to compute addressing modes
Addressing
Form

Offset
Immediate

Register

Base register (no
offset)

MOV Xd|SP, base

Base plus offset

ADD Xd|SP, base, #imm

Extended Register
-

-

ADD , base, Xm{,LSL#imm}

ADD , base, Wm,(S|U)XT(W|H|B|) {#imm}

or
SUB Xd|SP, base, #imm

Pre-indexed

-

-

-

Post-indexed

-

-

-

-

-

Literal
(PC-relative)

ADR Xd, label

Note
•

To calculate a base plus immediate offset the ADD instructions defined in Arithmetic (immediate) on
page C2-140 accept an unsigned 12-bit immediate offset, with an optional left shift by 12. This means that a
single ADD instruction cannot support the full range of byte offsets available to a single register Load/Store
with a scaled 12-bit immediate offset. For example, a quadword LDR effectively has a 16-bit byte offset. To
calculate an address with a byte offset that requires more than 12 bits it is necessary to use two ADD
instructions. The following example shows this:
ADD Xd, base, #(imm & 0xFFF)
ADD Xd, Xd, #(imm>>12), LSL #12

•

To calculate a base plus extended register offset, the ADD instructions defined in Arithmetic (extended register)
on page C2-145 provide a superset of the addressing mode that also supports sign-extension or
zero-extension of a byte or halfword value with any shift amount between 0 and 4, for example:
ADD
ADD

•

Xd, base, Wm, SXTW #3
Xd, base, Wm, UXTH #4

If the same extended register offset is used by more than one Load/Store instruction, then, depending on the
implementation, it might be more efficient to calculate the extended and scaled intermediate result just once,
and then re-use it as a simple register offset. The extend and scale calculation can be performed using the
SBFIZ and UBFIZ bitfield instructions defined in Bitfield move on page C2-142, for example:
SBFIZ Xd, Xm, #3, #32
UBFIZ Xd, Xm, #4, #16

C1-120

// Xd = base + (SignExtend(Wm) LSL 3)
// Xd = base + (ZeroExtend(Wm<15:0>) LSL 4)

//Xd = “Wm, SXTW #3”
//Xd = “Wm, UXTH #4”

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C1 The A64 Instruction Set
C1.4 Instruction aliases

C1.4

Instruction aliases
Some instructions have an associated architecture alias that is used for disassembly of the encoding when the
associated conditions are met. Architecture alias instructions are included in the alphabetic lists of instruction types
and clearly presented as an alias form in descriptions for the individual instructions.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C1-121

C1 The A64 Instruction Set
C1.4 Instruction aliases

C1-122

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

Chapter C2
A64 Instruction Set Overview

This chapter provides an overview of the A64 instruction set. It contains the following sections:
•
Branches, Exception generating, and System instructions on page C2-124.
•
Loads and stores on page C2-129.
•
Data processing - immediate on page C2-140.
•
Data processing - register on page C2-145.
•
Data processing - SIMD and floating-point on page C2-152.
For a structured breakdown of instruction groups by encoding, see Chapter C3 A64 Instruction Set Encoding.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C2-123

C2 A64 Instruction Set Overview
C2.1 Branches, Exception generating, and System instructions

C2.1

Branches, Exception generating, and System instructions
This section describes the branch, exception generating, and system instructions. It contains the following
subsections:
•
Conditional branch.
•
Unconditional branch (immediate).
•
Unconditional branch (register) on page C2-125.
•
Exception generation and return on page C2-125.
•
System register instructions on page C2-126.
•
System instructions on page C2-126.
•
Hint instructions on page C2-127.
•
Barriers and CLREX instructions on page C2-127.
For information about the encoding structure of the instructions in this instruction group, see Branches, exception
generating and system instructions on page C3-173.

Note
Software must:

C2.1.1

•

Use only BLR or BL to perform a nested subroutine call when that subroutine is expected to return to the
immediately following instruction, that is, the instruction with the address of the BLR or BL instruction
incremented by four.

•

Use only RET to perform a subroutine return, when that subroutine is expected to have been entered by a BL
or BLR instruction.

•

Use only B, BR, or the instructions listed in Table C2-1 to perform a control transfer that is not a subroutine
call or subroutine return described in this Note.

Conditional branch
Conditional branches change the flow of execution depending on the current state of the condition flags or the value
in a general-purpose register. See Table C1-1 on page C1-114 for a list of the condition codes that can be used for
cond.
Table C2-1 shows the Conditional branch instructions.
Table C2-1 Conditional branch instructions

Mnemonic

Instruction

Branch offset range from the PC

See

B.cond

Branch conditionally

±1MB

B.cond on page C5-420

CBNZ

Compare and branch if nonzero

±1MB

CBNZ on page C5-434

CBZ

Compare and branch if zero

±1MB

CBZ on page C5-435

TBNZ

Test bit and branch if nonzero

±32KB

TBNZ on page C5-754

TBZ

Test bit and branch if zero

±32KB

TBZ on page C5-755

C2.1.2

Unconditional branch (immediate)
Unconditional branch (immediate) instructions change the flow of execution unconditionally by adding an
immediate offset with a range of ±128MB to the value of the program counter that fetched the instruction. The BL
instruction also writes the address of the sequentially following instruction to general-purpose register, X30.

C2-124

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C2 A64 Instruction Set Overview
C2.1 Branches, Exception generating, and System instructions

Table C2-2 shows the Unconditional branch instructions with an immediate branch offset.
Table C2-2 Unconditional branch instructions (immediate)
Mnemonic

Instruction

Immediate branch offset range
from the PC

See

B

Branch unconditionally

±128MB

B on page C5-421

BL

Branch with link

±128MB

BL on page C5-430

C2.1.3

Unconditional branch (register)
Unconditional branch (register) instructions change the flow of execution unconditionally by setting the program
counter to the value in a general-purpose register. The BLR instruction also writes the address of the sequentially
following instruction to general-purpose register X30. The RET instruction behaves identically to BR, but provides an
additional hint to the PE that this is a return from a subroutine.Table C2-3 shows Unconditional branch instructions
that jump directly to an address held in a general-purpose register.
Table C2-3 Unconditional branch instructions (register)

Mnemonic

Instruction

See

BLR

Branch with link to register

BLR on page C5-431

BR

Branch to register

BR on page C5-432

RET

Return from subroutine

RET on page C5-642

C2.1.4

Exception generation and return
This section describes the following exceptions:
•
Exception generating.
•
Exception return on page C2-126.
•
Debug state on page C2-126.

Exception generating
Table C2-4 shows the Exception generating instructions.
Table C2-4 Exception generating instructions

ARM DDI 0487A.a
ID090413

Mnemonic

Instruction

See

BRK

Software breakpoint instruction

BRK on page C5-433

HLT

Halting software breakpoint instruction

HLT on page C5-484

HVC

Generate exception targeting Exception level 2

HVC on page C5-485

SMC

Generate exception targeting Exception level 3

SMC on page C5-663

SVC

Generate exception targeting Exception level 1

SVC on page C5-748

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C2-125

C2 A64 Instruction Set Overview
C2.1 Branches, Exception generating, and System instructions

Exception return
Table C2-5 shows the Exception return instructions.
Table C2-5 Exception return instructions
Mnemonic

Instruction

See

ERET

Exception return using current ELR and SPSR

ERET on page C5-479

Debug state
Table C2-6 shows the Debug state instructions.
Table C2-6 Debug state instructions

C2.1.5

Mnemonic

Instruction

See

DCPS1

Debug switch to Exception level 1

DCPS1 on page C5-466

DCPS2

Debug switch to Exception level 2

DCPS2 on page C5-467

DCPS3

Debug switch to Exception level 3

DCPS3 on page C5-468

DRPS

Debug restore PE state

DRPS on page C5-471

System register instructions
For detailed information about the System register instructions, see Chapter C4 The AArch64 System Instruction
Class. Table C2-7 shows the System register instructions.
Table C2-7 System register instructions

Mnemonic

Instruction

See

MRS

Move system register to general-purpose register

MRS on page C5-610

MSR

•
•

•
•

C2.1.6

Move general-purpose register to system register
Move immediate to PE state field

MSR (register) on page C5-613
MSR (immediate) on page C5-611

System instructions
For detailed information about the System instructions, see Chapter C4 The AArch64 System Instruction Class.
Table C2-8 shows the System instructions.
Table C2-8 System instructions

Mnemonic

Instruction

See

SYS

System instruction

SYS on page C5-752

SYSL

System instruction with result

SYSL on page C5-753

IC

Instruction cache maintenance

IC on page C5-486 and Table C4-2 on page C4-237

C2-126

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C2 A64 Instruction Set Overview
C2.1 Branches, Exception generating, and System instructions

Table C2-8 System instructions (continued)
Mnemonic

Instruction

See

DC

Data cache maintenance

DC on page C5-465 and Table C4-2 on page C4-237

AT

Address translation

AT on page C5-419 and Table C4-3 on page C4-238

TLBI

TLB Invalidate

TLBI on page C5-756 and Table C4-4 on page C4-239

C2.1.7

Hint instructions
Table C2-9 shows the Hint instructions.
Table C2-9 Hint instructions

Mnemonic

Instruction

See

NOP

No operation

NOP on page C5-622

YIELD

Yield hint

YIELD on page C5-773.

WFE

Wait for event

WFE on page C5-771.

WFI

Wait for interrupt

WFI on page C5-772

SEV

Send event

SEV on page C5-660

SEVL

Send event local

SEVL on page C5-661

HINT

Unallocated hint

HINT on page C5-482

C2.1.8

Barriers and CLREX instructions
Table C2-10 shows the barrier and CLREX instructions.
Table C2-10 Barriers and CLREX instructions

Mnemonic

Instruction

See

CLREX

Clear exclusive monitor

CLREX on page C5-442

DSB

Data synchronization barrier

DSB on page C5-472

DMB

Data memory barrier

DMB on page C5-469

ISB

Instruction synchronization barrier

ISB on page C5-487

Table C2-11 shows the allocated options for the data barriers. UNALLOCATED values behave as SY, but might be
allocated to other barrier functionality in future revisions of the architecture.
Table C2-11 Allocated values for the data barriers

ARM DDI 0487A.a
ID090413

Option

Shareability Domain

Ordered-accesses
(before-after)

OSHLD

Outer Shareable

Load-Load/Store

OSHST

Store-Store

OSH

Any-Any

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C2-127

C2 A64 Instruction Set Overview
C2.1 Branches, Exception generating, and System instructions

Table C2-11 Allocated values for the data barriers (continued)
Option

Shareability Domain

Ordered-accesses
(before-after)

NSHLD

Non-shareable

Load-Load/Store

NSHST

Store-Store

NSH

Any-Any

ISHLD

Load-Load/Store

ISHST

Store-Store

ISH

Any-Any

LD

C2-128

Inner Shareable

Full System

Load-Load/Store

ST

Store-Store

SY

Any-Any

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C2 A64 Instruction Set Overview
C2.2 Loads and stores

C2.2

Loads and stores
This section describes the Load/Store instructions. It contains the following subsections:
•
Load/Store register.
•
Load/Store register (unscaled offset) on page C2-130.
•
Load/Store Pair on page C2-131.
•
Load/Store Non-temporal Pair on page C2-132.
•
Load/Store Unprivileged on page C2-132.
•
Load-Exclusive/Store-Exclusive on page C2-133.
•
Load-Acquire/Store-Release on page C2-134.
•
Load/Store scalar SIMD and floating-point on page C2-134.
•
Load/Store Vector on page C2-137.
•
Prefetch memory on page C2-138.
Apart from Load-Exclusive, Store-Exclusive, Load-Acquire, and Store-Release, addresses can have any alignment
unless strict alignment checking is enabled, that is if SCTLR_ELx.A == 1.
The additional control bits SCTLR_ELx.SA and SCTLR_EL1.SA0 control whether the stack pointer must be
quadword aligned when used as a base register. See Stack pointer alignment checking on page D1-1424. Using a
misaligned stack pointer generates a Stack Alignment exception.
For information about the encoding structure of the instructions in this instruction group, see Loads and stores on
page C3-176.

Note
In some cases, Load/Store instructions can lead to CONSTRAINED UNPREDICTABLE behavior. See Appendix A
Constraints on AArch64 UNPREDICTABLE behavior.

C2.2.1

Load/Store register
The Load/Store register instructions support the following addressing modes:
•
Base plus a scaled 12-bit unsigned immediate offset or base plus an unscaled 9-bit signed immediate offset.
•
Base plus a 64-bit register offset, optionally scaled.
•
Base plus a 32-bit extended register offset, optionally scaled.
•
Pre-indexed by an unscaled 9-bit signed immediate offset.
•
Post-indexed by an unscaled 9-bit signed immediate offset.
•
PC-relative literal for loads of 32 bits or more.
See also Load/Store addressing modes on page C1-118.
If a Load instruction specifies writeback and the register being loaded is also the base register, then one of the
following behaviors occurs:
•

The instruction is UNALLOCATED.

•

The instruction is treated as a NOP.

•

The instruction performs the load using the specified addressing mode and the base register becomes
UNKNOWN. In addition, if an exception occurs during the execution of such an instruction, the base address
might be corrupted so that the instruction cannot be repeated.

If a Store instruction performs a writeback and the register that is stored is also the base register, then one of the
following behaviors occurs:

ARM DDI 0487A.a
ID090413

•

The instruction is UNALLOCATED.

•

The instruction is treated as a NOP.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C2-129

C2 A64 Instruction Set Overview
C2.2 Loads and stores

•

The instruction performs the store to the designated register using the specified addressing mode, but the
value stored is UNKNOWN.

Table C2-12 shows the Load/Store Register instructions.

Table C2-12 Load/Store register instructions
Mnemonic

Instruction

See

LDR

•
•
•

Load register (register offset)
Load register (immediate offset)
Load register (PC-relative literal)

•
•
•

LDR (register) on page C5-521
LDR (immediate) on page C5-517
LDR (literal) on page C5-520

LDRB

•
•

Load byte (register offset)
Load byte (immediate offset)

•
•

LDRB (register) on page C5-527
LDRB (immediate) on page C5-524

LDRSB

•
•

Load signed byte (register offset)
Load signed byte (immediate offset)

•
•

LDRSB (register) on page C5-539
LDRSB (immediate) on page C5-536

LDRH

•
•

Load halfword (register offset)
Load halfword (immediate offset)

•
•

LDRH (register) on page C5-533
LDRH (immediate) on page C5-530

LDRSH

•
•

Load signed halfword (register offset)
Load signed halfword (immediate offset)

•
•

LDRSH (register) on page C5-545
LDRSH (immediate) on page C5-542

LDRSW

•
•
•

Load signed word (register offset)
Load signed word (immediate offset)
Load signed word (PC-relative literal)

•
•
•

LDRSW (register) on page C5-552
LDRSW (immediate) on page C5-548
LDRSW (literal) on page C5-551

STR

•
•

Store register (register offset)
Store register (immediate offset)

•
•

STR (register) on page C5-697
STR (immediate) on page C5-694

STRB

•
•

Store byte (register offset)
Store byte (immediate offset)

•
•

STRB (register) on page C5-703
STRB (immediate) on page C5-700

STRH

•
•

Store halfword (register offset)
Store halfword (immediate offset)

•
•

STRH (register) on page C5-709
STRH (immediate) on page C5-706

C2.2.2

Load/Store register (unscaled offset)
The Load/Store register instructions with an unscaled offset support only one addressing mode:
•

Base plus an unscaled 9-bit signed immediate offset.

See Load/Store addressing modes on page C1-118.
The Load/Store register (unscaled offset) instructions are required to disambiguate this instruction class from the
Load/Store register instruction forms that support an addressing mode of base plus a scaled, unsigned 12-bit
immediate offset, because that can represent some offset values in the same range.
The ambiguous immediate offsets are byte offsets that are both:
•
In the range 0-255, inclusive.
•
Naturally aligned to the access size.
Other byte offsets in the range -256 to 255 inclusive are unambiguous. An assembler program translating a
Load/Store instruction, for example LDR, is required to encode an unambiguous offset using the unscaled 9-bit offset
form, and to encode an ambiguous offset using the scaled 12-bit offset form. A programmer might force the

C2-130

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C2 A64 Instruction Set Overview
C2.2 Loads and stores

generation of the unscaled 9-bit form by using one of the mnemonics in Table C2-13. ARM recommends that a
disassembler outputs all unscaled 9-bit offset forms using one of these mnemonics, but unambiguous offsets can be
output using a Load/Store single register mnemonic, for example, LDR.
Table C2-13 shows the Load/Store register instructions with an unscaled offset.
Table C2-13 Load/Store register (unscaled offset) instructions
Mnemonic

Instruction

See

LDUR

Load register (unscaled offset)

LDUR on page C5-567

LDURB

Load byte (unscaled offset)

LDURB on page C5-569

LDURSB

Load signed byte (unscaled offset)

LDURSB on page C5-573

LDURH

Load halfword (unscaled offset)

LDURH on page C5-571

LDURSH

Load signed halfword (unscaled offset)

LDURSH on page C5-575

LDURSW

Load signed word (unscaled offset)

LDURSW on page C5-577

STUR

Store register (unscaled offset)

STUR on page C5-718

STURB

Store byte (unscaled offset)

STURB on page C5-720

STURH

Store halfword (unscaled offset)

STURH on page C5-722

C2.2.3

Load/Store Pair
The Load/Store Pair instructions support the following addressing modes:
•
Base plus a scaled 7-bit signed immediate offset.
•
Pre-indexed by a scaled 7-bit signed immediate offset.
•
Post-indexed by a scaled 7-bit signed immediate offset.
See also Load/Store addressing modes on page C1-118.
If a Load Pair instruction specifies the same register for the two register that are being loaded, then one of the
following behaviors occurs:
•

The instruction is UNALLOCATED.

•

The instruction is treated as a NOP.

•

The instruction performs all the loads using the specified addressing mode and the register that is loaded takes
an UNKNOWN value.

If a Load Pair instruction specifies writeback and one of the registers being loaded is also the base register, then one
of the following behaviors occurs:
•

The instruction is UNALLOCATED.

•

The instruction is treated as a NOP.

•

The instruction performs all of the loads using the specified addressing mode, and the base register becomes
UNKNOWN. In addition, if an exception occurs during the instruction, the base address might be corrupted so
that the instruction cannot be repeated.

If a Store Pair instruction performs a writeback and one of the registers being stored is also the base register, then
one of the following behaviors occurs:

ARM DDI 0487A.a
ID090413

•

The instruction is UNALLOCATED.

•

The instruction is treated as a NOP.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C2-131

C2 A64 Instruction Set Overview
C2.2 Loads and stores

•

The instruction performs all the stores of the registers indicated by the specified addressing mode, but the
value stored for the base register is UNKNOWN.

Table C2-14 shows the Load/Store Pair instructions.
Table C2-14 Load/Store Pair instructions
Mnemonic

Instruction

See

LDP

Load Pair

LDP on page C5-511

LDPSW

Load Pair signed words

LDPSW on page C5-514

STP

Store Pair

STP on page C5-691

C2.2.4

Load/Store Non-temporal Pair
The Load/Store Non-temporal Pair instructions support only one addressing mode:
•

Base plus a scaled 7-bit signed immediate offset.

See Load/Store addressing modes on page C1-118.
The Load/Store Non-temporal Pair instructions provide a hint to the memory system that an access is non-temporal
or streaming, and unlikely to be repeated in the near future. This means that data caching is not required. However,
depending on the memory type, the instructions might permit memory reads to be preloaded and memory writes to
be gathered to accelerate bulk memory transfers.
In addition there is a special exception to the normal memory ordering rules. If an address dependency exists
between two memory reads, and a Load Non-temporal Pair instruction generated the second read, then in the
absence of any other barrier mechanism to achieve order, the memory accesses can be observed in any order by the
other observers within the shareability domain of the memory addresses being accessed.
If a Load Non-Temporal Pair instruction specifies the same register for the two registers that are being loaded, then
one of the following can occur:
•

The instruction is UNALLOCATED.

•

The instruction is treated as a NOP.

•

The instruction performs all the loads using the specified addressing mode and the register that is loaded takes
an UNKNOWN value.

Table C2-15 shows the Load/Store Non-temporal Pair instructions.
Table C2-15 Load/Store Non-temporal Pair instructions
Mnemonic

Instruction

See

LDNP

Load Non-temporal Pair

LDNP on page C5-509

STNP

Store Non-temporal Pair

STNP on page C5-689

C2.2.5

Load/Store Unprivileged
The Load/Store Unprivileged instructions support only one addressing mode:
•

Base plus an unscaled 9-bit signed immediate offset.

See Load/Store addressing modes on page C1-118.
The Load/Store Unprivileged instructions can be used when the PE is at EL1 to perform unprivileged memory
accesses. If the PE is executing in any other Exception level, then a normal memory access for that level is
performed.

C2-132

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C2 A64 Instruction Set Overview
C2.2 Loads and stores

Table C2-16 shows the Load/Store Unprivileged instructions.
Table C2-16 Load-Store Unprivileged instructions
Mnemonic

Instruction

See

LDTR

Load Unprivileged register

LDTR on page C5-555

LDTRB

Load Unprivileged byte

LDTRB on page C5-557

LDTRSB

Load Unprivileged signed byte

LDTRSB on page C5-561

LDTRH

Load Unprivileged halfword

LDTRH on page C5-559

LDTRSH

Load Unprivileged signed halfword

LDTRSH on page C5-563

LDTRSW

Load Unprivileged signed word

LDTRSW on page C5-565

STTR

Store Unprivileged register

STTR on page C5-712

STTR

Store Unprivileged register

STTR on page C5-712

STTRB

Store Unprivileged byte

STTRB on page C5-714

STTRH

Store Unprivileged halfword

STTRH on page C5-716

C2.2.6

Load-Exclusive/Store-Exclusive
The Load-Exclusive/Store-Exclusive instructions support only one addressing mode:
•

Base register with no offset.

See Load/Store addressing modes on page C1-118.
The Load-Exclusive instructions mark the physical address being accessed as an exclusive access. This exclusive
access mark is checked by the Store-Exclusive instruction, permitting the construction of atomic read-modify-write
operations on shared memory variables, semaphores, mutexes, and spinlocks. See Load-Acquire Exclusive,
Store-Release Exclusive and barriers on page AppxF-4839.
Natural alignment is required and an unaligned address generates an Alignment fault. Memory accesses generated
by Load-Exclusive pair or Store-Exclusive pair instructions must be aligned to the size of the pair. When a
Store-Exclusive pair succeeds, it causes a single-copy atomic update of the entire memory location.
Table C2-17 shows the Load-Exclusive/Store-Exclusive instructions.
Table C2-17 Load-Exclusive/Store-Exclusive instructions
Mnemonic

Instruction

See

LDXR

Load Exclusive register

LDXR on page C5-582

LDXRB

Load Exclusive byte

LDXRB on page C5-585

LDXRH

Load Exclusive halfword

LDXRH on page C5-588

LDXP

Load Exclusive pair

LDXP on page C5-579

STXR

Store Exclusive register

STXR on page C5-727

STXRB

Store Exclusive byte

STXRB on page C5-730.

STXRH

Store Exclusive halfword

STXRH on page C5-733

STXP

Store Exclusive pair

STXP on page C5-724

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C2-133

C2 A64 Instruction Set Overview
C2.2 Loads and stores

C2.2.7

Load-Acquire/Store-Release
The Load-Acquire/Store-Release instructions support only one addressing mode:
•

Base register with no offset.

See Load/Store addressing modes on page C1-118.
The Load-Acquire/Store-Release instructions can remove the requirement to use the explicit DMB memory barrier
instruction. For more information about the ordering of Load-Acquire/Store-Release, see Load-Acquire,
Store-Release on page B2-87.
Table C2-18 shows the Non-exclusive Load-Acquire/Store-Release instructions.
Table C2-18 Non-exclusive Load-Acquire and Store-Release instructions
Mnemonic

Instruction

See

LDAR

Load-Acquire register

LDAR on page C5-488

LDARB

Load-Acquire byte

LDARB on page C5-491

LDARH

Load-Acquire halfword

LDARH on page C5-494

STLR

Store-Release register

STLR on page C5-668

STLRB

Store-Release byte

STLRB on page C5-671

STLRH

Store-Release halfword

STLRH on page C5-674

Table C2-19 shows the Exclusive Load-Acquire/Store-Release instructions.
Table C2-19 Exclusive Load-Acquire and Store-Release instructions
Mnemonic

Instruction

See

LDAXR

Load-Acquire Exclusive register

LDAXR on page C5-500

LDAXRB

Load-Acquire Exclusive byte

LDAXRB on page C5-503

LDAXRH

Load-Acquire Exclusive halfword

LDAXRH on page C5-506

LDAXP

Load-Acquire Exclusive pair

LDAXP on page C5-497

STLXR

Store-Release Exclusive register

STLXR on page C5-680

STLXRB

Store-Release Exclusive byte

STLXRB on page C5-683.

STLXRH

Store-Release Exclusive halfword

STLXRH on page C5-686

STLXP

Store-Release Exclusive pair

STLXP on page C5-677

C2.2.8

Load/Store scalar SIMD and floating-point
The Load/Store scalar SIMD and floating-point instructions operate on scalar values in the SIMD and floating-point
register file as described in SIMD and floating-point scalar register names on page C1-116. The memory addressing
modes available, described in Load/Store addressing modes on page C1-118, are identical to the general-purpose
register Load/Store instructions, and like those instructions permit arbitrary address alignment unless strict
alignment checking is enabled. However, unlike the Load/Store instructions that transfer general-purpose registers,
Load/Store scalar SIMD and floating-point instructions make no guarantee of atomicity, even when the address is
naturally aligned to the size of the data.

C2-134

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C2 A64 Instruction Set Overview
C2.2 Loads and stores

Load/Store scalar SIMD and floating-point register
The Load/Store scalar SIMD and floating-point register instructions support the following addressing modes:
•
Base plus a scaled 12-bit unsigned immediate offset or base plus unscaled 9-bit signed immediate offset.
•
Base plus 64-bit register offset, optionally scaled.
•
Base plus 32-bit extended register offset, optionally scaled.
•
Pre-indexed by an unscaled 9-bit signed immediate offset.
•
Post-indexed by an unscaled 9-bit signed immediate offset.
•
PC-relative literal for loads of 32 bits or more.
For more information on the addressing modes, see Load/Store addressing modes on page C1-118.

Note
The unscaled 9-bit signed immediate offset address mode requires its own instruction form, see Load/Store scalar
SIMD and floating-point register (unscaled offset).
Table C2-20 shows the Load/Store instructions for a single SIMD and floating-point register.
Table C2-20 Load/Store single SIMD and floating-point register instructions
Mnemonic

Instruction

See

LDR

•
•

Load scalar SIMD&FP register (register offset)
Load scalar SIMD&FP register (immediate offset)

•
•

•

Load scalar SIMD &FP register (PC-relative literal)

•

•
•

Store scalar SIMD &FP register (register offset)
Store scalar SIMD &FP register (immediate offset)

•
•

STR

LDR (register, SIMD&FP) on page C6-1061
LDR (immediate, SIMD&FP) on
page C6-1057
LDR (literal, SIMD&FP) on page C6-1060
STR (register, SIMD&FP) on page C6-1290
STR (immediate, SIMD&FP) on
page C6-1287

Load/Store scalar SIMD and floating-point register (unscaled offset)
The Load /Store scalar SIMD and floating-point register instructions support only one addressing mode:
•

Base plus an unscaled 9-bit signed immediate offset.

See also Load/Store addressing modes on page C1-118.
The Load/Store scalar SIMD and floating-point register (unscaled offset) instructions are required to disambiguate
this instruction class from the Load/Store single SIMD and floating-point instruction forms that support an
addressing mode of base plus a scaled, unsigned 12-bit immediate offset. This is similar to the Load/Store register
(unscaled offset) instructions, that disambiguate this instruction class from the Load/Store register instruction, see
Load/Store register (unscaled offset) on page C2-130.
Table C2-21 shows the Load/Store SIMD and floating-point register instructions with an unscaled offset.

Table C2-21 Load/Store SIMD and floating-point register instructions
Mnemonic

Instruction

See

LDUR

Load scalar SIMD&FP register (unscaled offset)

LDUR (SIMD&FP) on page C6-1064

STUR

Store scalar SIMD&FP register (unscaled offset)

STUR (SIMD&FP) on page C6-1293

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C2-135

C2 A64 Instruction Set Overview
C2.2 Loads and stores

Load/Store SIMD and Floating-point register pair
The Load/Store SIMD and floating-point register pair instructions support the following addressing modes:
•
Base plus a scaled 7-bit signed immediate offset.
•
Pre-indexed by a scaled 7-bit signed immediate offset.
•
Post-indexed by a scaled 7-bit signed immediate offset.
See also Load/Store addressing modes on page C1-118.
If a Load pair instruction specifies the same register for the two registers that are being loaded, then one of the
following occurs:
•

The instruction is UNALLOCATED.

•

The instruction is treated as a NOP.

•

The instruction performs all of the loads using the specified addressing mode and the register being loaded
takes an UNKNOWN value.

Table C2-22 shows the Load/Store SIMD and floating-point register pair instructions.
Table C2-22 Load/Store SIMD and floating-point register pair instructions
Mnemonic

Instruction

See

LDP

Load pair of scalar SIMD&FP registers

LDP (SIMD&FP) on page C6-1054

STP

Store pair of scalar SIMD&FP registers

STP (SIMD&FP) on page C6-1284

Load/Store SIMD and Floating-point Non-temporal pair
The Load/Store SIMD and Floating-point Non-temporal pair instructions support only one addressing mode:
•

Base plus a scaled 7-bit signed immediate offset.

See also Load/Store addressing modes on page C1-118.
The Load/Store Non-temporal pair instructions provide a hint to the memory system that an access is non-temporal
or streaming, and unlikely to be repeated in the near future. This means that data caching is not required. However,
depending on the memory type, the instructions might permit memory reads to be preloaded and memory writes to
be gathered to accelerate bulk memory transfers.
In addition there is a special exception to the normal memory ordering rules. If an address dependency exists
between two memory reads, and a Load non-temporal pair instruction generated the second read, then in the absence
of any other barrier mechanism to achieve order, those memory accesses can be observed in any order by the other
observers within the shareability domain of the memory addresses being accessed.
If a Load Non-temporal pair instruction specifies the same register for the two registers that are being loaded, then
one of the following occurs:

C2-136

•

The instruction is UNALLOCATED.

•

The instruction is treated as a NOP.

•

The instruction performs all the loads using the specified addressing mode and the register that is loaded takes
an UNKNOWN value.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C2 A64 Instruction Set Overview
C2.2 Loads and stores

Table C2-23 shows the Load/Store SIMD and floating-point Non-temporal pair instructions.
Table C2-23 Load/Store SIMD and floating-point Non-temporal pair instructions
Mnemonic

Instruction

See

LDNP

Load pair of scalar SIMD&FP registers

LDNP (SIMD&FP) on page C6-1052

STNP

Store pair of scalar SIMD&FP registers

STNP (SIMD&FP) on page C6-1282

C2.2.9

Load/Store Vector
The Vector Load/Store structure instructions support the following addressing modes:
•
Base register only.
•
Post-indexed by a 64-bit register.
•
Post-indexed by an immediate, equal to the number of bytes transferred.
Load/Store vector instructions, like other Load/Store instructions, allow any address alignment, unless strict
alignment checking is enabled. If strict alignment checking is enabled, then alignment checking to the size of the
element is performed. However, unlike the Load/Store instructions that transfer general-purpose registers, the
Load/Store vector instructions do not guarantee atomicity, even when the address is naturally aligned to the size of
the element.

Load/Store structures
Table C2-24 shows the Load/Store structure instructions. A post-increment immediate offset, if present, must be 8,
16, 24, 32, 48, or 64, depending on the number of elements transferred.
Table C2-24 Load/Store multiple structures instructions
Mnemonic

Instruction

See

LD1

•

Load single 1-element structure to one lane of one register

•

•

Load multiple 1-element structures to one register or to two,
three or four consecutive registers

•

•

Load single 2-element structure to one lane of two consecutive
registers
Load multiple 2-element structures to two consecutive registers

•

Load single 3-element structure to one lane of three
consecutive registers
Load multiple 3-element structures to three consecutive
registers

•

Load single 4-element structure to one lane of four consecutive
registers
Load multiple 4-element structures to four consecutive
registers

•

•

Store single 1-element structure from one lane of one register

•

•

Store multiple 1-element structures from one register, or from
two, three or four consecutive registers

•

LD2

•
LD3

•
•

LD4

•
•

ST1

ARM DDI 0487A.a
ID090413

•

•

•

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

LD1 (single structure) on
page C6-1019
LD1 (multiple structures) on
page C6-1016
LD2 (single structure) on
page C6-1028
LD2 (multiple structures) on
page C6-1025
LD3 (single structure) on
page C6-1037
LD3 (multiple structures) on
page C6-1034
LD4 (single structure) on
page C6-1046
LD4 (multiple structures) on
page C6-1043
ST1 (single structure) on
page C6-1261
ST1 (multiple structures) on
page C6-1258

C2-137

C2 A64 Instruction Set Overview
C2.2 Loads and stores

Table C2-24 Load/Store multiple structures instructions (continued)
Mnemonic

Instruction

See

ST2

•

Store single 2-element structure from one lane of two
consecutive registers
Store multiple 2-element structures from two consecutive
registers

•

Store single 3-element structure from one lane of three
consecutive registers
Store multiple 3-element structures from three consecutive
registers

•

Store single 4-element structure from one lane of four
consecutive registers
Store multiple 4-element structures from four consecutive
registers

•

•
ST3

•
•

ST4

•
•

•

•

•

ST2 (single structure) on
page C6-1267
ST2 (multiple structures) on
page C6-1264
ST3 (single structure) on
page C6-1273
ST3 (multiple structures) on
page C6-1270
ST4 (single structure) on
page C6-1279
ST4 (multiple structures) on
page C6-1276

Load single structure and replicate
Table C2-25 shows the Load single structure and replicate instructions. A post-increment immediate offset, if
present, must be 1, 2, 3, 4, 6, 8, 12, 16, 24, or 32, depending on the number of elements transferred.
Table C2-25 Load single structure and replicate instructions
Mnemonic

Instruction

See

LD1R

Load single 1-element structure and replicate to all lanes of one register

LD1R on page C6-1022

LD2R

Load single 2-element structure and replicate to all lanes of two registers

LD2R on page C6-1031

LD3R

Load single 3-element structure and replicate to all lanes of three registers

LD3R on page C6-1040

LD4R

Load single 4-element structure and replicate to all lanes of four registers

LD4R on page C6-1049

C2.2.10

Prefetch memory
The Prefetch memory instructions support the following addressing modes:
•
Base plus a scaled 12-bit unsigned immediate offset or base plus an unscaled 9-bit signed immediate offset.
•
Base plus a 64-bit register offset. This can be optionally scaled by 8-bits, for example LSL#3.
•
Base plus a 32-bit extended register offset. This can be optionally scaled by 8-bits.
•
PC-relative literal.
The prefetch memory instructions signal to the memory system that memory accesses from a specified address are
likely to occur in the near future. The memory system can respond by taking actions that are expected to speed up
the memory access when they do occur, such as pre-loading the specified address into one or more caches. Because
these signals are only hints, it is valid for the PE to treat any or all prefetch instructions as a NOP.
Because they are hints to the memory system, the operation of a PRFM instruction cannot cause a synchronous
exception. However, a memory operation performed as a result of one of these memory system hints might in
exceptional cases trigger an asynchronous event, and thereby influence the execution of the PE. An example of an
asynchronous event that might be triggered is a System error interrupt.
A PRFM instruction can only have an effect on software visible structures, such as caches and translation lookaside
buffers associated with memory locations that can be accessed by reads, writes, or execution as defined in the
translation regime of the current Exception level.
A PRFM instruction is guaranteed not to access Device memory.

C2-138

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C2 A64 Instruction Set Overview
C2.2 Loads and stores

A PRFM instruction using a PLI hint must not result in any access that could not be performed by the PE speculatively
fetching an instruction. Therefore, if all associated MMUs are disabled, a PLI hint cannot access any memory
location that cannot be accessed by instruction fetches.
The PRFM instructions require an additional  operand to be specified, which must be one of the following:
PLDL1KEEP, PLDL1STRM, PLDL2KEEP, PLDL2STRM, PLDL3KEEP, PLDL3STRM
PSTL1KEEP, PSTL1STRM, PSTL2KEEP, PSTL2STRM, PSTL3KEEP, PSTL3STRM
PLIL1KEEP, PLIL1STRM, PLIL2KEEP, PLIL2STRM, PLIL3KEEP, PLIL3STRM
 is defined as .

Here:
Is one of:



Prefetch for load.
Prefetch for store.
Preload instructions.

PLD
PST
PLI


Is one of:
Level 1 cache.
Level 2 cache.
Level 3 cache.

L1
L2
L3


Is one of:
Retained or temporal prefetch, allocated in the cache normally.
Streaming or non-temporal prefetch, for data that is used only once.

KEEP
STRM

PRFUM explicitly uses the unscaled 9-bit signed immediate offset addressing mode, as described in Load/Store

register (unscaled offset) on page C2-130.
Table C2-26 shows the Prefetch memory instructions.
Table C2-26 Prefetch memory instructions
Mnemonic

Instruction

See

PRFM

•
•
•

•
•
•

PRFUM

Prefetch memory (unscaled offset)

ARM DDI 0487A.a
ID090413

Prefetch memory (register offset)
Prefetch memory (immediate offset)
Prefetch memory (PC-relative offset)

PRFM (register) on page C5-634
PRFM (immediate) on page C5-629
PRFM (literal) on page C5-632

PRFUM on page C5-637

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C2-139

C2 A64 Instruction Set Overview
C2.3 Data processing - immediate

C2.3

Data processing - immediate
This section describes the instruction groups for data processing with immediate operands. It contains the following
subsections:
•
Arithmetic (immediate).
•
Logical (immediate).
•
Move (wide immediate) on page C2-141.
•
Move (immediate) on page C2-141.
•
PC-relative address calculation on page C2-142.
•
Bitfield move on page C2-142.
•
Bitfield insert and extract on page C2-143
•
Extract register on page C2-143.
•
Shift (immediate) on page C2-143.
•
Sign-extend and Zero-extend on page C2-143.
For information about the encoding structure of the instructions in this instruction group, see Data processing immediate on page C3-193.

C2.3.1

Arithmetic (immediate)
The Arithmetic (immediate) instructions accept a 12-bit unsigned immediate value, optionally shifted left by 12 bits.
The Arithmetic (immediate) instructions that do not set condition flags can read from and write to the current stack
pointer. The flag setting instructions can read from the stack pointer, but they cannot write to it.
Table C2-27 shows the Arithmetic instructions with an immediate offset.
Table C2-27 Arithmetic instructions with an immediate

Mnemonic

Instruction

See

ADD

Add

ADD (immediate) on page C5-396

ADDS

Add and set flags

ADDS (immediate) on page C5-402

SUB

Subtract

SUB (immediate) on page C5-738

SUBS

Subtract and set flags

SUBS (immediate) on page C5-744

CMP

Compare

CMP (immediate) on page C5-451.

CMN

Compare negative

CMN (immediate) on page C5-447

C2.3.2

Logical (immediate)
The Logical (immediate) instructions accept a bitmask immediate value that is a 32-bit pattern or a 64-bit pattern
viewed as a vector of identical elements of size e = 2, 4, 8, 16, 32 or, 64 bits. Each element contains the same
sub-pattern, that is a single run of 1 to (e - 1) nonzero bits from bit 0 followed by zero bits, then rotated by 0 to (e 1) bits. This mechanism can generate 5334 unique 64-bit patterns as 2667 pairs of pattern and their bitwise inverse.

Note
Values that consist of only zeros or only ones cannot be described in this way.
The Logical (immediate) instructions that do not set the condition flags can write to the current stack pointer, for
example to align the stack pointer in a function prologue.

C2-140

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C2 A64 Instruction Set Overview
C2.3 Data processing - immediate

Note
Apart from ANDS, and its TST alias, Logical (immediate) instructions do not set the condition flags. However, the final
results of a bitwise operation can be tested by a CBZ, CBNZ, TBZ, or TBNZ conditional branch.
Table C2-28 shows the Logical immediate instructions.
Table C2-28 Logical immediate instructions
Mnemonic

Instruction

See

AND

Bitwise AND

AND (immediate) on page C5-408

ANDS

Bitwise AND and set flags

ANDS (immediate) on page C5-412

EOR

Bitwise exclusive OR

EOR (immediate) on page C5-476

ORR

Bitwise inclusive OR

ORR (immediate) on page C5-625

TST

Test bits

TST (immediate) on page C5-757

C2.3.3

Move (wide immediate)
The Move (wide immediate) instructions insert a 16-bit immediate, or inverted immediate, into a 16-bit aligned
position in the destination register. The value of the other bits in the destination register depends on the variant used.
The optional shift amount can be any multiple of 16 that is smaller than the register size.
Table C2-29 shows the Move (wide immediate) instructions.
Table C2-29 Move (wide immediate) instructions

Mnemonic

Instruction

See

MOVZ

Move wide with zero

MOVZ on page C5-608

MOVN

Move wide with NOT

MOVN on page C5-606

MOVK

Move wide with keep

MOVK on page C5-605

C2.3.4

Move (immediate)
The Move (immediate) instructions are aliases for a single MOVZ, MOVN, or ORR (immediate with zero register),
instruction to load an immediate value into the destination register. An assembler must permit a signed or unsigned
immediate, as long as its binary representation can be generated using one of these instructions, and an assembler
error results if the immediate cannot be generated in this way. On disassembly it is unspecified whether the
immediate is output as a signed or an unsigned value.
If there is a choice between the MOVZ, MOVN, and ORR instruction to encode the immediate, then an assembler must
prefer MOVZ to MOVN, and MOVZ or MOVN to ORR, to ensure reversability. A disassembler must output ORR (immediate with
zero register) MOVZ, and MOVN, as a MOV mnemonic except that the underlying instruction must be used when:
ORR has an immediate that can be generated by a MOVZ or MOVN instruction.
•
•
A MOVN instruction has an immediate that can be encoded by MOVZ.
•
MOVZ #0 or MOVN #0 have a shift amount other than LSL #0.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C2-141

C2 A64 Instruction Set Overview
C2.3 Data processing - immediate

Table C2-30 shows the Move (immediate) instructions.
Table C2-30 Move (immediate) instructions
Mnemonic

Instruction

See

MOV

•
•
•

•
•
•

C2.3.5

Move (inverted wide immediate)
Move (wide immediate)
Move (bitmask immediate)

MOV (inverted wide immediate) on page C5-601
MOV (wide immediate) on page C5-602
MOV (bitmask immediate) on page C5-603

PC-relative address calculation
The ADR instruction adds a signed, 21-bit immediate to the value of the program counter that fetched this instruction,
and then writes the result to a general-purpose register. This permits the calculation of any byte address within
±1MB of the current PC.
The ADRP instruction shifts a signed, 21-bit immediate left by 12 bits, adds it to the value of the program counter with
the bottom 12 bits cleared to zero, and then writes the result to a general-purpose register. This permits the
calculation of the address at a 4KB aligned memory region. In conjunction with an ADD (immediate) instruction, or
a Load/Store instruction with a 12-bit immediate offset, this allows for the calculation of, or access to, any address
within ±4GB of the current PC.

Note
The term page used in the ADRP description is short-hand for the 4KB memory region, and is not related to the virtual
memory translation granule size.
Table C2-31 shows the instructions used for PC-relative address calculations are as follows:
Table C2-31 PC-relative address calculation instructions
Mnemonic

Instruction

See

ADRP

Compute address of 4KB page at a PC-relative offset

ADRP on page C5-407

ADR

Compute address of label at a PC-relative offset.

ADR on page C5-406

C2.3.6

Bitfield move
The Bitfield move instructions copy a bitfield of constant width from bit 0 in the source register to a constant bit
position in the destination register, or from a constant bit position in the source register to bit 0 in the destination
register. The remaining bits in the destination register are set as follows:
•

For BFM the remaining bits are unchanged.

•

For UBFM the lower bits, if any, and upper bits, if any, are set to zero.

•

For SBFM the lower bits, if any, are set to zero, and the upper bits, if any, are set to a copy of the
most-significant bit in the copied bitfield.

Table C2-32 shows the Bitfield move instructions.
Table C2-32 Bitfield move instructions
Mnemonic

Instruction

See

BFM

Bitfield move

BFM on page C5-423

SBFM

Signed bitfield move

SBFM on page C5-656

UBFM

Unsigned bitfield move (32-bit)

UBFM on page C5-760

C2-142

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C2 A64 Instruction Set Overview
C2.3 Data processing - immediate

C2.3.7

Bitfield insert and extract
The Bitfield insert and extract instructions are implemented as aliases of the Bitfield move instructions. Table C2-33
shows the Bitfield insert and extract aliases.
Table C2-33 Bitfield insert and extract instructions

Mnemonic

Instruction

See

BFI

Bitfield insert

BFI on page C5-422

BFXIL

Bitfield extract and insert low

BFXIL on page C5-425

SBFIZ

Signed bitfield insert in zero

SBFIZ on page C5-655

SBFX

Signed bitfield extract

SBFX on page C5-658

UBFIZ

Unsigned bitfield insert in zero

UBFIZ on page C5-759

UBFX

Unsigned bitfield extract

UBFX on page C5-762

C2.3.8

Extract register
Depending on the register width of the operands, the Extract register instruction copies a 32-bit or 64-bit field from
a constant bit position within a double-width value formed by the concatenation of a pair of source registers to a
destination register.
Table C2-34 shows the Extract (immediate) instructions.
Table C2-34 Extract register instructions

Mnemonic

Instruction

See

EXTR

Extract register from pair

EXTR on page C5-480

C2.3.9

Shift (immediate)
Shifts and rotates by a constant amount are implemented as aliases of the Bitfield move or Extract register
instructions. The shift or rotate amount must be in the range 0 to one less than the register width of the instruction,
inclusive.
Table C2-35 shows the aliases that can be used as immediate shift and rotate instructions.
Table C2-35 Aliases for immediate shift and rotate instructions

Mnemonic

Instruction

See

ASR

Arithmetic shift right

ASR (immediate) on page C5-417

LSL

Logical shift left

LSL (immediate) on page C5-592

LSR

Logical shift right

LSR (immediate) on page C5-595

ROR

Rotate right

ROR (immediate) on page C5-648

C2.3.10

Sign-extend and Zero-extend
The Sign-extend and Zero-extend instructions are implemented as aliases of the Bitfield move instructions.
Table C2-36 on page C2-144 shows the aliases that can be used as zero-extend and sign-extend instructions.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C2-143

C2 A64 Instruction Set Overview
C2.3 Data processing - immediate

Table C2-36 Zero-extend and sign-extend instructions
Mnemonic

Instruction

See

SXTB

Sign-extend byte

SXTB on page C5-749

SXTH

Sign-extend halfword

SXTH on page C5-750

SXTW

Sign-extend word

SXTW on page C5-751

UXTB

Unsigned extend byte

UXTB on page C5-769

UXTH

Unsigned extend halfword

UXTH on page C5-770

C2-144

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C2 A64 Instruction Set Overview
C2.4 Data processing - register

C2.4

Data processing - register
This section describes the instruction groups for data processing with all register operands. It contains the following
subsections:
•
Arithmetic (shifted register).
•
Arithmetic (extended register).
•
Arithmetic with carry on page C2-146.
•
Logical (shifted register) on page C2-147.
•
Move (register) on page C2-148.
•
Shift (register) on page C2-148.
•
Multiply and divide on page C2-148.
•
CRC32 on page C2-150.
•
Bit operation on page C2-150.
•
Conditional select on page C2-150.
•
Conditional comparison on page C2-151.
For information about the encoding structure of the instructions in this instruction group, see Data processing register on page C3-196.

C2.4.1

Arithmetic (shifted register)
The Arithmetic (shifted register) instructions apply an optional shift operator to the second source register value
before performing the arithmetic operation. The register width of the instruction controls whether the new bits are
fed into the intermediate result on a right shift or rotate at bit[63] or bit[31].
The shift operators LSL, ASR and LSR accept an immediate shift amount in the range 0 to one less than the register
width of the instruction, inclusive.
Omitting the shift operator implies LSL #0, which means that there is no shift. A disassembler must not output LSL
#0. However, a disassembler must output all other shifts by zero.
The current stack pointer, SP or WSP, cannot be used with this class of instructions. See Arithmetic (extended
register) for arithmetic instructions that can operate on the current stack pointer.
Table C2-37 shows the Arithmetic (shifted register) instructions.
Table C2-37 Arithmetic (shifted register) instructions

Mnemonic

Instruction

See

ADD

Add

ADD (shifted register) on page C5-398

ADDS

Add and set flags

ADDS (shifted register) on page C5-404

SUB

Subtract

SUB (shifted register) on page C5-740

SUBS

Subtract and set flags

SUBS (shifted register) on page C5-746

CMN

Compare negative

CMN (shifted register) on page C5-448

CMP

Compare

CMP (shifted register) on page C5-452

NEG

Negate

NEG on page C5-618

NEGS

Negate and set flags

NEGS on page C5-619

C2.4.2

Arithmetic (extended register)
The extended register instructions provide an optional sign-extension or zero-extension of a portion of the second
source register value, followed by an optional left shift by a constant amount of 1-4, inclusive.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C2-145

C2 A64 Instruction Set Overview
C2.4 Data processing - register

The extended shift is described by the mandatory extend operator SXTB, SXTH, SXTW, UXTB, UXTH,or UXTW. This is
followed by an optional left shift amount. If the shift amount is not specified, the default shift amount is zero. A
disassembler must not output a shift amount of zero.
For 64-bit instruction forms the additional operators UXTX and SXTX use all 64 bits of the second source register with
an optional shift. In that case ARM recommends UXTX as the operator. If and only if at least one register is SP, ARM
recommends use of the LSL operator name, rather than UXTX, and when the shift amount is also zero then both the
operator and the shift amount can be omitted.
For 32-bit instruction forms the operators UXTW and SXTW both use all 32 bits of the second source register with an
optional shift. In that case ARM recommends UXTW as the operator. If and only if at least one register is WSP, ARM
recommends use of the LSL operator name, rather than UXTW, and when the shift amount is also zero then both the
operator and the shift amount can be omitted.
The non-flag setting variants of the extended register instruction permit the use of the current stack pointer as either
the destination register and the first source register. The flag setting variants only permit the stack pointer to be used
as the first source register.
In the 64-bit form of these instructions the final register operand is written as Wm for all except the UXTX/LSL and SXTX
extend operators. For example:
CMP X4, W5, SXTW
ADD X1, X2, W3, UXTB #2
SUB SP, SP, X1

// SUB SP, SP, X1, UXTX #0

Table C2-38 shows the Arithmetic (extended register) instructions.
Table C2-38 Arithmetic (extended register) instructions
Mnemonic

Instruction

See

ADD

Add

ADD (extended register) on page C5-394

ADDS

Add and set flags

ADDS (extended register) on page C5-400

SUB

Subtract

SUB (extended register) on page C5-736

SUBS

Subtract and set flags

SUBS (extended register) on page C5-742

CMN

Compare negative

CMN (extended register) on page C5-445

CMP

Compare

CMP (extended register) on page C5-449

C2.4.3

Arithmetic with carry
The Arithmetic with carry instructions accept two source registers, with the carry flag as an additional input to the
calculation. They do not support shifting of the second source register.
Table C2-39 shows the Arithmetic with carry instructions
Table C2-39 Arithmetic with carry instructions

Mnemonic

Instruction

See

ADC

Add with carry

ADC on page C5-392

ADCS

Add with carry and set flags

ADCS on page C5-393

SBC

Subtract with carry

SBC on page C5-651

C2-146

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C2 A64 Instruction Set Overview
C2.4 Data processing - register

Table C2-39 Arithmetic with carry instructions (continued)
Mnemonic

Instruction

See

SBCS

Subtract with carry and set flags

SBCS on page C5-653

NGC

Negate with carry

NGC on page C5-620

NGCS

Negate with carry and set flags

NGCS on page C5-621

C2.4.4

Logical (shifted register)
The Logical (shifted register) instructions apply an optional shift operator to the second source register value before
performing the main operation. The register width of the instruction controls whether the new bits are fed into the
intermediate result on a right shift or rotate at bit[63] or bit[31].
The shift operators LSL, ASR, LSR and ROR accept a constant immediate shift amount in the range 0 to one less than the
register width of the instruction, inclusive.
Omitting the shift operator and amount implies LSL #0, which means that there is no shift. A disassembler must not
output LSL #0. However, a disassembler must output all other shifts by zero.

Note
Apart from ANDS,TST and BICS the logical instructions do not set the condition flags, but the final result of a bit
operation can usually directly control a CBZ, CBNZ, TBZ, or TBNZ conditional branch.
Table C2-40 shows the Logical (shifted register) instructions.
Table C2-40 Logical (shifted register) instructions
Mnemonic

Instruction

See

AND

Bitwise AND

AND (shifted register) on page C5-410

ANDS

Bitwise AND and set flags

ANDS (shifted register) on page C5-414

BIC

Bitwise bit clear

BIC (shifted register) on page C5-426

BICS

Bitwise bit clear and set flags

BICS (shifted register) on page C5-428

EON

Bitwise exclusive OR NOT

EON (shifted register) on page C5-474

EOR

Bitwise exclusive OR

EOR (shifted register) on page C5-477

ORR

Bitwise inclusive OR

ORR (shifted register) on page C5-627

MVN

Bitwise NOT

MVN on page C5-617

ORN

Bitwise inclusive OR NOT

ORN (shifted register) on page C5-623

TST

Test bits

TST (shifted register) on page C5-758

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C2-147

C2 A64 Instruction Set Overview
C2.4 Data processing - register

C2.4.5

Move (register)
The Move (register) instructions are aliases for other data processing instructions. They copy a value from a
general-purpose register to another general-purpose register or the current stack pointer, or from the current stack
pointer to a general-purpose register.
Table C2-41 MOV register instructions

Mnemonic

Instruction

See

MOV

•
•

•
•

C2.4.6

Move register
Move register to SP or move SP to register

MOV (register) on page C5-604
MOV (to/from SP) on page C5-600

Shift (register)
In the Shift (register) instructions, the shift amount is the positive value in the second source register modulo the
register size. The register width of the instruction controls whether the new bits are fed into the result on a right shift
or rotate at bit[63] or bit[31].
Table C2-42 shows the Shift (register) instructions.
Table C2-42 Shift (register) instructions

Mnemonic

Instruction

See

ASRV

Arithmetic shift right variable

ASRV on page C5-418

LSLV

Logical shift left variable

LSLV on page C5-593

LSRV

Logical shift right variable

LSRV on page C5-596

RORV

Rotate right variable

RORV on page C5-650

However, the Shift (register) instructions have a preferred set of aliases that match the shift immediate aliases
described in Shift (immediate) on page C2-143.
Table C2-43 shows the aliases for Shift (register) instructions.
Table C2-43 Aliases for Variable shift instructions
Mnemonic

Instruction

See

ASR

Arithmetic shift right

ASR (register) on page C5-416

LSL

Logical shift left

LSL (register) on page C5-591

LSR

Logical shift right

LSR (register) on page C5-594

ROR

Rotate right

ROR (register) on page C5-649

C2.4.7

Multiply and divide
This section describes the instructions used for integer multiplication and division. It contains the following
subsections:
•
Multiply on page C2-149.
•
Divide on page C2-149.

C2-148

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C2 A64 Instruction Set Overview
C2.4 Data processing - register

Multiply
The Multiply instructions write to a single 32-bit or 64-bit destination register, and are built around the fundamental
four operand multiply-add and multiply-subtract operation, together with 32-bit to 64-bit widening variants. A
64-bit to 128-bit widening multiple can be constructed with two instructions, using SMULH or UMULH to generate the
upper 64 bits. Table C2-44 shows the Multiply instructions.
Table C2-44 Multiply integer instructions
Mnemonic

Instruction

See

MADD

Multiply-add

MADD on page C5-597

MSUB

Multiply-subtract

MSUB on page C5-614

MNEG

Multiply-negate

MNEG on page C5-599

MUL

Multiply

MUL on page C5-616

SMADDL

Signed multiply-add long

SMADDL on page C5-662

SMSUBL

Signed multiply-subtract long

SMSUBL on page C5-665

SMNEGL

Signed multiply-negate long

SMNEGL on page C5-664

SMULL

Signed multiply long

SMULL on page C5-667

SMULH

Signed multiply high

SMULH on page C5-666

UMADDL

Unsigned multiply-add long

UMADDL on page C5-764.

UMSUBL

Unsigned multiply-subtract long

UMSUBL on page C5-766

UMNEGL

Unsigned multiply-negate long

UMNEGL on page C5-765

UMULL

Unsigned multiply long

UMULL on page C5-768

UMULH

Unsigned multiply high

UMULH on page C5-767

Divide
The Divide instructions compute the quotient of a division, rounded towards zero. The remainder can then be
computed as (numerator - (quotient × denominator)), using the MSUB instruction.
If a signed integer division (INT_MIN / -1) is performed where INT_MIN is the most negative integer value
representable in the selected register size, then the result overflows the signed integer range. No indication of this
overflow is produced and the result that is written to the destination register is INT_MIN.
A division by zero results in a zero being written to the destination register, without any indication that the division
by zero occurred.
Table C2-45 shows the Divide instructions.
Table C2-45 Divide instructions
Mnemonic

Instruction

See

SDIV

Signed divide

SDIV on page C5-659

UDIV

Unsigned divide

UDIV on page C5-763

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C2-149

C2 A64 Instruction Set Overview
C2.4 Data processing - register

C2.4.8

CRC32
The optional CRC32 instructions operate on the general-purpose register file to update a 32-bit CRC value from an
input value comprising 1, 2, 4, or 8 bytes. There are two different classes of CRC instructions, CRC32 and CRC32C, that
support two commonly used 32-bit polynomials, known as CRC-32 and CRC-32C.
To fit with common usage, the bit order of the values is reversed as part of the operation.
When bits[19:16] of ID_AA64ISAR0_EL1 are set to 0b0001 the CRC instructions are implemented.
Table C2-46 shows the CRC instructions.
Table C2-46 CRC32 instructions

Mnemonic

Instruction

See

CRC32B

CRC-32 sum from byte

CRC32B, CRC32H, CRC32W, CRC32X on page C5-454

CRC32H

CRC-32 sum from halfword

CRC32B, CRC32H, CRC32W, CRC32X on page C5-454

CRC32W

CRC-32 sum from word

CRC32B, CRC32H, CRC32W, CRC32X on page C5-454

CRC32X

CRC-32 sum from doubleword

CRC32B, CRC32H, CRC32W, CRC32X on page C5-454

CRC32CB

CRC-32C sum from byte

CRC32CB, CRC32CH, CRC32CW, CRC32CX on page C5-455

CRC32CH

CRC-32C sum from halfword

CRC32CB, CRC32CH, CRC32CW, CRC32CX on page C5-455

CRC32CW

CRC-32C sum from word

CRC32CB, CRC32CH, CRC32CW, CRC32CX on page C5-455

CRC32CX

CRC-32C sum from doubleword

CRC32CB, CRC32CH, CRC32CW, CRC32CX on page C5-455

C2.4.9

Bit operation
Table C2-47 shows the Bit operation instructions.
Table C2-47 Bit operation instructions

Mnemonic

Instruction

See

CLS

Count leading sign bits

CLS on page C5-443

CLZ

Count leading zero bits

CLZ on page C5-444

RBIT

Reverse bit order

RBIT on page C5-640

REV

Reverse bytes in register

REV on page C5-643.

REV16

Reverse bytes in halfwords

REV16 on page C5-645

REV32

Reverses bytes in words

REV32 on page C5-647

C2.4.10

Conditional select
The Conditional select instructions select between the first or second source register, depending on the current state
of the condition flags. When the named condition is true, the first source register is selected and its value is copied
without modification to the destination register. When the condition is false the second source register is selected
and its value might not be optionally inverted, negated, or incremented by one, before writing to the destination
register.
Other useful conditional set and conditional unary operations are implemented as aliases of the four Conditional
select instructions.

C2-150

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C2 A64 Instruction Set Overview
C2.4 Data processing - register

Table C2-48 shows the Conditional select instructions.
Table C2-48 Conditional select instructions
Mnemonic

Instruction

See

CSEL

Conditional select

CSEL on page C5-456

CSINC

Conditional select increment

CSINC on page C5-459

CSINV

Conditional select inversion

CSINV on page C5-461

CSNEG

Conditional select negation

CSNEG on page C5-463

CSET

Conditional set

CSET on page C5-457

CSETM

Conditional set mask

CSETM on page C5-458

CINC

Conditional increment

CINC on page C5-440

CINV

Conditional invert

CINV on page C5-441

CNEG

Conditional negate

CNEG on page C5-453

C2.4.11

Conditional comparison
The Conditional comparison instructions provide a conditional select for the NZCV condition flags, setting the flags
to the result of an arithmetic comparison of its two source register values if the named input condition is true, or to
an immediate value if the input condition is false. There are register and immediate forms. The immediate form
compares the source register to a small 5-bit unsigned value.
Table C2-49 shows the Conditional comparison instructions.
Table C2-49 Conditional comparison instructions

Mnemonic

Instruction

See

CCMN

Conditional compare negative (register)

CCMN (register) on page C5-437

CCMN

Conditional compare negative (immediate)

CCMN (immediate) on page C5-436

CCMP

Conditional compare (register)

CCMP (register) on page C5-439

CCMP

Conditional compare (immediate)

CCMP (immediate) on page C5-438

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C2-151

C2 A64 Instruction Set Overview
C2.5 Data processing - SIMD and floating-point

C2.5

Data processing - SIMD and floating-point
This section describes the instruction groups for data processing with SIMD and floating-point register operands.
It contains the following subsections that describe the scalar floating-point data processing instructions:
•
Floating-point move (register) on page C2-153.
•
Floating-point move (immediate) on page C2-153.
•
Floating-point conversion on page C2-154.
•
Floating-point round to integral on page C2-155.
•
Floating-point multiply-add on page C2-156.
•
Floating-point arithmetic (one source) on page C2-156.
•
Floating-point arithmetic (two sources) on page C2-156.
•
Floating-point minimum and maximum on page C2-156.
•
Floating-point comparison on page C2-157.
•
Floating-point conditional select on page C2-157.
It also contains the following subsections that describe the SIMD data processing instructions:
•
SIMD move on page C2-158
•
SIMD arithmetic on page C2-158.
•
SIMD compare on page C2-160.
•
SIMD widening and narrowing arithmetic on page C2-161.
•
SIMD unary arithmetic on page C2-162.
•
SIMD by element arithmetic on page C2-164.
•
SIMD permute on page C2-165.
•
SIMD immediate on page C2-165.
•
SIMD shift (immediate) on page C2-166.
•
SIMD floating-point and integer conversion on page C2-167.
•
SIMD reduce (across vector lanes) on page C2-168.
•
SIMD pairwise arithmetic on page C2-168.
•
SIMD table lookup on page C2-169.
•
Cryptography extensions on page C2-169.
For information about the encoding structure of the instructions in this instruction group, see Data processing SIMD and floating point on page C3-203.
For information about the Floating-point exceptions, see Floating-point Exception traps on page D1-1454.

C2.5.1

Common features of SIMD instructions
A number of SIMD instructions come in three forms:
•

Wide:
—

•

Long:
—

•

This is indicated by the suffix L. The element width of the destination register is double that of both
source operands.

Narrow:
—

C2-152

This is indicated by the suffix W. The element width of the destination register and the first source
operand is double that of the second source operand.

This is indicated by the suffix N. The element width of the destination register is half that of both
source operands.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C2 A64 Instruction Set Overview
C2.5 Data processing - SIMD and floating-point

Furthermore, each vector form of the instruction is part of a pair, with a second and upper half suffix of 2, to identify
the variant of the instruction:
•

Where a SIMD operation widens or lengthens a 64-bit vector to a 128-bit vector, the instruction provides a
second part operation that can extract the source from the upper 64-bits of the source registers.

•

Where a SIMD operation narrows a 128-bit vector to a 64-bit vector, the instruction provides a second-part
operation that can pack the result of a second operation into the upper part of the same destination register.

Note
This is referred to as a lane set specifier.

C2.5.2

Floating-point move (register)
The Floating-point move (register) instructions copy a scalar floating-point value from one register to another
register without performing any conversion.
Some of the Floating-point move (register) instructions overlap with the functionality provided by the Advanced
SIMD instructions DUP, INS, and UMOV. However, ARM recommends using the FMOV instructions when operating on
scalar floating-point data to avoid the creation of scalar floating-point code that depends on the availability of the
Advanced SIMD instruction set.
Table C2-50 shows the Floating-point move (register) instructions.
Table C2-50 Floating-point move (register) instructions

Mnemonic

Instruction

See

FMOV

•
•

•
•

C2.5.3

Floating-point move register without conversion
Floating-point move to or from general-purpose register without
conversion

FMOV (register) on page C6-962
FMOV (general) on page C6-963

Floating-point move (immediate)
The Floating-point move (immediate) instructions convert a small constant immediate floating-point value into a
single-precision or double-precision scalar floating-point value in a SIMD and floating-point register.
The floating-point constant can be specified either in decimal notation, such as 12.0 or -1.2e1, or as a string
beginning with 0x followed by a hexadecimal representation of the IEEE 754 single-precision or double-precision
encoding. ARM recommends that a disassembler uses the decimal notation, provided that this displays the value
precisely.
The floating-point value must be expressible as (± n/16 × 2r), where n is an integer in the range 16 ≤ n ≤ 31 and r is
an integer in the range of -3 ≤ r ≤ 4, that is a normalized binary floating-point encoding with one sign bit, four bits
of fraction, and a 3-bit exponent.

Note
This encoding does not include the floating-point constant 0.0. There are several instructions that can store zero in
a SIMD and floating-point register, but ARM recommends that software uses FMOV Sd, WZR or FMOV Dd, XZR to
provide consistency across a range of microarchitectures.
Table C2-51 shows the Floating-point move (immediate) instruction:
Table C2-51 Floating-point move (immediate) instruction
Mnemonic

Instruction

See

FMOV

Floating-point move immediate

FMOV (scalar, immediate) on page C6-965

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C2-153

C2 A64 Instruction Set Overview
C2.5 Data processing - SIMD and floating-point

C2.5.4

Floating-point conversion
The following subsections describe the conversion of floating-point values:
•
Convert floating-point precision.
•
Convert between floating-point and integer or fixed-point.

Convert floating-point precision
These instructions convert a floating-point scalar with one precision to a floating-point scalar with a different
precision, using the current rounding mode as specified by FPCR.RMode.
Table C2-52 shows the Floating-point precision conversion instruction.
Table C2-52 Floating-point precision conversion instruction
Mnemonic

Instruction

See

FCVT

Floating-point convert precision (scalar)

FCVT on page C6-868

Convert between floating-point and integer or fixed-point
These instructions convert a floating-point scalar in a SIMD and floating-point register to or from a signed or
unsigned integer or fixed-point in a general-purpose register. For a fixed-point value, a final immediate operand
indicates that the general-purpose register holds a fixed-point number and fbits indicates the number of bits after
the binary point. fbits is in the range 1- 32 inclusive for a 32-bit general-purpose register name, and 1-64 inclusive
for a 64-bit general-purpose register name.
These instructions generate the Invalid Operation exception, in response to a floating-point input of NaN, infinity,
or a numerical value that cannot be represented within the destination register. An out-of-range integer or
fixed-point result is saturated to the size of the destination register. A numeric result that differs from the input
generates an Inexact exception. When flush-to-zero mode is enabled, zero replaces a denormal input and generates
an Input Denormal exception.
Table C2-53 shows the Floating-point and fixed-point conversion instructions.
Table C2-53 Floating-point and integer or fixed-point conversion instructions
Mnemonic

Instruction

See

FCVTAS

Floating-point scalar convert to signed integer, rounding to
nearest with ties to away (scalar form)

FCVTAS (scalar) on page C6-872

FCVTAU

Floating-point scalar convert to unsigned integer, rounding
to nearest with ties to away (scalar form)

FCVTAU (scalar) on page C6-876

FCVTMS

Floating-point scalar convert to signed integer, rounding
toward minus infinity (scalar form)

FCVTMS (scalar) on page C6-881.

FCVTMU

Floating-point scalar convert to unsigned integer, rounding
toward minus infinity (scalar form)

FCVTMU (scalar) on page C6-885

FCVTNS

Floating-point scalar convert to signed integer, rounding to
nearest with ties to even (scalar form)

FCVTNS (scalar) on page C6-890.

FCVTNU

Floating-point scalar convert to unsigned integer, rounding
to nearest with ties to even (scalar form)

FCVTNU (scalar) on page C6-894

FCVTPS

Floating-point scalar convert to signed integer, rounding
toward positive infinity (scalar form)

FCVTPS (scalar) on page C6-898

C2-154

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C2 A64 Instruction Set Overview
C2.5 Data processing - SIMD and floating-point

Table C2-53 Floating-point and integer or fixed-point conversion instructions (continued)
Mnemonic

Instruction

See

FCVTPU

Floating-point scalar convert to unsigned integer, rounding
toward positive infinity (scalar form)

FCVTPU (scalar) on page C6-902

FCVTZS

•

Floating-point scalar convert to signed integer,
rounding toward zero (scalar form)
Floating-point convert to signed fixed-point,
rounding toward zero (scalar form)

•

FCVTZS (scalar, integer) on page C6-912

•

FCVTZS (scalar, fixed-point) on page C6-910

Floating-point scalar convert to unsigned integer,
rounding toward zero (scalar form)
Floating-point scalar convert to unsigned fixed-point,
rounding toward zero (scalar form)

•

FCVTZU (scalar, integer) on page C6-920

•

FCVTZU (scalar, fixed-point) on page C6-918

Signed integer scalar convert to floating-point, using
the current rounding mode (scalar form)
Signed fixed-point convert to floating-point, using
the current rounding mode (scalar form)

•

SCVTF (vector, integer) on page C6-1128

•

SCVTF (scalar, fixed-point) on page C6-1130

Unsigned integer scalar convert to floating-point,
using the current rounding mode (scalar form)
Unsigned fixed-point convert to floating-point, using
the current rounding mode (scalar form)

•

UCVTF (vector, integer) on page C6-1325

•

UCVTF (scalar, fixed-point) on page C6-1327

•
•

FCVTZU

•
•

SCVTF

•
•

UCVTF

•

C2.5.5

Floating-point round to integral
The Floating-point round to integral instructions round a floating-point value to an integral floating-point value of
the same size.
These instructions generate the Invalid Operation exception in response to a signaling NaN input, or the Input
Denormal exception in response to a denormal input when flush-to-zero mode is enabled. The FRINTX instruction
can also generate the Inexact exception if the result is numeric and does not have the same numerical value as the
input. A zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign,
and a NaN is propagated as in normal floating-point arithmetic.
Table C2-54 shows the Floating-point round to integral instructions.
Table C2-54 Floating-point round to integral instructions

Mnemonic

Instruction

See

FRINTA

Floating-point round to integral, to nearest with ties to away

FRINTA (scalar) on page C6-991

FRINTI

Floating-point round to integral, using current rounding mode

FRINTI (scalar) on page C6-993.

FRINTM

Floating-point round to integral, toward minus infinity

FRINTM (scalar) on page C6-995

FRINTN

Floating-point round to integral, to nearest with ties to even

FRINTN (scalar) on page C6-997

FRINTP

Floating-point round to integral, toward positive infinity

FRINTP (scalar) on page C6-999

FRINTX

Floating-point round to integral exact, using current rounding mode

FRINTX (scalar) on page C6-1001.

FRINTZ

Floating-point round to integral, toward zero

FRINTZ (scalar) on page C6-1003

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C2-155

C2 A64 Instruction Set Overview
C2.5 Data processing - SIMD and floating-point

C2.5.6

Floating-point multiply-add
Table C2-55 shows the Floating-point multiply-add instructions that require three source register operands.
Table C2-55 Floating-point multiply-add instructions

Mnemonic

Instruction

See

FMADD

Floating-point scalar fused multiply-add

FMADD on page C6-924

FMSUB

Floating-point scalar fused multiply-subtract

FMSUB on page C6-966

FNMADD

Floating-point scalar negated fused multiply-add

FNMADD on page C6-980

FNMSUB

Floating-point scalar negated fused multiply-subtract

FNMSUB on page C6-982

C2.5.7

Floating-point arithmetic (one source)
Table C2-56 shows the Floating-point arithmetic instructions that require a single source register operand.
Table C2-56 Floating-point arithmetic instructions with one source register

Mnemonic

Instructions

See

FABS

Floating-point scalar absolute value

FABS (scalar) on page C6-838

FNEG

Floating-point scalar negate

FNEG (scalar) on page C6-979

FSQRT

Floating-point scalar square root

FSQRT (scalar) on page C6-1009

C2.5.8

Floating-point arithmetic (two sources)
Table C2-57 shows the Floating-point arithmetic instructions that require two source register operands.
Table C2-57 Floating-point arithmetic instructions with two source registers

Mnemonic

Instruction

See

FADD

Floating-point scalar add

FADD (scalar) on page C6-844

FDIV

Floating-point scalar divide

FDIV (scalar) on page C6-923

FMUL

Floating-point scalar multiply

FMUL (scalar) on page C6-972

FNMUL

Floating-point scalar multiply-negate

FNMUL on page C6-984

FSUB

Floating-point scalar subtract

FSUB (scalar) on page C6-1011

C2.5.9

Floating-point minimum and maximum
The min(x,y) and max(x,y) operations return a quiet NaN when either x or y is NaN. In flush-to-zero mode
subnormal operands are flushed to zero before comparison, and if the result of the comparison is the flushed value,
then a zero value is returned. Where both x and y are zero, or subnormal values flushed to zero, with different signs,
then +0.0 is returned by max() and -0.0 by min().
The minNum(x,y) and maxNum(x,y) operations follow the IEEE 754-2008 standard and return the numerical operand
when one operand is numerical and the other a quiet NaN. Apart from this additional handling of a single quiet NaN
the result is then identical to min(x,y) and max(x,y).

C2-156

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C2 A64 Instruction Set Overview
C2.5 Data processing - SIMD and floating-point

Table C2-58 shows the Floating-point instructions that can perform floating-point minimum and maximum
operations.
Table C2-58 Floating-point minimum and maximum instructions
Mnemonic

Instruction

See

FMAX

Floating-point scalar maximum

FMAX (scalar) on page C6-928.

FMAXNM

Floating-point scalar maximum number

FMAXNM (scalar) on page C6-931

FMIN

Floating-point scalar minimum

FMIN (scalar) on page C6-942

FMINNM

Floating-point scalar minimum number

FMINNM (scalar) on page C6-945

C2.5.10

Floating-point comparison
These instructions set the NZCV condition flags in PSTATE, based on the result of a comparison of two operands.
If the floating-point comparisons are unordered, where one or both operands are a form of NaN, the C and V bits
are set to 1 and the N and Z bits are cleared to 0.

Note
The NZCV flags in the FPSR are associated with AArch32 state. The A64 floating-point comparison instructions
do not change the condition flags in the FPSR.
For the conditional Floating-point comparison instructions, if the condition is TRUE, the flags are updated to the
result of the comparison, otherwise the flags are updated to the immediate value that is defined in the instruction
encoding.
The quiet compare instructions generate an Invalid Operation exception if either of the source operands is a
signaling NaN. The signaling compare instructions generate an Invalid Operation exception if either of the source
operands is any type of NaN.
Table C2-59 shows the Floating-point comparison instructions.
Table C2-59 Floating-point comparison instructions
Mnemonic

Instruction

See

FCMP

Floating-point quiet compare

FCMP on page C6-865.

FCMPE

Floating-point signaling compare

FCMPE on page C6-866.

FCCMP

Floating-point conditional quiet
compare

FCCMP on page C6-847

FCCMPE

Floating-point conditional
signaling compare

FCCMPE on page C6-848.

C2.5.11

Floating-point conditional select
Table C2-60 shows the Floating-point conditional select instructions.
Table C2-60 Floating-point conditional select instructions

Mnemonic

Instruction

See

FCSEL

Floating-point scalar conditional select

FCSEL on page C6-867

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C2-157

C2 A64 Instruction Set Overview
C2.5 Data processing - SIMD and floating-point

C2.5.12

SIMD move
The functionality of some data movement instructions overlaps with that provided by the scalar floating-point FMOV
instructions described in Floating-point move (register) on page C2-153.
Table C2-61 shows the SIMD move instructions.
Table C2-61 SIMD move instructions

Mnemonic

Instruction

See

DUP

•
•

Duplicate vector element to vector or scalar
Duplicate general-purpose register to vector

•
•

DUP (element) on page C6-828
DUP (general) on page C6-830

INS

•
•

Insert vector element from another vector element
Insert vector element from general-purpose register

•
•

INS (element) on page C6-1012
INS (general) on page C6-1014

•
•
•
•

MOV (element) on page C6-1075
MOV (from general) on page C6-1076
MOV (scalar) on page C6-1074
MOV (to general) on page C6-1078

Note
Normally disassembled as MOV.
MOV

•
•
•
•

UMOV

Unsigned move vector element to general-purpose register

UMOV on page C6-1349

SMOV

Signed move vector element to general-purpose register

SMOV on page C6-1170

C2.5.13

Move vector element to vector element
Move general-purpose register to vector element
Move vector element to scalar
Move vector element to general-purpose register

SIMD arithmetic
Table C2-62 shows the SIMD arithmetic instructions.
Table C2-62 SIMD arithmetic instructions

Mnemonic

Instruction

See

ADD

Add (vector and scalar form)

ADD (vector) on page C6-782

AND

Bitwise AND (vector form)

AND (vector) on page C6-793

BIC

Bitwise bit clear (register) (vector form)

BIC (vector, register) on page C6-796

BIF

Bitwise insert if false (vector form)

BIF on page C6-797

BIT

Bitwise insert if true (vector form)

BIT on page C6-799

BSL

Bitwise select (vector form)

BSL on page C6-801

EOR

Bitwise exclusive OR (vector form)

EOR (vector) on page C6-832

FABD

Floating-point absolute difference (vector and scalar form)

FABD on page C6-835

FADD

Floating-point add (vector form)

FADD (scalar) on page C6-844

FDIV

Floating-point divide (vector form)

FDIV (vector) on page C6-922

FMAX

Floating-point maximum (vector form)

FMAXP (vector) on page C6-937

FMAXNM

Floating-point maximum number (vector form)

FMAXNM (vector) on page C6-929

C2-158

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C2 A64 Instruction Set Overview
C2.5 Data processing - SIMD and floating-point

Table C2-62 SIMD arithmetic instructions (continued)
Mnemonic

Instruction

See

FMIN

Floating-point minimum (vector form)

FMIN (vector) on page C6-940

FMINNM

Floating-point minimum number (vector form)

FMINNM (vector) on page C6-943

FMLA

Floating-point fused multiply-add (vector form)

FMLA (vector) on page C6-956

FMLS

Floating-point fused multiply-subtract (vector form)

FMLS (vector) on page C6-959

FMUL

Floating-point multiply (vector form)

FMUL (vector) on page C6-971

FMULX

Floating-point multiply extended (vector and scalar form)

FMULX on page C6-976

FRECPS

Floating-point reciprocal step (vector and scalar form)

FRECPS on page C6-987

FRSQRTS

Floating-point reciprocal square root step (vector and scalar form)

FRSQRTS on page C6-1006

FSUB

Floating-point subtract (vector form)

FSUB (vector) on page C6-1010

MLA

Multiply-add (vector form)

MLA (vector) on page C6-1068

MLS

Multiply-subtract (vector form)

MLS (vector) on page C6-1072

MUL

Multiply (vector form)

MUL (vector) on page C6-1083

MOV

Move vector register (vector form)

MOV (vector) on page C6-1077.

ORN

Bitwise inclusive OR NOT (vector form)

ORN (vector) on page C6-1091

ORR

Bitwise inclusive OR (register) (vector form)

ORR (vector, register) on page C6-1094

PMUL

Polynomial multiply (vector form)

PMUL on page C6-1095

SABA

Signed absolute difference and accumulate (vector form)

SABA on page C6-1111

SABD

Signed absolute difference (vector form)

SABD on page C6-1114

SHADD

Signed halving add (vector form)

SHADD on page C6-1144

SHSUB

Signed halving subtract (vector form)

SHSUB on page C6-1151

SMAX

Signed maximum (vector form)

SMAX on page C6-1154

SMIN

Signed minimum (vector form)

SMIN on page C6-1158

SQADD

Signed saturating add (vector and scalar form)

SQADD on page C6-1178

SQDMULH

Signed saturating doubling multiply returning high half (vector and
scalar form)

SQDMULH (vector) on page C6-1195

SQRSHL

Signed saturating rounding shift left (register) (vector and scalar form)

SQRSHL on page C6-1209

SQRDMULH

Signed saturating rounding doubling multiply returning high half
(vector and scalar form)

SQRDMULH (vector) on page C6-1207

SQSHL

Signed saturating shift left (register) (vector and scalar form)

SQSHL (register) on page C6-1220

SQSUB

Signed saturating subtract (vector and scalar form)

SQSUB on page C6-1231

SRHADD

Signed rounding halving add (vector form)

SRHADD on page C6-1237

SRSHL

Signed rounding shift left (register) (vector and scalar form)

SRSHL on page C6-1240

SSHL

Signed shift left (register) (vector and scalar form)

SSHL on page C6-1246

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C2-159

C2 A64 Instruction Set Overview
C2.5 Data processing - SIMD and floating-point

Table C2-62 SIMD arithmetic instructions (continued)
Mnemonic

Instruction

See

SUB

Subtract (vector and scalar form)

SUB (vector) on page C6-1295

UABA

Unsigned absolute difference and accumulate (vector form)

UABA on page C6-1308

UABD

Unsigned absolute difference (vector form)

UABD on page C6-1311

UHADD

Unsigned halving add (vector form)

UHADD on page C6-1331

UHSUB

Unsigned halving subtract (vector form)

UHSUB on page C6-1332

UMAX

Unsigned maximum (vector form)

UMAX on page C6-1333

UMIN

Unsigned minimum (vector form)

UMIN on page C6-1337

UQADD

Unsigned saturating add (vector and scalar form)

UQADD on page C6-1355

UQRSHL

Unsigned saturating rounding shift left (register) (vector and scalar
form)

UQRSHL on page C6-1357

UQSHL

Unsigned saturating shift left (register) (vector and scalar form)

UQSHL (register) on page C6-1365

UQSUB

Unsigned saturating subtract (vector and scalar form)

UQSUB on page C6-1370

URHADD

Unsigned rounding halving add (vector form)

URHADD on page C6-1375

URSHL

Unsigned rounding shift left (register) (vector and scalar form)

URSHL on page C6-1376

USHL

Unsigned shift left (register) (vector and scalar form)

USHL on page C6-1383

C2.5.14

SIMD compare
The SIMD compare instructions compare vector or scalar elements according to the specified condition and set the
destination vector element to all ones if the condition holds, or to zero if the condition does not hold.

Note
Some of the comparisons, such as LS, LE, LO, and LT, can be made by reversing the operands and using the
opposite comparison, HS, GE, HI, or GT.
Table C2-63 shows that SIMD compare instructions.
Table C2-63 SIMD compare instructions
Mnemonic

Instruction

See

CMEQ

•
•

•
•

CMHS

Compare unsigned higher or same (vector and scalar form)

CMHS (register) on page C6-819

CMGE

•

•

CMGE (register) on page C6-809

•

CMGE (zero) on page C6-811

•

Compare bitwise equal (vector and scalar form)
Compare bitwise equal to zero (vector and scalar form)

Compare signed greater than or equal (vector and scalar
form)
Compare signed greater than or equal to zero (vector and
scalar form)

CMEQ (register) on page C6-805
CMEQ (zero) on page C6-807

CMHI

Compare unsigned higher (vector and scalar form)

CMHI (register) on page C6-817

CMGT

•
•

•
•

C2-160

Compare signed greater than (vector and scalar form)
Compare signed greater than zero (vector and scalar form)

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

CMGT (register) on page C6-813
CMGT (zero) on page C6-815

ARM DDI 0487A.a
ID090413

C2 A64 Instruction Set Overview
C2.5 Data processing - SIMD and floating-point

Table C2-63 SIMD compare instructions (continued)
Mnemonic

Instruction

See

CMLE

Compare signed less than or equal to zero (vector and scalar form)

CMLE (zero) on page C6-821

CMLT

Compare signed less than zero (vector and scalar form)

CMLT (zero) on page C6-823

CMTST

Compare bitwise test bits nonzero (vector and scalar form)

CMTST on page C6-825

FCMEQ

•
•

Floating-point compare equal (vector and scalar form)
Floating-point compare equal to zero (vector and scalar
form)

•
•

FCMEQ (register) on page C6-849
FCMEQ (zero) on page C6-851

FCMGE

•

Floating-point compare greater than or equal (vector and
scalar form)
Floating-point compare greater than or equal to zero (vector
and scalar form)

•
•

FCMGE (register) on page C6-853
FCMGE (zero) on page C6-855

Floating-point compare greater than (vector and scalar form)
Floating-point compare greater than zero (vector and scalar
form)

•
•

FCMGT (register) on page C6-857
FCMGT (zero) on page C6-859

•
FCMGT

•
•

FCMLE

Floating-point compare less than or equal to zero (vector and scalar
form)

FCMLE (zero) on page C6-861

FCMLT

Floating-point compare less than zero (vector and scalar form)

FCMLT (zero) on page C6-863

FACGE

Floating-point absolute compare greater than or equal (vector and
scalar form)

FACGE on page C6-839

FACGT

Floating-point absolute compare greater than (vector and scalar
form)

FACGT on page C6-841

C2.5.15

SIMD widening and narrowing arithmetic
For information about the variants of these instructions, see Common features of SIMD instructions on page C2-152.
Table C2-64 shows the SIMD widening and narrowing arithmetic instructions.
Table C2-64 SIMD widening and narrowing arithmetic instructions

Mnemonic

Instruction

See

ADDHN, ADDHN2

Add returning high, narrow (vector form)

ADDHN, ADDHN2 on page C6-784

PMULL, PMULL2

Polynomial multiply long (vector form)

PMULL, PMULL2 on page C6-1096
See also Cryptography extensions on
page C2-169

RADDHN, RADDHN2

Rounding add returning high, narrow (vector form)

RADDHN, RADDHN2 on page C6-1098

RSUBHN, RSUBHN2

Rounding subtract returning high, narrow (vector form)

RSUBHN, RSUBHN2 on page C6-1109

SABAL, SABAL2

Signed absolute difference and accumulate long (vector form)

SABAL, SABAL2 on page C6-1112

SABDL, SABDL2

Signed absolute difference long (vector form)

SABDL, SABDL2 on page C6-1115

SADDL, SADDL2

Signed add long (vector form)

SADDL, SADDL2 on page C6-1119

SADDW, SADDW2

Signed add wide (vector form)

SADDW, SADDW2 on page C6-1124

SMLAL, SMLAL2

Signed multiply-add long (vector form)

SMLAL, SMLAL2 (vector) on page C6-1164

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C2-161

C2 A64 Instruction Set Overview
C2.5 Data processing - SIMD and floating-point

Table C2-64 SIMD widening and narrowing arithmetic instructions (continued)
Mnemonic

Instruction

See

SMLSL, SMLSL2

Signed multiply-subtract long (vector form)

SMLSL, SMLSL2 (vector) on page C6-1168

SMULL, SMULL2

Signed multiply long (vector form)

SMULL, SMULL2 (vector) on page C6-1174

SQDMLAL, SQDMLAL2

Signed saturating doubling multiply-add long (vector and
scalar form)

SQDMLAL, SQDMLAL2 (vector) on
page C6-1183

SQDMLSL, SQDMLSL2

Signed saturating doubling multiply-subtract long (vector and
scalar form)

SQDMLSL, SQDMLSL2 (vector) on
page C6-1189

SQDMULL, SQDMULL2

Signed saturating doubling multiply long (vector and scalar
form)

SQDMULL, SQDMULL2 (vector) on
page C6-1200

SSUBL, SSUBL2

Signed subtract long (vector form)

SSUBL, SSUBL2 on page C6-1254

SSUBW, SSUBW2

Signed subtract wide (vector form)

SSUBW, SSUBW2 on page C6-1256

SUBHN, SUBHN2

Subtract returning high, narrow (vector form)

SUBHN, SUBHN2 on page C6-1297

UABAL, UABAL2

Unsigned absolute difference and accumulate long (vector
form)

UABAL, UABAL2 on page C6-1309

UABDL, UABDL2

Unsigned absolute difference long (vector form)

UABDL, UABDL2 on page C6-1312

UADDL, UADDL2

Unsigned add long (vector form)

UADDL, UADDL2 on page C6-1316

UADDW, UADDW2

Unsigned add wide (vector form)

UADDW, UADDW2 on page C6-1321

UMLAL, UMLAL2

Unsigned multiply-add long (vector form)

UMLAL, UMLAL2 (vector) on page C6-1343

UMLSL, UMLSL2

Unsigned multiply-subtract long (vector form)

UMLSL, UMLSL2 (vector) on page C6-1347

UMULL, UMULL2

Unsigned multiply long (vector form)

UMULL, UMULL2 (vector) on page C6-1353

USUBL, USUBL2

Unsigned subtract long (vector form)

USUBL, USUBL2 on page C6-1393

USUBW, USUBW2

Unsigned subtract wide (vector form)

USUBW, USUBW2 on page C6-1395

C2.5.16

SIMD unary arithmetic
For information about the variants of these instructions, see Common features of SIMD instructions on page C2-152.
Table C2-65 shows the SIMD unary arithmetic instructions.
Table C2-65 SIMD unary arithmetic instructions

Mnemonic

Instruction

See

ABS

Absolute value (vector and scalar form)

ABS on page C6-780

CLS

Count leading sign bits (vector form)

CLS (vector) on page C6-803

CLZ

Count leading zero bits (vector form)

CLZ (vector) on page C6-804

CNT

Population count per byte (vector form)

CNT on page C6-827

FABS

Floating-point absolute (vector form)

FABS (vector) on page C6-837

FCVTL, FCVTL2

Floating-point convert to higher precision long (vector form)

FCVTL, FCVTL2 on page C6-878

FCVTN, FCVTN2

Floating-point convert to lower precision narrow (vector form)

FCVTN, FCVTN2 on page C6-887

C2-162

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C2 A64 Instruction Set Overview
C2.5 Data processing - SIMD and floating-point

Table C2-65 SIMD unary arithmetic instructions (continued)
Mnemonic

Instruction

See

FCVTXN, FCVTXN2

Floating-point convert to lower precision narrow, rounding to odd
(vector and scalar form)

FCVTXN, FCVTXN2 on
page C6-904

FNEG

Floating-point negate (vector form)

FNEG (vector) on page C6-978

FRECPE

Floating-point reciprocal estimate (vector and scalar form)

FRECPE on page C6-985

FRECPX

Floating-point reciprocal square root (scalar form)

FRECPX on page C6-989

FRINTA

Floating-point round to integral, to nearest with ties to away (vector
form)

FRINTA (scalar) on page C6-991

FRINTI

Floating-point round to integral, using current rounding mode (vector
form)

FRINTI (vector) on page C6-992

FRINTM

Floating-point round to integral, toward minus infinity (vector form)

FRINTM (vector) on page C6-994

FRINTN

Floating-point round to integral, to nearest with ties to even (vector
form)

FRINTN (vector) on page C6-996

FRINTP

Floating-point round to integral, toward positive infinity (vector form)

FRINTP (vector) on page C6-998

FRINTX

Floating-point round to integral exact, using current rounding mode
(vector form)

FRINTX (vector) on page C6-1000

FRINTZ

Floating-point round to integral, toward zero (vector form)

FRINTZ (vector) on page C6-1002

FRSQRTE

Floating-point reciprocal square root estimate (vector and scalar form)

FRSQRTE on page C6-1004

FSQRT

Floating-point square root (vector form)

FSQRT (vector) on page C6-1008

MVN

Bitwise NOT (vector form)

MVN on page C6-1085

NEG

Negate (vector and scalar form)

NEG (vector) on page C6-1088

NOT

Bitwise NOT (vector form)

NOT on page C6-1090

RBIT

Bitwise reverse (vector form)

RBIT (vector) on page C6-1100

REV16

Reverse elements in 16-bit halfwords (vector form)

REV16 (vector) on page C6-1101

REV32

Reverse elements in 32-bit words (vector form)

REV32 (vector) on page C6-1103

REV64

Reverse elements in 64-bit doublewords (vector form)

REV64 on page C6-1105

SADALP

Signed add and accumulate long pairwise (vector form)

SADALP on page C6-1117

SADDLP

Signed add long pairwise (vector form)

SADDLP on page C6-1121

SQABS

Signed saturating absolute value (vector and scalar form)

SQABS on page C6-1176

SQNEG

Signed saturating negate (vector and scalar form)

SQNEG on page C6-1202

SQXTN, SQXTN2

Signed saturating extract narrow (vector form)

SQXTN, SQXTN2 on
page C6-1233

SQXTUN, SQXTUN2

Signed saturating extract unsigned narrow (vector and scalar form)

SQXTUN, SQXTUN2 on
page C6-1235

SUQADD

Signed saturating accumulate of unsigned value (vector and scalar
form)

SUQADD on page C6-1299

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C2-163

C2 A64 Instruction Set Overview
C2.5 Data processing - SIMD and floating-point

Table C2-65 SIMD unary arithmetic instructions (continued)
Mnemonic

Instruction

See

SXTL, SXTL2

Signed extend long

SXTL on page C6-1301

UADALP

Unsigned add and accumulate long pairwise (vector form)

UADALP on page C6-1314

UADDLP

Unsigned add long pairwise (vector form)

UADDLP on page C6-1318

UQXTN, UQXTN2

Unsigned saturating extract narrow (vector form)

UQXTN, UQXTN2 on
page C6-1372

URECPE

Unsigned reciprocal estimate (vector form)

URECPE on page C6-1374

URSQRTE

Unsigned reciprocal square root estimate (vector form)

URSQRTE on page C6-1380

USQADD

Unsigned saturating accumulate of signed value (vector and scalar
form)

USQADD on page C6-1389

UXTL, UXTL2

Unsigned extend long

UXTL on page C6-1397

XTN, XTN2

Extract narrow (vector form)

XTN, XTN2 on page C6-1400

C2.5.17

SIMD by element arithmetic
For information about the variants of these instructions, see Common features of SIMD instructions on page C2-152.
Table C2-66 shows the SIMD by element arithmetic instructions.
Table C2-66 SIMD by element arithmetic instructions

Mnemonic

Instruction

See

FMLA

Floating-point fused multiply-add (vector and scalar form)

FMLA (by element) on page C6-954

FMLS

Floating-point fused multiply-subtract (vector and scalar form)

FMLS (by element) on page C6-957.

FMUL

Floating-point multiply (vector and scalar form)

FMUL (by element) on page C6-968

FMULX

Floating-point multiply extended (vector and scalar form)

FMULX (by element) on page C6-973

MLA

Multiply-add (vector form)

MLA (by element) on page C6-1066

MLS

Multiply-subtract (vector form)

MLS (by element) on page C6-1070

MUL

Multiply (vector form)

MUL (by element) on page C6-1081

SMLAL, SMLAL2

Signed multiply-add long (vector form)

SMLAL, SMLAL2 (by element) on
page C6-1162

SMLSL, SMLSL2

Signed multiply-subtract long (vector form)

SMLSL, SMLSL2 (by element) on
page C6-1166

SMULL, SMULL2

Signed multiply long (vector form)

SMULL, SMULL2 (by element) on
page C6-1172

SQDMLAL, SQDMLAL2

Signed saturating doubling multiply-add long (vector and scalar
form)

SQDMLAL, SQDMLAL2 (by element) on
page C6-1180

SQDMLSL, SQDMLSL2

Signed saturating doubling multiply-subtract long (vector form)

SQDMLSL, SQDMLSL2 (by element) on
page C6-1186

SQDMULH

Signed saturating doubling multiply returning high half (vector
and scalar form)

SQDMULH (by element) on page C6-1192

C2-164

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C2 A64 Instruction Set Overview
C2.5 Data processing - SIMD and floating-point

Table C2-66 SIMD by element arithmetic instructions (continued)
Mnemonic

Instruction

See

SQDMULL, SQDMULL2

Signed saturating doubling multiply long (vector and scalar form)

SQDMULL, SQDMULL2 (by element) on
page C6-1197

SQRDMULH

Signed saturating rounding doubling multiply returning high half
(vector and scalar form)

SQRDMULH (by element) on page C6-1204

UMLAL, UMLAL2

Unsigned multiply-add long (vector form)

UMLAL, UMLAL2 (by element) on
page C6-1341

UMLSL, UMLSL2

Unsigned multiply-subtract long (vector form)

UMLSL, UMLSL2 (by element) on
page C6-1345

UMULL, UMULL2

Unsigned multiply long (vector form)

UMULL, UMULL2 (by element) on
page C6-1351

C2.5.18

SIMD permute
Table C2-67 shows the SIMD permute instructions.
Table C2-67 SIMD permute instructions

Mnemonic

Instruction

See

EXT

Extract vector from a pair of vectors

EXT on page C6-834

TRN1

Transpose vectors (primary)

TRN1 on page C6-1306

TRN2

Transpose vectors (secondary)

TRN2 on page C6-1307

UZP1

Unzip vectors (primary)

UZP1 on page C6-1398

UZP2

Unzip vectors (secondary)

UZP2 on page C6-1399

ZIP1

Zip vectors (primary)

ZIP1 on page C6-1402

ZIP2

Zip vectors (secondary)

ZIP2 on page C6-1403

C2.5.19

SIMD immediate
Table C2-68 shows the SIMD immediate instructions.
Table C2-68 SIMD immediate instructions

Mnemonic

Instruction

See

BIC

Bitwise bit clear immediate

BIC (vector, immediate) on page C6-794

FMOV

Floating-point move immediate

FMOV (vector, immediate) on page C6-960

MOVI

Move immediate

MOVI on page C6-1079

MVNI

Move inverted immediate

MVNI on page C6-1086

ORR

Bitwise inclusive OR immediate

ORR (vector, immediate) on page C6-1092

C2.5.20

SIMD shift (immediate)
For information about the variants of these instructions, see Common features of SIMD instructions on page C2-152.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C2-165

C2 A64 Instruction Set Overview
C2.5 Data processing - SIMD and floating-point

Table C2-69 shows the SIMD shift immediate instructions.
Table C2-69 SIMD shift (immediate) instructions
Mnemonic

Instruction

See

RSHRN, RSHRN2

Rounding shift right narrow immediate (vector form)

RSHRN, RSHRN2 on page C6-1107

SHL

Shift left immediate (vector and scalar form)

SHL on page C6-1145

SHLL, SHLL2

Shift left long (by element size) (vector form)

SHLL, SHLL2 on page C6-1147

SHRN, SHRN2

Shift right narrow immediate (vector form)

SHRN, SHRN2 on page C6-1149

SLI

Shift left and insert immediate (vector and scalar form)

SLI on page C6-1152

SQRSHRN, SQRSHRN2

Signed saturating rounded shift right narrow immediate (vector
and scalar form)

SQRSHRN, SQRSHRN2 on
page C6-1211

SQRSHRUN, SQRSHRUN2

Signed saturating shift right unsigned narrow immediate (vector
and scalar form)

SQRSHRUN, SQRSHRUN2 on
page C6-1214

SQSHL

Signed saturating shift left immediate (vector and scalar form)

SQSHL (immediate) on page C6-1217

SQSHLU

Signed saturating shift left unsigned immediate (vector and scalar
form)

SQSHLU on page C6-1222

SQSHRN, SQSHRN2

Signed saturating shift right narrow immediate (vector and scalar
form)

SQSHRN, SQSHRN2 on page C6-1225

SQSHRUN, SQSHRUN2

Signed saturating shift right unsigned narrow immediate (vector
and scalar form)

SQSHRUN, SQSHRUN2 on
page C6-1228

SRI

Shift right and insert immediate (vector and scalar form)

SRI on page C6-1238

SRSHR

Signed rounding shift right immediate (vector and scalar form)

SRSHR on page C6-1242

SRSRA

Signed rounding shift right and accumulate immediate (vector and
scalar form)

SRSRA on page C6-1244.

SSHLL, SSHLL2

Signed shift left long immediate (vector form)

SSHLL, SSHLL2 on page C6-1248

SSHR

Signed shift right immediate (vector and scalar form)

SSHR on page C6-1250

SSRA

Signed integer shift right and accumulate immediate (vector and
scalar form)

SSRA on page C6-1252

SXTL, SXTL2

Signed integer extend (vector only)

SXTL on page C6-1301

UQRSHRN, UQRSHRN2

Unsigned saturating rounded shift right narrow immediate (vector
and scalar form)

UQRSHRN, UQRSHRN2 on
page C6-1359

UQSHL

Unsigned saturating shift left immediate (vector and scalar form)

UQSHL (immediate) on page C6-1362

UQSHRN, UQSHRN2

Unsigned saturating shift right narrow immediate (vector and
scalar form)

UQSHRN on page C6-1367

URSHR

Unsigned rounding shift right immediate (vector and scalar form)

URSHR on page C6-1378

URSRA

Unsigned integer rounding shift right and accumulate immediate
(vector and scalar form)

URSRA on page C6-1381

USHLL, USHLL2

Unsigned shift left long immediate (vector form)

USHLL, USHLL2 on page C6-1385

C2-166

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C2 A64 Instruction Set Overview
C2.5 Data processing - SIMD and floating-point

Table C2-69 SIMD shift (immediate) instructions (continued)
Mnemonic

Instruction

See

USHR

Unsigned shift right immediate (vector and scalar form)

USHR on page C6-1387

USRA

Unsigned shift right and accumulate immediate (vector and scalar
form)

USRA on page C6-1391

UXTL, UXTL2

Unsigned integer extend (vector only)

UXTL on page C6-1397

C2.5.21

SIMD floating-point and integer conversion
The SIMD floating-point and integer conversion instructions generate the Invalid Operation exception in response
to a floating-point input of NaN, infinity, or a numerical value that cannot be represented within the destination
register. An out-of-range integer or a fixed-point result is saturated to the size of the destination register. A numeric
result that differs from the input raises the Inexact exception.
Table C2-70 shows the SIMD floating-point and integer conversion instructions.
Table C2-70 SIMD floating-point and integer conversion instructions

Mnemonic

Instruction

See

FCVTAS

Floating-point convert to signed integer, rounding to nearest with ties
to away (vector and scalar form)

FCVTAS (vector) on page C6-870

FCVTAU

Floating-point convert to unsigned integer, rounding to nearest with ties
to away (vector and scalar form)

FCVTAU (vector) on page C6-874

FCVTMS

Floating-point convert to signed integer, rounding toward minus
infinity (vector and scalar form)

FCVTMS (vector) on page C6-879

FCVTMU

Floating-point convert to unsigned integer, rounding toward minus
infinity (vector and scalar form)

FCVTMU (vector) on page C6-883

FCVTNS

Floating-point convert to signed integer, rounding to nearest with ties
to even (vector and scalar form)

FCVTNS (vector) on page C6-888

FCVTNU

Floating-point convert to unsigned integer, rounding to nearest with ties
to even (vector and scalar form)

FCVTNU (vector) on page C6-892

FCVTPS

Floating-point convert to signed integer, rounding toward positive
infinity (vector and scalar form)

FCVTPS (vector) on page C6-896

FCVTPU

Floating-point convert to unsigned integer, rounding toward positive
infinity (vector and scalar form)

FCVTPU (vector) on page C6-900

FCVTZS

•

•

•

ARM DDI 0487A.a
ID090413

Floating-point convert to signed integer, rounding toward zero
(vector and scalar form)
Floating-point convert to signed fixed-point, rounding toward
zero (vector and scalar form)

•

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

FCVTZS (vector, integer) on
page C6-908
FCVTZS (vector, fixed-point) on
page C6-906

C2-167

C2 A64 Instruction Set Overview
C2.5 Data processing - SIMD and floating-point

Table C2-70 SIMD floating-point and integer conversion instructions (continued)
Mnemonic

Instruction

See

FCVTZU

•

Floating-point convert to unsigned integer, rounding toward zero
(vector and scalar form)
Floating-point convert to unsigned fixed-point, rounding toward
zero, (vector and scalar form)

•

•

Signed integer convert to floating-point (vector and scalar form)

•

•

Signed fixed-point convert to floating-point (vector and scalar
form)

•

•

Unsigned integer convert to floating-point (vector and scalar
form)

•

•

Unsigned fixed-point convert to floating-point (vector and scalar
form)

•
SCVTF

UCVTF

•

•

C2.5.22

FCVTZU (vector, integer) on
page C6-916
FCVTZU (vector, fixed-point) on
page C6-914
SCVTF (vector, integer) on
page C6-1128
SCVTF (vector, fixed-point) on
page C6-1126
UCVTF (vector, integer) on
page C6-1325
UCVTF (vector, fixed-point) on
page C6-1323

SIMD reduce (across vector lanes)
The SIMD reduce (across vector lanes) instructions perform arithmetic operations horizontally, that is across all
lanes of the input vector. They deliver a single scalar result.
Table C2-71 shows the SIMD reduce (across vector lanes) instructions.
Table C2-71 SIMD reduce (across vector lanes) instructions

Mnemonic

Instruction

See

ADDV

Add (across vector)

ADDV on page C6-788

FMAXNMV

Floating-point maximum number (across vector)

FMAXNMV on page C6-935

FMAXV

Floating-point maximum (across vector)

FMAXV on page C6-939

FMINNMV

Floating-point minimum number (across vector)

FMINNMV on page C6-949

FMINV

Floating-point minimum (across vector)

FMINV on page C6-953

SADDLV

Signed add long (across vector)

SADDLV on page C6-1123

SMAXV

Signed maximum (across vector)

SMAXV on page C6-1156

SMINV

Signed minimum (across vector)

SMINV on page C6-1160

UADDLV

Unsigned add long (across vector)

UADDLV on page C6-1320

UMAXV

Unsigned maximum (across vector)

UMAXV on page C6-1335

UMINV

Unsigned minimum (across vector)

UMINV on page C6-1339

C2.5.23

SIMD pairwise arithmetic
The SIMD pairwise arithmetic instructions perform operations on pairs of adjacent elements and deliver a vector
result.

C2-168

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C2 A64 Instruction Set Overview
C2.5 Data processing - SIMD and floating-point

Table C2-72 shows the SIMD pairwise arithmetic instructions.
Table C2-72 SIMD pairwise arithmetic instructions
Mnemonic

Instruction

See

ADDP

Add pairwise (vector and scalar form)

•
•

ADDP (vector) on page C6-787
ADDP (scalar) on page C6-786

FADDP

Floating-point add pairwise (vector and scalar form)

•
•

FADDP (vector) on page C6-846
FADDP (scalar) on page C6-845

FMAXNMP

Floating-point maximum number pairwise (vector and scalar
form)

•
•

FMAXNMP (vector) on page C6-933
FMAXNMP (scalar) on page C6-932

FMAXP

Floating-point maximum pairwise (vector and scalar form)

•
•

FMAXP (vector) on page C6-937
FMAXP (scalar) on page C6-936

FMINNMP

Floating-point minimum number pairwise (vector and scalar form)

•
•

FMINNMP (vector) on page C6-947
FMINNMP (scalar) on page C6-946

FMINP

Floating-point minimum pairwise (vector and scalar form)

•
•

FMINP (vector) on page C6-951
FMINP (scalar) on page C6-950

SMAXP

Signed maximum pairwise

SMAXP on page C6-1155

SMINP

Signed minimum pairwise

SMINP on page C6-1159

UMAXP

Unsigned maximum pairwise

UMAXP on page C6-1334

UMINP

Unsigned minimum pairwise

UMINP on page C6-1338

C2.5.24

SIMD table lookup
Table C2-73 shows the SIMD table lookup instructions.
Table C2-73 SIMD table lookup instructions

Mnemonic

Instruction

See

TBL

Table vector lookup

TBL on page C6-1302

TBX

Table vector lookup extension

TBX on page C6-1304

C2.5.25

Cryptography extensions
The optional Cryptography extension instructions share the SIMD and floating-point register file. For more
information see:
•
Announcing the Advanced Encryption Standard.
•
The Galois/Counter Mode of Operation.
•
Announcing the Secure Hash Standard.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C2-169

C2 A64 Instruction Set Overview
C2.5 Data processing - SIMD and floating-point

Table C2-74 shows the Cryptography extension instructions.
Table C2-74 Cryptography extension instructions
Mnemonic

Instruction

See

AESD

AES single round decryption

AESD on page C6-789

AESE

AES single round encryption

AESE on page C6-790

AESIMC

AES inverse mix columns

AESIMC on page C6-791

AESMC

AES mix columns

AESMC on page C6-792

PMULL

Polynomial multiply long

PMULL, PMULL2 on page C6-1096

SHA1C

SHA1 hash update (choose)

SHA1C on page C6-1134

SHA1H

SHA1 fixed rotate

SHA1H on page C6-1135

SHA1M

SHA1 hash update (majority)

SHA1M on page C6-1136

SHA1P

SHA1 hash update (parity)

SHA1P on page C6-1137

SHA1SU0

SHA1 schedule update 0

SHA1SU0 on page C6-1138

SHA1SU1

SHA1 schedule update 1

SHA1SU1 on page C6-1139

SHA256H

SHA256 hash update (part 1)

SHA256H on page C6-1141

SHA256H2

SHA256 hash update (part 2)

SHA256H2 on page C6-1140

SHA256SU0

SHA256 schedule update 0

SHA256SU0 on page C6-1142

SHA256SU1

SHA256 schedule update 1

SHA256SU1 on page C6-1143

C2-170

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

Chapter C3
A64 Instruction Set Encoding

This chapter describes the A64 instruction set encoding. It contains an encoding index followed by a set of
functional groups. Each group contains an alphabetical list of instructions that have similar function within the
instruction set.
It contains the following sections:
•
A64 instruction index by encoding on page C3-172.
•
Branches, exception generating and system instructions on page C3-173
•
Loads and stores on page C3-176
•
Data processing - immediate on page C3-193
•
Data processing - register on page C3-196
•
Data processing - SIMD and floating point on page C3-203

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C3-171

C3 A64 Instruction Set Encoding
C3.1 A64 instruction index by encoding

C3.1

A64 instruction index by encoding
Table C3-1 A64 main encoding table

Instruction bits
Encoding Group
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10
-

-

-

0

0

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

UNALLOCATED

-

-

-

1

0

0

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

Data processing - immediate

-

-

-

1

0

1

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

Branch, exception generation and
system instructions

-

-

-

-

1

-

0

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

Loads and stores

-

-

-

-

1

0

1

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

Data processing - register

-

-

-

0

1

1

1

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

Data processing - SIMD and floating
point

-

-

-

1

1

1

1

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

Data processing - SIMD and floating
point

C3-172

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C3 A64 Instruction Set Encoding
C3.2 Branches, exception generating and system instructions

C3.2

Branches, exception generating and system instructions
This section describes the encoding of the instruction classes in the Branch, exception generation and system
instruction group, and shows how each instruction class encodes the different instruction forms. For additional
information on this functional group of instructions, see Branches, Exception generating, and System instructions
on page C2-124.
Table C3-2 Encoding table for the Branches, Exception Generating and System instructions functional group

Instruction bits
Instruction class
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10
-

0

0

1

0

1

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

Unconditional branch (immediate)

-

0

1

1

0

1

0

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

Compare & branch (immediate)

-

0

1

1

0

1

1

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

Test & branch (immediate)

0

1

0

1

0

1

0

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

Conditional branch (immediate)

1

1

0

1

0

1

0

0

-

-

-

-

-

-

-

-

-

-

-

-

-

-

Exception generation

1

1

0

1

0

1

0

1

0

0

-

-

-

-

-

-

-

-

-

-

-

-

System

1

1

0

1

0

1

1

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

Unconditional branch (register)

C3.2.1

Compare & branch (immediate)
31 30 29 28 27 26 25 24 23
sf 0 1 1 0 1 0 op

5 4
imm19

0
Rt

Decode fields

C3.2.2

Instruction Page

Variant

0

CBZ

32-bit

0

1

CBNZ

32-bit

1

0

CBZ

64-bit

1

1

CBNZ

64-bit

sf

op

0

Conditional branch (immediate)
31 30 29 28 27 26 25 24 23
0 1 0 1 0 1 0 o1

5 4 3
imm19

o0

0
cond

Decode fields

ARM DDI 0487A.a
ID090413

o1

o0

0

0

Instruction Page

Variant

B.cond

-

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C3-173

C3 A64 Instruction Set Encoding
C3.2 Branches, exception generating and system instructions

C3.2.3

Exception generation
31 30 29 28 27 26 25 24 23
1 1 0 1 0 1 0 0

21 20

5 4

opc

imm16

2 1 0
op2

LL

Decode fields

C3.2.4

Instruction Page

Variant

01

SVC

-

000

10

HVC

-

000

000

11

SMC

-

001

000

00

BRK

-

010

000

00

HLT

-

101

000

01

DCPS1

-

101

000

10

DCPS2

-

101

000

11

DCPS3

-

opc

op2

LL

000

000

000

System
31 30 29 28 27 26 25 24 23 22 21 20 19 18
1 1 0 1 0 1 0 1 0 0 L

op0

16 15
op1

12 11
CRn

8 7
CRm

5 4
op2

0
Rt

Decode fields

C3-174

Instruction Page

Variant

11111

MSR (immediate)

-

-

11111

HINT

-

0011

010

11111

CLREX

-

011

0011

100

11111

DSB

-

00

011

0011

101

11111

DMB

-

0

00

011

0011

110

11111

ISB

-

0

01

-

-

-

-

SYS

-

0

1x

-

-

-

-

MSR (register)

-

1

01

-

-

-

-

SYSL

-

1

1x

-

-

-

-

MRS

-

L

op0

op1

CRn

op2

Rt

0

00

-

0100

-

0

00

011

0010

0

00

011

0

00

0

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C3 A64 Instruction Set Encoding
C3.2 Branches, exception generating and system instructions

C3.2.5

Test & branch (immediate)
31 30 29 28 27 26 25 24 23
b5 0 1 1 0 1 1 op

19 18

5 4

b40

0

imm14

Rt

Decode fields
Instruction Page

Variant

0

TBZ

-

1

TBNZ

-

op

C3.2.6

Unconditional branch (immediate)
31 30 29 28 27 26 25

0

op 0 0 1 0 1

imm26

Decode fields
Instruction Page

Variant

0

B

-

1

BL

-

op

C3.2.7

Unconditional branch (register)
31 30 29 28 27 26 25 24
1 1 0 1 0 1 1

21 20
opc

16 15
op2

10 9
op3

5 4
Rn

0
op4

Decode fields

ARM DDI 0487A.a
ID090413

Instruction Page

Variant

00000

BR

-

-

00000

BLR

-

000000

-

00000

RET

-

11111

000000

11111

00000

ERET

-

11111

000000

11111

00000

DRPS

-

opc

op2

op3

Rn

op4

0000

11111

000000

-

0001

11111

000000

0010

11111

0100
0101

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C3-175

C3 A64 Instruction Set Encoding
C3.3 Loads and stores

C3.3

Loads and stores
This section describes the encoding of the instruction classes in the Loads and stores instruction group, and shows
how each instruction class encodes the different instruction forms. For additional information on this functional
group of instructions, see Loads and stores on page C2-129.
Table C3-3 Encoding table for the Loads and Stores functional group

Instruction bits
Instruction class
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10
-

-

0

0

1

0

0

0

-

-

-

-

-

-

-

-

-

-

-

-

-

-

Load/store exclusive

-

-

0

1

1

-

0

0

-

-

-

-

-

-

-

-

-

-

-

-

-

-

Load register (literal)

-

-

1

0

1

-

0

0

0

-

-

-

-

-

-

-

-

-

-

-

-

-

Load/store no-allocate pair (offset)

-

-

1

0

1

-

0

0

1

-

-

-

-

-

-

-

-

-

-

-

-

-

Load/store register pair (post-indexed)

-

-

1

0

1

-

0

1

0

-

-

-

-

-

-

-

-

-

-

-

-

-

Load/store register pair (offset)

-

-

1

0

1

-

0

1

1

-

-

-

-

-

-

-

-

-

-

-

-

-

Load/store register pair (pre-indexed)

-

-

1

1

1

-

0

0

-

-

0

-

-

-

-

-

-

-

-

-

0

0

Load/store register (unscaled
immediate)

-

-

1

1

1

-

0

0

-

-

0

-

-

-

-

-

-

-

-

-

0

1

Load/store register (immediate
post-indexed)

-

-

1

1

1

-

0

0

-

-

0

-

-

-

-

-

-

-

-

-

1

0

Load/store register (unprivileged)

-

-

1

1

1

-

0

0

-

-

0

-

-

-

-

-

-

-

-

-

1

1

Load/store register (immediate
pre-indexed)

-

-

1

1

1

-

0

0

-

-

1

-

-

-

-

-

-

-

-

-

1

0

Load/store register (register offset)

-

-

1

1

1

-

0

1

-

-

-

-

-

-

-

-

-

-

-

-

-

-

Load/store register (unsigned
immediate)

0

-

0

0

1

1

0

0

0

-

0

0

0

0

0

0

-

-

-

-

-

-

AdvSIMD load/store multiple structures

0

-

0

0

1

1

0

0

1

-

0

-

-

-

-

-

-

-

-

-

-

-

AdvSIMD load/store multiple structures
(post-indexed)

0

-

0

0

1

1

0

1

0

-

-

0

0

0

0

0

-

-

-

-

-

-

AdvSIMD load/store single structure

0

-

0

0

1

1

0

1

1

-

-

-

-

-

-

-

-

-

-

-

-

-

AdvSIMD load/store single structure
(post-indexed)

C3.3.1

AdvSIMD load/store multiple structures
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15
0 Q 0 0 1 1 0 0 0 L 0 0 0 0 0 0

Decode
fields

C3-176

12 11 10 9

opcode

size

Instruction Page

Variant

5 4
Rn

L

opcode

0

0000

ST4 (multiple structures)

No offset

0

0010

ST1 (multiple structures)

Four registers

0

0100

ST3 (multiple structures)

No offset

0

0110

ST1 (multiple structures)

Three registers

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

0
Rt

ARM DDI 0487A.a
ID090413

C3 A64 Instruction Set Encoding
C3.3 Loads and stores

Decode
fields

C3.3.2

Instruction Page

Variant

L

opcode

0

0111

ST1 (multiple structures)

One register

0

1000

ST2 (multiple structures)

No offset

0

1010

ST1 (multiple structures)

Two registers

1

0000

LD4 (multiple structures)

No offset

1

0010

LD1 (multiple structures)

Four registers

1

0100

LD3 (multiple structures)

No offset

1

0110

LD1 (multiple structures)

Three registers

1

0111

LD1 (multiple structures)

One register

1

1000

LD2 (multiple structures)

No offset

1

1010

LD1 (multiple structures)

Two registers

AdvSIMD load/store multiple structures (post-indexed)
31 30 29 28 27 26 25 24 23 22 21 20
0 Q 0 0 1 1 0 0 1 L 0

16 15
Rm

12 11 10 9

opcode

size

5 4
Rn

0
Rt

Decode fields

ARM DDI 0487A.a
ID090413

Instruction Page

Variant

0000

ST4 (multiple structures)

Register offset

!= 11111

0010

ST1 (multiple structures)

Four registers, register offset

0

!= 11111

0100

ST3 (multiple structures)

Register offset

0

!= 11111

0110

ST1 (multiple structures)

Three registers, register offset

0

!= 11111

0111

ST1 (multiple structures)

One register, register offset

0

!= 11111

1000

ST2 (multiple structures)

Register offset

0

!= 11111

1010

ST1 (multiple structures)

Two registers, register offset

0

11111

0000

ST4 (multiple structures)

Immediate offset

0

11111

0010

ST1 (multiple structures)

Four registers, immediate offset

0

11111

0100

ST3 (multiple structures)

Immediate offset

0

11111

0110

ST1 (multiple structures)

Three registers, immediate offset

0

11111

0111

ST1 (multiple structures)

One register, immediate offset

0

11111

1000

ST2 (multiple structures)

Immediate offset

0

11111

1010

ST1 (multiple structures)

Two registers, immediate offset

L Rm

opcode

0

!= 11111

0

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C3-177

C3 A64 Instruction Set Encoding
C3.3 Loads and stores

Decode fields

C3.3.3

Instruction Page

Variant

0000

LD4 (multiple structures)

Register offset

!= 11111

0010

LD1 (multiple structures)

Four registers, register offset

1

!= 11111

0100

LD3 (multiple structures)

Register offset

1

!= 11111

0110

LD1 (multiple structures)

Three registers, register offset

1

!= 11111

0111

LD1 (multiple structures)

One register, register offset

1

!= 11111

1000

LD2 (multiple structures)

Register offset

1

!= 11111

1010

LD1 (multiple structures)

Two registers, register offset

1

11111

0000

LD4 (multiple structures)

Immediate offset

1

11111

0010

LD1 (multiple structures)

Four registers, immediate offset

1

11111

0100

LD3 (multiple structures)

Immediate offset

1

11111

0110

LD1 (multiple structures)

Three registers, immediate offset

1

11111

0111

LD1 (multiple structures)

One register, immediate offset

1

11111

1000

LD2 (multiple structures)

Immediate offset

1

11111

1010

LD1 (multiple structures)

Two registers, immediate offset

L Rm

opcode

1

!= 11111

1

AdvSIMD load/store single structure
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15

13 12 11 10 9

0 Q 0 0 1 1 0 1 0 L R 0 0 0 0 0 opcode S size

5 4
Rn

0
Rt

Decode fields

C3-178

Instruction Page

Variant

-

ST1 (single structure)

8-bit

-

-

ST3 (single structure)

8-bit

010

-

x0

ST1 (single structure)

16-bit

0

011

-

x0

ST3 (single structure)

16-bit

0

0

100

-

00

ST1 (single structure)

32-bit

0

0

100

0

01

ST1 (single structure)

64-bit

0

0

101

-

00

ST3 (single structure)

32-bit

0

0

101

0

01

ST3 (single structure)

64-bit

0

1

000

-

-

ST2 (single structure)

8-bit

0

1

001

-

-

ST4 (single structure)

8-bit

L

R

opcode

S

size

0

0

000

-

0

0

001

0

0

0

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C3 A64 Instruction Set Encoding
C3.3 Loads and stores

Decode fields

ARM DDI 0487A.a
ID090413

Instruction Page

Variant

x0

ST2 (single structure)

16-bit

-

x0

ST4 (single structure)

16-bit

100

-

00

ST2 (single structure)

32-bit

1

100

0

01

ST2 (single structure)

64-bit

0

1

101

-

00

ST4 (single structure)

32-bit

0

1

101

0

01

ST4 (single structure)

64-bit

1

0

000

-

-

LD1 (single structure)

8-bit

1

0

001

-

-

LD3 (single structure)

8-bit

1

0

010

-

x0

LD1 (single structure)

16-bit

1

0

011

-

x0

LD3 (single structure)

16-bit

1

0

100

-

00

LD1 (single structure)

32-bit

1

0

100

0

01

LD1 (single structure)

64-bit

1

0

101

-

00

LD3 (single structure)

32-bit

1

0

101

0

01

LD3 (single structure)

64-bit

1

0

110

0

-

LD1R

No offset

1

0

111

0

-

LD3R

No offset

1

1

000

-

-

LD2 (single structure)

8-bit

1

1

001

-

-

LD4 (single structure)

8-bit

1

1

010

-

x0

LD2 (single structure)

16-bit

1

1

011

-

x0

LD4 (single structure)

16-bit

1

1

100

-

00

LD2 (single structure)

32-bit

1

1

100

0

01

LD2 (single structure)

64-bit

1

1

101

-

00

LD4 (single structure)

32-bit

1

1

101

0

01

LD4 (single structure)

64-bit

1

1

110

0

-

LD2R

No offset

1

1

111

0

-

LD4R

No offset

L

R

opcode

S

size

0

1

010

-

0

1

011

0

1

0

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C3-179

C3 A64 Instruction Set Encoding
C3.3 Loads and stores

C3.3.4

AdvSIMD load/store single structure (post-indexed)
31 30 29 28 27 26 25 24 23 22 21 20
0 Q 0 0 1 1 0 1 1 L R

16 15
Rm

13 12 11 10 9

opcode S size

5 4
Rn

0
Rt

Decode fields

C3-180

Instruction Page

Variant

-

ST1 (single structure)

8-bit, register offset

-

-

ST3 (single structure)

8-bit, register offset

010

-

x0

ST1 (single structure)

16-bit, register offset

!= 11111

011

-

x0

ST3 (single structure)

16-bit, register offset

0

!= 11111

100

-

00

ST1 (single structure)

32-bit, register offset

0

0

!= 11111

100

0

01

ST1 (single structure)

64-bit, register offset

0

0

!= 11111

101

-

00

ST3 (single structure)

32-bit, register offset

0

0

!= 11111

101

0

01

ST3 (single structure)

64-bit, register offset

0

0

11111

000

-

-

ST1 (single structure)

8-bit, immediate offset

0

0

11111

001

-

-

ST3 (single structure)

8-bit, immediate offset

0

0

11111

010

-

x0

ST1 (single structure)

16-bit, immediate offset

0

0

11111

011

-

x0

ST3 (single structure)

16-bit, immediate offset

0

0

11111

100

-

00

ST1 (single structure)

32-bit, immediate offset

0

0

11111

100

0

01

ST1 (single structure)

64-bit, immediate offset

0

0

11111

101

-

00

ST3 (single structure)

32-bit, immediate offset

0

0

11111

101

0

01

ST3 (single structure)

64-bit, immediate offset

0

1

!= 11111

000

-

-

ST2 (single structure)

8-bit, register offset

0

1

!= 11111

001

-

-

ST4 (single structure)

8-bit, register offset

0

1

!= 11111

010

-

x0

ST2 (single structure)

16-bit, register offset

0

1

!= 11111

011

-

x0

ST4 (single structure)

16-bit, register offset

0

1

!= 11111

100

-

00

ST2 (single structure)

32-bit, register offset

0

1

!= 11111

100

0

01

ST2 (single structure)

64-bit, register offset

0

1

!= 11111

101

-

00

ST4 (single structure)

32-bit, register offset

0

1

!= 11111

101

0

01

ST4 (single structure)

64-bit, register offset

0

1

11111

000

-

-

ST2 (single structure)

8-bit, immediate offset

0

1

11111

001

-

-

ST4 (single structure)

8-bit, immediate offset

0

1

11111

010

-

x0

ST2 (single structure)

16-bit, immediate offset

L R Rm

opcode

S size

0

0

!= 11111

000

-

0

0

!= 11111

001

0

0

!= 11111

0

0

0

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C3 A64 Instruction Set Encoding
C3.3 Loads and stores

Decode fields

ARM DDI 0487A.a
ID090413

Instruction Page

Variant

x0

ST4 (single structure)

16-bit, immediate offset

-

00

ST2 (single structure)

32-bit, immediate offset

100

0

01

ST2 (single structure)

64-bit, immediate offset

11111

101

-

00

ST4 (single structure)

32-bit, immediate offset

1

11111

101

0

01

ST4 (single structure)

64-bit, immediate offset

1

0

!= 11111

000

-

-

LD1 (single structure)

8-bit, register offset

1

0

!= 11111

001

-

-

LD3 (single structure)

8-bit, register offset

1

0

!= 11111

010

-

x0

LD1 (single structure)

16-bit, register offset

1

0

!= 11111

011

-

x0

LD3 (single structure)

16-bit, register offset

1

0

!= 11111

100

-

00

LD1 (single structure)

32-bit, register offset

1

0

!= 11111

100

0

01

LD1 (single structure)

64-bit, register offset

1

0

!= 11111

101

-

00

LD3 (single structure)

32-bit, register offset

1

0

!= 11111

101

0

01

LD3 (single structure)

64-bit, register offset

1

0

!= 11111

110

0

-

LD1R

Register offset

1

0

!= 11111

111

0

-

LD3R

Register offset

1

0

11111

000

-

-

LD1 (single structure)

8-bit, immediate offset

1

0

11111

001

-

-

LD3 (single structure)

8-bit, immediate offset

1

0

11111

010

-

x0

LD1 (single structure)

16-bit, immediate offset

1

0

11111

011

-

x0

LD3 (single structure)

16-bit, immediate offset

1

0

11111

100

-

00

LD1 (single structure)

32-bit, immediate offset

1

0

11111

100

0

01

LD1 (single structure)

64-bit, immediate offset

1

0

11111

101

-

00

LD3 (single structure)

32-bit, immediate offset

1

0

11111

101

0

01

LD3 (single structure)

64-bit, immediate offset

1

0

11111

110

0

-

LD1R

Immediate offset

1

0

11111

111

0

-

LD3R

Immediate offset

1

1

!= 11111

000

-

-

LD2 (single structure)

8-bit, register offset

1

1

!= 11111

001

-

-

LD4 (single structure)

8-bit, register offset

1

1

!= 11111

010

-

x0

LD2 (single structure)

16-bit, register offset

1

1

!= 11111

011

-

x0

LD4 (single structure)

16-bit, register offset

1

1

!= 11111

100

-

00

LD2 (single structure)

32-bit, register offset

1

1

!= 11111

100

0

01

LD2 (single structure)

64-bit, register offset

1

1

!= 11111

101

-

00

LD4 (single structure)

32-bit, register offset

L R Rm

opcode

S size

0

1

11111

011

-

0

1

11111

100

0

1

11111

0

1

0

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C3-181

C3 A64 Instruction Set Encoding
C3.3 Loads and stores

Decode fields

C3.3.5

Instruction Page

Variant

01

LD4 (single structure)

64-bit, register offset

0

-

LD2R

Register offset

111

0

-

LD4R

Register offset

11111

000

-

-

LD2 (single structure)

8-bit, immediate offset

1

11111

001

-

-

LD4 (single structure)

8-bit, immediate offset

1

1

11111

010

-

x0

LD2 (single structure)

16-bit, immediate offset

1

1

11111

011

-

x0

LD4 (single structure)

16-bit, immediate offset

1

1

11111

100

-

00

LD2 (single structure)

32-bit, immediate offset

1

1

11111

100

0

01

LD2 (single structure)

64-bit, immediate offset

1

1

11111

101

-

00

LD4 (single structure)

32-bit, immediate offset

1

1

11111

101

0

01

LD4 (single structure)

64-bit, immediate offset

1

1

11111

110

0

-

LD2R

Immediate offset

1

1

11111

111

0

-

LD4R

Immediate offset

L R Rm

opcode

S size

1

1

!= 11111

101

0

1

1

!= 11111

110

1

1

!= 11111

1

1

1

Load register (literal)
31 30 29 28 27 26 25 24 23
opc

0 1 1 V 0 0

5 4
imm19

0
Rt

Decode fields

C3-182

Instruction Page

Variant

0

LDR (literal)

32-bit

00

1

LDR (literal, SIMD&FP)

32-bit

01

0

LDR (literal)

64-bit

01

1

LDR (literal, SIMD&FP)

64-bit

10

0

LDRSW (literal)

-

10

1

LDR (literal, SIMD&FP)

128-bit

11

0

PRFM (literal)

-

opc

V

00

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C3 A64 Instruction Set Encoding
C3.3 Loads and stores

C3.3.6

Load/store exclusive
31 30 29 28 27 26 25 24 23 22 21 20

16 15 14

size 0 0 1 0 0 0 o2 L o1

Rs

o0

10 9
Rt2

5 4

0

Rn

Rt

Decode fields

ARM DDI 0487A.a
ID090413

Instruction Page

Variant

0

STXRB

-

0

1

STLXRB

-

1

0

0

LDXRB

-

0

1

0

1

LDAXRB

-

00

1

0

0

1

STLRB

-

00

1

1

0

1

LDARB

-

01

0

0

0

0

STXRH

-

01

0

0

0

1

STLXRH

-

01

0

1

0

0

LDXRH

-

01

0

1

0

1

LDAXRH

-

01

1

0

0

1

STLRH

-

01

1

1

0

1

LDARH

-

10

0

0

0

0

STXR

32-bit

10

0

0

0

1

STLXR

32-bit

10

0

0

1

0

STXP

32-bit

10

0

0

1

1

STLXP

32-bit

10

0

1

0

0

LDXR

32-bit

10

0

1

0

1

LDAXR

32-bit

10

0

1

1

0

LDXP

32-bit

10

0

1

1

1

LDAXP

32-bit

10

1

0

0

1

STLR

32-bit

10

1

1

0

1

LDAR

32-bit

11

0

0

0

0

STXR

64-bit

11

0

0

0

1

STLXR

64-bit

11

0

0

1

0

STXP

64-bit

11

0

0

1

1

STLXP

64-bit

11

0

1

0

0

LDXR

64-bit

size

o2

L

o1

o0

00

0

0

0

00

0

0

00

0

00

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C3-183

C3 A64 Instruction Set Encoding
C3.3 Loads and stores

Decode fields

C3.3.7

Instruction Page

Variant

1

LDAXR

64-bit

1

0

LDXP

64-bit

1

1

1

LDAXP

64-bit

1

0

0

1

STLR

64-bit

1

1

0

1

LDAR

64-bit

size

o2

L

o1

o0

11

0

1

0

11

0

1

11

0

11
11

Load/store no-allocate pair (offset)
31 30 29 28 27 26 25 24 23 22 21
opc

1 0 1 V 0 0 0 L

15 14
imm7

10 9
Rt2

5 4
Rn

0
Rt

Decode fields

C3-184

Instruction Page

Variant

0

STNP

32-bit

0

1

LDNP

32-bit

00

1

0

STNP (SIMD&FP)

32-bit

00

1

1

LDNP (SIMD&FP)

32-bit

01

1

0

STNP (SIMD&FP)

64-bit

01

1

1

LDNP (SIMD&FP)

64-bit

10

0

0

STNP

64-bit

10

0

1

LDNP

64-bit

10

1

0

STNP (SIMD&FP)

128-bit

10

1

1

LDNP (SIMD&FP)

128-bit

opc

V

L

00

0

00

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C3 A64 Instruction Set Encoding
C3.3 Loads and stores

C3.3.8

Load/store register (immediate post-indexed)
31 30 29 28 27 26 25 24 23 22 21 20
size 1 1 1 V 0 0

opc

0

12 11 10 9
imm9

0 1

5 4
Rn

0
Rt

Decode fields

ARM DDI 0487A.a
ID090413

Instruction Page

Variant

00

STRB (immediate)

Post-index

0

01

LDRB (immediate)

Post-index

00

0

10

LDRSB (immediate)

64-bit

00

0

11

LDRSB (immediate)

32-bit

00

1

00

STR (immediate, SIMD&FP)

8-bit

00

1

01

LDR (immediate, SIMD&FP)

8-bit

00

1

10

STR (immediate, SIMD&FP)

128-bit

00

1

11

LDR (immediate, SIMD&FP)

128-bit

01

0

00

STRH (immediate)

Post-index

01

0

01

LDRH (immediate)

Post-index

01

0

10

LDRSH (immediate)

64-bit

01

0

11

LDRSH (immediate)

32-bit

01

1

00

STR (immediate, SIMD&FP)

16-bit

01

1

01

LDR (immediate, SIMD&FP)

16-bit

10

0

00

STR (immediate)

32-bit

10

0

01

LDR (immediate)

32-bit

10

0

10

LDRSW (immediate)

Post-index

10

1

00

STR (immediate, SIMD&FP)

32-bit

10

1

01

LDR (immediate, SIMD&FP)

32-bit

11

0

00

STR (immediate)

64-bit

11

0

01

LDR (immediate)

64-bit

11

1

00

STR (immediate, SIMD&FP)

64-bit

11

1

01

LDR (immediate, SIMD&FP)

64-bit

size

V

opc

00

0

00

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C3-185

C3 A64 Instruction Set Encoding
C3.3 Loads and stores

C3.3.9

Load/store register (immediate pre-indexed)
31 30 29 28 27 26 25 24 23 22 21 20
size 1 1 1 V 0 0

opc

0

12 11 10 9
imm9

1 1

5 4
Rn

0
Rt

Decode fields

C3-186

Instruction Page

Variant

00

STRB (immediate)

Pre-index

0

01

LDRB (immediate)

Pre-index

00

0

10

LDRSB (immediate)

64-bit

00

0

11

LDRSB (immediate)

32-bit

00

1

00

STR (immediate, SIMD&FP)

8-bit

00

1

01

LDR (immediate, SIMD&FP)

8-bit

00

1

10

STR (immediate, SIMD&FP)

128-bit

00

1

11

LDR (immediate, SIMD&FP)

128-bit

01

0

00

STRH (immediate)

Pre-index

01

0

01

LDRH (immediate)

Pre-index

01

0

10

LDRSH (immediate)

64-bit

01

0

11

LDRSH (immediate)

32-bit

01

1

00

STR (immediate, SIMD&FP)

16-bit

01

1

01

LDR (immediate, SIMD&FP)

16-bit

10

0

00

STR (immediate)

32-bit

10

0

01

LDR (immediate)

32-bit

10

0

10

LDRSW (immediate)

Pre-index

10

1

00

STR (immediate, SIMD&FP)

32-bit

10

1

01

LDR (immediate, SIMD&FP)

32-bit

11

0

00

STR (immediate)

64-bit

11

0

01

LDR (immediate)

64-bit

11

1

00

STR (immediate, SIMD&FP)

64-bit

11

1

01

LDR (immediate, SIMD&FP)

64-bit

size

V

opc

00

0

00

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C3 A64 Instruction Set Encoding
C3.3 Loads and stores

C3.3.10

Load/store register (register offset)
31 30 29 28 27 26 25 24 23 22 21 20
size 1 1 1 V 0 0

opc

1

16 15
Rm

13 12 11 10 9

option

S 1 0

5 4
Rn

0
Rt

Decode fields

ARM DDI 0487A.a
ID090413

Instruction Page

Variant

-

STRB (register)

-

01

-

LDRB (register)

-

0

10

-

LDRSB (register)

64-bit

00

0

11

-

LDRSB (register)

32-bit

00

1

00

-

STR (register, SIMD&FP)

8-bit

00

1

01

-

LDR (register, SIMD&FP)

8-bit

00

1

10

-

STR (register, SIMD&FP)

128-bit

00

1

11

-

LDR (register, SIMD&FP)

128-bit

01

0

00

-

STRH (register)

-

01

0

01

-

LDRH (register)

-

01

0

10

-

LDRSH (register)

64-bit

01

0

11

-

LDRSH (register)

32-bit

01

1

00

-

STR (register, SIMD&FP)

16-bit

01

1

01

-

LDR (register, SIMD&FP)

16-bit

10

0

00

-

STR (register)

32-bit

10

0

01

-

LDR (register)

32-bit

10

0

10

-

LDRSW (register)

-

10

1

00

-

STR (register, SIMD&FP)

32-bit

10

1

01

-

LDR (register, SIMD&FP)

32-bit

11

0

00

-

STR (register)

64-bit

11

0

01

-

LDR (register)

64-bit

11

0

10

-

PRFM (register)

-

11

1

00

-

STR (register, SIMD&FP)

64-bit

11

1

01

-

LDR (register, SIMD&FP)

64-bit

size

V

opc

option

00

0

00

00

0

00

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C3-187

C3 A64 Instruction Set Encoding
C3.3 Loads and stores

C3.3.11

Load/store register (unprivileged)
31 30 29 28 27 26 25 24 23 22 21 20
size 1 1 1 V 0 0

opc

12 11 10 9

0

imm9

1 0

5 4
Rn

0
Rt

Decode fields

C3.3.12

Instruction Page

Variant

00

STTRB

-

0

01

LDTRB

-

00

0

10

LDTRSB

64-bit

00

0

11

LDTRSB

32-bit

01

0

00

STTRH

-

01

0

01

LDTRH

-

01

0

10

LDTRSH

64-bit

01

0

11

LDTRSH

32-bit

10

0

00

STTR

32-bit

10

0

01

LDTR

32-bit

10

0

10

LDTRSW

-

11

0

00

STTR

64-bit

11

0

01

LDTR

64-bit

size

V

opc

00

0

00

Load/store register (unscaled immediate)
31 30 29 28 27 26 25 24 23 22 21 20
size 1 1 1 V 0 0

opc

12 11 10 9

0

imm9

0 0

5 4
Rn

0
Rt

Decode fields

C3-188

Instruction Page

Variant

00

STURB

-

0

01

LDURB

-

00

0

10

LDURSB

64-bit

00

0

11

LDURSB

32-bit

00

1

00

STUR (SIMD&FP)

8-bit

00

1

01

LDUR (SIMD&FP)

8-bit

00

1

10

STUR (SIMD&FP)

128-bit

size

V

opc

00

0

00

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C3 A64 Instruction Set Encoding
C3.3 Loads and stores

Decode fields

C3.3.13

Instruction Page

Variant

11

LDUR (SIMD&FP)

128-bit

0

00

STURH

-

01

0

01

LDURH

-

01

0

10

LDURSH

64-bit

01

0

11

LDURSH

32-bit

01

1

00

STUR (SIMD&FP)

16-bit

01

1

01

LDUR (SIMD&FP)

16-bit

10

0

00

STUR

32-bit

10

0

01

LDUR

32-bit

10

0

10

LDURSW

-

10

1

00

STUR (SIMD&FP)

32-bit

10

1

01

LDUR (SIMD&FP)

32-bit

11

0

00

STUR

64-bit

11

0

01

LDUR

64-bit

11

0

10

PRFUM

-

11

1

00

STUR (SIMD&FP)

64-bit

11

1

01

LDUR (SIMD&FP)

64-bit

size

V

opc

00

1

01

Load/store register (unsigned immediate)
31 30 29 28 27 26 25 24 23 22 21
size 1 1 1 V 0 1

opc

10 9
imm12

5 4
Rn

0
Rt

Decode fields

ARM DDI 0487A.a
ID090413

Instruction Page

Variant

00

STRB (immediate)

Unsigned offset

0

01

LDRB (immediate)

Unsigned offset

00

0

10

LDRSB (immediate)

64-bit

00

0

11

LDRSB (immediate)

32-bit

00

1

00

STR (immediate, SIMD&FP)

8-bit

00

1

01

LDR (immediate, SIMD&FP)

8-bit

00

1

10

STR (immediate, SIMD&FP)

128-bit

size

V

opc

00

0

00

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C3-189

C3 A64 Instruction Set Encoding
C3.3 Loads and stores

Decode fields

C3.3.14

Instruction Page

Variant

11

LDR (immediate, SIMD&FP)

128-bit

0

00

STRH (immediate)

Unsigned offset

01

0

01

LDRH (immediate)

Unsigned offset

01

0

10

LDRSH (immediate)

64-bit

01

0

11

LDRSH (immediate)

32-bit

01

1

00

STR (immediate, SIMD&FP)

16-bit

01

1

01

LDR (immediate, SIMD&FP)

16-bit

10

0

00

STR (immediate)

32-bit

10

0

01

LDR (immediate)

32-bit

10

0

10

LDRSW (immediate)

Unsigned offset

10

1

00

STR (immediate, SIMD&FP)

32-bit

10

1

01

LDR (immediate, SIMD&FP)

32-bit

11

0

00

STR (immediate)

64-bit

11

0

01

LDR (immediate)

64-bit

11

0

10

PRFM (immediate)

-

11

1

00

STR (immediate, SIMD&FP)

64-bit

11

1

01

LDR (immediate, SIMD&FP)

64-bit

size

V

opc

00

1

01

Load/store register pair (offset)
31 30 29 28 27 26 25 24 23 22 21
opc

1 0 1 V 0 1 0 L

15 14
imm7

10 9
Rt2

5 4
Rn

0
Rt

Decode fields

C3-190

Instruction Page

Variant

0

STP

32-bit

0

1

LDP

32-bit

00

1

0

STP (SIMD&FP)

32-bit

00

1

1

LDP (SIMD&FP)

32-bit

01

0

1

LDPSW

Signed offset

01

1

0

STP (SIMD&FP)

64-bit

01

1

1

LDP (SIMD&FP)

64-bit

opc

V

L

00

0

00

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C3 A64 Instruction Set Encoding
C3.3 Loads and stores

Decode fields

C3.3.15

Instruction Page

Variant

0

STP

64-bit

0

1

LDP

64-bit

10

1

0

STP (SIMD&FP)

128-bit

10

1

1

LDP (SIMD&FP)

128-bit

opc

V

L

10

0

10

Load/store register pair (post-indexed)
31 30 29 28 27 26 25 24 23 22 21
opc

1 0 1 V 0 0 1 L

15 14
imm7

10 9
Rt2

5 4
Rn

0
Rt

Decode fields

ARM DDI 0487A.a
ID090413

Instruction Page

Variant

0

STP

32-bit

0

1

LDP

32-bit

00

1

0

STP (SIMD&FP)

32-bit

00

1

1

LDP (SIMD&FP)

32-bit

01

0

1

LDPSW

Post-index

01

1

0

STP (SIMD&FP)

64-bit

01

1

1

LDP (SIMD&FP)

64-bit

10

0

0

STP

64-bit

10

0

1

LDP

64-bit

10

1

0

STP (SIMD&FP)

128-bit

10

1

1

LDP (SIMD&FP)

128-bit

opc

V

L

00

0

00

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C3-191

C3 A64 Instruction Set Encoding
C3.3 Loads and stores

C3.3.16

Load/store register pair (pre-indexed)
31 30 29 28 27 26 25 24 23 22 21
opc

1 0 1 V 0 1 1 L

15 14
imm7

10 9
Rt2

5 4
Rn

0
Rt

Decode fields

C3-192

Instruction Page

Variant

0

STP

32-bit

0

1

LDP

32-bit

00

1

0

STP (SIMD&FP)

32-bit

00

1

1

LDP (SIMD&FP)

32-bit

01

0

1

LDPSW

Pre-index

01

1

0

STP (SIMD&FP)

64-bit

01

1

1

LDP (SIMD&FP)

64-bit

10

0

0

STP

64-bit

10

0

1

LDP

64-bit

10

1

0

STP (SIMD&FP)

128-bit

10

1

1

LDP (SIMD&FP)

128-bit

opc

V

L

00

0

00

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C3 A64 Instruction Set Encoding
C3.4 Data processing - immediate

C3.4

Data processing - immediate
This section describes the encoding of the instruction classes in the Data processing (immediate) instruction group,
and shows how each instruction class encodes the different instruction forms. For additional information on this
functional group of instructions, see Data processing - immediate on page C2-140.
Table C3-4 Encoding table for the Data Processing - Immediate functional group

Instruction bits
Instruction class
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10
-

-

-

1

0

0

0

0

-

-

-

-

-

-

-

-

-

-

-

-

-

-

PC-rel. addressing

-

-

-

1

0

0

0

1

-

-

-

-

-

-

-

-

-

-

-

-

-

-

Add/subtract (immediate)

-

-

-

1

0

0

1

0

0

-

-

-

-

-

-

-

-

-

-

-

-

-

Logical (immediate)

-

-

-

1

0

0

1

0

1

-

-

-

-

-

-

-

-

-

-

-

-

-

Move wide (immediate)

-

-

-

1

0

0

1

1

0

-

-

-

-

-

-

-

-

-

-

-

-

-

Bitfield

-

-

-

1

0

0

1

1

1

-

-

-

-

-

-

-

-

-

-

-

-

-

Extract

C3.4.1

Add/subtract (immediate)
31 30 29 28 27 26 25 24 23 22 21
sf op S 1 0 0 0 1 shift

10 9
imm12

5 4
Rn

0
Rd

Decode fields

ARM DDI 0487A.a
ID090413

Instruction Page

Variant

-

ADD (immediate)

32-bit

1

-

ADDS (immediate)

32-bit

1

0

-

SUB (immediate)

32-bit

0

1

1

-

SUBS (immediate)

32-bit

1

0

0

-

ADD (immediate)

64-bit

1

0

1

-

ADDS (immediate)

64-bit

1

1

0

-

SUB (immediate)

64-bit

1

1

1

-

SUBS (immediate)

64-bit

sf

op

S

shift

0

0

0

0

0

0

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C3-193

C3 A64 Instruction Set Encoding
C3.4 Data processing - immediate

C3.4.2

Bitfield
31 30 29 28 27 26 25 24 23 22 21
sf opc

16 15

1 0 0 1 1 0 N

10 9

immr

imms

5 4
Rn

0
Rd

Decode fields

C3.4.3

Instruction Page

Variant

0

SBFM

32-bit

01

0

BFM

32-bit

0

10

0

UBFM

32-bit

1

00

1

SBFM

64-bit

1

01

1

BFM

64-bit

1

10

1

UBFM

64-bit

sf

opc

N

0

00

0

Extract
31 30 29 28 27 26 25 24 23 22 21 20
sf op21 1 0 0 1 1 1 N o0

16 15

10 9

Rm

imms

5 4
Rn

0
Rd

Decode fields

C3.4.4

Instruction Page

Variant

0xxxxx

EXTR

32-bit

-

EXTR

64-bit

sf

op21

N

o0

imms

0

00

0

0

1

00

1

0

Logical (immediate)
31 30 29 28 27 26 25 24 23 22 21
sf opc

1 0 0 1 0 0 N

16 15
immr

10 9
imms

5 4
Rn

0
Rd

Decode fields

C3-194

Instruction Page

Variant

0

AND (immediate)

32-bit

01

0

ORR (immediate)

32-bit

0

10

0

EOR (immediate)

32-bit

0

11

0

ANDS (immediate)

32-bit

sf

opc

N

0

00

0

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C3 A64 Instruction Set Encoding
C3.4 Data processing - immediate

Decode fields

C3.4.5

Instruction Page

Variant

-

AND (immediate)

64-bit

01

-

ORR (immediate)

64-bit

1

10

-

EOR (immediate)

64-bit

1

11

-

ANDS (immediate)

64-bit

sf

opc

N

1

00

1

Move wide (immediate)
31 30 29 28 27 26 25 24 23 22 21 20
sf opc

1 0 0 1 0 1

5 4

hw

0

imm16

Rd

Decode fields

C3.4.6

Instruction Page

Variant

-

MOVN

32-bit

10

-

MOVZ

32-bit

0

11

-

MOVK

32-bit

1

00

-

MOVN

64-bit

1

10

-

MOVZ

64-bit

1

11

-

MOVK

64-bit

sf

opc

hw

0

00

0

PC-rel. addressing
31 30 29 28 27 26 25 24 23

5 4

op immlo 1 0 0 0 0

0

immhi

Rd

Decode fields
Instruction Page

Variant

0

ADR

-

1

ADRP

-

op

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C3-195

C3 A64 Instruction Set Encoding
C3.5 Data processing - register

C3.5

Data processing - register
This section describes the encoding of the instruction classes in the Data processing (register) instruction group, and
shows how each instruction class encodes the different instruction forms. For additional information on this
functional group of instructions, see Data processing - register on page C2-145.
Table C3-5 Encoding table for the Data Processing - Register functional group

Instruction bits
Instruction class
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10
-

-

-

0

1

0

1

0

-

-

-

-

-

-

-

-

-

-

-

-

-

-

Logical (shifted register)

-

-

-

0

1

0

1

1

-

-

0

-

-

-

-

-

-

-

-

-

-

-

Add/subtract (shifted register)

-

-

-

0

1

0

1

1

-

-

1

-

-

-

-

-

-

-

-

-

-

-

Add/subtract (extended register)

-

-

-

1

1

0

1

0

0

0

0

-

-

-

-

-

-

-

-

-

-

-

Add/subtract (with carry)

-

-

-

1

1

0

1

0

0

1

0

-

-

-

-

-

-

-

-

-

0

-

Conditional compare (register)

-

-

-

1

1

0

1

0

0

1

0

-

-

-

-

-

-

-

-

-

1

-

Conditional compare (immediate)

-

-

-

1

1

0

1

0

1

0

0

-

-

-

-

-

-

-

-

-

-

-

Conditional select

-

-

-

1

1

0

1

1

-

-

-

-

-

-

-

-

-

-

-

-

-

-

Data-processing (3 source)

-

0

-

1

1

0

1

0

1

1

0

-

-

-

-

-

-

-

-

-

-

-

Data-processing (2 source)

-

1

-

1

1

0

1

0

1

1

0

-

-

-

-

-

-

-

-

-

-

-

Data-processing (1 source)

C3.5.1

Add/subtract (extended register)
31 30 29 28 27 26 25 24 23 22 21 20
sf op S 0 1 0 1 1

opt

1

16 15
Rm

13 12

option

10 9

imm3

5 4
Rn

0
Rd

Decode fields

C3-196

Instruction Page

Variant

-

ADD (extended register)

32-bit

00

-

ADDS (extended register)

32-bit

0

00

-

SUB (extended register)

32-bit

1

1

00

-

SUBS (extended register)

32-bit

1

0

0

00

-

ADD (extended register)

64-bit

1

0

1

00

-

ADDS (extended register)

64-bit

1

1

0

00

-

SUB (extended register)

64-bit

1

1

1

00

-

SUBS (extended register)

64-bit

sf

op

S

opt

imm3

0

0

0

00

0

0

1

0

1

0

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C3 A64 Instruction Set Encoding
C3.5 Data processing - register

C3.5.2

Add/subtract (shifted register)
31 30 29 28 27 26 25 24 23 22 21 20
sf op S 0 1 0 1 1 shift 0

16 15
Rm

10 9
imm6

5 4
Rn

0
Rd

Decode fields

C3.5.3

Instruction Page

Variant

-

ADD (shifted register)

32-bit

-

-

ADDS (shifted register)

32-bit

0

-

-

SUB (shifted register)

32-bit

1

1

-

-

SUBS (shifted register)

32-bit

1

0

0

-

-

ADD (shifted register)

64-bit

1

0

1

-

-

ADDS (shifted register)

64-bit

1

1

0

-

-

SUB (shifted register)

64-bit

1

1

1

-

-

SUBS (shifted register)

64-bit

sf

op

S

shift

imm6

0

0

0

-

0

0

1

0

1

0

Add/subtract (with carry)
31 30 29 28 27 26 25 24 23 22 21 20
sf op S 1 1 0 1 0 0 0 0

16 15
Rm

10 9
opcode2

5 4
Rn

0
Rd

Decode fields

ARM DDI 0487A.a
ID090413

Instruction Page

Variant

000000

ADC

32-bit

1

000000

ADCS

32-bit

1

0

000000

SBC

32-bit

0

1

1

000000

SBCS

32-bit

1

0

0

000000

ADC

64-bit

1

0

1

000000

ADCS

64-bit

1

1

0

000000

SBC

64-bit

1

1

1

000000

SBCS

64-bit

sf

op

S

opcode2

0

0

0

0

0

0

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C3-197

C3 A64 Instruction Set Encoding
C3.5 Data processing - register

C3.5.4

Conditional compare (immediate)
31 30 29 28 27 26 25 24 23 22 21 20
sf op S 1 1 0 1 0 0 1 0

16 15
imm5

12 11 10 9
cond

1 o2

5 4 3
Rn

o3

0
nzcv

Decode fields

C3.5.5

Instruction Page

Variant

0

CCMN (immediate)

32-bit

0

0

CCMP (immediate)

32-bit

1

0

0

CCMN (immediate)

64-bit

1

0

0

CCMP (immediate)

64-bit

sf

op

S

o2

o3

0

0

1

0

0

1

1

1

0

1

1

Conditional compare (register)
31 30 29 28 27 26 25 24 23 22 21 20
sf op S 1 1 0 1 0 0 1 0

16 15
Rm

12 11 10 9
cond

0 o2

5 4 3
Rn

o3

0
nzcv

Decode fields

C3.5.6

Instruction Page

Variant

0

CCMN (register)

32-bit

0

0

CCMP (register)

32-bit

1

0

0

CCMN (register)

64-bit

1

0

0

CCMP (register)

64-bit

sf

op

S

o2

o3

0

0

1

0

0

1

1

1

0

1

1

Conditional select
31 30 29 28 27 26 25 24 23 22 21 20
sf op S 1 1 0 1 0 1 0 0

16 15
Rm

12 11 10 9
cond

op2

5 4
Rn

0
Rd

Decode fields

C3-198

Instruction Page

Variant

00

CSEL

32-bit

0

01

CSINC

32-bit

1

0

00

CSINV

32-bit

1

0

01

CSNEG

32-bit

sf

op

S

op2

0

0

0

0

0

0
0

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C3 A64 Instruction Set Encoding
C3.5 Data processing - register

Decode fields

C3.5.7

Instruction Page

Variant

00

CSEL

64-bit

0

01

CSINC

64-bit

1

0

00

CSINV

64-bit

1

0

01

CSNEG

64-bit

sf

op

S

op2

1

0

0

1

0

1
1

Data-processing (1 source)
31 30 29 28 27 26 25 24 23 22 21 20
sf 1 S 1 1 0 1 0 1 1 0

16 15
opcode2

10 9
opcode

5 4
Rn

0
Rd

Decode fields

ARM DDI 0487A.a
ID090413

Instruction Page

Variant

000000

RBIT

32-bit

00000

000001

REV16

32-bit

0

00000

000010

REV

32-bit

0

0

00000

000100

CLZ

32-bit

0

0

00000

000101

CLS

32-bit

1

0

00000

000000

RBIT

64-bit

1

0

00000

000001

REV16

64-bit

1

0

00000

000010

REV32

-

1

0

00000

000011

REV

64-bit

1

0

00000

000100

CLZ

64-bit

1

0

00000

000101

CLS

64-bit

sf

S

opcode2

opcode

0

0

00000

0

0

0

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C3-199

C3 A64 Instruction Set Encoding
C3.5 Data processing - register

C3.5.8

Data-processing (2 source)
31 30 29 28 27 26 25 24 23 22 21 20
sf 0 S 1 1 0 1 0 1 1 0

16 15
Rm

10 9
opcode

5 4
Rn

0
Rd

Decode fields

C3-200

Instruction Page

Variant

000010

UDIV

32-bit

0

000011

SDIV

32-bit

0

0

001000

LSLV

32-bit

0

0

001001

LSRV

32-bit

0

0

001010

ASRV

32-bit

0

0

001011

RORV

32-bit

0

0

010000

CRC32B, CRC32H, CRC32W, CRC32X

CRC32B

0

0

010001

CRC32B, CRC32H, CRC32W, CRC32X

CRC32H

0

0

010010

CRC32B, CRC32H, CRC32W, CRC32X

CRC32W

0

0

010100

CRC32CB, CRC32CH, CRC32CW, CRC32CX

CRC32CB

0

0

010101

CRC32CB, CRC32CH, CRC32CW, CRC32CX

CRC32CH

0

0

010110

CRC32CB, CRC32CH, CRC32CW, CRC32CX

CRC32CW

1

0

000010

UDIV

64-bit

1

0

000011

SDIV

64-bit

1

0

001000

LSLV

64-bit

1

0

001001

LSRV

64-bit

1

0

001010

ASRV

64-bit

1

0

001011

RORV

64-bit

1

0

010011

CRC32B, CRC32H, CRC32W, CRC32X

CRC32X

1

0

010111

CRC32CB, CRC32CH, CRC32CW, CRC32CX

CRC32CX

sf

S

opcode

0

0

0

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C3 A64 Instruction Set Encoding
C3.5 Data processing - register

C3.5.9

Data-processing (3 source)
31 30 29 28 27 26 25 24 23
sf op54 1 1 0 1 1

21 20

16 15 14

op31

Rm

10 9

o0

Ra

5 4

0

Rn

Rd

Decode fields

C3.5.10

Instruction Page

Variant

0

MADD

32-bit

000

1

MSUB

32-bit

00

000

0

MADD

64-bit

1

00

000

1

MSUB

64-bit

1

00

001

0

SMADDL

-

1

00

001

1

SMSUBL

-

1

00

010

0

SMULH

-

1

00

101

0

UMADDL

-

1

00

101

1

UMSUBL

-

1

00

110

0

UMULH

-

sf

op54

op31

o0

0

00

000

0

00

1

Logical (shifted register)
31 30 29 28 27 26 25 24 23 22 21 20
sf opc

0 1 0 1 0 shift N

16 15
Rm

10 9
imm6

5 4
Rn

0
Rd

Decode fields

ARM DDI 0487A.a
ID090413

Instruction Page

Variant

-

AND (shifted register)

32-bit

1

-

BIC (shifted register)

32-bit

01

0

-

ORR (shifted register)

32-bit

0

01

1

-

ORN (shifted register)

32-bit

0

10

0

-

EOR (shifted register)

32-bit

0

10

1

-

EON (shifted register)

32-bit

0

11

0

-

ANDS (shifted register)

32-bit

0

11

1

-

BICS (shifted register)

32-bit

1

00

0

-

AND (shifted register)

64-bit

1

00

1

-

BIC (shifted register)

64-bit

sf

opc

N

imm6

0

00

0

0

00

0

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C3-201

C3 A64 Instruction Set Encoding
C3.5 Data processing - register

Decode fields

C3-202

Instruction Page

Variant

-

ORR (shifted register)

64-bit

1

-

ORN (shifted register)

64-bit

10

0

-

EOR (shifted register)

64-bit

1

10

1

-

EON (shifted register)

64-bit

1

11

0

-

ANDS (shifted register)

64-bit

1

11

1

-

BICS (shifted register)

64-bit

sf

opc

N

imm6

1

01

0

1

01

1

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C3 A64 Instruction Set Encoding
C3.6 Data processing - SIMD and floating point

C3.6

Data processing - SIMD and floating point
This section describes the encoding of the instruction classes in the Data processing (SIMD and floating-point)
instruction group, and shows how each instruction class encodes the different instruction forms. For additional
information on this functional group of instructions, see Data processing - SIMD and floating-point on
page C2-152.

Table C3-6 Encoding table for the Data Processing - Scalar Floating-Point and Advanced SIMD functional group
Instruction bits
Instruction class
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10
-

0

-

1

1

1

1

0

-

-

0

-

-

-

-

-

-

-

-

-

-

-

Floating-point<->fixed-point
conversions

-

0

-

1

1

1

1

0

-

-

1

-

-

-

-

-

-

-

-

-

0

1

Floating-point conditional compare

-

0

-

1

1

1

1

0

-

-

1

-

-

-

-

-

-

-

-

-

1

0

Floating-point data-processing (2
source)

-

0

-

1

1

1

1

0

-

-

1

-

-

-

-

-

-

-

-

-

1

1

Floating-point conditional select

-

0

-

1

1

1

1

0

-

-

1

-

-

-

-

-

-

-

-

1

0

0

Floating-point immediate

-

0

-

1

1

1

1

0

-

-

1

-

-

-

-

-

-

-

1

0

0

0

Floating-point compare

-

0

-

1

1

1

1

0

-

-

1

-

-

-

-

-

-

1

0

0

0

0

Floating-point data-processing (1
source)

-

0

-

1

1

1

1

0

-

-

1

-

-

-

-

-

0

0

0

0

0

0

Floating-point<->integer conversions

-

0

-

1

1

1

1

1

-

-

-

-

-

-

-

-

-

-

-

-

-

-

Floating-point data-processing (3
source)

0

-

-

0

1

1

1

0

-

-

1

-

-

-

-

-

-

-

-

-

-

1

AdvSIMD three same

0

-

-

0

1

1

1

0

-

-

1

-

-

-

-

-

-

-

-

-

0

0

AdvSIMD three different

0

-

-

0

1

1

1

0

-

-

1

0

0

0

0

-

-

-

-

-

1

0

AdvSIMD two-reg misc

0

-

-

0

1

1

1

0

-

-

1

1

0

0

0

-

-

-

-

-

1

0

AdvSIMD across lanes

0

-

-

0

1

1

1

0

0

0

0

-

-

-

-

-

0

-

-

-

-

1

AdvSIMD copy

0

-

-

0

1

1

1

1

-

-

-

-

-

-

-

-

-

-

-

-

-

0

AdvSIMD vector x indexed element

0

-

-

0

1

1

1

1

0

0

0

0

0

-

-

-

-

-

-

-

-

1

AdvSIMD modified immediate

0

-

-

0

1

1

1

1

0

!= 0000

-

-

-

-

-

-

-

-

1

AdvSIMD shift by immediate

0

-

0

0

1

1

1

0

-

-

0

-

-

-

-

-

0

-

-

-

0

0

AdvSIMD TBL/TBX

0

-

0

0

1

1

1

0

-

-

0

-

-

-

-

-

0

-

-

-

1

0

AdvSIMD ZIP/UZP/TRN

0

-

1

0

1

1

1

0

-

-

0

-

-

-

-

-

0

-

-

-

-

0

AdvSIMD EXT

0

1

-

1

1

1

1

0

-

-

1

-

-

-

-

-

-

-

-

-

-

1

AdvSIMD scalar three same

0

1

-

1

1

1

1

0

-

-

1

-

-

-

-

-

-

-

-

-

0

0

AdvSIMD scalar three different

0

1

-

1

1

1

1

0

-

-

1

0

0

0

0

-

-

-

-

-

1

0

AdvSIMD scalar two-reg misc

0

1

-

1

1

1

1

0

-

-

1

1

0

0

0

-

-

-

-

-

1

0

AdvSIMD scalar pairwise

0

1

-

1

1

1

1

0

0

0

0

-

-

-

-

-

0

-

-

-

-

1

AdvSIMD scalar copy

0

1

-

1

1

1

1

1

-

-

-

-

-

-

-

-

-

-

-

-

-

0

AdvSIMD scalar x indexed element

0

1

-

1

1

1

1

1

0

-

-

-

-

-

-

-

-

-

-

-

-

1

AdvSIMD scalar shift by immediate

0

1

0

0

1

1

1

0

-

-

1

0

1

0

0

-

-

-

-

-

1

0

Crypto AES

0

1

0

1

1

1

1

0

-

-

0

-

-

-

-

-

0

-

-

-

0

0

Crypto three-reg SHA

0

1

0

1

1

1

1

0

-

-

1

0

1

0

0

-

-

-

-

-

1

0

Crypto two-reg SHA

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C3-203

C3 A64 Instruction Set Encoding
C3.6 Data processing - SIMD and floating point

C3.6.1

AdvSIMD EXT
31 30 29 28 27 26 25 24 23 22 21 20
0 Q 1 0 1 1 1 0

op2

0

16 15 14
Rm

0

11 10 9
imm4

0

5 4

0

Rn

Rd

Decode fields
Instruction Page

Variant

EXT

-

op2
00

C3.6.2

AdvSIMD TBL/TBX
31 30 29 28 27 26 25 24 23 22 21 20
0 Q 0 0 1 1 1 0

op2

0

16 15 14 13 12 11 10 9
Rm

0

len op 0 0

5 4

0

Rn

Rd

Decode fields

C3.6.3

Instruction Page

Variant

0

TBL

Single register table

00

1

TBX

Single register table

00

01

0

TBL

Two register table

00

01

1

TBX

Two register table

00

10

0

TBL

Three register table

00

10

1

TBX

Three register table

00

11

0

TBL

Four register table

00

11

1

TBX

Four register table

op2

len

op

00

00

00

AdvSIMD ZIP/UZP/TRN
31 30 29 28 27 26 25 24 23 22 21 20
0 Q 0 0 1 1 1 0 size 0

16 15 14
Rm

12 11 10 9

0 opcode 1 0

5 4

0

Rn

Rd

Decode fields
Instruction Page

Variant

001

UZP1

-

010

TRN1

-

011

ZIP1

-

opcode

C3-204

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C3 A64 Instruction Set Encoding
C3.6 Data processing - SIMD and floating point

Decode fields
Instruction Page

Variant

101

UZP2

-

110

TRN2

-

111

ZIP2

-

opcode

C3.6.4

AdvSIMD across lanes
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16

12 11 10 9

0 Q U 0 1 1 1 0 size 1 1 0 0 0

opcode

1 0

5 4

0

Rn

Rd

Decode fields

C3.6.5

Instruction Page

Variant

00011

SADDLV

-

-

01010

SMAXV

-

0

-

11010

SMINV

-

0

-

11011

ADDV

-

1

-

00011

UADDLV

-

1

-

01010

UMAXV

-

1

-

11010

UMINV

-

1

0x

01100

FMAXNMV

-

1

0x

01111

FMAXV

-

1

1x

01100

FMINNMV

-

1

1x

01111

FMINV

-

U

size

opcode

0

-

0

AdvSIMD copy
31 30 29 28 27 26 25 24 23 22 21 20
0 Q op 0 1 1 1 0 0 0 0

16 15 14
imm5

0

11 10 9
imm4

1

5 4

0

Rn

Rd

Decode fields

ARM DDI 0487A.a
ID090413

Instruction Page

Variant

0000

DUP (element)

Vector

-

0001

DUP (general)

-

-

0101

SMOV

32-bit

Q

op

imm5

imm4

-

0

-

-

0

0

0

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C3-205

C3 A64 Instruction Set Encoding
C3.6 Data processing - SIMD and floating point

Decode fields

C3.6.6

Instruction Page

Variant

0111

UMOV

32-bit

-

0011

INS (general)

-

0

-

0101

SMOV

64-bit

1

0

-

0111

UMOV

64-bit

1

1

-

-

INS (element)

-

Q

op

imm5

imm4

0

0

-

1

0

1

AdvSIMD modified immediate
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15
0 Q op 0 1 1 1 1 0 0 0 0 0 a b c

12 11 10 9 8 7 6 5 4
cmode

o2 1 d e

f

g h

0
Rd

Decode fields

C3-206

Instruction Page

Variant

0

MOVI

32-bit shifted immediate

0xx1

0

ORR (vector, immediate)

32-bit

0

10x0

0

MOVI

16-bit shifted immediate

-

0

10x1

0

ORR (vector, immediate)

16-bit

-

0

110x

0

MOVI

32-bit shifting ones

-

0

1110

0

MOVI

8-bit

-

0

1111

0

FMOV (vector, immediate)

Single-precision

-

1

0xx0

0

MVNI

32-bit shifted immediate

-

1

0xx1

0

BIC (vector, immediate)

32-bit

-

1

10x0

0

MVNI

16-bit shifted immediate

-

1

10x1

0

BIC (vector, immediate)

16-bit

-

1

110x

0

MVNI

32-bit shifting ones

0

1

1110

0

MOVI

64-bit scalar

1

1

1110

0

MOVI

64-bit vector

1

1

1111

0

FMOV (vector, immediate)

Double-precision

Q

op

cmode

o2

-

0

0xx0

-

0

-

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C3 A64 Instruction Set Encoding
C3.6 Data processing - SIMD and floating point

C3.6.7

AdvSIMD scalar copy
31 30 29 28 27 26 25 24 23 22 21 20
0 1 op 1 1 1 1 0 0 0 0

16 15 14
imm5

0

11 10 9
imm4

1

5 4

0

Rn

Rd

Decode fields

C3.6.8

op

imm5

imm4

0

-

0000

Instruction Page

Variant

DUP (element)

Scalar

AdvSIMD scalar pairwise
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 1 U 1 1 1 1 0 size 1 1 0 0 0

12 11 10 9
opcode

1 0

5 4

0

Rn

Rd

Decode fields

C3.6.9

Instruction Page

Variant

11011

ADDP (scalar)

-

0x

01100

FMAXNMP (scalar)

-

1

0x

01101

FADDP (scalar)

-

1

0x

01111

FMAXP (scalar)

-

1

1x

01100

FMINNMP (scalar)

-

1

1x

01111

FMINP (scalar)

-

U

size

opcode

0

-

1

AdvSIMD scalar shift by immediate
31 30 29 28 27 26 25 24 23 22
0 1 U 1 1 1 1 1 0

19 18
immh

16 15

immb

11 10 9
opcode

1

5 4
Rn

0
Rd

Decode fields

ARM DDI 0487A.a
ID090413

Instruction Page

Variant

00000

SSHR

Scalar

!= 0000

00010

SSRA

Scalar

0

!= 0000

00100

SRSHR

Scalar

0

!= 0000

00110

SRSRA

Scalar

0

!= 0000

01010

SHL

Scalar

U

immh

opcode

0

!= 0000

0

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C3-207

C3 A64 Instruction Set Encoding
C3.6 Data processing - SIMD and floating point

Decode fields

C3.6.10

Instruction Page

Variant

01110

SQSHL (immediate)

Scalar

!= 0000

10010

SQSHRN, SQSHRN2

Scalar

0

!= 0000

10011

SQRSHRN, SQRSHRN2

Scalar

0

!= 0000

11100

SCVTF (vector, fixed-point)

Scalar

0

!= 0000

11111

FCVTZS (vector, fixed-point)

Scalar

1

!= 0000

00000

USHR

Scalar

1

!= 0000

00010

USRA

Scalar

1

!= 0000

00100

URSHR

Scalar

1

!= 0000

00110

URSRA

Scalar

1

!= 0000

01000

SRI

Scalar

1

!= 0000

01010

SLI

Scalar

1

!= 0000

01100

SQSHLU

Scalar

1

!= 0000

01110

UQSHL (immediate)

Scalar

1

!= 0000

10000

SQSHRUN, SQSHRUN2

Scalar

1

!= 0000

10001

SQRSHRUN, SQRSHRUN2

Scalar

1

!= 0000

10010

UQSHRN

Scalar

1

!= 0000

10011

UQRSHRN, UQRSHRN2

Scalar

1

!= 0000

11100

UCVTF (vector, fixed-point)

Scalar

1

!= 0000

11111

FCVTZU (vector, fixed-point)

Scalar

U

immh

opcode

0

!= 0000

0

AdvSIMD scalar three different
31 30 29 28 27 26 25 24 23 22 21 20
0 1 U 1 1 1 1 0 size 1

16 15
Rm

12 11 10 9

opcode

0 0

5 4
Rn

0
Rd

Decode fields

C3-208

Instruction Page

Variant

1001

SQDMLAL, SQDMLAL2 (vector)

Scalar

0

1011

SQDMLSL, SQDMLSL2 (vector)

Scalar

0

1101

SQDMULL, SQDMULL2 (vector)

Scalar

U

opcode

0

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C3 A64 Instruction Set Encoding
C3.6 Data processing - SIMD and floating point

C3.6.11

AdvSIMD scalar three same
31 30 29 28 27 26 25 24 23 22 21 20
0 1 U 1 1 1 1 0 size 1

16 15
Rm

11 10 9
opcode

1

5 4
Rn

0
Rd

Decode fields

ARM DDI 0487A.a
ID090413

Instruction Page

Variant

00001

SQADD

Scalar

-

00101

SQSUB

Scalar

0

-

00110

CMGT (register)

Scalar

0

-

00111

CMGE (register)

Scalar

0

-

01000

SSHL

Scalar

0

-

01001

SQSHL (register)

Scalar

0

-

01010

SRSHL

Scalar

0

-

01011

SQRSHL

Scalar

0

-

10000

ADD (vector)

Scalar

0

-

10001

CMTST

Scalar

0

-

10110

SQDMULH (vector)

Scalar

0

0x

11011

FMULX

Scalar

0

0x

11100

FCMEQ (register)

Scalar

0

0x

11111

FRECPS

Scalar

0

1x

11111

FRSQRTS

Scalar

1

-

00001

UQADD

Scalar

1

-

00101

UQSUB

Scalar

1

-

00110

CMHI (register)

Scalar

1

-

00111

CMHS (register)

Scalar

1

-

01000

USHL

Scalar

1

-

01001

UQSHL (register)

Scalar

1

-

01010

URSHL

Scalar

1

-

01011

UQRSHL

Scalar

1

-

10000

SUB (vector)

Scalar

1

-

10001

CMEQ (register)

Scalar

1

-

10110

SQRDMULH (vector)

Scalar

1

0x

11100

FCMGE (register)

Scalar

U

size

opcode

0

-

0

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C3-209

C3 A64 Instruction Set Encoding
C3.6 Data processing - SIMD and floating point

Decode fields

C3.6.12

Instruction Page

Variant

11101

FACGE

Scalar

1x

11010

FABD

Scalar

1

1x

11100

FCMGT (register)

Scalar

1

1x

11101

FACGT

Scalar

U

size

opcode

1

0x

1

AdvSIMD scalar two-reg misc
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 1 U 1 1 1 1 0 size 1 0 0 0 0

12 11 10 9
opcode

1 0

5 4
Rn

0
Rd

Decode fields

C3-210

Instruction Page

Variant

00011

SUQADD

Scalar

-

00111

SQABS

Scalar

0

-

01000

CMGT (zero)

Scalar

0

-

01001

CMEQ (zero)

Scalar

0

-

01010

CMLT (zero)

Scalar

0

-

01011

ABS

Scalar

0

-

10100

SQXTN, SQXTN2

Scalar

0

0x

11010

FCVTNS (vector)

Scalar

0

0x

11011

FCVTMS (vector)

Scalar

0

0x

11100

FCVTAS (vector)

Scalar

0

0x

11101

SCVTF (vector, integer)

Scalar

0

1x

01100

FCMGT (zero)

Scalar

0

1x

01101

FCMEQ (zero)

Scalar

0

1x

01110

FCMLT (zero)

Scalar

0

1x

11010

FCVTPS (vector)

Scalar

0

1x

11011

FCVTZS (vector, integer)

Scalar

0

1x

11101

FRECPE

Scalar

0

1x

11111

FRECPX

-

1

-

00011

USQADD

Scalar

1

-

00111

SQNEG

Scalar

U

size

opcode

0

-

0

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C3 A64 Instruction Set Encoding
C3.6 Data processing - SIMD and floating point

Decode fields

C3.6.13

Instruction Page

Variant

01000

CMGE (zero)

Scalar

-

01001

CMLE (zero)

Scalar

1

-

01011

NEG (vector)

Scalar

1

-

10010

SQXTUN, SQXTUN2

Scalar

1

-

10100

UQXTN, UQXTN2

Scalar

1

0x

10110

FCVTXN, FCVTXN2

Scalar

1

0x

11010

FCVTNU (vector)

Scalar

1

0x

11011

FCVTMU (vector)

Scalar

1

0x

11100

FCVTAU (vector)

Scalar

1

0x

11101

UCVTF (vector, integer)

Scalar

1

1x

01100

FCMGE (zero)

Scalar

1

1x

01101

FCMLE (zero)

Scalar

1

1x

11010

FCVTPU (vector)

Scalar

1

1x

11011

FCVTZU (vector, integer)

Scalar

1

1x

11101

FRSQRTE

Scalar

U

size

opcode

1

-

1

AdvSIMD scalar x indexed element
31 30 29 28 27 26 25 24 23 22 21 20 19
0 1 U 1 1 1 1 1 size L M

16 15
Rm

12 11 10 9

opcode

H 0

5 4
Rn

0
Rd

Decode fields

ARM DDI 0487A.a
ID090413

Instruction Page

Variant

0011

SQDMLAL, SQDMLAL2 (by element)

Scalar

-

0111

SQDMLSL, SQDMLSL2 (by element)

Scalar

0

-

1011

SQDMULL, SQDMULL2 (by element)

Scalar

0

-

1100

SQDMULH (by element)

Scalar

0

-

1101

SQRDMULH (by element)

Scalar

0

1x

0001

FMLA (by element)

Scalar

0

1x

0101

FMLS (by element)

Scalar

0

1x

1001

FMUL (by element)

Scalar

1

1x

1001

FMULX (by element)

Scalar

U

size

opcode

0

-

0

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C3-211

C3 A64 Instruction Set Encoding
C3.6 Data processing - SIMD and floating point

C3.6.14

AdvSIMD shift by immediate
31 30 29 28 27 26 25 24 23 22
0 Q U 0 1 1 1 1 0

18
immh

16 15

immb

11 10 9
opcode

1

5 4
Rn

0
Rd

Decode fields

C3-212

Instruction Page

Variant

00000

SSHR

Vector

0

00010

SSRA

Vector

0

00100

SRSHR

Vector

0

00110

SRSRA

Vector

0

01010

SHL

Vector

0

01110

SQSHL (immediate)

Vector

0

10000

SHRN, SHRN2

-

0

10001

RSHRN, RSHRN2

-

0

10010

SQSHRN, SQSHRN2

Vector

0

10011

SQRSHRN, SQRSHRN2

Vector

0

10100

SSHLL, SSHLL2

-

0

11100

SCVTF (vector, fixed-point)

Vector

0

11111

FCVTZS (vector, fixed-point)

Vector

1

00000

USHR

Vector

1

00010

USRA

Vector

1

00100

URSHR

Vector

1

00110

URSRA

Vector

1

01000

SRI

Vector

1

01010

SLI

Vector

1

01100

SQSHLU

Vector

1

01110

UQSHL (immediate)

Vector

1

10000

SQSHRUN, SQSHRUN2

Vector

1

10001

SQRSHRUN, SQRSHRUN2

Vector

1

10010

UQSHRN

Vector

1

10011

UQRSHRN, UQRSHRN2

Vector

U

opcode

0

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C3 A64 Instruction Set Encoding
C3.6 Data processing - SIMD and floating point

Decode fields

C3.6.15

Instruction Page

Variant

10100

USHLL, USHLL2

-

1

11100

UCVTF (vector, fixed-point)

Vector

1

11111

FCVTZU (vector, fixed-point)

Vector

U

opcode

1

AdvSIMD three different
31 30 29 28 27 26 25 24 23 22 21 20
0 Q U 0 1 1 1 0 size 1

16 15
Rm

12 11 10 9

opcode

0 0

5 4
Rn

0
Rd

Decode fields

ARM DDI 0487A.a
ID090413

Instruction Page

Variant

0000

SADDL, SADDL2

-

0

0001

SADDW, SADDW2

-

0

0010

SSUBL, SSUBL2

-

0

0011

SSUBW, SSUBW2

-

0

0100

ADDHN, ADDHN2

-

0

0101

SABAL, SABAL2

-

0

0110

SUBHN, SUBHN2

-

0

0111

SABDL, SABDL2

-

0

1000

SMLAL, SMLAL2 (vector)

-

0

1001

SQDMLAL, SQDMLAL2 (vector)

Vector

0

1010

SMLSL, SMLSL2 (vector)

-

0

1011

SQDMLSL, SQDMLSL2 (vector)

Vector

0

1100

SMULL, SMULL2 (vector)

-

0

1101

SQDMULL, SQDMULL2 (vector)

Vector

0

1110

PMULL, PMULL2

-

1

0000

UADDL, UADDL2

-

1

0001

UADDW, UADDW2

-

1

0010

USUBL, USUBL2

-

1

0011

USUBW, USUBW2

-

1

0100

RADDHN, RADDHN2

-

1

0101

UABAL, UABAL2

-

U

opcode

0

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C3-213

C3 A64 Instruction Set Encoding
C3.6 Data processing - SIMD and floating point

Decode fields

C3.6.16

Instruction Page

Variant

0110

RSUBHN, RSUBHN2

-

1

0111

UABDL, UABDL2

-

1

1000

UMLAL, UMLAL2 (vector)

-

1

1010

UMLSL, UMLSL2 (vector)

-

1

1100

UMULL, UMULL2 (vector)

-

U

opcode

1

AdvSIMD three same
31 30 29 28 27 26 25 24 23 22 21 20
0 Q U 0 1 1 1 0 size 1

16 15
Rm

11 10 9
opcode

1

5 4
Rn

0
Rd

Decode fields

C3-214

Instruction Page

Variant

00000

SHADD

-

-

00001

SQADD

Vector

0

-

00010

SRHADD

-

0

-

00100

SHSUB

-

0

-

00101

SQSUB

Vector

0

-

00110

CMGT (register)

Vector

0

-

00111

CMGE (register)

Vector

0

-

01000

SSHL

Vector

0

-

01001

SQSHL (register)

Vector

0

-

01010

SRSHL

Vector

0

-

01011

SQRSHL

Vector

0

-

01100

SMAX

-

0

-

01101

SMIN

-

0

-

01110

SABD

-

0

-

01111

SABA

-

0

-

10000

ADD (vector)

Vector

0

-

10001

CMTST

Vector

0

-

10010

MLA (vector)

-

0

-

10011

MUL (vector)

-

U

size

opcode

0

-

0

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C3 A64 Instruction Set Encoding
C3.6 Data processing - SIMD and floating point

Decode fields

ARM DDI 0487A.a
ID090413

Instruction Page

Variant

10100

SMAXP

-

-

10101

SMINP

-

0

-

10110

SQDMULH (vector)

Vector

0

-

10111

ADDP (vector)

-

0

0x

11000

FMAXNM (vector)

-

0

0x

11001

FMLA (vector)

-

0

0x

11010

FADD (vector)

-

0

0x

11011

FMULX

Vector

0

0x

11100

FCMEQ (register)

Vector

0

0x

11110

FMAX (vector)

-

0

0x

11111

FRECPS

Vector

0

00

00011

AND (vector)

-

0

01

00011

BIC (vector, register)

-

0

1x

11000

FMINNM (vector)

-

0

1x

11001

FMLS (vector)

-

0

1x

11010

FSUB (vector)

-

0

1x

11110

FMIN (vector)

-

0

1x

11111

FRSQRTS

Vector

0

10

00011

ORR (vector, register)

-

0

11

00011

ORN (vector)

-

1

-

00000

UHADD

-

1

-

00001

UQADD

Vector

1

-

00010

URHADD

-

1

-

00100

UHSUB

-

1

-

00101

UQSUB

Vector

1

-

00110

CMHI (register)

Vector

1

-

00111

CMHS (register)

Vector

1

-

01000

USHL

Vector

1

-

01001

UQSHL (register)

Vector

1

-

01010

URSHL

Vector

1

-

01011

UQRSHL

Vector

1

-

01100

UMAX

-

U

size

opcode

0

-

0

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C3-215

C3 A64 Instruction Set Encoding
C3.6 Data processing - SIMD and floating point

Decode fields

C3-216

Instruction Page

Variant

01101

UMIN

-

-

01110

UABD

-

1

-

01111

UABA

-

1

-

10000

SUB (vector)

Vector

1

-

10001

CMEQ (register)

Vector

1

-

10010

MLS (vector)

-

1

-

10011

PMUL

-

1

-

10100

UMAXP

-

1

-

10101

UMINP

-

1

-

10110

SQRDMULH (vector)

Vector

1

0x

11000

FMAXNMP (vector)

-

1

0x

11010

FADDP (vector)

-

1

0x

11011

FMUL (vector)

-

1

0x

11100

FCMGE (register)

Vector

1

0x

11101

FACGE

Vector

1

0x

11110

FMAXP (vector)

-

1

0x

11111

FDIV (vector)

-

1

00

00011

EOR (vector)

-

1

01

00011

BSL

-

1

1x

11000

FMINNMP (vector)

-

1

1x

11010

FABD

Vector

1

1x

11100

FCMGT (register)

Vector

1

1x

11101

FACGT

Vector

1

1x

11110

FMINP (vector)

-

1

10

00011

BIT

-

1

11

00011

BIF

-

U

size

opcode

1

-

1

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C3 A64 Instruction Set Encoding
C3.6 Data processing - SIMD and floating point

C3.6.17

AdvSIMD two-reg misc
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 Q U 0 1 1 1 0 size 1 0 0 0 0

12 11 10 9
opcode

1 0

5 4
Rn

0
Rd

Decode fields

ARM DDI 0487A.a
ID090413

Instruction Page

Variant

00000

REV64

-

-

00001

REV16 (vector)

-

0

-

00010

SADDLP

-

0

-

00011

SUQADD

Vector

0

-

00100

CLS (vector)

-

0

-

00101

CNT

-

0

-

00110

SADALP

-

0

-

00111

SQABS

Vector

0

-

01000

CMGT (zero)

Vector

0

-

01001

CMEQ (zero)

Vector

0

-

01010

CMLT (zero)

Vector

0

-

01011

ABS

Vector

0

-

10010

XTN, XTN2

-

0

-

10100

SQXTN, SQXTN2

Vector

0

0x

10110

FCVTN, FCVTN2

-

0

0x

10111

FCVTL, FCVTL2

-

0

0x

11000

FRINTN (vector)

-

0

0x

11001

FRINTM (vector)

-

0

0x

11010

FCVTNS (vector)

Vector

0

0x

11011

FCVTMS (vector)

Vector

0

0x

11100

FCVTAS (vector)

Vector

0

0x

11101

SCVTF (vector, integer)

Vector

0

1x

01100

FCMGT (zero)

Vector

0

1x

01101

FCMEQ (zero)

Vector

0

1x

01110

FCMLT (zero)

Vector

0

1x

01111

FABS (vector)

-

0

1x

11000

FRINTP (vector)

-

U

size

opcode

0

-

0

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C3-217

C3 A64 Instruction Set Encoding
C3.6 Data processing - SIMD and floating point

Decode fields

C3-218

Instruction Page

Variant

11001

FRINTZ (vector)

-

1x

11010

FCVTPS (vector)

Vector

0

1x

11011

FCVTZS (vector, integer)

Vector

0

1x

11100

URECPE

-

0

1x

11101

FRECPE

Vector

1

-

00000

REV32 (vector)

-

1

-

00010

UADDLP

-

1

-

00011

USQADD

Vector

1

-

00100

CLZ (vector)

-

1

-

00110

UADALP

-

1

-

00111

SQNEG

Vector

1

-

01000

CMGE (zero)

Vector

1

-

01001

CMLE (zero)

Vector

1

-

01011

NEG (vector)

Vector

1

-

10010

SQXTUN, SQXTUN2

Vector

1

-

10011

SHLL, SHLL2

-

1

-

10100

UQXTN, UQXTN2

Vector

1

0x

10110

FCVTXN, FCVTXN2

Vector

1

0x

11000

FRINTA (vector)

-

1

0x

11001

FRINTX (vector)

-

1

0x

11010

FCVTNU (vector)

Vector

1

0x

11011

FCVTMU (vector)

Vector

1

0x

11100

FCVTAU (vector)

Vector

1

0x

11101

UCVTF (vector, integer)

Vector

1

00

00101

NOT

-

1

01

00101

RBIT (vector)

-

1

1x

01100

FCMGE (zero)

Vector

1

1x

01101

FCMLE (zero)

Vector

1

1x

01111

FNEG (vector)

-

1

1x

11001

FRINTI (vector)

-

1

1x

11010

FCVTPU (vector)

Vector

1

1x

11011

FCVTZU (vector, integer)

Vector

U

size

opcode

0

1x

0

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C3 A64 Instruction Set Encoding
C3.6 Data processing - SIMD and floating point

Decode fields

C3.6.18

Instruction Page

Variant

11100

URSQRTE

-

1x

11101

FRSQRTE

Vector

1x

11111

FSQRT (vector)

-

U

size

opcode

1

1x

1
1

AdvSIMD vector x indexed element
31 30 29 28 27 26 25 24 23 22 21 20 19
0 Q U 0 1 1 1 1 size L M

16 15
Rm

12 11 10 9

opcode

H 0

5 4
Rn

0
Rd

Decode fields

ARM DDI 0487A.a
ID090413

Instruction Page

Variant

0010

SMLAL, SMLAL2 (by element)

-

-

0011

SQDMLAL, SQDMLAL2 (by element)

Vector

0

-

0110

SMLSL, SMLSL2 (by element)

-

0

-

0111

SQDMLSL, SQDMLSL2 (by element)

Vector

0

-

1000

MUL (by element)

-

0

-

1010

SMULL, SMULL2 (by element)

-

0

-

1011

SQDMULL, SQDMULL2 (by element)

Vector

0

-

1100

SQDMULH (by element)

Vector

0

-

1101

SQRDMULH (by element)

Vector

0

1x

0001

FMLA (by element)

Vector

0

1x

0101

FMLS (by element)

Vector

0

1x

1001

FMUL (by element)

Vector

1

-

0000

MLA (by element)

-

1

-

0010

UMLAL, UMLAL2 (by element)

-

1

-

0100

MLS (by element)

-

1

-

0110

UMLSL, UMLSL2 (by element)

-

1

-

1010

UMULL, UMULL2 (by element)

-

1

1x

1001

FMULX (by element)

Vector

U

size

opcode

0

-

0

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C3-219

C3 A64 Instruction Set Encoding
C3.6 Data processing - SIMD and floating point

C3.6.19

Crypto AES
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 1 0 0 1 1 1 0 size 1 0 1 0 0

12 11 10 9
opcode

1 0

5 4
Rn

0
Rd

Decode fields

C3.6.20

Instruction Page

Variant

00100

AESE

-

00

00101

AESD

-

00

00110

AESMC

-

00

00111

AESIMC

-

size

opcode

00

Crypto three-reg SHA
31 30 29 28 27 26 25 24 23 22 21 20
0 1 0 1 1 1 1 0 size 0

16 15 14
Rm

12 11 10 9

0 opcode 0 0

5 4
Rn

0
Rd

Decode fields

C3-220

Instruction Page

Variant

000

SHA1C

-

00

001

SHA1P

-

00

010

SHA1M

-

00

011

SHA1SU0

-

00

100

SHA256H

-

00

101

SHA256H2

-

00

110

SHA256SU1

-

size

opcode

00

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C3 A64 Instruction Set Encoding
C3.6 Data processing - SIMD and floating point

C3.6.21

Crypto two-reg SHA
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 1 0 1 1 1 1 0 size 1 0 1 0 0

12 11 10 9
opcode

1 0

5 4

0

Rn

Rd

Decode fields

C3.6.22

Instruction Page

Variant

00000

SHA1H

-

00

00001

SHA1SU1

-

00

00010

SHA256SU0

-

size

opcode

00

Floating-point compare
31 30 29 28 27 26 25 24 23 22 21 20
M 0 S 1 1 1 1 0 type 1

16 15 14 13 12 11 10 9
Rm

op

1 0 0 0

5 4
Rn

0
opcode2

Decode fields

ARM DDI 0487A.a
ID090413

Instruction Page

Variant

00000

FCMP

Single-precision

00

01000

FCMP

Single-precision, zero

00

00

10000

FCMPE

Single-precision

0

00

00

11000

FCMPE

Single-precision, zero

0

0

01

00

00000

FCMP

Double-precision

0

0

01

00

01000

FCMP

Double-precision, zero

0

0

01

00

10000

FCMPE

Double-precision

0

0

01

00

11000

FCMPE

Double-precision, zero

M

S

type

op

opcode2

0

0

00

00

0

0

00

0

0

0

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C3-221

C3 A64 Instruction Set Encoding
C3.6 Data processing - SIMD and floating point

C3.6.23

Floating-point conditional compare
31 30 29 28 27 26 25 24 23 22 21 20
M 0 S 1 1 1 1 0 type 1

16 15
Rm

12 11 10 9
cond

0 1

5 4 3
Rn

0

op

nzcv

Decode fields

C3.6.24

Instruction Page

Variant

0

FCCMP

Single-precision

00

1

FCCMPE

Single-precision

0

01

0

FCCMP

Double-precision

0

01

1

FCCMPE

Double-precision

M

S

type

op

0

0

00

0

0

0
0

Floating-point conditional select
31 30 29 28 27 26 25 24 23 22 21 20
M 0 S 1 1 1 1 0 type 1

16 15
Rm

12 11 10 9
cond

1 1

5 4
Rn

0
Rd

Decode fields

C3.6.25

Instruction Page

Variant

00

FCSEL

Single-precision

01

FCSEL

Double-precision

M

S

type

0

0

0

0

Floating-point data-processing (1 source)
31 30 29 28 27 26 25 24 23 22 21 20
M 0 S 1 1 1 1 0 type 1

15 14 13 12 11 10 9
opcode

1 0 0 0 0

5 4
Rn

0
Rd

Decode fields

C3-222

Instruction Page

Variant

000000

FMOV (register)

Single-precision

00

000001

FABS (scalar)

Single-precision

0

00

000010

FNEG (scalar)

Single-precision

0

0

00

000011

FSQRT (scalar)

Single-precision

0

0

00

000101

FCVT

Single-precision to double-precision

0

0

00

000111

FCVT

Single-precision to half-precision

M S

type

opcode

0

0

00

0

0

0

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C3 A64 Instruction Set Encoding
C3.6 Data processing - SIMD and floating point

Decode fields

ARM DDI 0487A.a
ID090413

Instruction Page

Variant

001000

FRINTN (scalar)

Single-precision

00

001001

FRINTP (scalar)

Single-precision

0

00

001010

FRINTM (scalar)

Single-precision

0

0

00

001011

FRINTZ (scalar)

Single-precision

0

0

00

001100

FRINTA (scalar)

Single-precision

0

0

00

001110

FRINTX (scalar)

Single-precision

0

0

00

001111

FRINTI (scalar)

Single-precision

0

0

01

000000

FMOV (register)

Double-precision

0

0

01

000001

FABS (scalar)

Double-precision

0

0

01

000010

FNEG (scalar)

Double-precision

0

0

01

000011

FSQRT (scalar)

Double-precision

0

0

01

000100

FCVT

Double-precision to single-precision

0

0

01

000111

FCVT

Double-precision to half-precision

0

0

01

001000

FRINTN (scalar)

Double-precision

0

0

01

001001

FRINTP (scalar)

Double-precision

0

0

01

001010

FRINTM (scalar)

Double-precision

0

0

01

001011

FRINTZ (scalar)

Double-precision

0

0

01

001100

FRINTA (scalar)

Double-precision

0

0

01

001110

FRINTX (scalar)

Double-precision

0

0

01

001111

FRINTI (scalar)

Double-precision

0

0

11

000100

FCVT

Half-precision to single-precision

0

0

11

000101

FCVT

Half-precision to double-precision

M S

type

opcode

0

0

00

0

0

0

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C3-223

C3 A64 Instruction Set Encoding
C3.6 Data processing - SIMD and floating point

C3.6.26

Floating-point data-processing (2 source)
31 30 29 28 27 26 25 24 23 22 21 20
M 0 S 1 1 1 1 0 type 1

16 15
Rm

12 11 10 9

opcode

1 0

5 4
Rn

0
Rd

Decode fields

C3-224

Instruction Page

Variant

0000

FMUL (scalar)

Single-precision

00

0001

FDIV (scalar)

Single-precision

0

00

0010

FADD (scalar)

Single-precision

0

0

00

0011

FSUB (scalar)

Single-precision

0

0

00

0100

FMAX (scalar)

Single-precision

0

0

00

0101

FMIN (scalar)

Single-precision

0

0

00

0110

FMAXNM (scalar)

Single-precision

0

0

00

0111

FMINNM (scalar)

Single-precision

0

0

00

1000

FNMUL

Single-precision

0

0

01

0000

FMUL (scalar)

Double-precision

0

0

01

0001

FDIV (scalar)

Double-precision

0

0

01

0010

FADD (scalar)

Double-precision

0

0

01

0011

FSUB (scalar)

Double-precision

0

0

01

0100

FMAX (scalar)

Double-precision

0

0

01

0101

FMIN (scalar)

Double-precision

0

0

01

0110

FMAXNM (scalar)

Double-precision

0

0

01

0111

FMINNM (scalar)

Double-precision

0

0

01

1000

FNMUL

Double-precision

M

S

type

opcode

0

0

00

0

0

0

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C3 A64 Instruction Set Encoding
C3.6 Data processing - SIMD and floating point

C3.6.27

Floating-point data-processing (3 source)
31 30 29 28 27 26 25 24 23 22 21 20
M 0 S 1 1 1 1 1 type o1

16 15 14
Rm

o0

10 9
Ra

5 4
Rn

0
Rd

Decode fields

C3.6.28

Instruction Page

Variant

0

FMADD

Single-precision

0

1

FMSUB

Single-precision

00

1

0

FNMADD

Single-precision

0

00

1

1

FNMSUB

Single-precision

0

0

01

0

0

FMADD

Double-precision

0

0

01

0

1

FMSUB

Double-precision

0

0

01

1

0

FNMADD

Double-precision

0

0

01

1

1

FNMSUB

Double-precision

M

S

type

o1

o0

0

0

00

0

0

0

00

0

0

0

Floating-point immediate
31 30 29 28 27 26 25 24 23 22 21 20
M 0 S 1 1 1 1 0 type 1

13 12 11 10 9
imm8

1 0 0

5 4
imm5

0
Rd

Decode fields

ARM DDI 0487A.a
ID090413

Instruction Page

Variant

00000

FMOV (scalar, immediate)

Single-precision

00000

FMOV (scalar, immediate)

Double-precision

M

S

type

imm5

0

0

00

0

0

01

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C3-225

C3 A64 Instruction Set Encoding
C3.6 Data processing - SIMD and floating point

C3.6.29

Floating-point<->fixed-point conversions
31 30 29 28 27 26 25 24 23 22 21 20 19 18

16 15

sf 0 S 1 1 1 1 0 type 0 rmode opcode

10 9

5 4

scale

Rn

0
Rd

Decode fields

C3.6.30

Instruction Page

Variant

-

SCVTF (scalar, fixed-point)

32-bit to single-precision

011

-

UCVTF (scalar, fixed-point)

32-bit to single-precision

11

000

-

FCVTZS (scalar, fixed-point)

Single-precision to 32-bit

00

11

001

-

FCVTZU (scalar, fixed-point)

Single-precision to 32-bit

0

01

00

010

-

SCVTF (scalar, fixed-point)

32-bit to double-precision

0

0

01

00

011

-

UCVTF (scalar, fixed-point)

32-bit to double-precision

0

0

01

11

000

-

FCVTZS (scalar, fixed-point)

Double-precision to 32-bit

0

0

01

11

001

-

FCVTZU (scalar, fixed-point)

Double-precision to 32-bit

1

0

00

00

010

-

SCVTF (scalar, fixed-point)

64-bit to single-precision

1

0

00

00

011

-

UCVTF (scalar, fixed-point)

64-bit to single-precision

1

0

00

11

000

-

FCVTZS (scalar, fixed-point)

Single-precision to 64-bit

1

0

00

11

001

-

FCVTZU (scalar, fixed-point)

Single-precision to 64-bit

1

0

01

00

010

-

SCVTF (scalar, fixed-point)

64-bit to double-precision

1

0

01

00

011

-

UCVTF (scalar, fixed-point)

64-bit to double-precision

1

0

01

11

000

-

FCVTZS (scalar, fixed-point)

Double-precision to 64-bit

1

0

01

11

001

-

FCVTZU (scalar, fixed-point)

Double-precision to 64-bit

sf

S type

rmode

opcode

scale

0

0

00

00

010

0

0

00

00

0

0

00

0

0

0

Floating-point<->integer conversions
31 30 29 28 27 26 25 24 23 22 21 20 19 18

16 15 14 13 12 11 10 9

sf 0 S 1 1 1 1 0 type 1 rmode opcode 0 0 0 0 0 0

5 4
Rn

0
Rd

Decode fields

C3-226

Instruction Page

Variant

000

FCVTNS (scalar)

Single-precision to 32-bit

00

001

FCVTNU (scalar)

Single-precision to 32-bit

00

00

010

SCVTF (scalar, integer)

32-bit to single-precision

00

00

011

UCVTF (scalar, integer)

32-bit to single-precision

sf

S type

rmode

opcode

0

0

00

00

0

0

00

0

0

0

0

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C3 A64 Instruction Set Encoding
C3.6 Data processing - SIMD and floating point

Decode fields

ARM DDI 0487A.a
ID090413

Instruction Page

Variant

100

FCVTAS (scalar)

Single-precision to 32-bit

00

101

FCVTAU (scalar)

Single-precision to 32-bit

00

00

110

FMOV (general)

Single-precision to 32-bit

0

00

00

111

FMOV (general)

32-bit to single-precision

0

0

00

01

000

FCVTPS (scalar)

Single-precision to 32-bit

0

0

00

01

001

FCVTPU (scalar)

Single-precision to 32-bit

0

0

00

10

000

FCVTMS (scalar)

Single-precision to 32-bit

0

0

00

10

001

FCVTMU (scalar)

Single-precision to 32-bit

0

0

00

11

000

FCVTZS (scalar, integer)

Single-precision to 32-bit

0

0

00

11

001

FCVTZU (scalar, integer)

Single-precision to 32-bit

0

0

01

00

000

FCVTNS (scalar)

Double-precision to 32-bit

0

0

01

00

001

FCVTNU (scalar)

Double-precision to 32-bit

0

0

01

00

010

SCVTF (scalar, integer)

32-bit to double-precision

0

0

01

00

011

UCVTF (scalar, integer)

32-bit to double-precision

0

0

01

00

100

FCVTAS (scalar)

Double-precision to 32-bit

0

0

01

00

101

FCVTAU (scalar)

Double-precision to 32-bit

0

0

01

01

000

FCVTPS (scalar)

Double-precision to 32-bit

0

0

01

01

001

FCVTPU (scalar)

Double-precision to 32-bit

0

0

01

10

000

FCVTMS (scalar)

Double-precision to 32-bit

0

0

01

10

001

FCVTMU (scalar)

Double-precision to 32-bit

0

0

01

11

000

FCVTZS (scalar, integer)

Double-precision to 32-bit

0

0

01

11

001

FCVTZU (scalar, integer)

Double-precision to 32-bit

1

0

00

00

000

FCVTNS (scalar)

Single-precision to 64-bit

1

0

00

00

001

FCVTNU (scalar)

Single-precision to 64-bit

1

0

00

00

010

SCVTF (scalar, integer)

64-bit to single-precision

1

0

00

00

011

UCVTF (scalar, integer)

64-bit to single-precision

1

0

00

00

100

FCVTAS (scalar)

Single-precision to 64-bit

1

0

00

00

101

FCVTAU (scalar)

Single-precision to 64-bit

1

0

00

01

000

FCVTPS (scalar)

Single-precision to 64-bit

1

0

00

01

001

FCVTPU (scalar)

Single-precision to 64-bit

1

0

00

10

000

FCVTMS (scalar)

Single-precision to 64-bit

1

0

00

10

001

FCVTMU (scalar)

Single-precision to 64-bit

sf

S type

rmode

opcode

0

0

00

00

0

0

00

0

0

0

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C3-227

C3 A64 Instruction Set Encoding
C3.6 Data processing - SIMD and floating point

Decode fields

C3-228

Instruction Page

Variant

000

FCVTZS (scalar, integer)

Single-precision to 64-bit

11

001

FCVTZU (scalar, integer)

Single-precision to 64-bit

01

00

000

FCVTNS (scalar)

Double-precision to 64-bit

0

01

00

001

FCVTNU (scalar)

Double-precision to 64-bit

1

0

01

00

010

SCVTF (scalar, integer)

64-bit to double-precision

1

0

01

00

011

UCVTF (scalar, integer)

64-bit to double-precision

1

0

01

00

100

FCVTAS (scalar)

Double-precision to 64-bit

1

0

01

00

101

FCVTAU (scalar)

Double-precision to 64-bit

1

0

01

00

110

FMOV (general)

Double-precision to 64-bit

1

0

01

00

111

FMOV (general)

64-bit to double-precision

1

0

01

01

000

FCVTPS (scalar)

Double-precision to 64-bit

1

0

01

01

001

FCVTPU (scalar)

Double-precision to 64-bit

1

0

01

10

000

FCVTMS (scalar)

Double-precision to 64-bit

1

0

01

10

001

FCVTMU (scalar)

Double-precision to 64-bit

1

0

01

11

000

FCVTZS (scalar, integer)

Double-precision to 64-bit

1

0

01

11

001

FCVTZU (scalar, integer)

Double-precision to 64-bit

1

0

10

01

110

FMOV (general)

Top half of 128-bit to 64-bit

1

0

10

01

111

FMOV (general)

64-bit to top half of 128-bit

sf

S type

rmode

opcode

1

0

00

11

1

0

00

1

0

1

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

Chapter C4
The AArch64 System Instruction Class

This chapter describes the AArch64 system instructions and registers, and the system instruction class encoding
space. It contains the following sections:
•
About the System instruction and System register descriptions on page C4-230.
•
The System instruction class encoding space on page C4-232.
•
PSTATE and special purpose registers on page C4-251.
•
A64 system instructions for cache maintenance on page C4-306.
•
A64 system instructions for address translation on page C4-322
•
A64 system instructions for TLB maintenance on page C4-335

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-229

C4 The AArch64 System Instruction Class
C4.1 About the System instruction and System register descriptions

C4.1

About the System instruction and System register descriptions
This section provides general information about the System instructions and the System register descriptions.
The terms defined in Fixed values in instruction and register descriptions apply throughout this manual. That is,
they are not restricted to the System instruction and the System register descriptions.

C4.1.1

Fixed values in instruction and register descriptions
In register descriptions, the meaning of some bits depends on the PE state. This affects the definitions of RES0 and
RES1 given in this section. These definitions are more detailed than those given in the Glossary.
The following terms are used to describe bits or fields with fixed values:
RAZ
In any implementation:
•
The field must read as zero.
•
Writes to the field must be ignored.
•
Software:
—
Can rely on the field reading as zero.
—
Must use an SBZP policy to write to the field.
In diagrams, a RAZ bit can be shown as 0.
RES0
In diagrams, and sometimes in other descriptions, a RES0 bit can be shown as (0). This notation can
be expanded for bitfields, so a three-bit RES0 field can be shown as either (0)(0)(0) or as (000).
Within the architecture, there are a small number of cases where a register bit or bitfield:
•
Is RES0 in some defined architectural context.
•
Has different defined behavior in a different architectural context.
The definition of RES0 is modified for those bits.
This means the definition of RES0 is:
If a bit is RES0 in all contexts
•
The bit must read as 0.
•
Writes to the bit must be ignored.
•
Software:
—
Must not rely on the bit reading as 0.
—
Must use an SBZP policy to write to the bit.
If a register bit is RES0 only in some contexts, when that bit is described as RES0
•
A read of the bit must return the value last successfully written to the bit,
regardless of the use of the register when the bit was written.
If the bit has not been successfully written since reset, then the read of the bit
returns the reset value if there is one, or otherwise returns an UNKNOWN value.
•
A write to the bit must update a storage location associated with the bit.
•
While the use of the register is such that the bit is described as RES0, the value of
the bit must have no effect on the operation of the PE, other than determining the
value read back from that bit.
•
Software:
—
Must not rely on the bit reading as 0.
—
Must use an SBZP policy to write to the bit.
The RES0 description can be applied to bits or bitfields that are read-only, or are write-only:
•
For a read-only bit, RES0 indicates that the bit reads as 0, but software must treat the bit as
UNKNOWN.
•
For a write-only bit, RES0 indicates that software must treat the bit as SBZ.
RAO
In any implementation:
•
The field must read as all 1s.
•
Writes to the field must be ignored.
•
Software:
—
Can rely on the field reading as all 1s.

C4-230

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.1 About the System instruction and System register descriptions

RES1

ARM DDI 0487A.a
ID090413

—
Must use an SBOP policy to write to the field.
In diagrams, a RAZ bit can be shown as 0.
In diagrams, and sometimes in other descriptions, a RES1 bit can be shown as (1). This notation can
be expanded for bitfields, so a three-bit RES1 field can be shown as either (1)(1)(1) or as (111).
Within the architecture, there are a small number of cases where a register bit or bitfield:
•
Is RES1 in some defined architectural context.
•
Has different defined behavior in a different architectural context.
The definition of RES1 is modified for those bits.
This means the definition of RES1 is:
If a bit is RES1 in all contexts
•
The bit must read as 1.
•
Writes to the bit must be ignored.
•
Software:
—
Must not rely on the bit reading as 1.
—
Must use an SBOP policy to write to the bit.
If a register bit is RES1 only in some contexts, when that bit is described as RES1
•
A read of the bit must return the value last successfully written to the bit,
regardless of the use of the register when the bit was written.
If the bit has not been successfully written since reset, then the read of the bit
returns the reset value if there is one, or otherwise returns an UNKNOWN value.
•
A write to the bit must update a storage location associated with the bit.
•
While the use of the register is such that the bit is described as RES1, the value of
the bit must have no effect on the operation of the PE, other than determining the
value read back from that bit.
•
Software:
—
Must not rely on the bit reading as 1.
—
Must use an SBOP policy to write to the bit.
The RES1 description can be applied to bits or bitfields that are read-only, or are write-only:
•
For a read-only bit, RES1 indicates that the bit reads as 1, but software must treat the bit as
UNKNOWN.
•
For a write-only bit, RES1 indicates that software must treat the bit as SBO.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-231

C4 The AArch64 System Instruction Class
C4.2 The System instruction class encoding space

C4.2

The System instruction class encoding space
Part of the A64 instruction encoding space is assigned to instructions that access the system register space. These
instructions provide:
•

Access to System registers, including the debug registers, that provide system control, and system status
information.

•

Access to special-purpose registers such as SPSR_ELx on page AppxJ-5091, ELR_ELx on page AppxJ-5091,
and the equivalent fields of the Process State.

•

The cache and TLB maintenance instructions and address translation instructions.

•

Barriers and the CLREX instruction.

•

Architectural hint instructions.

This section describes the general model for accessing this functionality.

Note
In ARMv7 and earlier versions of the ARM architecture, this functionality is provided through conceptual
coprocessors CP14 and CP15, and in part through CP10 and CP11. These are accessed through a generic
coprocessor interface. For compatibility:
•

ARMv8 AArch32 state retains this conceptual coprocessor model. However, it adds register and operation
aliases, to simplify access to this functionality.

•

In the instruction encoding descriptions, AArch64 state retains the naming of the instruction arguments as
Op1, CRn, CRm, and Op2. However, there is no functional distinction between the Opn arguments and the
CRx arguments.

Principles of the System instruction class encoding describes some general properties of these encodings. System
instruction class encoding overview on page C4-233 then describes the top-level encoding of these instructions, and
the following sections then describe the next level of the encoding hierarchy:
•
Op0==0b00, architectural hints, barriers and CLREX, and PSTATE access on page C4-234.
•
Op0==0b01, cache maintenance, TLB maintenance, and address translation instructions on page C4-237.
•
Op0==0b10, Moves to and from debug, trace, and Execution environment System registers on page C4-240.
•
Op0==0b11, Moves to and from non-debug System registers and special-purpose registers on page C4-242.
•
Reserved control space for IMPLEMENTATION DEFINED functionality on page C4-250.

C4.2.1

Principles of the System instruction class encoding
In ARMv8, an encoding in the System instruction space is identified by a set of arguments, Op0, Op1, CRn, CRm, and
Op2. These form an encoding hierarchy, where:
Op0

Op1

Defines the top-level division of the encoding space, see System instruction class encoding overview
on page C4-233.
Identifies the lowest Exception level at which the encoding is accessible, as follows:
Accessible at EL0

Op1 has the value 3.

Accessible at EL1

Op1 has the value 0, 1, or 2. The value is the same as the Op1 value used to

access the equivalent AArch32 register.
Accessible at EL2

Op1 has the value 4.

Accessible at EL3

Op1 has the value 6.

ARM strongly recommends that implementers adopt this use of Op1 when using the IMPLEMENTATION DEFINED
regions of the encoding space described in Reserved control space for IMPLEMENTATION DEFINED functionality
on page C4-250.

C4-232

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.2 The System instruction class encoding space

C4.2.2

System instruction class encoding overview
The encoding of the System instruction class describes each instruction as being either:
•
A transfer to a System register. This is a System instruction with the semantics of a write.
•
A transfer from a System register. This is a System instruction with the semantics of a read.
A System instruction that initiate an operation operates as if it was making a transfer to a register.
In the AArch64 instruction set, the decode structure for the System instruction class is:
31 30 29 28 27 26 25 24 23 22 21 20 19 18
16 15
12 11
8 7
5 4
1 1 0 1 0 1 0 1 0 0 L Op0
Op1
CRn
CRm
Op2

0
Rt

The value of L indicates the transfer direction:
0
Transfer to system register.
1
Transfer from system register.
The Op0 field is the top level encoding of the System instruction type. Its possible values are:
0b00

These encodings provide:
•
Instructions with an immediate field for accessing PSTATE, the current PE state.
•
The architectural hint instructions.
•
Barriers and the CLREX instruction.
For more information about these encodings, see Op0==0b00, architectural hints, barriers and
CLREX, and PSTATE access on page C4-234.

0b01

These encodings provide the cache maintenance, TLB maintenance, and address translation
operations.

Note
These are equivalent to operations in the AArch32 CP15 space.
For more information, see Op0==0b01, cache maintenance, TLB maintenance, and address
translation instructions on page C4-237.
0b10

These encodings provide moves to and from:
•

Legacy AArch32 System registers for execution environments, to provide access to these
registers from higher exception levels that are using AArch64.

•

Debug and trace registers.

Note
These are equivalent to the registers in the AArch32 CP14 space,.
For more information, see Op0==0b10, Moves to and from debug, trace, and Execution
environment System registers on page C4-240.
0b11

These encodings provide:
•
Moves to and from System registers for software execution in Non-debug state. These
registers provide Non-debug state system control, and system status information.

Note
These are equivalent to the registers in the AArch32 CP15 space,.
•

Instructions for accessing special-purpose registers.

For more information, see Instructions for accessing special-purpose registers on page C4-248 and
Instructions for accessing non-debug System registers on page C4-242.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-233

C4 The AArch64 System Instruction Class
C4.2 The System instruction class encoding space

UNDEFINED behaviors
In the System register instruction encoding space, the following principles apply:
•
All unallocated encodings are treated as UNDEFINED.
•
All encodings with L==1 and Op0==0b0x are UNDEFINED, except for encodings in the area reserved for
IMPLEMENTATION DEFINED use, see Reserved control space for IMPLEMENTATION DEFINED
functionality on page C4-250.
For registers and operations that are accessible from a particular Exception level, any attempt to access those
registers from a lower Exception level is UNDEFINED.
If a particular Exception level:
•
Defines a register to be RO then any attempt to write to that register, at that Exception level, is UNDEFINED.
This means that any access to that register with L ==0 is UNDEFINED.
•
A register to be WO then any attempt to read from that register, at that Exception level, is UNDEFINED. This
means that any access to that register with L ==1 is UNDEFINED.
For IMPLEMENTATION DEFINED encoding spaces, the treatment of the encodings is IMPLEMENTATION DEFINED, but
see the recommendation in Principles of the System instruction class encoding on page C4-232.

C4.2.3

Op0==0b00, architectural hints, barriers and CLREX, and PSTATE access
The different groups of System register instructions with Op0== 0b00:
•
Are identified by the value of CRn.
•
Are always encoded with a value of 0b11111 in the Rt field.
The encoding of these instructions is:
31 30 29 28 27 26 25 24 23 22 21 20 19 18
16 15
12 11
8 7
5 4
0
1 1 0 1 0 1 0 1 0 0 L 0 0
Op1
CRn
CRm
Op2
1 1 1 1 1
Op0

Rt

The encoding of the CRn field is as follows:
0b0010
See Architectural hint instructions.
0b0011
See Barriers and CLREX on page C4-235.
0b0100
See Instructions for accessing the PSTATE fields on page C4-236.

Architectural hint instructions
The architectural hint instructions are identified by CRn having the value 0b0010. The encoding of these instructions
is:
31 30 29 28 27 26 25 24 23 22 21 20 19 18
16 15
12 11
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0
Op0

Op1

CRn

Op<6:0>
CRm

Op2

5 4
0
1 1 1 1 1
Rt

The value of Op<6:0>, formed by concatenating the CRm and Op2 fields, determines the hint instruction as follows:
0b0000000
NOP instruction. This has no effect on architectural state other than to advance the PC.
0b0000001
YIELD instruction.
0b0000010
WFE instruction.
0b0000011
WFI instruction.
0b0000100
SEV instruction.
0b0000101
SEVL instruction.
0b0000110 - 0b1111111
Unallocated values. These behave as a NOP.

Note
•

C4-234

Instruction encodings with bits[4:0] not set to 0b11111 are UNDEFINED.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.2 The System instruction class encoding space

•

The operation of the A64 instructions for architectural hints are identical to the corresponding A32 and T32
instructions.

For more information about:
•
The WFE, WFI, SEV, and SEVL instructions, see Mechanisms for entering a low-power state on page D1-1533.
•
The YIELD instruction, see Software control features and EL0 on page B1-65.

Barriers and CLREX
The barriers and CLREX instructions are identified by CRn having the value 0b0011. The encoding of these instructions
is:
31 30 29 28 27 26 25 24 23 22 21 20 19 18
16 15
12 11
8 7
5 4
0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 1
CRm
Op2
1 1 1 1 1
Op0

Op1

CRn

CRm

Op2

Rt

The value of Op2 determines the instruction, as follows. For the DSB and DMB instructions, CRm controls the instruction
options.
0b010
CLREX instruction. The value of CRm is ignored.
0b100
DSB instruction. The value of CRm sets the option type, see Table C4-1.
0b101
DMB instruction. The value of CRm sets the option type, see Table C4-1.
0b110
ISB instruction. The value of CRm is ignored.
0b000, 0b001, 0b011, 0b111
UNDEFINED.

Note
Instruction encodings with bits[4:0] not set to 0b11111 are UNDEFINED.
Table C4-1 shows the CRm encodings for the data barrier option types.
Table C4-1 CRm encoding for DMB and DSB instructions

ARM DDI 0487A.a
ID090413

CRm value

Option, for DMB and DSB

Meaning

0001

OSHLD

Outer Shareable, load

0010

OSHST

Outer Shareable, store

0011

OSH

Outer Shareable, all

0101

NSHLD

Non-shareable, load

0110

NSHST

Non-shareable, store

0111

NSH

Non-shareable, all

1001

ISHLD

Inner Shareable, load

1010

ISHST

Inner Shareable, store

1011

ISH

Inner Shareable, all

1101

LD

Full system, load

1110

ST

Full system, store

0000, 0100, 1000, 1111

SYS

Full system, all

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-235

C4 The AArch64 System Instruction Class
C4.2 The System instruction class encoding space

Note
The operation of the A64 instructions for barriers and CLREX are identical to the corresponding A32 and T32
instructions.
For more information about:
•
The barrier instructions, see Memory barriers on page B2-85.
•
The CLREX instruction, see Synchronization and semaphores on page B2-100.

Instructions for accessing the PSTATE fields
The A64 instruction set provides instructions that can be used to modify PSTATE fields directly. These instructions
are:
MSR DAIFSet, #Imm4
MSR DAIFClr, #Imm4
MSR SPSel, #Imm1

; Used to set any or all of DAIF to 1
; Used to clear any or all of DAIF to 0
; Used to select the Stack Pointer, between SP_EL0 and SP_ELx

The PSTATE field update instructions are identified by CRn having the value 0b0100. The encoding of these
instructions is:
31 30 29 28 27 26 25 24 23 22 21 20 19 18
16 15
12 11
8 7
5 4
0
1 1 0 1 0 1 0 1 0 0 0 0 0
Op1
0 1 0 0
Imm4
Op2
1 1 1 1 1
Op0

Op1

CRn

CRm

Op2

Rt

The value of Op2 selects the instruction form, which defines the constraints on the values of the Op1 and Imm4
arguments, as follows:
Op2 == 0b101

Selects the MSR SPSel instruction.
Op1 must be 0b000.

This instruction is accessible at EL1 or higher.
Imm4<0> selects the accessed stack pointer, as follows:

0
1

Selects SP_EL0.
Selects SP_ELx on page AppxJ-5091, where x is the number of the current Exception
level, 1, 2, or 3.

Imm4<3:1> are RES0.
Op2 == 0b110

Selects the MSR DAIFSet instruction, that sets the specified PSTATE.{D, A, I, F} bits to 1.
Op1 must be 0b011.

This instruction is accessible at EL1 or higher, and when the value of the SCTLR_EL1.UMA bit is
1 it is also accessible at EL0.
Imm4 determines which of the PSTATE.{D, A, I, F} bits are set to 1, as follows:

Op2 == 0b111

Imm4<3>

If this bit is set to 1 then the D bit is set to 1, otherwise the D bit is not changed.

Imm4<2>

If this bit is set to 1 then the A bit is set to 1, otherwise the A bit is not changed.

Imm4<1>

If this bit is set to 1 then the I bit is set to 1, otherwise the I bit is not changed.

Imm4<0>

If this bit is set to 1 then the F bit is set to 1, otherwise the F bit is not changed.

Selects the MSR DAIFClr instruction, that clears the specified PSTATE.{D, A, I, F} bits to 0.
Op1 must be 0b011.

This instruction is accessible at EL1 or higher, and when the value of the SCTLR_EL1.UMA bit is
1 it is also accessible at EL0.
Imm4 determines which of the PSTATE.{D, A, I, F} bits is cleared to 0, as follows:

C4-236

Imm4<3>

If this bit is set to 1 then the D bit is cleared to 0, otherwise the D bit is not changed.

Imm4<2>

If this bit is set to 1 then the A bit is cleared to 0, otherwise the A bit is not changed.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.2 The System instruction class encoding space

Imm4<1>

If this bit is set to 1 then the I bit is cleared to 0, otherwise the I bit is not changed.

Imm4<0>

If this bit is set to 1 then the F bit is cleared to 0, otherwise the F bit is not changed.

All other combinations of Op1 and Op2 are reserved, and the corresponding instructions are UNDEFINED.

Note
For PSTATE updates, instruction encodings with bits[4:0] not set to 0b11111 are UNDEFINED.
Writes to PSTATE.{D, A, I, F} occur in program order without the need for additional synchronization. Changing
PSTATE.SPSel to use EL0 synchronizes any updates to SP_EL0 that have been written by an MSR to SP_EL0,
without the need for additional synchronization.
For more information about PSTATE, see Process state, PSTATE on page D1-1421.

C4.2.4

Op0==0b01, cache maintenance, TLB maintenance, and address translation instructions
The System instructions are encoded with Op0== 0b01. The different groups of System instructions are identified by
the values of CRn and CRm, except that some of this encoding space is reserved for IMPLEMENTATION DEFINED
functionality. The encoding of these instructions is:
31 30 29 28 27 26 25 24 23 22 21 20 19 18
16 15
12 11
8 7
5 4
1 1 0 1 0 1 0 1 0 0 0 0 1
Op1
CRn
CRm
Op2

0
Xt

Op0

The grouping of these instructions depending on the CRn and CRm fields is as follows:
The instruction group is determined by the value of CRm, as follows:
CRm=={1, 5}
Instruction cache maintenance operations.
CRm==4
Data cache zero operation.
CRm=={6, 10, 11, 14}
Data cache maintenance operations.
See Cache maintenance instructions, and data cache zero.
CRm==8
See Address translation instructions on page C4-238.
CRn ==8
See TLB maintenance instructions on page C4-239.
CRn=={11, 15} See Reserved control space for IMPLEMENTATION DEFINED functionality on page C4-250.
CRn ==7

Cache maintenance instructions, and data cache zero
Table C4-2 lists the Cache maintenance instructions and their encodings. Instructions that take an argument include
Xt in the instruction syntax. For instructions that do not take an argument, the Xt field is encoded as 0b11111.
Table C4-2 Cache maintenance instructions
Access instruction encoding
Instruction

Notes
Op1

CRn

CRm

Op2

1

0

5

0

5

1

Instruction cache maintenance operations
IC IALLUIS

0

7

IC IALLU
IC IVAU, Xt

3

7

Accessible from EL1 or higher.

When SCTLR_EL1.UCI == 1, accessible from EL0 or higher.
Otherwise, accessible from EL1 or higher.

Data cache maintenance operations

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-237

C4 The AArch64 System Instruction Class
C4.2 The System instruction class encoding space

Table C4-2 Cache maintenance instructions (continued)
Access instruction encoding
Instruction

Notes

DC IVAC, Xt

Op1

CRn

CRm

Op2

0

7

6

1

Accessible from EL1 or higher.

2

DC ISW, Xt
DC CSW, Xt

10

2

DC CISW, Xt

14

2

10

1

DC CVAU, Xt

11

1

DC CIVAC, Xt

14

1

4

1

3

DC CVAC, Xt

7

When SCTLR_EL1.UCI == 1, accessible from EL0 or higher.
Otherwise, accessible from EL1 or higher.

Data cache zero operation
3

DC ZVA, Xt

7

When SCTLR_EL1.UCI == 1, accessible from EL0 or higher.
Otherwise, accessible from EL1 or higher.

For more information about these instructions, see Cache maintenance operations on page D4-1680 and Cache
maintenance instructions on page D4-1684.

Address translation instructions
Table C4-3 lists the Address translation instructions and their encodings. The syntax of the instructions includes Xt,
that provides the address to be translated.
Table C4-3 Address translation instructions
Access instruction encoding
Instruction

AT S1E1R, Xt

Notes
Op1

CRn

CRm

Op2

0

7

8

0

AT S1E1W, Xt

1

AT S1E0R, Xt

2

AT S1E0W, Xt

3

AT S1E2R, Xt

4

7

8

0

AT S1E2W, Xt

1

AT S12E1R, Xt

4

AT S12E1W, Xt

5

AT S12E0R, Xt

6

AT S12E0W, Xt

7

AT S1E3R, Xt
AT S1E3W, Xt

6

7

8

0

Accessible from EL1 or higher.

Accessible from EL2 or higher.

Accessible only from EL3.

1

For more information about these instructions, see Address translation operations on page D5-1756.

C4-238

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.2 The System instruction class encoding space

TLB maintenance instructions
Table C4-4 lists the TLB maintenance instructions and their encodings. Instructions that take an argument include
Xt in the instruction syntax. For instructions that do not take an argument, the Xt field is encoded as 0b11111.
Table C4-4 TLB maintenance instructions
Access instruction encoding
Instruction

TLBI VMALLE1IS

Notes
Op1

CRn

CRm

Op2

0

8

3

0

TLBI VAE1IS, Xt

1

TLBI ASIDE1IS, Xt

2

TLBI VAAE1IS, Xt

3

TLBI VALE1IS, Xt

5

TLBI VAALE1IS, Xt

7

TLBI VMALLE1

7

0

TLBI VAE1, Xt

1

TLBI ASIDE1, Xt

2

TLBI VAAE1, Xt

3

TLBI VALE1, Xt

5

TLBI VAALE1, Xt

7

TLBI IPAS2E1IS, Xt

4

8

0

TLBI IPAS2LE1IS, Xt
TLBI ALLE2IS

1

3

TLBI ALLE1IS

4

TLBI VALE2IS, Xt

5

TLBI VMALLS12E1IS

6
4

1
5

TLBI IPAS2LE1, Xt
7

0

TLBI VAE2, Xt

1

TLBI ALLE1

4

TLBI VALE2, Xt

5

TLBI VMALLS12E1

6

ARM DDI 0487A.a
ID090413

Accessible from EL2 or higher.

0
1

TLBI ALLE2

Accessible from EL1 or higher.

5

TLBI VAE2IS, Xt

TLBI IPAS2E1, Xt

Accessible from EL1 or higher.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-239

C4 The AArch64 System Instruction Class
C4.2 The System instruction class encoding space

Table C4-4 TLB maintenance instructions (continued)
Access instruction encoding
Instruction

Notes

TLBI ALLE3IS

Op1

CRn

CRm

Op2

6

8

3

0

TLBI VAE3IS, Xt

1

TLBI VALE3IS, Xt

5

TLBI ALLE3

7

Accessible only from EL3.

0

TLBI VAE3, Xt

1

TLBI VALE3, Xt

5

For more information about these instructions, see TLB maintenance instructions on page D5-1808.

C4.2.5

Op0==0b10, Moves to and from debug, trace, and Execution environment System registers
The instructions that move data to and from the debug, Execution environment, and trace system registers are
encoded with Op0== 0b10. This means the encoding of these instructions is:
31 30 29 28 27 26 25 24 23 22 21 20 19 18
16 15
12 11
8 7
5 4
1 1 0 1 0 1 0 1 0 0 L 1 0
Op1
CRn
CRm
Op2

0
Rt

Op0

Note
These encodings access the registers that are equivalent to the AArch32 CP14 registers.
The value of Op1 provides the next level of decode of these instructions, as follows:
Op1 == {0, 3, 4}
Debug. See Instructions for accessing debug System registers
Op1 == 1

Trace. See the appropriate trace architecture specification.

Op1 == 2

Execution environment. See Instructions for accessing AArch32 Execution environment registers
on page C4-241.

Instructions for accessing debug System registers
The instructions for accessing debug System registers are:
MSR , Xt
MRS Xt, 

; Write to System register
; Read from System register

Where  is the register name, for example MDCCSR_EL0.
This section includes only the System register access encodings for which both:
Op0 is 0b10.
•
•
The value of Op1 is one of {0, 3, 4}.

Note
These encodings access the registers that are equivalent to the AArch32 CP14 registers.

C4-240

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.2 The System instruction class encoding space

Table C4-5 shows the mapping of the System register encodings for debug System register access.
Table C4-5 System instruction encodings for debug System register access
Access instruction encoding
Register

OSDTRRX_EL1

Permitted
accesses

Op1

CRn

CRm

Op2

0

0

0

2

RW

2

0

RW

2

RW

MDCCINT_EL1
MDSCR_EL1
OSDTRTX_EL1

3

2

RW

OSECCR_EL1

6

2

RW

DBGBVR_EL1

0-15a

4

RW

DBGBCR_EL1

0-15a

5

RW

DBGWVR_EL1

0-15a

6

RW

DBGWCR_EL1

0-15a

7

RW

0

0

RO

4

WO

MDRAR_EL1

1

OSLAR_EL1
OSLSR_EL1

1

4

RO

OSDLR_EL1

3

4

RW

DBGPRCR_EL1

4

4

RW

8

6

RW

DBGCLAIMCLR_EL1

9

6

RW

DBGAUTHSTATUS_EL1

14

6

RO

1

0

RO

DBGDTR_EL0

4

0

RW

DBGDTRRX_EL0

5

0

RO

DBGCLAIMSET_EL1

MDCCSR_EL0

7

3

0

DBGDTRTX_EL0
DBGVCR32_EL2

WO
4

0

7

0

RW

a. Unimplemented breakpoint and watchpoint register access instructions are UNALLOCATED. If EL2 is not
implemented or breakpoint n is not context-aware, DBGBXVRn_ EL1 is unallocated. CRm encodes n, the
breakpoint or watchpoint number.

For more information see Mapping of the System registers between the Execution states on page D1-1545.

Instructions for accessing AArch32 Execution environment registers
The instructions for accessing the deprecated and OPTIONAL AArch32 Execution environment registers are:
MSR , Xt
MRS Xt, 

; Write to System register
; Read from System register

Where  is the register name, for example TEECR32_EL1.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-241

C4 The AArch64 System Instruction Class
C4.2 The System instruction class encoding space

This section includes only the System register access encodings for which both:
Op0 is 0b10.
•
•
The value of Op1 is 2.

Note
These encodings access the registers that are equivalent to the AArch32 CP14 registers.
Table C4-6 shows the mapping of the System register encodings for AArch32 Execution environment register
access.
Table C4-6 System instruction encodings for AArch32 Execution environment register access
Access instruction encoding
Register

Notes
Op1

CRn

CRm

Op2

The following registers are defined to allow access from AArch64 state to registers that are only used in AArch32 state
TEECR32_EL1

2

TEEHBR32_EL1

C4.2.6

0

0

0

RW.

1

0

0

If EL0 cannot use AArch32, this register is UNDEFINED.

Op0==0b11, Moves to and from non-debug System registers and special-purpose registers
The instructions that move data to and from non-debug system registers are encoded with Op0== 0b11, except that
some of this encoding space is reserved for IMPLEMENTATION DEFINED functionality. The encoding of these
instructions is:
31 30 29 28 27 26 25 24 23 22 21 20 19 18
16 15
12 11
8 7
5 4
1 1 0 1 0 1 0 1 0 0 L 1 1
Op1
CRn
CRm
Op2

0
Rt

Op0

The value of CRn provides the next level of decode of these instructions, as follows:
CRn =={0, 1, 2, 3, 5, 6, 7, 9, 10, 12, 13, 14}

See Instructions for accessing non-debug System registers.
CRn ==4

See Instructions for accessing special-purpose registers on page C4-248.

CRn=={11, 15} See Reserved control space for IMPLEMENTATION DEFINED functionality on page C4-250.

Instructions for accessing non-debug System registers
The A64 instructions for accessing System registers are:
MSR , Xt
MRS Xt, 

; Write to System register
; Read from System register

Where  is the register name, for example MIDR_EL1.
This section includes only the System register access encodings for which both:
Op0 is 0b11.
•
•
The value of CRn is one of {0, 1, 2, 3, 5, 6, 7, 9, 10, 12, 13, 14}.

Note
These encodings access the registers that are equivalent to the AArch32 CP15 registers.

C4-242

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.2 The System instruction class encoding space

The instruction encoding for these accesses is:
31 30 29 28 27 26 25 24 23 22 21 20 19 18
16 15
12 11
8 7
5 4
1 1 0 1 0 1 0 1 0 0 L 1 1
Op1
CRn
CRm
Op2

0
Rt

Op0
See text for permitted values of CRn

Table C4-7 shows the encodings of the register access instructions. For these registers, CRn often indicates register
grouping, and therefore CRn is given as the first column of the encoding. Registers appended with [63:0] are 64-bit
registers. All other registers are 32-bit registers for which bits [63:32] of the 64-bit register value are RES0.
Table C4-7 System instruction encodings for System register accesses
Access instruction encoding
Register accessed

Notes
CRn

Op1

CRm

Op2

0

0

0

0

RO.

MPIDR_EL1[63:0]

5

RO.

REVIDR_EL1

6

RO.

0

RO, but RAZ if AArch32 is not implemented.

MIDR_EL1

ID_PFR0_EL1

1

ID_PFR1_EL1

1

ID_DFR0_EL1

2

ID_AFR0_EL1

3

ID_MMFR0_EL1

4

ID_MMFR1_EL1

5

ID_MMFR2_EL1

6

ID_MMFR3_EL1

7

ID_ISAR0_EL1

0

0

2

0

ID_ISAR1_EL1

1

ID_ISAR2_EL1

2

ID_ISAR3_EL1

3

ID_ISAR4_EL1

4

ID_ISAR5_EL1

5

ARM DDI 0487A.a
ID090413

RO, but RAZ if AArch32 is not implemented.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-243

C4 The AArch64 System Instruction Class
C4.2 The System instruction class encoding space

Table C4-7 System instruction encodings for System register accesses (continued)
Access instruction encoding
Register accessed

MVFR0_EL1

Notes
CRn

Op1

CRm

Op2

0

0

3

0

RO.

MVFR1_EL1

1

MVFR2_EL1

2

Reserved, RAZ

n

For n=3-7.

0

RO.

ID_AA64PFR1_EL1

1

RO.

Reserved, RAZ

n

For n=2-7.

0

RO.

ID_AA64DFR1_EL1

1

RO.

ID_AA64AFR0_EL1

4

RO.

ID_AA64AFR1_EL1

5

RO.

Reserved, RAZ

n

For n={2, 3, 6, 7}.

0

RO.

ID_AA64ISAR1_EL1

1

RO.

Reserved, RAZ

n

For n=2-7.

0

RO.

ID_AA64MMFR1_EL1

1

RO.

Reserved, RAZ

n

For n=2-7.

0

RO.

ID_AA64PFR0_EL1

4

ID_AA64DFR0_EL1

5

ID_AA64ISAR0_EL1

6

ID_AA64MMFR0_EL1

CCSIDR_EL1

7

0

1

0

CLIDR_EL1

1

AIDR_EL1

7

CSSELR_EL1

0

2

0

0

RW.

CTR_EL0

0

3

0

1

RO and configurable to enable access at EL0.

7

RO.

0

RW.

DCZID_EL0
VPIDR_EL2

0

4

0

VMPIDR_EL2[63:0]
SCTLR_EL1

5
0

RW.

ACTLR_EL1

1

IMPLEMENTATION DEFINED.

CPACR_EL1

2

Floating-point and Advanced SIMD only.

C4-244

1

0

0

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.2 The System instruction class encoding space

Table C4-7 System instruction encodings for System register accesses (continued)
Access instruction encoding
Register accessed

SCTLR_EL2

Notes
CRn

Op1

CRm

Op2

1

4

0

0

RW.

1

IMPLEMENTATION DEFINED.

0

RW.

ACTLR_EL2
HCR_EL2[63:0]

1

MDCR_EL2

1

CPTR_EL2

2

Floating-point and Advanced SIMD only.

HSTR_EL2

3

RW.

HACR_EL2

7

IMPLEMENTATION DEFINED.

0

RW.

1

IMPLEMENTATION DEFINED.

0

RW.

2

Floating-point and Advanced SIMD only.

3

1

RW.

0

0

RW.

SCTLR_EL3

1

6

0

ACTLR_EL3
SCR_EL3

1

CPTR_EL3
MDCR_EL3
TTBR0_EL1[63:0]

2

0

TTBR1_EL1[63:0]

1

TCR_EL1[63:0]

2

TTBR0_EL2[63:0]

2

4

0

TCR_EL2
1

VTCR_EL2
2

6

0

5

0

1

5

4

0

IMPLEMENTATION DEFINED.

0

RW.

1

0

IMPLEMENTATION DEFINED.

1

ESR_EL2
5

6

2

0

RW.

1

0

IMPLEMENTATION DEFINED.

AFSR1_EL3

1

ESR_EL3

ARM DDI 0487A.a
ID090413

RW.

2

AFSR1_EL2

FAR_EL1[63:0]

0

1

ESR_EL1

AFSR0_EL3

RW.

2

AFSR1_EL1

AFSR0_EL2

0
2

TCR_EL3
AFSR0_EL1

RW.

2

VTTBR_EL2[63:0]

TTBR0_EL3[63:0]

0

6

0

2

0

RW.

0

0

RW.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-245

C4 The AArch64 System Instruction Class
C4.2 The System instruction class encoding space

Table C4-7 System instruction encodings for System register accesses (continued)
Access instruction encoding
Register accessed

FAR_EL2[63:0]

Notes
CRn

Op1

CRm

Op2

6

4

0

0

HPFAR_EL2[63:0]

RW.

4

FAR_EL3[63:0]

6

6

0

0

RW.

PAR_EL1[63:0]

7

0

4

0

RW.

PMINTENSET_EL1

9

0

14

1

RW

2

RW

0

Configurable whether accesses at EL0 are permitted.

PMINTENCLR_EL1
PMCR_EL0

3

12

PMCNTENSET_EL0

1

PMCNTENCLR_EL0

2

PMOVSCLR_EL0

3

PMSWINC_EL0

4

WO. Configurable whether accesses at EL0 are permitted.

PMSELR_EL0

5

Configurable whether accesses at EL0 are permitted.

PMCEID0_EL0

6

RO. Configurable whether accesses at EL0 are permitted.

PMCEID1_EL0

7

PMCCNTR_EL0

13

0

PMXEVTYPER_EL0

1

PMXEVCNTR_EL0

2

PMUSERENR_EL0

14

0

RO at EL0 but can be written at other Exception levels

3

Configurable whether accesses at EL0 are permitted.

{8-10}

{0-7}

11

{0-6}

CRm and op2 encode n, the counter number. Configurable
whether accesses at EL0 are permitted.

{12-14}

{0-7}

15

{0-6}

PMOVSSET_EL0
PMEVCNTR_EL0

14

3

PMEVTYPER_EL0

PMCCFILTR_EL0
MAIR_EL1[63:0]

10

0

AMAIR_EL1[63:0]
MAIR_EL2[63:0]

10

4

AMAIR_EL2[63:0]
MAIR_EL3[63:0]
AMAIR_EL3[63:0]

C4-246

10

6

Configurable whether accesses at EL0 are permitted.

7

Configurable whether accesses at EL0 are permitted.

2

0

RW.

3

0

IMPLEMENTATION DEFINED.

2

0

RW.

3

0

IMPLEMENTATION DEFINED.

2

0

RW.

3

0

IMPLEMENTATION DEFINED.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.2 The System instruction class encoding space

Table C4-7 System instruction encodings for System register accesses (continued)
Access instruction encoding
Register accessed

Notes
CRn

Op1

CRm

Op2

12

0

0

0

RW.

RVBAR_EL1[63:0]

1

RO. Implemented only if EL2 and EL3 are not
implemented.

RMR_EL1[63:0]

2

Implemented only if both of the following conditions
apply:
•
EL1 is capable of using AArch32 and AArch64
•
EL2 and EL3 are not implemented.

1

0

RO.

0

0

RW.

RVBAR_EL2[63:0]

1

RO. Implemented only if EL3 is not implemented.

RMR_EL2[63:0]

2

Implemented only if both of the following conditions
apply:
•
EL2 is capable of using AArch32 and AArch64
•
EL3 is not implemented.

0

RW.

RVBAR_EL3[63:0]

1

RO.

RMR_EL3[63:0]

2

Implemented only if EL3 can use both AArch32 and
AArch64.

1

RW.

VBAR_EL1[63:0]

ISR_EL1
VBAR_EL2[63:0]

VBAR_EL3[63:0]

CONTEXTIDR_EL1

12

12

13

4

6

0

0

0

TPIDR_EL1[63:0]
TPIDR_EL0[63:0]

4
13

3

0

TPIDRRO_EL0[63:0]

2

RW.

3

TPIDR_EL2[63:0]

13

4

0

2

RW.

TPIDR_EL3[63:0]

13

6

0

2

RW.

14

0

1

0

RW.

Timer registers
CNTKCTL_EL1

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-247

C4 The AArch64 System Instruction Class
C4.2 The System instruction class encoding space

Table C4-7 System instruction encodings for System register accesses (continued)
Access instruction encoding
Register accessed

Notes
CRn

Op1

CRm

Op2

14

3

0

0

RO at EL1 but can be written at the highest Exception
Level implemented. Configurable to enable access at EL0.

CNTPCT_EL0[63:0]

1

RO. Configurable whether accesses at EL0 are permitted.

CNTVCT_EL0[63:0]

2

CNTFRQ_EL0

CNTP_TVAL_EL0

2

0

CNTP_CTL_EL0

1

CNTP_CVAL_EL0[63:0]

2

CNTV_TVAL_EL0

3

0

CNTV_CTL_EL0

1

CNTV_CVAL_EL0[63:0]

2

Configurable whether accesses at EL0 are permitted.

Configurable whether accesses at EL0 are permitted.

CNTHCTL_EL2

14

4

1

0

RW.

CNTHP_TVAL_EL2

14

4

2

0

RW.

CNTHP_CTL_EL2

1

CNTHP_CVAL_EL2[63:0]

2

CNTPS_TVAL_EL1

14

7

2

0

CNTPS_CTL_EL1

1

CNTPS_CVAL_EL1[63:0]

2

Accessible at EL3. Configurable whether Secure accesses
at EL1 are permitted.

The following registers are defined to allow access from AArch64 state to registers that are only used in AArch32 state
SDER32_EL3

1

6

1

1

If EL1 cannot use AArch32, this register is UNDEFINED.

DACR32_EL2

3

4

0

0

If EL1 cannot use AArch32, this register is UNDEFINED.

IFSR32_EL2

5

4

0

1

If EL1 cannot use AArch32, this register is UNDEFINED.

3

0

If EL1 cannot use AArch32, this register is UNDEFINED.

FPEXC32_EL2

Instructions for accessing special-purpose registers
The A64 instructions for accessing special-purpose registers are:
MSR , Xt
MRS Xt, 

; Write to special-purpose register
; Read from special-purpose register

For these accesses, CRn has the value 4. The encoding for special-purpose register accesses is:
31 30 29 28 27 26 25 24 23 22 21 20 19 18
16 15
12 11
8 7
5 4
1 1 0 1 0 1 0 1 0 0 L 1 1
Op1
0 1 0 0
CRm
Op2
Op0

C4-248

0
Rt

CRn

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.2 The System instruction class encoding space

Table C4-8 lists the encodings for Op1, CRm, and Op2 fields for accesses to the special-purpose registers in AArch64.
Table C4-8 Special-purpose register accesses
Access instruction encoding:
Register

Notes

SPSR_EL1

Op1

CRm

Op2

0

0

0

ELR_EL1

Accessible from EL1 or higher.

1

SP_EL0

1

0

Accessible from EL1 or higher. If SP_EL0 is the current stack
pointer then the access is UNDEFINED.

SPSel

2

0

Accessible from EL1 or higher.

2

RO. Accessible from EL1 or higher.

1

Configurable whether accesses at EL0 are permitted.

0

Accessible from EL0 or higher.

0

Accessible from EL0 or higher.

CurrentEL
DAIF

3

2

NZCV
FPCR

4

FPSR

1

DSPSR_EL0

5

DLR_EL0

0

Accessible only in Debug state, from EL0 or higher.

1

SPSR_EL2

4

0

ELR_EL2

0

Accessible from EL2 or higher.

1

SP_EL1

1

0

SPSR_irq

3

0

SPSR_abt

1

SPSR_und

2

SPSR_fiq

3

SPSR_EL3

6

0

ELR_EL3

0

Accessible from EL3 or higher.

1

SP_EL2

1

0

For the accesses to the special-purpose registers shown in Table C4-8:

ARM DDI 0487A.a
ID090413

•

Any write to the FPCR must be synchronized, by a Context synchronization operation, before its effect on
subsequent instructions can be relied upon.

•

All other reads and writes to the registers appear to occur in program order relative to other instructions.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-249

C4 The AArch64 System Instruction Class
C4.2 The System instruction class encoding space

C4.2.7

Reserved control space for IMPLEMENTATION DEFINED functionality
The A64 instruction set reserves the following space for IMPLEMENTATION DEFINED instructions:
31 30 29 28 27 26 25 24 23 22 21 20 19 18
1 1 0 1 0 1 0 1 0 0 L 0 1

16 15
12 11
1 x 1 1

IMPLEMENTATION DEFINED

5 4

0
Rt

IMPLEMENTATION DEFINED

The value of L defines the use of Rt as follows:
0

Rt is an argument supplied to the instruction.

1

Rt is a result returned by the instruction.

The A64 instruction set reserves the following space for IMPLEMENTATION DEFINED registers:
31 30 29 28 27 26 25 24 23 22 21 20 19 18
1 1 0 1 0 1 0 1 0 0 L 1 1

16 15
12 11
1 x 1 1

IMPLEMENTATION DEFINED

5 4

0
Rt

IMPLEMENTATION DEFINED

The value of L defines the access type and the use of Rt as follows:
0
Write the value in Rt to the IMPLEMENTATION DEFINED register.
L==1
Read the value of the IMPLEMENTATION DEFINED register to Rt.

C4-250

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

C4.3

PSTATE and special purpose registers
This section describes the following registers:
•
CurrentEL, that software can read to determine the current Exception level.
•
DAIF, that specifies the current interrupt mask bits.
•
dlr_el0, that holds the address to return to for a return from Debug state.
•
dspsr_el0, that holds process state on entry to Debug state.
•
ELR_EL1, that holds the address to return to for an exception return from EL1.
•
ELR_EL2, that holds the address to return to for an exception return from EL2.
•
ELR_EL3, that holds the address to return to for an exception return from EL3.
•
FPCR, that provides control of floating-point operation.
•
FPSR, that provides floating-point status information.
•
NZCV, that holds the condition flags.
•
SP_EL0, that holds the stack pointer for EL0.
•
SP_EL1, that holds the stack pointer for EL1.
•
SP_EL2, that holds the stack pointer for EL2.
•
SP_EL3, that holds the stack pointer for EL3.
•
SPSel, that at EL1 or higher selects between the SP for the current Exception level and SP_EL0.
•
SPSR_abt, that holds process state on taking an exception to AArch32 Abort mode.
•
SPSR_EL1, that holds process state on taking an exception to AArch64 EL1.
•
SPSR_EL2, that holds process state on taking an exception to AArch64 EL2.
•
SPSR_EL3, that holds process state on taking an exception to AArch64 EL3.
•
SPSR_fiq, that holds process state on taking an exception to AArch32 FIQ mode.
•
SPSR_irq, that holds process state on taking an exception to AArch32 IRQ mode.
•
SPSR_und, that holds process state on taking an exception to AArch32 Undefined mode.
The PSRs hold the PE state from immediately before taking the exception or entering Debug state. This means they
hold the state required for the return from Debug state, or for the exception return.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-251

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

C4.3.1

CurrentEL, Current Exception Level
The CurrentEL characteristics are:
Purpose
Holds the current exception level.
This register is part of the Process state registers functional group.
Usage constraints
This register is accessible as shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

RO

RO

RO

RO

RO

A write to the CurrentEL register is UNDEFINED.
Configurations
There are no configuration notes.
Attributes
CurrentEL is a 32-bit register.
The CurrentEL bit assignments are:

31

4 3 2 1 0
RES0

EL

RES0

Bits [31:4]
Reserved, RES0.
EL, bits [3:2]
Current exception level. Possible values of this field are:
00

EL0

01

EL1

10

EL2

11

EL3

Resets to an IMPLEMENTATION DEFINED value.
Bits [1:0]
Reserved, RES0.

Accessing the CurrentEL:
To access the CurrentEL:
MRS , CurrentEL ; Read CurrentEL into Xt

C4-252

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

Register access is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

11

000

0100

0010

010

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-253

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

C4.3.2

DAIF, Interrupt Mask Bits
The DAIF characteristics are:
Purpose
Allows access to the interrupt mask bits.
This register is part of the Process state registers functional group.
Usage constraints
This register is accessible as shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

Config-RW

RW

RW

RW

RW

RW

This register is accessible at EL0 when SCTLR_EL1.UMA is set to 1.
Configurations
There are no configuration notes.
Attributes
DAIF is a 32-bit register.
The DAIF bit assignments are:

31

10 9 8 7 6 5
RES0

D A I F

0
RES0

Bits [31:10]
Reserved, RES0.
D, bit [9]
Process state D mask. The possible values of this bit are:
0

Debug exceptions from Watchpoint, Breakpoint, and Software step debug events
targeted at the current exception level are not masked.

1

Debug exceptions from Watchpoint, Breakpoint, and Software step debug events
targeted at the current exception level are masked.

When the target exception level of the debug exception is not than the current exception level, the
exception is not masked by this bit.
Resets to 1.
A, bit [8]
SError (System Error) mask bit. The possible values of this bit are:
0

Exception not masked.

1

Exception masked.

Resets to 1.
I, bit [7]
IRQ mask bit. The possible values of this bit are:

C4-254

0

Exception not masked.

1

Exception masked.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

Resets to 1.
F, bit [6]
FIQ mask bit. The possible values of this bit are:
0

Exception not masked.

1

Exception masked.

Resets to 1.
Bits [5:0]
Reserved, RES0.

Accessing the DAIF:
To access the DAIF:
MRS , DAIF ; Read DAIF into Xt
MSR DAIF,  ; Write Xt to DAIF

Register access is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

11

011

0100

0010

001

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-255

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

C4.3.3

DLR_EL0, Debug Link Register
The DLR_EL0 characteristics are:
Purpose
In Debug state, holds the address to restart from.
This register is part of:
•
the Debug registers functional group
•
the Special purpose registers functional group.
Usage constraints
This register is accessible as shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

RW

RW

RW

RW

RW

RW

Access to this register is from Debug state only. During normal execution this register is
UNALLOCATED.
Configurations
DLR_EL0[31:0] is architecturally mapped to AArch32 register DLR.
Attributes
DLR_EL0 is a 64-bit register.
DLR_EL0 is a member of multiple register groups and is defined elsewhere. For the full definition, see DLR_EL0,
Debug Link Register on page D8-2103.

C4-256

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

C4.3.4

DSPSR_EL0, Debug Saved Program Status Register
The DSPSR_EL0 characteristics are:
Purpose
Holds the saved processor state on entry to Debug state.
This register is part of:
•
the Debug registers functional group
•
the Special purpose registers functional group.
Usage constraints
This register is accessible as shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

RW

RW

RW

RW

RW

RW

Access to this register is from Debug state only. During normal execution this register is
UNALLOCATED.
Configurations
DSPSR_EL0 is architecturally mapped to AArch32 register DSPSR.
Attributes
DSPSR_EL0 is a 32-bit register.
DSPSR_EL0 is a member of multiple register groups and is defined elsewhere. For the full definition, see
DSPSR_EL0, Debug Saved Program Status Register on page D8-2104.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-257

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

C4.3.5

ELR_EL1, Exception Link Register (EL1)
The ELR_EL1 characteristics are:
Purpose
When taking an exception to EL1, holds the address to return to.
This register is part of the Special purpose registers functional group.
Usage constraints
This register is accessible as shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

RW

RW

RW

RW

RW

Configurations
There are no configuration notes.
Attributes
ELR_EL1 is a 64-bit register.
The ELR_EL1 bit assignments are:

63

0
Return address

Bits [63:0]
Return address.

Accessing the ELR_EL1:
To access the ELR_EL1:
MRS , ELR_EL1 ; Read ELR_EL1 into Xt
MSR ELR_EL1,  ; Write Xt to ELR_EL1

Register access is encoded as follows:

C4-258

op0

op1

CRn

CRm

op2

11

000

0100

0000

001

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

C4.3.6

ELR_EL2, Exception Link Register (EL2)
The ELR_EL2 characteristics are:
Purpose
When taking an exception to EL2, holds the address to return to.
This register is part of the Special purpose registers functional group.
Usage constraints
This register is accessible as shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

-

-

RW

RW

RW

When EL2 is using AArch32 and an exception is taken from EL0, EL1, or EL2 to EL3 and AArch64
execution, the upper 32-bits of ELR_EL2 are either set to 0 or hold the same value that they did
before AArch32 execution. The choice between these two options is determined by an
implementation, and might vary dynamically within an implementation. Correspondingly software
must regard the value as being an UNKNOWN choice between the two values.
Configurations
ELR_EL2 is architecturally mapped to AArch32 register ELR_hyp.
Attributes
ELR_EL2 is a 64-bit register.
The ELR_EL2 bit assignments are:

63

0
Return address

Bits [63:0]
Return address.

Accessing the ELR_EL2:
To access the ELR_EL2:
MRS , ELR_EL2 ; Read ELR_EL2 into Xt
MSR ELR_EL2,  ; Write Xt to ELR_EL2

Register access is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

11

100

0100

0000

001

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-259

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

C4.3.7

ELR_EL3, Exception Link Register (EL3)
The ELR_EL3 characteristics are:
Purpose
When taking an exception to EL3, holds the address to return to.
This register is part of the Special purpose registers functional group.
Usage constraints
This register is accessible as shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

-

-

-

RW

RW

Configurations
There are no configuration notes.
Attributes
ELR_EL3 is a 64-bit register.
The ELR_EL3 bit assignments are:

63

0
Return address

Bits [63:0]
Return address.

Accessing the ELR_EL3:
To access the ELR_EL3:
MRS , ELR_EL3 ; Read ELR_EL3 into Xt
MSR ELR_EL3,  ; Write Xt to ELR_EL3

Register access is encoded as follows:

C4-260

op0

op1

CRn

CRm

op2

11

110

0100

0000

001

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

C4.3.8

FPCR, Floating-point Control Register
The FPCR characteristics are:
Purpose
Controls floating-point extension behavior.
This register is part of:
•
the Special purpose registers functional group
•
the Floating-point registers functional group.
Usage constraints
This register is accessible as shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

RW

RW

RW

RW

RW

RW

Configurations
The named fields in this register map to the equivalent fields in the AArch32 FPSCR.
It is IMPLEMENTATION DEFINED whether the Len and Stride fields can be programmed to non-zero
values, which will cause some AArch32 floating-point instruction encodings to be UNDEFINED, or
whether these fields are RAZ.
Attributes
FPCR is a 32-bit register.
The FPCR bit assignments are:

31

27 26 25 24 23 22 21 20 19 18
RES0

FZ

Stride

16 15 14 13 12 11 10 9 8 7
Len

0
RES0

AHP
DN
RMode
RES0

IOE
DZE
OFE
UFE
IXE
RES0
IDE

Bits [31:27]
Reserved, RES0.
AHP, bit [26]
Alternative half-precision control bit:
0

IEEE half-precision format selected.

1

Alternative half-precision format selected.

DN, bit [25]
Default NaN mode control bit:
0

NaN operands propagate through to the output of a floating-point operation.

1

Any operation involving one or more NaNs returns the Default NaN.

The value of this bit controls both scalar and Advanced SIMD floating-point arithmetic.
ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-261

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

FZ, bit [24]
Flush-to-zero mode control bit:
0

Flush-to-zero mode disabled. Behavior of the floating-point system is fully compliant
with the IEEE 754 standard.

1

Flush-to-zero mode enabled.

The value of this bit controls both scalar and Advanced SIMD floating-point arithmetic.
RMode, bits [23:22]
Rounding Mode control field. The encoding of this field is:
00

Round to Nearest (RN) mode

01

Round towards Plus Infinity (RP) mode

10

Round towards Minus Infinity (RM) mode

11

Round towards Zero (RZ) mode.

The specified rounding mode is used by both scalar and Advanced SIMD floating-point
instructions.
Stride, bits [21:20]
This field is ignored during AArch64 execution.
Bit [19]
Reserved, RES0.
Len, bits [18:16]
This field is ignored during AArch64 execution.
IDE, bit [15]
Input Denormal exception trap enable. Possible values are:
0

Untrapped exception handling selected. If the floating-point exception occurs then the
FPSR.IDC bit is set to 1.

1

Trapped exception handling selected. If the floating-point exception occurs, the PE does
not update the FPSR.IDC bit. The trap handling software can decide whether to set the
FPSR.IDC bit to 1.

The value of this bit controls both scalar and Advanced SIMD floating-point arithmetic.
If the implementation does not support this exception, this bit is RES0.
Bits [14:13]
Reserved, RES0.
IXE, bit [12]
Inexact exception trap enable. Possible values are:
0

Untrapped exception handling selected. If the floating-point exception occurs then the
FPSR.IXC bit is set to 1.

1

Trapped exception handling selected. If the floating-point exception occurs, the PE does
not update the FPSR.IXC bit. The trap handling software can decide whether to set the
FPSR.IXC bit to 1.

The value of this bit controls both scalar and Advanced SIMD floating-point arithmetic.
If the implementation does not support this exception, this bit is RES0.
UFE, bit [11]
Underflow exception trap enable. Possible values are:
0

C4-262

Untrapped exception handling selected. If the floating-point exception occurs then the
FPSR.UFC bit is set to 1.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

1

Trapped exception handling selected. If the floating-point exception occurs, the PE does
not update the FPSR.UFC bit. The trap handling software can decide whether to set the
FPSR.UFC bit to 1.

The value of this bit controls both scalar and Advanced SIMD floating-point arithmetic.
If the implementation does not support this exception, this bit is RES0.
OFE, bit [10]
Overflow exception trap enable. Possible values are:
0

Untrapped exception handling selected. If the floating-point exception occurs then the
FPSR.OFC bit is set to 1.

1

Trapped exception handling selected. If the floating-point exception occurs, the PE does
not update the FPSR.OFC bit. The trap handling software can decide whether to set the
FPSR.OFC bit to 1.

The value of this bit controls both scalar and Advanced SIMD floating-point arithmetic.
If the implementation does not support this exception, this bit is RES0.
DZE, bit [9]
Division by Zero exception trap enable. Possible values are:
0

Untrapped exception handling selected. If the floating-point exception occurs then the
FPSR.DZC bit is set to 1.

1

Trapped exception handling selected. If the floating-point exception occurs, the PE does
not update the FPSR.DZC bit. The trap handling software can decide whether to set the
FPSR.DZC bit to 1.

The value of this bit controls both scalar and Advanced SIMD floating-point arithmetic.
If the implementation does not support this exception, this bit is RES0.
IOE, bit [8]
Invalid Operation exception trap enable. Possible values are:
0

Untrapped exception handling selected. If the floating-point exception occurs then the
FPSR.IOC bit is set to 1.

1

Trapped exception handling selected. If the floating-point exception occurs, the PE does
not update the FPSR.IOC bit. The trap handling software can decide whether to set the
FPSR.IOC bit to 1.

The value of this bit controls both scalar and Advanced SIMD floating-point arithmetic.
If the implementation does not support this exception, this bit is RES0.
Bits [7:0]
Reserved, RES0.

Accessing the FPCR:
To access the FPCR:
MRS , FPCR ; Read FPCR into Xt
MSR FPCR,  ; Write Xt to FPCR

Register access is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

11

011

0100

0100

000

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-263

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

C4.3.9

FPSR, Floating-point Status Register
The FPSR characteristics are:
Purpose
Provides floating-point system status information.
This register is part of:
•
the Special purpose registers functional group
•
the Floating-point registers functional group.
Usage constraints
This register is accessible as shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

RW

RW

RW

RW

RW

RW

Configurations
The named fields in this register map to the equivalent fields in the AArch32 FPSCR.
Attributes
FPSR is a 32-bit register.
The FPSR bit assignments are:

31 30 29 28 27 26
N Z C V

8 7 6 5 4 3 2 1 0
RES0

QC

IOC
DZC
OFC
UFC
IXC
RES0
IDC

N, bit [31]
Negative condition flag for AArch32 floating-point comparison operations. AArch64 floating-point
comparisons set the PSTATE.N flag instead.
Z, bit [30]
Zero condition flag for AArch32 floating-point comparison operations. AArch64 floating-point
comparisons set the PSTATE.Z flag instead.
C, bit [29]
Carry condition flag for AArch32 floating-point comparison operations. AArch64 floating-point
comparisons set the PSTATE.C flag instead.
V, bit [28]
Overflow condition flag for AArch32 floating-point comparison operations. AArch64
floating-point comparisons set the PSTATE.V flag instead.

C4-264

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

QC, bit [27]
Cumulative saturation bit, Advanced SIMD only. This bit is set to 1 to indicate that an Advanced
SIMD integer operation has saturated since 0 was last written to this bit.
Bits [26:8]
Reserved, RES0.
IDC, bit [7]
Input Denormal cumulative exception bit. This bit is set to 1 to indicate that the Input Denormal
exception has occurred since 0 was last written to this bit.
How scalar and Advanced SIMD floating-point instructions update this bit depends on the value of
the FPCR.IDE bit. This bit is only set to 1 to indicate an exception if FPCR.IDE is 0, or if trapping
software sets it.
Bits [6:5]
Reserved, RES0.
IXC, bit [4]
Inexact cumulative exception bit. This bit is set to 1 to indicate that the Inexact exception has
occurred since 0 was last written to this bit.
How scalar and Advanced SIMD floating-point instructions update this bit depends on the value of
the FPCR.IXE bit. This bit is only set to 1 to indicate an exception if FPCR.IXE is 0, or if trapping
software sets it.
UFC, bit [3]
Underflow cumulative exception bit. This bit is set to 1 to indicate that the Underflow exception has
occurred since 0 was last written to this bit.
How scalar and Advanced SIMD floating-point instructions update this bit depends on the value of
the FPCR.UFE bit. This bit is only set to 1 to indicate an exception if FPCR.UFE is 0, or if trapping
software sets it.
OFC, bit [2]
Underflow cumulative exception bit. This bit is set to 1 to indicate that the Underflow exception has
occurred since 0 was last written to this bit.
How scalar and Advanced SIMD floating-point instructions update this bit depends on the value of
the FPCR.OFE bit. This bit is only set to 1 to indicate an exception if FPCR.OFE is 0, or if trapping
software sets it.
DZC, bit [1]
Division by Zero cumulative exception bit. This bit is set to 1 to indicate that the Division by Zero
exception has occurred since 0 was last written to this bit.
How scalar and Advanced SIMD floating-point instructions update this bit depends on the value of
the FPCR.DZE bit. This bit is only set to 1 to indicate an exception if FPCR.DZE is 0, or if trapping
software sets it.
IOC, bit [0]
Invalid Operation cumulative exception bit. This bit is set to 1 to indicate that the Invalid Operation
exception has occurred since 0 was last written to this bit.
How scalar and Advanced SIMD floating-point instructions update this bit depends on the value of
the FPCR.IOE bit. This bit is only set to 1 to indicate an exception if FPCR.IOE is 0, or if trapping
software sets it.

Accessing the FPSR:
To access the FPSR:

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-265

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

MRS , FPSR ; Read FPSR into Xt
MSR FPSR,  ; Write Xt to FPSR

Register access is encoded as follows:

C4-266

op0

op1

CRn

CRm

op2

11

011

0100

0100

001

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

C4.3.10

NZCV, Condition Flags
The NZCV characteristics are:
Purpose
Allows access to the condition flags.
This register is part of the Process state registers functional group.
Usage constraints
This register is accessible as shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

RW

RW

RW

RW

RW

RW

Configurations
There are no configuration notes.
Attributes
NZCV is a 32-bit register.
The NZCV bit assignments are:

31 30 29 28 27

0

N Z C V

RES0

N, bit [31]
Negative condition flag. Set to bit[31] of the result of the last flag-setting instruction. If the result is
regarded as a two's complement signed integer, then the processor sets N to 1 if the result was
negative, and sets N to 0 if it was positive or zero.
Z, bit [30]
Zero condition flag. Set to 1 if the result of the last flag-setting instruction was zero, and to 0
otherwise. A result of zero often indicates an equal result from a comparison.
C, bit [29]
Carry condition flag. Set to 1 if the last flag-setting instruction resulted in a carry condition, for
example an unsigned overflow on an addition.
V, bit [28]
Overflow condition flag. Set to 1 if the last flag-setting instruction resulted in an overflow condition,
for example a signed overflow on an addition.
Bits [27:0]
Reserved, RES0.

Accessing the NZCV:
To access the NZCV:
MRS , NZCV ; Read NZCV into Xt
MSR NZCV,  ; Write Xt to NZCV

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-267

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

Register access is encoded as follows:

C4-268

op0

op1

CRn

CRm

op2

11

011

0100

0010

000

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

C4.3.11

SP_EL0, Stack Pointer (EL0)
The SP_EL0 characteristics are:
Purpose
Holds the stack pointer if SPSel.SP is 0, or the stack pointer for EL0 if SPSel.SP is 1.
This register is part of the Special purpose registers functional group.
Usage constraints
This register is accessible as shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

RW

RW

RW

RW

RW

This register is also accessible at EL0 as the current stack pointer, and at any exception level as the
current stack pointer when SPSel.SP is 0.
If SPSel.SP is 0 (the stack pointer selected is SP_EL0) then any access to SP_EL0 using the MSR
or MRS instructions is UNDEFINED.
Configurations
There are no configuration notes.
Attributes
SP_EL0 is a 64-bit register.
The SP_EL0 bit assignments are:

63

0
Stack pointer

Bits [63:0]
Stack pointer.

Accessing the SP_EL0:
To access the SP_EL0:
MRS , SP_EL0 ; Read SP_EL0 into Xt
MSR SP_EL0,  ; Write Xt to SP_EL0

Register access is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

11

000

0100

0001

000

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-269

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

C4.3.12

SP_EL1, Stack Pointer (EL1)
The SP_EL1 characteristics are:
Purpose
Holds the stack pointer for EL1 if SPSel.SP is 1 (the stack pointer selected is SP_ELx).
This register is part of the Special purpose registers functional group.
Usage constraints
This register is accessible as shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

-

-

RW

RW

RW

This register is also accessible at EL1 as the current stack pointer when SPSel.SP is 1.
Configurations
There are no configuration notes.
Attributes
SP_EL1 is a 64-bit register.
The SP_EL1 bit assignments are:

63

0
Stack pointer

Bits [63:0]
Stack pointer.

Accessing the SP_EL1:
To access the SP_EL1:
MRS , SP_EL1 ; Read SP_EL1 into Xt
MSR SP_EL1,  ; Write Xt to SP_EL1

Register access is encoded as follows:

C4-270

op0

op1

CRn

CRm

op2

11

100

0100

0001

000

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

C4.3.13

SP_EL2, Stack Pointer (EL2)
The SP_EL2 characteristics are:
Purpose
Holds the stack pointer for EL2 if SPSel.SP is 1 (the stack pointer selected is SP_ELx).
This register is part of the Special purpose registers functional group.
Usage constraints
This register is accessible as shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

-

-

-

RW

RW

This register is also accessible at EL2 as the current stack pointer when SPSel.SP is 1.
Configurations
There are no configuration notes.
Attributes
SP_EL2 is a 64-bit register.
The SP_EL2 bit assignments are:

63

0
Stack pointer

Bits [63:0]
Stack pointer.

Accessing the SP_EL2:
To access the SP_EL2:
MRS , SP_EL2 ; Read SP_EL2 into Xt
MSR SP_EL2,  ; Write Xt to SP_EL2

Register access is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

11

110

0100

0001

000

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-271

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

C4.3.14

SP_EL3, Stack Pointer (EL3)
The SP_EL3 characteristics are:
Purpose
Holds the stack pointer for EL3 if SPSel.SP is 1 (the stack pointer selected is SP_ELx).
This register is part of the Special purpose registers functional group.
Usage constraints
Accessing this register depends on which field is being accessed; see the register field descriptions
for the states that they are accessible in.
This register is only accessible at EL3 as the current stack pointer when SPSel.SP is 1.
Configurations
There are no configuration notes.
Attributes
SP_EL3 is a 64-bit register.
The SP_EL3 bit assignments are:

63

0
Stack pointer

Bits [63:0]
Stack pointer.

C4-272

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

C4.3.15

SPSel, Stack Pointer Select
The SPSel characteristics are:
Purpose
Allows the Stack Pointer to be selected between SP_EL0 and SP_ELx.
This register is part of the Process state registers functional group.
Usage constraints
This register is accessible as shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

RW

RW

RW

RW

RW

Configurations
There are no configuration notes.
Attributes
SPSel is a 32-bit register.
The SPSel bit assignments are:

31

1 0
RES0

SP

Bits [31:1]
Reserved, RES0.
SP, bit [0]
Stack pointer to use. Possible values of this bit are:
0

Use SP_EL0 at all exception levels.

1

Use SP_ELx for exception level ELx.

Resets to 1.

Accessing the SPSel:
To access the SPSel:
MRS , SPSel ; Read SPSel into Xt
MSR SPSel,  ; Write Xt to SPSel

Register access is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

11

000

0100

0010

000

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-273

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

C4.3.16

SPSR_abt, Saved Program Status Register (Abort mode)
The SPSR_abt characteristics are:
Purpose
Holds the saved processor state when an exception is taken to Abort mode.
This register is part of the Special purpose registers functional group.
Usage constraints
This register is accessible as shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

-

-

RW

RW

RW

Configurations
SPSR_abt is architecturally mapped to AArch32 register SPSR_abt.
If EL1 does not support execution in AArch32, this register is RES0.
Attributes
SPSR_abt is a 32-bit register.
The SPSR_abt bit assignments are:

31 30 29 28 27 26 25 24 23
N Z C V Q

J

21 20 19

RES0

IL

16 15
GE

10 9 8 7 6 5 4 3
IT[7:2]

E A I F T

0
M[3:0]

IT[1:0]

M[4]

N, bit [31]
Set to the value of CPSR.N on taking an exception to Abort mode, and copied to CPSR.N on
executing an exception return operation in Abort mode.
Z, bit [30]
Set to the value of CPSR.Z on taking an exception to Abort mode, and copied to CPSR.Z on
executing an exception return operation in Abort mode.
C, bit [29]
Set to the value of CPSR.C on taking an exception to Abort mode, and copied to CPSR.C on
executing an exception return operation in Abort mode.
V, bit [28]
Set to the value of CPSR.V on taking an exception to Abort mode, and copied to CPSR.V on
executing an exception return operation in Abort mode.
Q, bit [27]
Cumulative saturation bit. Set to 1 to indicate that overflow or saturation occurred in some
instructions.
IT[1:0], bits [26:25]
If-Then execution state bits for the T32 IT (If-Then) instruction. See IT[7:2] for explanation of this
field.

C4-274

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

J, bit [24]
Jazelle bit. Along with the T bit, determines the AArch32 instruction set state that the exception was
taken from. Possible values of this bit are:
0

Processor in A32 state if T is 0, or T32 state if T is 1.

1

Processor in an invalid state (Jazelle state before ARMv8) if T is 0, or T32EE state if T
is 1.

Since the Jazelle state is obsolete in ARMv8, J==1 and T==0 is an invalid combination, and
attempting to perform an exception return with those values is an illegal exception return.
If T32EE is not implemented, this bit is RES0, so the possible values of the T bit signify either A32
or T32 state.
Bits [23:21]
Reserved, RES0.
IL, bit [20]
Illegal Execution State bit. Shows the value of PSTATE.IL immediately before the exception was
taken.
GE, bits [19:16]
Greater than or Equal flags, for parallel addition and subtraction.
IT[7:2], bits [15:10]
If-Then execution state bits for the T32 IT (If-Then) instruction. This field must be interpreted in
two parts.
•

IT[7:5] holds the base condition for the IT block. The base condition is the top 3 bits of the
condition code specified by the first condition field of the IT instruction.

•

IT[4:0] encodes the size of the IT block, which is the number of instructions that are to be
conditionally executed, by the position of the least significant 1 in this field. It also encodes
the value of the least significant bit of the condition code for each instruction in the block.

The IT field is 0b00000000 when no IT block is active.
E, bit [9]
Endianness execution state bit. Controls the load and store endianness for data accesses:
0

Little-endian operation

1

Big-endian operation.

Instruction fetches ignore this bit.
When the reset value of the SCTLR.EE bit is defined by a configuration input signal, that value also
applies to the CPSR.E bit on reset, and therefore applies to software execution from reset.
If an implementation does not provide Big-endian support, this bit is RES0. If it does not provide
Little-endian support, this bit is RES1.
If an implementation provides Big-endian support but only at EL0, this bit is RES0 for an exception
return to any exception level other than EL0.
Likewise, if it provides Little-endian support only at EL0, this bit is RES1 for an exception return
to any exception level other than EL0.
A, bit [8]
Asynchronous data abort mask bit. The possible values of this bit are:

ARM DDI 0487A.a
ID090413

0

Exception not masked.

1

Exception masked.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-275

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

I, bit [7]
IRQ mask bit. The possible values of this bit are:
0

Exception not masked.

1

Exception masked.

F, bit [6]
FIQ mask bit. The possible values of this bit are:
0

Exception not masked.

1

Exception masked.

T, bit [5]
Thumb execution state bit. Along with the J bit, determines the AArch32 instruction set state that
the exception was taken from. Possible values of this bit are:
0

Processor in A32 state if J is 0, or an invalid state (Jazelle state before ARMv8) if J is 1.

1

Processor in T32 state if J is 0, or T32EE state if J is 1.

Since the Jazelle state is obsolete in ARMv8, J==1 and T==0 is an invalid combination, and
attempting to perform an exception return with those values is an illegal exception return.
If T32EE is not implemented, the J bit is RES0, so the possible values of this bit signify either A32
or T32 state.
M[4], bit [4]
Register width that the exception was taken from. Possible values of this bit are:
Exception taken from AArch32.

1

M[3:0], bits [3:0]
Mode that an exception was taken from. For exceptions taken from AArch32, the possible values
are:
M[3:0]

Mode

0b0000

User

0b0001

FIQ

0b0010

IRQ

0b0011

Supervisor

0b0110

Monitor

0b0111

Abort

0b1010

Hyp

0b1011

Undefined

0b1111

System

Other values are reserved.

Accessing the SPSR_abt:
To access the SPSR_abt:
MRS , SPSR_abt ; Read SPSR_abt into Xt
MSR SPSR_abt,  ; Write Xt to SPSR_abt

C4-276

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

Register access is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

11

100

0100

0011

001

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-277

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

C4.3.17

SPSR_EL1, Saved Program Status Register (EL1)
The SPSR_EL1 characteristics are:
Purpose
Holds the saved processor state when an exception is taken to EL1.
This register is part of the Special purpose registers functional group.
Usage constraints
This register is accessible as shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

RW

RW

RW

RW

RW

Configurations
SPSR_EL1 is architecturally mapped to AArch32 register SPSR_svc.
Attributes
SPSR_EL1 is a 32-bit register.
The SPSR_EL1 bit assignments are:

When exception taken from AArch32:

31 30 29 28 27 26 25 24 23
N Z C V Q

J

21 20 19

RES0

IL

16 15
GE

10 9 8 7 6 5 4 3
IT[7:2]

E A I F T

0
M[3:0]

IT[1:0]

M[4]

N, bit [31]
Set to the value of CPSR.N on taking an exception to Supervisor mode, and copied to CPSR.N on
executing an exception return operation in Supervisor mode.
Z, bit [30]
Set to the value of CPSR.Z on taking an exception to Supervisor mode, and copied to CPSR.Z on
executing an exception return operation in Supervisor mode.
C, bit [29]
Set to the value of CPSR.C on taking an exception to Supervisor mode, and copied to CPSR.C on
executing an exception return operation in Supervisor mode.
V, bit [28]
Set to the value of CPSR.V on taking an exception to Supervisor mode, and copied to CPSR.V on
executing an exception return operation in Supervisor mode.
Q, bit [27]
Cumulative saturation bit. Set to 1 to indicate that overflow or saturation occurred in some
instructions.
IT[1:0], bits [26:25]
If-Then execution state bits for the T32 IT (If-Then) instruction. See IT[7:2] for explanation of this
field.

C4-278

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

J, bit [24]
Jazelle bit. Along with the T bit, determines the AArch32 instruction set state that the exception was
taken from. Possible values of this bit are:
0

Processor in A32 state if T is 0, or T32 state if T is 1.

1

Processor in an invalid state (Jazelle state before ARMv8) if T is 0, or T32EE state if T
is 1.

Since the Jazelle state is obsolete in ARMv8, J==1 and T==0 is an invalid combination, and
attempting to perform an exception return with those values is an illegal exception return.
If T32EE is not implemented, this bit is RES0, so the possible values of the T bit signify either A32
or T32 state.
Bits [23:21]
Reserved, RES0.
IL, bit [20]
Illegal Execution State bit. Shows the value of PSTATE.IL immediately before the exception was
taken.
GE, bits [19:16]
Greater than or Equal flags, for parallel addition and subtraction.
IT[7:2], bits [15:10]
If-Then execution state bits for the T32 IT (If-Then) instruction. This field must be interpreted in
two parts.
•

IT[7:5] holds the base condition for the IT block. The base condition is the top 3 bits of the
condition code specified by the first condition field of the IT instruction.

•

IT[4:0] encodes the size of the IT block, which is the number of instructions that are to be
conditionally executed, by the position of the least significant 1 in this field. It also encodes
the value of the least significant bit of the condition code for each instruction in the block.

The IT field is 0b00000000 when no IT block is active.
E, bit [9]
Endianness execution state bit. Controls the load and store endianness for data accesses:
0

Little-endian operation

1

Big-endian operation.

Instruction fetches ignore this bit.
When the reset value of the SCTLR.EE bit is defined by a configuration input signal, that value also
applies to the CPSR.E bit on reset, and therefore applies to software execution from reset.
If an implementation does not provide Big-endian support, this bit is RES0. If it does not provide
Little-endian support, this bit is RES1.
If an implementation provides Big-endian support but only at EL0, this bit is RES0 for an exception
return to any exception level other than EL0.
Likewise, if it provides Little-endian support only at EL0, this bit is RES1 for an exception return
to any exception level other than EL0.
A, bit [8]
Asynchronous data abort mask bit. The possible values of this bit are:

ARM DDI 0487A.a
ID090413

0

Exception not masked.

1

Exception masked.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-279

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

I, bit [7]
IRQ mask bit. The possible values of this bit are:
0

Exception not masked.

1

Exception masked.

F, bit [6]
FIQ mask bit. The possible values of this bit are:
0

Exception not masked.

1

Exception masked.

T, bit [5]
Thumb execution state bit. Along with the J bit, determines the AArch32 instruction set state that
the exception was taken from. Possible values of this bit are:
0

Processor in A32 state if J is 0, or an invalid state (Jazelle state before ARMv8) if J is 1.

1

Processor in T32 state if J is 0, or T32EE state if J is 1.

Since the Jazelle state is obsolete in ARMv8, J==1 and T==0 is an invalid combination, and
attempting to perform an exception return with those values is an illegal exception return.
If T32EE is not implemented, the J bit is RES0, so the possible values of this bit signify either A32
or T32 state.
M[4], bit [4]
Register width that the exception was taken from. Possible values of this bit are:
1

Exception taken from AArch32.

M[3:0], bits [3:0]
Mode that an exception was taken from. For exceptions taken from AArch32, the possible values
are:
M[3:0]

Mode

0b0000

User

0b0001

FIQ

0b0010

IRQ

0b0011

Supervisor

0b0111

Abort

0b1011

Undefined

0b1111

System

Other values are reserved.

C4-280

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

When exception taken from AArch64:

31 30 29 28 27

22 21 20 19

N Z C V

RES0

SS IL

10 9 8 7 6 5 4 3
RES0

D A I F

0
M[3:0]

M[4]
RES0

N, bit [31]
Set to the value of the N condition flag on taking an exception to EL1, and copied to the N condition
flag on executing an exception return operation in EL1.
Z, bit [30]
Set to the value of the Z condition flag on taking an exception to EL1, and copied to the Z condition
flag on executing an exception return operation in EL1.
C, bit [29]
Set to the value of the C condition flag on taking an exception to EL1, and copied to the C condition
flag on executing an exception return operation in EL1.
V, bit [28]
Set to the value of the V condition flag on taking an exception to EL1, and copied to the V condition
flag on executing an exception return operation in EL1.
Bits [27:22]
Reserved, RES0.
SS, bit [21]
Software step. Indicates whether software step was enabled when an exception was taken.
IL, bit [20]
Illegal Execution State bit. Shows the value of PSTATE.IL immediately before the exception was
taken.
Bits [19:10]
Reserved, RES0.
D, bit [9]
Process state D mask. The possible values of this bit are:
0

Debug exceptions from Watchpoint, Breakpoint, and Software step debug events
targeted at the current exception level are not masked.

1

Debug exceptions from Watchpoint, Breakpoint, and Software step debug events
targeted at the current exception level are masked.

When the target exception level of the debug exception is not than the current exception level, the
exception is not masked by this bit.
A, bit [8]
SError (System Error) mask bit. The possible values of this bit are:

ARM DDI 0487A.a
ID090413

0

Exception not masked.

1

Exception masked.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-281

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

I, bit [7]
IRQ mask bit. The possible values of this bit are:
0

Exception not masked.

1

Exception masked.

F, bit [6]
FIQ mask bit. The possible values of this bit are:
0

Exception not masked.

1

Exception masked.

Bit [5]
Reserved, RES0.
M[4], bit [4]
Register width that the exception was taken from. Possible values of this bit are:
Exception taken from AArch64.

0

M[3:0], bits [3:0]
Mode that an exception was taken from. For exceptions taken from AArch64, the possible values
are:
M[3:0]

Mode

0b0000

EL0t

0b0100

EL1t

0b0101

EL1h

Other values are reserved.
For exceptions from AArch64:
•

M[3:2] holds the Exception Level.

•

M[1] is unused, and returning to an exception level that is using AArch64 with this bit set is
treated as an illegal exception return.

•

M[0] is used to select the SP:
—

0 means the SP is always SP0.

—

1 means the exception SP is determined by the EL.

Accessing the SPSR_EL1:
To access the SPSR_EL1:
MRS , SPSR_EL1 ; Read SPSR_EL1 into Xt
MSR SPSR_EL1,  ; Write Xt to SPSR_EL1

Register access is encoded as follows:

C4-282

op0

op1

CRn

CRm

op2

11

000

0100

0000

000

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

C4.3.18

SPSR_EL2, Saved Program Status Register (EL2)
The SPSR_EL2 characteristics are:
Purpose
Holds the saved processor state when an exception is taken to EL2.
This register is part of the Special purpose registers functional group.
Usage constraints
This register is accessible as shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

-

-

RW

RW

RW

Configurations
SPSR_EL2 is architecturally mapped to AArch32 register SPSR_hyp.
Attributes
SPSR_EL2 is a 32-bit register.
The SPSR_EL2 bit assignments are:

When exception taken from AArch32:

31 30 29 28 27 26 25 24 23
N Z C V Q

J

21 20 19

RES0

IL

16 15
GE

10 9 8 7 6 5 4 3
IT[7:2]

E A I F T

0
M[3:0]

IT[1:0]

M[4]

N, bit [31]
Set to the value of CPSR.N on taking an exception to Hyp mode, and copied to CPSR.N on
executing an exception return operation in Hyp mode.
Z, bit [30]
Set to the value of CPSR.Z on taking an exception to Hyp mode, and copied to CPSR.Z on executing
an exception return operation in Hyp mode.
C, bit [29]
Set to the value of CPSR.C on taking an exception to Hyp mode, and copied to CPSR.C on
executing an exception return operation in Hyp mode.
V, bit [28]
Set to the value of CPSR.V on taking an exception to Hyp mode, and copied to CPSR.V on
executing an exception return operation in Hyp mode.
Q, bit [27]
Cumulative saturation bit. Set to 1 to indicate that overflow or saturation occurred in some
instructions.
IT[1:0], bits [26:25]
If-Then execution state bits for the T32 IT (If-Then) instruction. See IT[7:2] for explanation of this
field.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-283

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

J, bit [24]
Jazelle bit. Along with the T bit, determines the AArch32 instruction set state that the exception was
taken from. Possible values of this bit are:
0

Processor in A32 state if T is 0, or T32 state if T is 1.

1

Processor in an invalid state (Jazelle state before ARMv8) if T is 0, or T32EE state if T
is 1.

Since the Jazelle state is obsolete in ARMv8, J==1 and T==0 is an invalid combination, and
attempting to perform an exception return with those values is an illegal exception return.
If T32EE is not implemented, this bit is RES0, so the possible values of the T bit signify either A32
or T32 state.
Bits [23:21]
Reserved, RES0.
IL, bit [20]
Illegal Execution State bit. Shows the value of PSTATE.IL immediately before the exception was
taken.
GE, bits [19:16]
Greater than or Equal flags, for parallel addition and subtraction.
IT[7:2], bits [15:10]
If-Then execution state bits for the T32 IT (If-Then) instruction. This field must be interpreted in
two parts.
•

IT[7:5] holds the base condition for the IT block. The base condition is the top 3 bits of the
condition code specified by the first condition field of the IT instruction.

•

IT[4:0] encodes the size of the IT block, which is the number of instructions that are to be
conditionally executed, by the position of the least significant 1 in this field. It also encodes
the value of the least significant bit of the condition code for each instruction in the block.

The IT field is 0b00000000 when no IT block is active.
E, bit [9]
Endianness execution state bit. Controls the load and store endianness for data accesses:
0

Little-endian operation

1

Big-endian operation.

Instruction fetches ignore this bit.
When the reset value of the SCTLR.EE bit is defined by a configuration input signal, that value also
applies to the CPSR.E bit on reset, and therefore applies to software execution from reset.
If an implementation does not provide Big-endian support, this bit is RES0. If it does not provide
Little-endian support, this bit is RES1.
If an implementation provides Big-endian support but only at EL0, this bit is RES0 for an exception
return to any exception level other than EL0.
Likewise, if it provides Little-endian support only at EL0, this bit is RES1 for an exception return
to any exception level other than EL0.
A, bit [8]
Asynchronous data abort mask bit. The possible values of this bit are:

C4-284

0

Exception not masked.

1

Exception masked.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

I, bit [7]
IRQ mask bit. The possible values of this bit are:
0

Exception not masked.

1

Exception masked.

F, bit [6]
FIQ mask bit. The possible values of this bit are:
0

Exception not masked.

1

Exception masked.

T, bit [5]
Thumb execution state bit. Along with the J bit, determines the AArch32 instruction set state that
the exception was taken from. Possible values of this bit are:
0

Processor in A32 state if J is 0, or an invalid state (Jazelle state before ARMv8) if J is 1.

1

Processor in T32 state if J is 0, or T32EE state if J is 1.

Since the Jazelle state is obsolete in ARMv8, J==1 and T==0 is an invalid combination, and
attempting to perform an exception return with those values is an illegal exception return.
If T32EE is not implemented, the J bit is RES0, so the possible values of this bit signify either A32
or T32 state.
M[4], bit [4]
Register width that the exception was taken from. Possible values of this bit are:
1

Exception taken from AArch32.

M[3:0], bits [3:0]
Mode that an exception was taken from. For exceptions taken from AArch32, the possible values
are:
M[3:0]

Mode

0b0000

User

0b0001

FIQ

0b0010

IRQ

0b0011

Supervisor

0b0111

Abort

0b1010

Hyp

0b1011

Undefined

0b1111

System

Other values are reserved.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-285

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

When exception taken from AArch64:

31 30 29 28 27

22 21 20 19

N Z C V

RES0

SS IL

10 9 8 7 6 5 4 3
RES0

D A I F

0
M[3:0]

M[4]
RES0

N, bit [31]
Set to the value of the N condition flag on taking an exception to EL2, and copied to the N condition
flag on executing an exception return operation in EL2.
Z, bit [30]
Set to the value of the Z condition flag on taking an exception to EL2, and copied to the Z condition
flag on executing an exception return operation in EL2.
C, bit [29]
Set to the value of the C condition flag on taking an exception to EL2, and copied to the C condition
flag on executing an exception return operation in EL2.
V, bit [28]
Set to the value of the V condition flag on taking an exception to EL2, and copied to the V condition
flag on executing an exception return operation in EL2.
Bits [27:22]
Reserved, RES0.
SS, bit [21]
Software step. Indicates whether software step was enabled when an exception was taken.
IL, bit [20]
Illegal Execution State bit. Shows the value of PSTATE.IL immediately before the exception was
taken.
Bits [19:10]
Reserved, RES0.
D, bit [9]
Process state D mask. The possible values of this bit are:
0

Debug exceptions from Watchpoint, Breakpoint, and Software step debug events
targeted at the current exception level are not masked.

1

Debug exceptions from Watchpoint, Breakpoint, and Software step debug events
targeted at the current exception level are masked.

When the target exception level of the debug exception is not than the current exception level, the
exception is not masked by this bit.
A, bit [8]
SError (System Error) mask bit. The possible values of this bit are:

C4-286

0

Exception not masked.

1

Exception masked.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

I, bit [7]
IRQ mask bit. The possible values of this bit are:
0

Exception not masked.

1

Exception masked.

F, bit [6]
FIQ mask bit. The possible values of this bit are:
0

Exception not masked.

1

Exception masked.

Bit [5]
Reserved, RES0.
M[4], bit [4]
Register width that the exception was taken from. Possible values of this bit are:
Exception taken from AArch64.

0

M[3:0], bits [3:0]
Mode that an exception was taken from. For exceptions taken from AArch64, the possible values
are:
M[3:0]

Mode

0b0000

EL0t

0b0100

EL1t

0b0101

EL1h

0b1000

EL2t

0b1001

EL2h

Other values are reserved.
For exceptions from AArch64:
•

M[3:2] holds the Exception Level.

•

M[1] is unused, and returning to an exception level that is using AArch64 with this bit set is
treated as an illegal exception return.

•

M[0] is used to select the SP:
—

0 means the SP is always SP0.

—

1 means the exception SP is determined by the EL.

Accessing the SPSR_EL2:
To access the SPSR_EL2:
MRS , SPSR_EL2 ; Read SPSR_EL2 into Xt
MSR SPSR_EL2,  ; Write Xt to SPSR_EL2

Register access is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

11

100

0100

0000

000

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-287

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

C4.3.19

SPSR_EL3, Saved Program Status Register (EL3)
The SPSR_EL3 characteristics are:
Purpose
Holds the saved processor state when an exception is taken to EL3.
This register is part of the Special purpose registers functional group.
Usage constraints
This register is accessible as shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

-

-

-

RW

RW

Configurations
SPSR_EL3 can be mapped to AArch32 register SPSR_mon, but this is not architecturally
mandated.
Attributes
SPSR_EL3 is a 32-bit register.
The SPSR_EL3 bit assignments are:

When exception taken from AArch32:

31 30 29 28 27 26 25 24 23
N Z C V Q

J

21 20 19

RES0

IL

16 15
GE

10 9 8 7 6 5 4 3
IT[7:2]

E A I F T

0
M[3:0]

IT[1:0]

M[4]

N, bit [31]
Set to the value of CPSR.N on taking an exception to Monitor mode, and copied to CPSR.N on
executing an exception return operation in Monitor mode.
Z, bit [30]
Set to the value of CPSR.Z on taking an exception to Monitor mode, and copied to CPSR.Z on
executing an exception return operation in Monitor mode.
C, bit [29]
Set to the value of CPSR.C on taking an exception to Monitor mode, and copied to CPSR.C on
executing an exception return operation in Monitor mode.
V, bit [28]
Set to the value of CPSR.V on taking an exception to Monitor mode, and copied to CPSR.V on
executing an exception return operation in Monitor mode.
Q, bit [27]
Cumulative saturation bit. Set to 1 to indicate that overflow or saturation occurred in some
instructions.

C4-288

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

IT[1:0], bits [26:25]
If-Then execution state bits for the T32 IT (If-Then) instruction. See IT[7:2] for explanation of this
field.
J, bit [24]
Jazelle bit. Along with the T bit, determines the AArch32 instruction set state that the exception was
taken from. Possible values of this bit are:
0

Processor in A32 state if T is 0, or T32 state if T is 1.

1

Processor in an invalid state (Jazelle state before ARMv8) if T is 0, or T32EE state if T
is 1.

Since the Jazelle state is obsolete in ARMv8, J==1 and T==0 is an invalid combination, and
attempting to perform an exception return with those values is an illegal exception return.
If T32EE is not implemented, this bit is RES0, so the possible values of the T bit signify either A32
or T32 state.
Bits [23:21]
Reserved, RES0.
IL, bit [20]
Illegal Execution State bit. Shows the value of PSTATE.IL immediately before the exception was
taken.
GE, bits [19:16]
Greater than or Equal flags, for parallel addition and subtraction.
IT[7:2], bits [15:10]
If-Then execution state bits for the T32 IT (If-Then) instruction. This field must be interpreted in
two parts.
•

IT[7:5] holds the base condition for the IT block. The base condition is the top 3 bits of the
condition code specified by the first condition field of the IT instruction.

•

IT[4:0] encodes the size of the IT block, which is the number of instructions that are to be
conditionally executed, by the position of the least significant 1 in this field. It also encodes
the value of the least significant bit of the condition code for each instruction in the block.

The IT field is 0b00000000 when no IT block is active.
E, bit [9]
Endianness execution state bit. Controls the load and store endianness for data accesses:
0

Little-endian operation

1

Big-endian operation.

Instruction fetches ignore this bit.
When the reset value of the SCTLR.EE bit is defined by a configuration input signal, that value also
applies to the CPSR.E bit on reset, and therefore applies to software execution from reset.
If an implementation does not provide Big-endian support, this bit is RES0. If it does not provide
Little-endian support, this bit is RES1.
If an implementation provides Big-endian support but only at EL0, this bit is RES0 for an exception
return to any exception level other than EL0.
Likewise, if it provides Little-endian support only at EL0, this bit is RES1 for an exception return
to any exception level other than EL0.
A, bit [8]
Asynchronous data abort mask bit. The possible values of this bit are:

ARM DDI 0487A.a
ID090413

0

Exception not masked.

1

Exception masked.
Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-289

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

I, bit [7]
IRQ mask bit. The possible values of this bit are:
0

Exception not masked.

1

Exception masked.

F, bit [6]
FIQ mask bit. The possible values of this bit are:
0

Exception not masked.

1

Exception masked.

T, bit [5]
Thumb execution state bit. Along with the J bit, determines the AArch32 instruction set state that
the exception was taken from. Possible values of this bit are:
0

Processor in A32 state if J is 0, or an invalid state (Jazelle state before ARMv8) if J is 1.

1

Processor in T32 state if J is 0, or T32EE state if J is 1.

Since the Jazelle state is obsolete in ARMv8, J==1 and T==0 is an invalid combination, and
attempting to perform an exception return with those values is an illegal exception return.
If T32EE is not implemented, the J bit is RES0, so the possible values of this bit signify either A32
or T32 state.
M[4], bit [4]
Register width that the exception was taken from. Possible values of this bit are:
1

Exception taken from AArch32.

M[3:0], bits [3:0]
Mode that an exception was taken from. For exceptions taken from AArch32, the possible values
are:
M[3:0]

Mode

0b0000

User

0b0001

FIQ

0b0010

IRQ

0b0011

Supervisor

0b0110

Monitor

0b0111

Abort

0b1010

Hyp

0b1011

Undefined

0b1111

System

Other values are reserved.

C4-290

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

When exception taken from AArch64:

31 30 29 28 27

22 21 20 19

N Z C V

RES0

SS IL

10 9 8 7 6 5 4 3
RES0

D A I F

0
M[3:0]

M[4]
RES0

N, bit [31]
Set to the value of the N condition flag on taking an exception to EL3, and copied to the N condition
flag on executing an exception return operation in EL3.
Z, bit [30]
Set to the value of the Z condition flag on taking an exception to EL3, and copied to the Z condition
flag on executing an exception return operation in EL3.
C, bit [29]
Set to the value of the C condition flag on taking an exception to EL3, and copied to the C condition
flag on executing an exception return operation in EL3.
V, bit [28]
Set to the value of the V condition flag on taking an exception to EL3, and copied to the V condition
flag on executing an exception return operation in EL3.
Bits [27:22]
Reserved, RES0.
SS, bit [21]
Software step. Indicates whether software step was enabled when an exception was taken.
IL, bit [20]
Illegal Execution State bit. Shows the value of PSTATE.IL immediately before the exception was
taken.
Bits [19:10]
Reserved, RES0.
D, bit [9]
Process state D mask. The possible values of this bit are:
0

Debug exceptions from Watchpoint, Breakpoint, and Software step debug events
targeted at the current exception level are not masked.

1

Debug exceptions from Watchpoint, Breakpoint, and Software step debug events
targeted at the current exception level are masked.

When the target exception level of the debug exception is not than the current exception level, the
exception is not masked by this bit.
A, bit [8]
SError (System Error) mask bit. The possible values of this bit are:

ARM DDI 0487A.a
ID090413

0

Exception not masked.

1

Exception masked.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-291

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

I, bit [7]
IRQ mask bit. The possible values of this bit are:
0

Exception not masked.

1

Exception masked.

F, bit [6]
FIQ mask bit. The possible values of this bit are:
0

Exception not masked.

1

Exception masked.

Bit [5]
Reserved, RES0.
M[4], bit [4]
Register width that the exception was taken from. Possible values of this bit are:
Exception taken from AArch64.

0

M[3:0], bits [3:0]
Mode that an exception was taken from. For exceptions taken from AArch64, the possible values
are:
M[3:0]

Mode

0b0000

EL0t

0b0100

EL1t

0b0101

EL1h

0b1000

EL2t

0b1001

EL2h

0b1100

EL3t

0b1101

EL3h

Other values are reserved.
For exceptions from AArch64:
•

M[3:2] holds the Exception Level.

•

M[1] is unused, and returning to an exception level that is using AArch64 with this bit set is
treated as an illegal exception return.

•

M[0] is used to select the SP:
—

0 means the SP is always SP0.

—

1 means the exception SP is determined by the EL.

Accessing the SPSR_EL3:
To access the SPSR_EL3:
MRS , SPSR_EL3 ; Read SPSR_EL3 into Xt
MSR SPSR_EL3,  ; Write Xt to SPSR_EL3

C4-292

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

Register access is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

11

110

0100

0000

000

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-293

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

C4.3.20

SPSR_fiq, Saved Program Status Register (FIQ mode)
The SPSR_fiq characteristics are:
Purpose
Holds the saved processor state when an exception is taken to FIQ mode.
This register is part of the Special purpose registers functional group.
Usage constraints
This register is accessible as shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

-

-

RW

RW

RW

Configurations
SPSR_fiq is architecturally mapped to AArch32 register SPSR_fiq.
If EL1 does not support execution in AArch32, this register is RES0.
Attributes
SPSR_fiq is a 32-bit register.
The SPSR_fiq bit assignments are:

31 30 29 28 27 26 25 24 23
N Z C V Q

J

21 20 19

RES0

IL

16 15
GE

10 9 8 7 6 5 4 3
IT[7:2]

E A I F T

0
M[3:0]

IT[1:0]

M[4]

N, bit [31]
Set to the value of CPSR.N on taking an exception to FIQ mode, and copied to CPSR.N on
executing an exception return operation in FIQ mode.
Z, bit [30]
Set to the value of CPSR.Z on taking an exception to FIQ mode, and copied to CPSR.Z on executing
an exception return operation in FIQ mode.
C, bit [29]
Set to the value of CPSR.C on taking an exception to FIQ mode, and copied to CPSR.C on executing
an exception return operation in FIQ mode.
V, bit [28]
Set to the value of CPSR.V on taking an exception to FIQ mode, and copied to CPSR.V on
executing an exception return operation in FIQ mode.
Q, bit [27]
Cumulative saturation bit. Set to 1 to indicate that overflow or saturation occurred in some
instructions.
IT[1:0], bits [26:25]
If-Then execution state bits for the T32 IT (If-Then) instruction. See IT[7:2] for explanation of this
field.

C4-294

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

J, bit [24]
Jazelle bit. Along with the T bit, determines the AArch32 instruction set state that the exception was
taken from. Possible values of this bit are:
0

Processor in A32 state if T is 0, or T32 state if T is 1.

1

Processor in an invalid state (Jazelle state before ARMv8) if T is 0, or T32EE state if T
is 1.

Since the Jazelle state is obsolete in ARMv8, J==1 and T==0 is an invalid combination, and
attempting to perform an exception return with those values is an illegal exception return.
If T32EE is not implemented, this bit is RES0, so the possible values of the T bit signify either A32
or T32 state.
Bits [23:21]
Reserved, RES0.
IL, bit [20]
Illegal Execution State bit. Shows the value of PSTATE.IL immediately before the exception was
taken.
GE, bits [19:16]
Greater than or Equal flags, for parallel addition and subtraction.
IT[7:2], bits [15:10]
If-Then execution state bits for the T32 IT (If-Then) instruction. This field must be interpreted in
two parts.
•

IT[7:5] holds the base condition for the IT block. The base condition is the top 3 bits of the
condition code specified by the first condition field of the IT instruction.

•

IT[4:0] encodes the size of the IT block, which is the number of instructions that are to be
conditionally executed, by the position of the least significant 1 in this field. It also encodes
the value of the least significant bit of the condition code for each instruction in the block.

The IT field is 0b00000000 when no IT block is active.
E, bit [9]
Endianness execution state bit. Controls the load and store endianness for data accesses:
0

Little-endian operation

1

Big-endian operation.

Instruction fetches ignore this bit.
When the reset value of the SCTLR.EE bit is defined by a configuration input signal, that value also
applies to the CPSR.E bit on reset, and therefore applies to software execution from reset.
If an implementation does not provide Big-endian support, this bit is RES0. If it does not provide
Little-endian support, this bit is RES1.
If an implementation provides Big-endian support but only at EL0, this bit is RES0 for an exception
return to any exception level other than EL0.
Likewise, if it provides Little-endian support only at EL0, this bit is RES1 for an exception return
to any exception level other than EL0.
A, bit [8]
Asynchronous data abort mask bit. The possible values of this bit are:

ARM DDI 0487A.a
ID090413

0

Exception not masked.

1

Exception masked.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-295

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

I, bit [7]
IRQ mask bit. The possible values of this bit are:
0

Exception not masked.

1

Exception masked.

F, bit [6]
FIQ mask bit. The possible values of this bit are:
0

Exception not masked.

1

Exception masked.

T, bit [5]
Thumb execution state bit. Along with the J bit, determines the AArch32 instruction set state that
the exception was taken from. Possible values of this bit are:
0

Processor in A32 state if J is 0, or an invalid state (Jazelle state before ARMv8) if J is 1.

1

Processor in T32 state if J is 0, or T32EE state if J is 1.

Since the Jazelle state is obsolete in ARMv8, J==1 and T==0 is an invalid combination, and
attempting to perform an exception return with those values is an illegal exception return.
If T32EE is not implemented, the J bit is RES0, so the possible values of this bit signify either A32
or T32 state.
M[4], bit [4]
Register width that the exception was taken from. Possible values of this bit are:
Exception taken from AArch32.

1

M[3:0], bits [3:0]
Mode that an exception was taken from. For exceptions taken from AArch32, the possible values
are:
M[3:0]

Mode

0b0000

User

0b0001

FIQ

0b0010

IRQ

0b0011

Supervisor

0b0110

Monitor

0b0111

Abort

0b1010

Hyp

0b1011

Undefined

0b1111

System

Other values are reserved.

Accessing the SPSR_fiq:
To access the SPSR_fiq:
MRS , SPSR_fiq ; Read SPSR_fiq into Xt
MSR SPSR_fiq,  ; Write Xt to SPSR_fiq

C4-296

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

Register access is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

11

100

0100

0011

011

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-297

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

C4.3.21

SPSR_irq, Saved Program Status Register (IRQ mode)
The SPSR_irq characteristics are:
Purpose
Holds the saved processor state when an exception is taken to IRQ mode.
This register is part of the Special purpose registers functional group.
Usage constraints
This register is accessible as shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

-

-

RW

RW

RW

Configurations
SPSR_irq is architecturally mapped to AArch32 register SPSR_irq.
If EL1 does not support execution in AArch32, this register is RES0.
Attributes
SPSR_irq is a 32-bit register.
The SPSR_irq bit assignments are:

31 30 29 28 27 26 25 24 23
N Z C V Q

J

21 20 19

RES0

IL

16 15
GE

10 9 8 7 6 5 4 3
IT[7:2]

E A I F T

0
M[3:0]

IT[1:0]

M[4]

N, bit [31]
Set to the value of CPSR.N on taking an exception to IRQ mode, and copied to CPSR.N on
executing an exception return operation in IRQ mode.
Z, bit [30]
Set to the value of CPSR.Z on taking an exception to IRQ mode, and copied to CPSR.Z on executing
an exception return operation in IRQ mode.
C, bit [29]
Set to the value of CPSR.C on taking an exception to IRQ mode, and copied to CPSR.C on
executing an exception return operation in IRQ mode.
V, bit [28]
Set to the value of CPSR.V on taking an exception to IRQ mode, and copied to CPSR.V on
executing an exception return operation in IRQ mode.
Q, bit [27]
Cumulative saturation bit. Set to 1 to indicate that overflow or saturation occurred in some
instructions.
IT[1:0], bits [26:25]
If-Then execution state bits for the T32 IT (If-Then) instruction. See IT[7:2] for explanation of this
field.

C4-298

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

J, bit [24]
Jazelle bit. Along with the T bit, determines the AArch32 instruction set state that the exception was
taken from. Possible values of this bit are:
0

Processor in A32 state if T is 0, or T32 state if T is 1.

1

Processor in an invalid state (Jazelle state before ARMv8) if T is 0, or T32EE state if T
is 1.

Since the Jazelle state is obsolete in ARMv8, J==1 and T==0 is an invalid combination, and
attempting to perform an exception return with those values is an illegal exception return.
If T32EE is not implemented, this bit is RES0, so the possible values of the T bit signify either A32
or T32 state.
Bits [23:21]
Reserved, RES0.
IL, bit [20]
Illegal Execution State bit. Shows the value of PSTATE.IL immediately before the exception was
taken.
GE, bits [19:16]
Greater than or Equal flags, for parallel addition and subtraction.
IT[7:2], bits [15:10]
If-Then execution state bits for the T32 IT (If-Then) instruction. This field must be interpreted in
two parts.
•

IT[7:5] holds the base condition for the IT block. The base condition is the top 3 bits of the
condition code specified by the first condition field of the IT instruction.

•

IT[4:0] encodes the size of the IT block, which is the number of instructions that are to be
conditionally executed, by the position of the least significant 1 in this field. It also encodes
the value of the least significant bit of the condition code for each instruction in the block.

The IT field is 0b00000000 when no IT block is active.
E, bit [9]
Endianness execution state bit. Controls the load and store endianness for data accesses:
0

Little-endian operation

1

Big-endian operation.

Instruction fetches ignore this bit.
When the reset value of the SCTLR.EE bit is defined by a configuration input signal, that value also
applies to the CPSR.E bit on reset, and therefore applies to software execution from reset.
If an implementation does not provide Big-endian support, this bit is RES0. If it does not provide
Little-endian support, this bit is RES1.
If an implementation provides Big-endian support but only at EL0, this bit is RES0 for an exception
return to any exception level other than EL0.
Likewise, if it provides Little-endian support only at EL0, this bit is RES1 for an exception return
to any exception level other than EL0.
A, bit [8]
Asynchronous data abort mask bit. The possible values of this bit are:

ARM DDI 0487A.a
ID090413

0

Exception not masked.

1

Exception masked.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-299

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

I, bit [7]
IRQ mask bit. The possible values of this bit are:
0

Exception not masked.

1

Exception masked.

F, bit [6]
FIQ mask bit. The possible values of this bit are:
0

Exception not masked.

1

Exception masked.

T, bit [5]
Thumb execution state bit. Along with the J bit, determines the AArch32 instruction set state that
the exception was taken from. Possible values of this bit are:
0

Processor in A32 state if J is 0, or an invalid state (Jazelle state before ARMv8) if J is 1.

1

Processor in T32 state if J is 0, or T32EE state if J is 1.

Since the Jazelle state is obsolete in ARMv8, J==1 and T==0 is an invalid combination, and
attempting to perform an exception return with those values is an illegal exception return.
If T32EE is not implemented, the J bit is RES0, so the possible values of this bit signify either A32
or T32 state.
M[4], bit [4]
Register width that the exception was taken from. Possible values of this bit are:
Exception taken from AArch32.

1

M[3:0], bits [3:0]
Mode that an exception was taken from. For exceptions taken from AArch32, the possible values
are:
M[3:0]

Mode

0b0000

User

0b0001

FIQ

0b0010

IRQ

0b0011

Supervisor

0b0110

Monitor

0b0111

Abort

0b1010

Hyp

0b1011

Undefined

0b1111

System

Other values are reserved.

Accessing the SPSR_irq:
To access the SPSR_irq:
MRS , SPSR_irq ; Read SPSR_irq into Xt
MSR SPSR_irq,  ; Write Xt to SPSR_irq

C4-300

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

Register access is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

11

100

0100

0011

000

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-301

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

C4.3.22

SPSR_und, Saved Program Status Register (Undefined mode)
The SPSR_und characteristics are:
Purpose
Holds the saved processor state when an exception is taken to Undefined mode.
This register is part of the Special purpose registers functional group.
Usage constraints
This register is accessible as shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

-

-

RW

RW

RW

Configurations
SPSR_und is architecturally mapped to AArch32 register SPSR_und.
If EL1 does not support execution in AArch32, this register is RES0.
Attributes
SPSR_und is a 32-bit register.
The SPSR_und bit assignments are:

31 30 29 28 27 26 25 24 23
N Z C V Q

J

21 20 19

RES0

IL

16 15
GE

10 9 8 7 6 5 4 3
IT[7:2]

E A I F T

0
M[3:0]

IT[1:0]

M[4]

N, bit [31]
Set to the value of CPSR.N on taking an exception to Undefined mode, and copied to CPSR.N on
executing an exception return operation in Undefined mode.
Z, bit [30]
Set to the value of CPSR.Z on taking an exception to Undefined mode, and copied to CPSR.Z on
executing an exception return operation in Undefined mode.
C, bit [29]
Set to the value of CPSR.C on taking an exception to Undefined mode, and copied to CPSR.C on
executing an exception return operation in Undefined mode.
V, bit [28]
Set to the value of CPSR.V on taking an exception to Undefined mode, and copied to CPSR.V on
executing an exception return operation in Undefined mode.
Q, bit [27]
Cumulative saturation bit. Set to 1 to indicate that overflow or saturation occurred in some
instructions.
IT[1:0], bits [26:25]
If-Then execution state bits for the T32 IT (If-Then) instruction. See IT[7:2] for explanation of this
field.

C4-302

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

J, bit [24]
Jazelle bit. Along with the T bit, determines the AArch32 instruction set state that the exception was
taken from. Possible values of this bit are:
0

Processor in A32 state if T is 0, or T32 state if T is 1.

1

Processor in an invalid state (Jazelle state before ARMv8) if T is 0, or T32EE state if T
is 1.

Since the Jazelle state is obsolete in ARMv8, J==1 and T==0 is an invalid combination, and
attempting to perform an exception return with those values is an illegal exception return.
If T32EE is not implemented, this bit is RES0, so the possible values of the T bit signify either A32
or T32 state.
Bits [23:21]
Reserved, RES0.
IL, bit [20]
Illegal Execution State bit. Shows the value of PSTATE.IL immediately before the exception was
taken.
GE, bits [19:16]
Greater than or Equal flags, for parallel addition and subtraction.
IT[7:2], bits [15:10]
If-Then execution state bits for the T32 IT (If-Then) instruction. This field must be interpreted in
two parts.
•

IT[7:5] holds the base condition for the IT block. The base condition is the top 3 bits of the
condition code specified by the first condition field of the IT instruction.

•

IT[4:0] encodes the size of the IT block, which is the number of instructions that are to be
conditionally executed, by the position of the least significant 1 in this field. It also encodes
the value of the least significant bit of the condition code for each instruction in the block.

The IT field is 0b00000000 when no IT block is active.
E, bit [9]
Endianness execution state bit. Controls the load and store endianness for data accesses:
0

Little-endian operation

1

Big-endian operation.

Instruction fetches ignore this bit.
When the reset value of the SCTLR.EE bit is defined by a configuration input signal, that value also
applies to the CPSR.E bit on reset, and therefore applies to software execution from reset.
If an implementation does not provide Big-endian support, this bit is RES0. If it does not provide
Little-endian support, this bit is RES1.
If an implementation provides Big-endian support but only at EL0, this bit is RES0 for an exception
return to any exception level other than EL0.
Likewise, if it provides Little-endian support only at EL0, this bit is RES1 for an exception return
to any exception level other than EL0.
A, bit [8]
Asynchronous data abort mask bit. The possible values of this bit are:

ARM DDI 0487A.a
ID090413

0

Exception not masked.

1

Exception masked.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-303

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

I, bit [7]
IRQ mask bit. The possible values of this bit are:
0

Exception not masked.

1

Exception masked.

F, bit [6]
FIQ mask bit. The possible values of this bit are:
0

Exception not masked.

1

Exception masked.

T, bit [5]
Thumb execution state bit. Along with the J bit, determines the AArch32 instruction set state that
the exception was taken from. Possible values of this bit are:
0

Processor in A32 state if J is 0, or an invalid state (Jazelle state before ARMv8) if J is 1.

1

Processor in T32 state if J is 0, or T32EE state if J is 1.

Since the Jazelle state is obsolete in ARMv8, J==1 and T==0 is an invalid combination, and
attempting to perform an exception return with those values is an illegal exception return.
If T32EE is not implemented, the J bit is RES0, so the possible values of this bit signify either A32
or T32 state.
M[4], bit [4]
Register width that the exception was taken from. Possible values of this bit are:
Exception taken from AArch32.

1

M[3:0], bits [3:0]
Mode that an exception was taken from. For exceptions taken from AArch32, the possible values
are:
M[3:0]

Mode

0b0000

User

0b0001

FIQ

0b0010

IRQ

0b0011

Supervisor

0b0110

Monitor

0b0111

Abort

0b1010

Hyp

0b1011

Undefined

0b1111

System

Other values are reserved.

Accessing the SPSR_und:
To access the SPSR_und:
MRS , SPSR_und ; Read SPSR_und into Xt
MSR SPSR_und,  ; Write Xt to SPSR_und

C4-304

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.3 PSTATE and special purpose registers

Register access is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

11

100

0100

0011

010

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-305

C4 The AArch64 System Instruction Class
C4.4 A64 system instructions for cache maintenance

C4.4

A64 system instructions for cache maintenance
The following sections describe the cache maintenance system instructions in A64.
•
DC CISW, Data or unified Cache line Clean and Invalidate by Set/Way on page C4-307
•
DC CIVAC, Data or unified Cache line Clean and Invalidate by VA to PoC on page C4-309
•
DC CSW, Data or unified Cache line Clean by Set/Way on page C4-310
•
DC CVAC, Data or unified Cache line Clean by VA to PoC on page C4-312
•
DC CVAU, Data or unified Cache line Clean by VA to PoU on page C4-313
•
DC ISW, Data or unified Cache line Invalidate by Set/Way on page C4-314
•
DC IVAC, Data or unified Cache line Invalidate by VA to PoC on page C4-316
•
DC ZVA, Data Cache Zero by VA on page C4-317
•
IC IALLU, Instruction Cache Invalidate All to PoU on page C4-319
•
IC IALLUIS, Instruction Cache Invalidate All to PoU, Inner Shareable on page C4-320
•
IC IVAU, Instruction Cache line Invalidate by VA to PoU on page C4-321

C4-306

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.4 A64 system instructions for cache maintenance

C4.4.1

DC CISW, Data or unified Cache line Clean and Invalidate by Set/Way
The DC CISW characteristics are:
Purpose
Clean and Invalidate data cache by set/way.
This register is part of the Cache maintenance operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

WO

WO

WO

WO

WO

Configurations
DC CISW performs the same function as AArch32 operation DCCISW.
Attributes
DC CISW is a 64-bit system operation.
The DC CISW input value bit assignments are:

63

32 31
RES0

4 3
SetWay

1 0
Level

RES0

Bits [63:32]
Reserved, RES0.
SetWay, bits [31:4]
Contains two fields:
•

Way, bits[31:32-A], the number of the way to operate on.

•

Set, bits[B-1:L], the number of the set to operate on.

Bits[L-1:4] are RES0.
A = Log2(ASSOCIATIVITY), L = Log2(LINELEN), B = (L + S), S = Log2(NSETS).
ASSOCIATIVITY, LINELEN (line length, in bytes), and NSETS (number of sets) have their usual
meanings and are the values for the cache level being operated on. The values of A and S are
rounded up to the next integer.
Level, bits [3:1]
Cache level to operate on, minus 1. For example, this field is 0 for operations on L1 cache, or 1 for
operations on L2 cache.
Bit [0]
Reserved, RES0.

Performing the DC CISW operation:
To perform the DC CISW operation:
DC CISW, 

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-307

C4 The AArch64 System Instruction Class
C4.4 A64 system instructions for cache maintenance

The operation is encoded as follows:

C4-308

op0

op1

CRn

CRm

op2

01

000

0111

1110

010

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.4 A64 system instructions for cache maintenance

C4.4.2

DC CIVAC, Data or unified Cache line Clean and Invalidate by VA to PoC
The DC CIVAC characteristics are:
Purpose
Clean and Invalidate data cache by address to Point of Coherency.
This register is part of the Cache maintenance operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

Config-WO

WO

WO

WO

WO

WO

EL0 access is enabled when SCTLR_EL1.UCI is set to 1. When it is set to 0, this operation is
UNDEFINED at EL0.
If EL0 access is enabled, this operation is available at EL0 when the VA has read access permission,
otherwise it causes a Permission Fault.
Configurations
DC CIVAC performs the same function as AArch32 operation DCCIMVAC.
Attributes
DC CIVAC is a 64-bit system operation.
The DC CIVAC input value bit assignments are:

63

0
Virtual address to use

Bits [63:0]
Virtual address to use.

Performing the DC CIVAC operation:
To perform the DC CIVAC operation:
DC CIVAC, 

The operation is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

01

011

0111

1110

001

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-309

C4 The AArch64 System Instruction Class
C4.4 A64 system instructions for cache maintenance

C4.4.3

DC CSW, Data or unified Cache line Clean by Set/Way
The DC CSW characteristics are:
Purpose
Clean data cache by set/way.
This register is part of the Cache maintenance operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

WO

WO

WO

WO

WO

Configurations
DC CSW performs the same function as AArch32 operation DCCSW.
Attributes
DC CSW is a 64-bit system operation.
The DC CSW input value bit assignments are:

63

32 31
RES0

4 3
SetWay

1 0
Level

RES0

Bits [63:32]
Reserved, RES0.
SetWay, bits [31:4]
Contains two fields:
•

Way, bits[31:32-A], the number of the way to operate on.

•

Set, bits[B-1:L], the number of the set to operate on.

Bits[L-1:4] are RES0.
A = Log2(ASSOCIATIVITY), L = Log2(LINELEN), B = (L + S), S = Log2(NSETS).
ASSOCIATIVITY, LINELEN (line length, in bytes), and NSETS (number of sets) have their usual
meanings and are the values for the cache level being operated on. The values of A and S are
rounded up to the next integer.
Level, bits [3:1]
Cache level to operate on, minus 1. For example, this field is 0 for operations on L1 cache, or 1 for
operations on L2 cache.
Bit [0]
Reserved, RES0.

Performing the DC CSW operation:
To perform the DC CSW operation:
DC CSW, 

C4-310

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.4 A64 system instructions for cache maintenance

The operation is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

01

000

0111

1010

010

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-311

C4 The AArch64 System Instruction Class
C4.4 A64 system instructions for cache maintenance

C4.4.4

DC CVAC, Data or unified Cache line Clean by VA to PoC
The DC CVAC characteristics are:
Purpose
Clean data cache by address to Point of Coherency.
This register is part of the Cache maintenance operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

Config-WO

WO

WO

WO

WO

WO

EL0 access is enabled when SCTLR_EL1.UCI is set to 1. When it is set to 0, this operation is
UNDEFINED at EL0.
If EL0 access is enabled, this operation is available at EL0 when the VA has read access permission,
otherwise it causes a Permission Fault.
Configurations
DC CVAC performs the same function as AArch32 operation DCCMVAC.
Attributes
DC CVAC is a 64-bit system operation.
The DC CVAC input value bit assignments are:

63

0
Virtual address to use

Bits [63:0]
Virtual address to use.

Performing the DC CVAC operation:
To perform the DC CVAC operation:
DC CVAC, 

The operation is encoded as follows:

C4-312

op0

op1

CRn

CRm

op2

01

011

0111

1010

001

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.4 A64 system instructions for cache maintenance

C4.4.5

DC CVAU, Data or unified Cache line Clean by VA to PoU
The DC CVAU characteristics are:
Purpose
Clean data cache by address to Point of Unification.
This register is part of the Cache maintenance operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

Config-WO

WO

WO

WO

WO

WO

EL0 access is enabled when SCTLR_EL1.UCI is set to 1. When it is set to 0, this operation is
UNDEFINED at EL0.
If EL0 access is enabled, this operation is available at EL0 when the VA has read access permission,
otherwise it causes a Permission Fault.
Configurations
DC CVAU performs the same function as AArch32 operation DCCMVAU.
Attributes
DC CVAU is a 64-bit system operation.
The DC CVAU input value bit assignments are:

63

0
Virtual address to use

Bits [63:0]
Virtual address to use.

Performing the DC CVAU operation:
To perform the DC CVAU operation:
DC CVAU, 

The operation is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

01

011

0111

1011

001

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-313

C4 The AArch64 System Instruction Class
C4.4 A64 system instructions for cache maintenance

C4.4.6

DC ISW, Data or unified Cache line Invalidate by Set/Way
The DC ISW characteristics are:
Purpose
Invalidate data cache by set/way.
This register is part of the Cache maintenance operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

WO

WO

WO

WO

WO

At EL1, this operation must be performed as DC CISW if all of the following apply:
•

EL2 is implemented

•

HCR_EL2.VM is set to 1

•

SCR_EL3.NS is set to 1 or EL3 is not implemented.

Configurations
DC ISW performs the same function as AArch32 operation DCISW.
Attributes
DC ISW is a 64-bit system operation.
The DC ISW input value bit assignments are:

63

32 31
RES0

4 3
SetWay

1 0
Level

RES0

Bits [63:32]
Reserved, RES0.
SetWay, bits [31:4]
Contains two fields:
•

Way, bits[31:32-A], the number of the way to operate on.

•

Set, bits[B-1:L], the number of the set to operate on.

Bits[L-1:4] are RES0.
A = Log2(ASSOCIATIVITY), L = Log2(LINELEN), B = (L + S), S = Log2(NSETS).
ASSOCIATIVITY, LINELEN (line length, in bytes), and NSETS (number of sets) have their usual
meanings and are the values for the cache level being operated on. The values of A and S are
rounded up to the next integer.
Level, bits [3:1]
Cache level to operate on, minus 1. For example, this field is 0 for operations on L1 cache, or 1 for
operations on L2 cache.
Bit [0]
Reserved, RES0.
C4-314

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.4 A64 system instructions for cache maintenance

Performing the DC ISW operation:
To perform the DC ISW operation:
DC ISW, 

The operation is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

01

000

0111

0110

010

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-315

C4 The AArch64 System Instruction Class
C4.4 A64 system instructions for cache maintenance

C4.4.7

DC IVAC, Data or unified Cache line Invalidate by VA to PoC
The DC IVAC characteristics are:
Purpose
Invalidate instruction cache by address to Point of Coherency.
This register is part of the Cache maintenance operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

WO

WO

WO

WO

WO

This operation requires write access permission to the VA, otherwise it causes a Permission Fault.
At EL1, this operation must be performed as DC CIVAC if all of the following apply:
•

EL2 is implemented.

•

HCR_EL2.VM is set to 1.

•

SCR_EL3.NS is set to 1 or EL3 is not implemented.

Configurations
DC IVAC performs the same function as AArch32 operation DCIMVAC.
Attributes
DC IVAC is a 64-bit system operation.
The DC IVAC input value bit assignments are:

63

0
Virtual address to use

Bits [63:0]
Virtual address to use.

Performing the DC IVAC operation:
To perform the DC IVAC operation:
DC IVAC, 

The operation is encoded as follows:

C4-316

op0

op1

CRn

CRm

op2

01

000

0111

0110

001

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.4 A64 system instructions for cache maintenance

C4.4.8

DC ZVA, Data Cache Zero by VA
The DC ZVA characteristics are:
Purpose
Zero data cache by address. Zeroes a naturally aligned block of N bytes, where the size of N is
identified in DCZID_EL0.
This register is part of the Cache maintenance operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

Config-WO

WO

WO

WO

WO

WO

There are two control bits associated with this instruction: SCTLR_EL1.DZE and HCR_EL2.TDZ.
•

If execution at EL0 is disabled by the SCTLR_EL1.DZE bit being set to 0, the instruction is
at EL0.

UNDEFINED

•

In the Non-secure state, HCR_EL2.TDZ controls whether the instruction executes at EL0 or
EL1, or traps to EL2.

When UNDEFINED, the instruction always takes the EL1 UNDEFINED exception.
When the instruction is executed, it can generate memory faults or watchpoints which are prioritized
in the same way as other memory related faults or watchpoints. If a synchronous data abort fault or
a watchpoint is generated, the CM bit in the syndrome field is not set.
If the memory region being zeroed is any type of Device memory, these instructions give an
alignment fault which is prioritized in the same way as other alignment faults that are determined
by the memory type.
This instruction applies to Normal memory regardless of cacheability attributes.
The instruction behaves as a set of Stores to each byte within the block being accessed, and so it:
•

Will cause a Permission Fault if the translation system does not permit writes to the locations.

•

Requires the same considerations for ordering and the management of coherency as any other
store instructions.

Configurations
There are no configuration notes.
Attributes
DC ZVA is a 64-bit system operation.
The DC ZVA input value bit assignments are:

63

0
Virtual address to use

Bits [63:0]
Virtual address to use. There is no alignment restriction on the address within the block of N bytes
that is used.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-317

C4 The AArch64 System Instruction Class
C4.4 A64 system instructions for cache maintenance

Performing the DC ZVA operation:
To perform the DC ZVA operation:
DC ZVA, 

The operation is encoded as follows:

C4-318

op0

op1

CRn

CRm

op2

01

011

0111

0100

001

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.4 A64 system instructions for cache maintenance

C4.4.9

IC IALLU, Instruction Cache Invalidate All to PoU
The IC IALLU characteristics are:
Purpose
Invalidate all instruction caches to Point of Unification.
This register is part of the Cache maintenance operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

WO

WO

WO

WO

WO

Configurations
IC IALLU performs the same function as AArch32 operation ICIALLU.
Attributes
IC IALLU is a 64-bit system operation.
The IC IALLU operation ignores the value in the register specified by the instruction used to perform this operation.
Software does not have to write a value to the register before issuing this instruction.

Performing the IC IALLU operation:
To perform the IC IALLU operation:
IC IALLU

The operation is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

01

000

0111

0101

000

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-319

C4 The AArch64 System Instruction Class
C4.4 A64 system instructions for cache maintenance

C4.4.10

IC IALLUIS, Instruction Cache Invalidate All to PoU, Inner Shareable
The IC IALLUIS characteristics are:
Purpose
Invalidate all instruction caches in Inner Shareable domain to Point of Unification.
This register is part of the Cache maintenance operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

WO

WO

WO

WO

WO

Configurations
IC IALLUIS performs the same function as AArch32 operation ICIALLUIS.
Attributes
IC IALLUIS is a 64-bit system operation.
The IC IALLUIS operation ignores the value in the register specified by the instruction used to perform this
operation. Software does not have to write a value to the register before issuing this instruction.

Performing the IC IALLUIS operation:
To perform the IC IALLUIS operation:
IC IALLUIS

The operation is encoded as follows:

C4-320

op0

op1

CRn

CRm

op2

01

000

0111

0001

000

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.4 A64 system instructions for cache maintenance

C4.4.11

IC IVAU, Instruction Cache line Invalidate by VA to PoU
The IC IVAU characteristics are:
Purpose
Invalidate instruction cache by address to Point of Unification.
This register is part of the Cache maintenance operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

Config-WO

WO

WO

WO

WO

WO

EL0 access is enabled when SCTLR_EL1.UCI is set to 1. When it is set to 0, this operation is
UNDEFINED at EL0.
If EL0 access is enabled, this operation is available at EL0 when the VA has read access permission,
otherwise it causes a Permission Fault.
Configurations
IC IVAU performs the same function as AArch32 operation ICIMVAU.
Attributes
IC IVAU is a 64-bit system operation.
The IC IVAU input value bit assignments are:

63

0
Virtual address to use

Bits [63:0]
Virtual address to use.

Performing the IC IVAU operation:
To perform the IC IVAU operation:
IC IVAU, 

The operation is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

01

011

0111

0101

001

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-321

C4 The AArch64 System Instruction Class
C4.5 A64 system instructions for address translation

C4.5

A64 system instructions for address translation
The following sections describe the address translation instructions in A64.
•
AT S12E0R, Address Translate Stages 1 and 2 EL0 Read on page C4-323
•
AT S12E0W, Address Translate Stages 1 and 2 EL0 Write on page C4-324
•
AT S12E1R, Address Translate Stages 1 and 2 EL1 Read on page C4-325
•
AT S12E1W, Address Translate Stages 1 and 2 EL1 Write on page C4-326
•
AT S1E0R, Address Translate Stage 1 EL0 Read on page C4-327
•
AT S1E0W, Address Translate Stage 1 EL0 Write on page C4-328
•
AT S1E1R, Address Translate Stage 1 EL1 Read on page C4-329
•
AT S1E1W, Address Translate Stage 1 EL1 Write on page C4-330
•
AT S1E2R, Address Translate Stage 1 EL2 Read on page C4-331
•
AT S1E2W, Address Translate Stage 1 EL2 Write on page C4-332
•
AT S1E3R, Address Translate Stage 1 EL3 Read on page C4-333
•
AT S1E3W, Address Translate Stage 1 EL3 Write on page C4-334

C4-322

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.5 A64 system instructions for address translation

C4.5.1

AT S12E0R, Address Translate Stages 1 and 2 EL0 Read
The AT S12E0R characteristics are:
Purpose
Performs stage 1 and 2 address translations as defined for EL0, with permissions as if reading from
the given virtual address.
This register is part of the Address translation operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

-

-

WO

WO

WO

If EL2 does not exist, or stage 2 translation is disabled, this operation executes as AT S1E0R.
Configurations
There are no configuration notes.
Attributes
AT S12E0R is a 64-bit system operation.
The AT S12E0R input value bit assignments are:

63

0
Virtual address to translate to a physical address

Bits [63:0]
Virtual address to translate to a physical address. The resulting physical address can be read from
the PAR_EL1.
If the address translation instructions are targeting a translation regime that is using AArch32, and
so has a VA of only 32 bits, then VA[63:32] is RES0.

Performing the AT S12E0R operation:
To perform the AT S12E0R operation:
AT S12E0R, 

The operation is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

01

100

0111

1000

110

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-323

C4 The AArch64 System Instruction Class
C4.5 A64 system instructions for address translation

C4.5.2

AT S12E0W, Address Translate Stages 1 and 2 EL0 Write
The AT S12E0W characteristics are:
Purpose
Performs stage 1 and 2 address translations as defined for EL0, with permissions as if writing to the
given virtual address.
This register is part of the Address translation operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

-

-

WO

WO

WO

If EL2 does not exist, or stage 2 translation is disabled, this operation executes as AT S1E0W.
Configurations
There are no configuration notes.
Attributes
AT S12E0W is a 64-bit system operation.
The AT S12E0W input value bit assignments are:

63

0
Virtual address to translate to a physical address

Bits [63:0]
Virtual address to translate to a physical address. The resulting physical address can be read from
the PAR_EL1.
If the address translation instructions are targeting a translation regime that is using AArch32, and
so has a VA of only 32 bits, then VA[63:32] is RES0.

Performing the AT S12E0W operation:
To perform the AT S12E0W operation:
AT S12E0W, 

The operation is encoded as follows:

C4-324

op0

op1

CRn

CRm

op2

01

100

0111

1000

111

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.5 A64 system instructions for address translation

C4.5.3

AT S12E1R, Address Translate Stages 1 and 2 EL1 Read
The AT S12E1R characteristics are:
Purpose
Performs stage 1 and 2 address translations as defined for EL1, with permissions as if reading from
the given virtual address.
This register is part of the Address translation operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

-

-

WO

WO

WO

If EL2 does not exist, or stage 2 translation is disabled, this operation executes as AT S1E1R.
Configurations
There are no configuration notes.
Attributes
AT S12E1R is a 64-bit system operation.
The AT S12E1R input value bit assignments are:

63

0
Virtual address to translate to a physical address

Bits [63:0]
Virtual address to translate to a physical address. The resulting physical address can be read from
the PAR_EL1.
If the address translation instructions are targeting a translation regime that is using AArch32, and
so has a VA of only 32 bits, then VA[63:32] is RES0.

Performing the AT S12E1R operation:
To perform the AT S12E1R operation:
AT S12E1R, 

The operation is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

01

100

0111

1000

100

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-325

C4 The AArch64 System Instruction Class
C4.5 A64 system instructions for address translation

C4.5.4

AT S12E1W, Address Translate Stages 1 and 2 EL1 Write
The AT S12E1W characteristics are:
Purpose
Performs stage 1 and 2 address translations as defined for EL1, with permissions as if writing to the
given virtual address.
This register is part of the Address translation operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

-

-

WO

WO

WO

If EL2 does not exist, or stage 2 translation is disabled, this operation executes as AT S1E1W.
Configurations
There are no configuration notes.
Attributes
AT S12E1W is a 64-bit system operation.
The AT S12E1W input value bit assignments are:

63

0
Virtual address to translate to a physical address

Bits [63:0]
Virtual address to translate to a physical address. The resulting physical address can be read from
the PAR_EL1.
If the address translation instructions are targeting a translation regime that is using AArch32, and
so has a VA of only 32 bits, then VA[63:32] is RES0.

Performing the AT S12E1W operation:
To perform the AT S12E1W operation:
AT S12E1W, 

The operation is encoded as follows:

C4-326

op0

op1

CRn

CRm

op2

01

100

0111

1000

101

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.5 A64 system instructions for address translation

C4.5.5

AT S1E0R, Address Translate Stage 1 EL0 Read
The AT S1E0R characteristics are:
Purpose
Performs stage 1 address translation as defined for EL0, with permissions as if reading from the
given virtual address.
This register is part of the Address translation operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

WO

WO

WO

WO

WO

Configurations
There are no configuration notes.
Attributes
AT S1E0R is a 64-bit system operation.
The AT S1E0R input value bit assignments are:

63

0
Virtual address to translate to a physical address

Bits [63:0]
Virtual address to translate to a physical address. The resulting physical address can be read from
the PAR_EL1.
If the address translation instructions are targeting a translation regime that is using AArch32, and
so has a VA of only 32 bits, then VA[63:32] is RES0.

Performing the AT S1E0R operation:
To perform the AT S1E0R operation:
AT S1E0R, 

The operation is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

01

000

0111

1000

010

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-327

C4 The AArch64 System Instruction Class
C4.5 A64 system instructions for address translation

C4.5.6

AT S1E0W, Address Translate Stage 1 EL0 Write
The AT S1E0W characteristics are:
Purpose
Performs stage 1 address translation as defined for EL0, with permissions as if writing to the given
virtual address.
This register is part of the Address translation operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

WO

WO

WO

WO

WO

Configurations
There are no configuration notes.
Attributes
AT S1E0W is a 64-bit system operation.
The AT S1E0W input value bit assignments are:

63

0
Virtual address to translate to a physical address

Bits [63:0]
Virtual address to translate to a physical address. The resulting physical address can be read from
the PAR_EL1.
If the address translation instructions are targeting a translation regime that is using AArch32, and
so has a VA of only 32 bits, then VA[63:32] is RES0.

Performing the AT S1E0W operation:
To perform the AT S1E0W operation:
AT S1E0W, 

The operation is encoded as follows:

C4-328

op0

op1

CRn

CRm

op2

01

000

0111

1000

011

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.5 A64 system instructions for address translation

C4.5.7

AT S1E1R, Address Translate Stage 1 EL1 Read
The AT S1E1R characteristics are:
Purpose
Performs stage 1 address translation as defined for EL1, with permissions as if reading from the
given virtual address.
This register is part of the Address translation operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

WO

WO

WO

WO

WO

Configurations
There are no configuration notes.
Attributes
AT S1E1R is a 64-bit system operation.
The AT S1E1R input value bit assignments are:

63

0
Virtual address to translate to a physical address

Bits [63:0]
Virtual address to translate to a physical address. The resulting physical address can be read from
the PAR_EL1.
If the address translation instructions are targeting a translation regime that is using AArch32, and
so has a VA of only 32 bits, then VA[63:32] is RES0.

Performing the AT S1E1R operation:
To perform the AT S1E1R operation:
AT S1E1R, 

The operation is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

01

000

0111

1000

000

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-329

C4 The AArch64 System Instruction Class
C4.5 A64 system instructions for address translation

C4.5.8

AT S1E1W, Address Translate Stage 1 EL1 Write
The AT S1E1W characteristics are:
Purpose
Performs stage 1 address translation as defined for EL1, with permissions as if writing to the given
virtual address.
This register is part of the Address translation operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

WO

WO

WO

WO

WO

Configurations
There are no configuration notes.
Attributes
AT S1E1W is a 64-bit system operation.
The AT S1E1W input value bit assignments are:

63

0
Virtual address to translate to a physical address

Bits [63:0]
Virtual address to translate to a physical address. The resulting physical address can be read from
the PAR_EL1.
If the address translation instructions are targeting a translation regime that is using AArch32, and
so has a VA of only 32 bits, then VA[63:32] is RES0.

Performing the AT S1E1W operation:
To perform the AT S1E1W operation:
AT S1E1W, 

The operation is encoded as follows:

C4-330

op0

op1

CRn

CRm

op2

01

000

0111

1000

001

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.5 A64 system instructions for address translation

C4.5.9

AT S1E2R, Address Translate Stage 1 EL2 Read
The AT S1E2R characteristics are:
Purpose
Performs stage 1 address translation as defined for EL2, with permissions as if reading from the
given virtual address.
This register is part of the Address translation operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

-

-

WO

WO

-

Performing this operation from EL3 is UNDEFINED if EL2 does not exist.
Configurations
There are no configuration notes.
Attributes
AT S1E2R is a 64-bit system operation.
The AT S1E2R input value bit assignments are:

63

0
Virtual address to translate to a physical address

Bits [63:0]
Virtual address to translate to a physical address. The resulting physical address can be read from
the PAR_EL1.
If the address translation instructions are targeting a translation regime that is using AArch32, and
so has a VA of only 32 bits, then VA[63:32] is RES0.

Performing the AT S1E2R operation:
To perform the AT S1E2R operation:
AT S1E2R, 

The operation is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

01

100

0111

1000

000

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-331

C4 The AArch64 System Instruction Class
C4.5 A64 system instructions for address translation

C4.5.10

AT S1E2W, Address Translate Stage 1 EL2 Write
The AT S1E2W characteristics are:
Purpose
Performs stage 1 address translation as defined for EL2, with permissions as if writing to the given
virtual address.
This register is part of the Address translation operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

-

-

WO

WO

-

Performing this operation from EL3 is UNDEFINED if EL2 does not exist.
Configurations
There are no configuration notes.
Attributes
AT S1E2W is a 64-bit system operation.
The AT S1E2W input value bit assignments are:

63

0
Virtual address to translate to a physical address

Bits [63:0]
Virtual address to translate to a physical address. The resulting physical address can be read from
the PAR_EL1.
If the address translation instructions are targeting a translation regime that is using AArch32, and
so has a VA of only 32 bits, then VA[63:32] is RES0.

Performing the AT S1E2W operation:
To perform the AT S1E2W operation:
AT S1E2W, 

The operation is encoded as follows:

C4-332

op0

op1

CRn

CRm

op2

01

100

0111

1000

001

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.5 A64 system instructions for address translation

C4.5.11

AT S1E3R, Address Translate Stage 1 EL3 Read
The AT S1E3R characteristics are:
Purpose
Performs stage 1 address translation as defined for EL3, with permissions as if reading from the
given virtual address.
This register is part of the Address translation operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

-

-

-

WO

WO

Configurations
There are no configuration notes.
Attributes
AT S1E3R is a 64-bit system operation.
The AT S1E3R input value bit assignments are:

63

0
Virtual address to translate to a physical address

Bits [63:0]
Virtual address to translate to a physical address. The resulting physical address can be read from
the PAR_EL1.
If the address translation instructions are targeting a translation regime that is using AArch32, and
so has a VA of only 32 bits, then VA[63:32] is RES0.

Performing the AT S1E3R operation:
To perform the AT S1E3R operation:
AT S1E3R, 

The operation is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

01

110

0111

1000

000

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-333

C4 The AArch64 System Instruction Class
C4.5 A64 system instructions for address translation

C4.5.12

AT S1E3W, Address Translate Stage 1 EL3 Write
The AT S1E3W characteristics are:
Purpose
Performs stage 1 address translation as defined for EL3, with permissions as if writing to the given
virtual address.
This register is part of the Address translation operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

-

-

-

WO

WO

Configurations
There are no configuration notes.
Attributes
AT S1E3W is a 64-bit system operation.
The AT S1E3W input value bit assignments are:

63

0
Virtual address to translate to a physical address

Bits [63:0]
Virtual address to translate to a physical address. The resulting physical address can be read from
the PAR_EL1.
If the address translation instructions are targeting a translation regime that is using AArch32, and
so has a VA of only 32 bits, then VA[63:32] is RES0.

Performing the AT S1E3W operation:
To perform the AT S1E3W operation:
AT S1E3W, 

The operation is encoded as follows:

C4-334

op0

op1

CRn

CRm

op2

01

110

0111

1000

001

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

C4.6

A64 system instructions for TLB maintenance
The following sections describe the TLB maintenance instructions in A64.
•
TLBI ALLE1, TLB Invalidate All entries, EL1 on page C4-336
•
TLBI ALLE1IS, TLB Invalidate All entries, EL1, Inner Shareable on page C4-337
•
TLBI ALLE2, TLB Invalidate All entries, EL2 on page C4-338
•
TLBI ALLE2IS, TLB Invalidate All entries, EL2, Inner Shareable on page C4-339
•
TLBI ALLE3, TLB Invalidate All entries, EL3 on page C4-340
•
TLBI ALLE3IS, TLB Invalidate All entries, EL3, Inner Shareable on page C4-341
•
TLBI ASIDE1, TLB Invalidate by ASID, EL1 on page C4-342
•
TLBI ASIDE1IS, TLB Invalidate by ASID, EL1, Inner Shareable on page C4-343
•
TLBI IPAS2E1, TLB Invalidate by Intermediate Physical Address, Stage 2, EL1 on page C4-344
•
TLBI IPAS2E1IS, TLB Invalidate by Intermediate Physical Address, Stage 2, EL1, Inner Shareable on
page C4-345
•
TLBI IPAS2LE1, TLB Invalidate by Intermediate Physical Address, Stage 2, Last level, EL1 on page C4-346
•
TLBI IPAS2LE1IS, TLB Invalidate by Intermediate Physical Address, Stage 2, Last level, EL1, Inner
Shareable on page C4-347
•
TLBI VAAE1, TLB Invalidate by VA, All ASID, EL1 on page C4-348
•
TLBI VAAE1IS, TLB Invalidate by VA, All ASID, EL1, Inner Shareable on page C4-350
•
TLBI VAALE1, TLB Invalidate by VA, All ASID, Last level, EL1 on page C4-352
•
TLBI VAALE1IS, TLB Invalidate by VA, All ASID, EL1, Inner Shareable on page C4-354
•
TLBI VAE1, TLB Invalidate by VA, EL1 on page C4-356
•
TLBI VAE1IS, TLB Invalidate by VA, EL1, Inner Shareable on page C4-358
•
TLBI VAE2, TLB Invalidate by VA, EL2 on page C4-360
•
TLBI VAE2IS, TLB Invalidate by VA, EL2, Inner Shareable on page C4-362
•
TLBI VAE3, TLB Invalidate by VA, EL3 on page C4-364
•
TLBI VAE3IS, TLB Invalidate by VA, EL3, Inner Shareable on page C4-366
•
TLBI VALE1, TLB Invalidate by VA, Last level, EL1 on page C4-368
•
TLBI VALE1IS, TLB Invalidate by VA, Last level, EL1, Inner Shareable on page C4-370
•
TLBI VALE2, TLB Invalidate by VA, Last level, EL2 on page C4-372
•
TLBI VALE2IS, TLB Invalidate by VA, Last level, EL2, Inner Shareable on page C4-374
TLBI VALE3, TLB Invalidate by VA, Last level, EL3 on page C4-376
•
•
TLBI VALE3IS, TLB Invalidate by VA, Last level, EL3, Inner Shareable on page C4-378
•
TLBI VMALLE1, TLB Invalidate by VMID, All entries at stage 1, EL1 on page C4-380
•
TLBI VMALLE1IS, TLB Invalidate by VMID, All entries at stage 1, EL1, Inner Shareable on page C4-381
•
TLBI VMALLS12E1, TLB Invalidate by VMID, All entries at Stage 1 and 2, EL1 on page C4-382
•
TLBI VMALLS12E1IS, TLB Invalidate by VMID, All entries at Stage 1 and 2, EL1, Inner Shareable on
page C4-383

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-335

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

C4.6.1

TLBI ALLE1, TLB Invalidate All entries, EL1
The TLBI ALLE1 characteristics are:
Purpose
Invalidate all EL1&0 regime stage 1 and 2 TLB entries.
This register is part of the TLB maintenance operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

-

-

WO

WO

WO

If EL3 is implemented, the translations that are invalidated are those associated with either the
Secure or Non-secure address space, depending on the value of SCR_EL3.NS.
Configurations
There are no configuration notes.
Attributes
TLBI ALLE1 is a 64-bit system operation.
The TLBI ALLE1 operation ignores the value in the register specified by the instruction used to perform this
operation. Software does not have to write a value to the register before issuing this instruction.

Performing the TLBI ALLE1 operation:
To perform the TLBI ALLE1 operation:
TLBI ALLE1

The operation is encoded as follows:

C4-336

op0

op1

CRn

CRm

op2

01

100

1000

0111

100

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

C4.6.2

TLBI ALLE1IS, TLB Invalidate All entries, EL1, Inner Shareable
The TLBI ALLE1IS characteristics are:
Purpose
Invalidate all EL1&0 regime stage 1 and 2 TLB entries on all PEs in the same Inner Shareable
domain.
This register is part of the TLB maintenance operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

-

-

WO

WO

WO

If EL3 is implemented, the translations that are invalidated are those associated with either the
Secure or Non-secure address space, depending on the value of SCR_EL3.NS.
Configurations
There are no configuration notes.
Attributes
TLBI ALLE1IS is a 64-bit system operation.
The TLBI ALLE1IS operation ignores the value in the register specified by the instruction used to perform this
operation. Software does not have to write a value to the register before issuing this instruction.

Performing the TLBI ALLE1IS operation:
To perform the TLBI ALLE1IS operation:
TLBI ALLE1IS

The operation is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

01

100

1000

0011

100

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-337

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

C4.6.3

TLBI ALLE2, TLB Invalidate All entries, EL2
The TLBI ALLE2 characteristics are:
Purpose
Invalidate all EL2 regime stage 1 TLB entries.
This register is part of the TLB maintenance operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

-

-

WO

WO

-

Performing this operation from EL3 is UNDEFINED if EL2 does not exist.
Configurations
There are no configuration notes.
Attributes
TLBI ALLE2 is a 64-bit system operation.
The TLBI ALLE2 operation ignores the value in the register specified by the instruction used to perform this
operation. Software does not have to write a value to the register before issuing this instruction.

Performing the TLBI ALLE2 operation:
To perform the TLBI ALLE2 operation:
TLBI ALLE2

The operation is encoded as follows:

C4-338

op0

op1

CRn

CRm

op2

01

100

1000

0111

000

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

C4.6.4

TLBI ALLE2IS, TLB Invalidate All entries, EL2, Inner Shareable
The TLBI ALLE2IS characteristics are:
Purpose
Invalidate all EL2 regime stage 1 TLB entries on all PEs in the same Inner Shareable domain.
This register is part of the TLB maintenance operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

-

-

WO

WO

-

Performing this operation from EL3 is UNDEFINED if EL2 does not exist.
Configurations
There are no configuration notes.
Attributes
TLBI ALLE2IS is a 64-bit system operation.
The TLBI ALLE2IS operation ignores the value in the register specified by the instruction used to perform this
operation. Software does not have to write a value to the register before issuing this instruction.

Performing the TLBI ALLE2IS operation:
To perform the TLBI ALLE2IS operation:
TLBI ALLE2IS

The operation is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

01

100

1000

0011

000

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-339

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

C4.6.5

TLBI ALLE3, TLB Invalidate All entries, EL3
The TLBI ALLE3 characteristics are:
Purpose
Invalidate all EL3 regime stage 1 TLB entries.
This register is part of the TLB maintenance operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

-

-

-

WO

WO

Configurations
There are no configuration notes.
Attributes
TLBI ALLE3 is a 64-bit system operation.
The TLBI ALLE3 operation ignores the value in the register specified by the instruction used to perform this
operation. Software does not have to write a value to the register before issuing this instruction.

Performing the TLBI ALLE3 operation:
To perform the TLBI ALLE3 operation:
TLBI ALLE3

The operation is encoded as follows:

C4-340

op0

op1

CRn

CRm

op2

01

110

1000

0111

000

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

C4.6.6

TLBI ALLE3IS, TLB Invalidate All entries, EL3, Inner Shareable
The TLBI ALLE3IS characteristics are:
Purpose
Invalidate all EL3 regime stage 1 TLB entries on all PEs in the same Inner Shareable domain.
This register is part of the TLB maintenance operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

-

-

-

WO

WO

Configurations
There are no configuration notes.
Attributes
TLBI ALLE3IS is a 64-bit system operation.
The TLBI ALLE3IS operation ignores the value in the register specified by the instruction used to perform this
operation. Software does not have to write a value to the register before issuing this instruction.

Performing the TLBI ALLE3IS operation:
To perform the TLBI ALLE3IS operation:
TLBI ALLE3IS

The operation is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

01

110

1000

0011

000

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-341

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

C4.6.7

TLBI ASIDE1, TLB Invalidate by ASID, EL1
The TLBI ASIDE1 characteristics are:
Purpose
Invalidate EL1&0 regime stage 1 TLB entries for the given ASID and the current VMID.
This register is part of the TLB maintenance operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

WO

WO

WO

WO

WO

Configurations
There are no configuration notes.
Attributes
TLBI ASIDE1 is a 64-bit system operation.
The TLBI ASIDE1 input value bit assignments are:

63

48 47

0

ASID

RES0

ASID, bits [63:48]
ASID value to match. Any appropriate TLB entries that match the ASID values will be affected by
this operation.
If the implementation supports 16 bits of ASID, but only 8 bits are being used in the context being
invalidated, the upper bits are considered RES0 and must be written to 0 by software performing the
TLB maintenance.
Bits [47:0]
Reserved, RES0.

Performing the TLBI ASIDE1 operation:
To perform the TLBI ASIDE1 operation:
TLBI ASIDE1, 

The operation is encoded as follows:

C4-342

op0

op1

CRn

CRm

op2

01

000

1000

0111

010

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

C4.6.8

TLBI ASIDE1IS, TLB Invalidate by ASID, EL1, Inner Shareable
The TLBI ASIDE1IS characteristics are:
Purpose
Invalidate EL1&0 regime stage 1 TLB entries for the given ASID and the current VMID on all PEs
in the same Inner Shareable domain.
This register is part of the TLB maintenance operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

WO

WO

WO

WO

WO

Configurations
There are no configuration notes.
Attributes
TLBI ASIDE1IS is a 64-bit system operation.
The TLBI ASIDE1IS input value bit assignments are:

63

48 47

0

ASID

RES0

ASID, bits [63:48]
ASID value to match. Any appropriate TLB entries that match the ASID values will be affected by
this operation.
If the implementation supports 16 bits of ASID, but only 8 bits are being used in the context being
invalidated, the upper bits are considered RES0 and must be written to 0 by software performing the
TLB maintenance.
Bits [47:0]
Reserved, RES0.

Performing the TLBI ASIDE1IS operation:
To perform the TLBI ASIDE1IS operation:
TLBI ASIDE1IS, 

The operation is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

01

000

1000

0011

010

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-343

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

C4.6.9

TLBI IPAS2E1, TLB Invalidate by Intermediate Physical Address, Stage 2, EL1
The TLBI IPAS2E1 characteristics are:
Purpose
Invalidate EL1&0 regime stage 2 TLB entries for the given IPA and the current VMID.
This register is part of:
•
the TLB maintenance operations functional group
•
the Virtualization registers functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

-

-

WO

WO

WO

If SCR_EL3.NS==0, or EL2 is not implemented, this instruction is a NOP.
This instruction must apply to structures that contain only stage 2 translation information, but does
not need to apply to structures that contain combined stage 1 and stage 2 translation information.
Configurations
There are no configuration notes.
Attributes
TLBI IPAS2E1 is a 64-bit system operation.
The TLBI IPAS2E1 input value bit assignments are:

63

36 35

0

RES0

IPA[47:12]

Bits [63:36]
Reserved, RES0.
IPA[47:12], bits [35:0]
Bits[47:12] of the intermediate physical address to match.

Performing the TLBI IPAS2E1 operation:
To perform the TLBI IPAS2E1 operation:
TLBI IPAS2E1, 

The operation is encoded as follows:

C4-344

op0

op1

CRn

CRm

op2

01

100

1000

0100

001

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

C4.6.10
TLBI IPAS2E1IS, TLB Invalidate by Intermediate Physical Address, Stage 2, EL1, Inner
Shareable
The TLBI IPAS2E1IS characteristics are:
Purpose
Invalidate EL1&0 regime stage 2 TLB entries for the given IPA and the current VMID on all PEs
in the same Inner Shareable domain.
This register is part of:
•
the TLB maintenance operations functional group
•
the Virtualization registers functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

-

-

WO

WO

WO

If SCR_EL3.NS==0, or EL2 is not implemented, this instruction is a NOP.
This instruction must apply to structures that contain only stage 2 translation information, but does
not need to apply to structures that contain combined stage 1 and stage 2 translation information.
Configurations
There are no configuration notes.
Attributes
TLBI IPAS2E1IS is a 64-bit system operation.
The TLBI IPAS2E1IS input value bit assignments are:

63

36 35

0

RES0

IPA[47:12]

Bits [63:36]
Reserved, RES0.
IPA[47:12], bits [35:0]
Bits[47:12] of the intermediate physical address to match.

Performing the TLBI IPAS2E1IS operation:
To perform the TLBI IPAS2E1IS operation:
TLBI IPAS2E1IS, 

The operation is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

01

100

1000

0000

001

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-345

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

C4.6.11

TLBI IPAS2LE1, TLB Invalidate by Intermediate Physical Address, Stage 2, Last level, EL1
The TLBI IPAS2LE1 characteristics are:
Purpose
Invalidate EL1&0 regime stage 2 TLB entries for the last level of translation, the given IPA, and the
current VMID.
This register is part of:
•
the TLB maintenance operations functional group
•
the Virtualization registers functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

-

-

WO

WO

WO

If SCR_EL3.NS==0, or EL2 is not implemented, this instruction is a NOP.
This instruction must apply to structures that contain only stage 2 translation information, but does
not need to apply to structures that contain combined stage 1 and stage 2 translation information.
Configurations
There are no configuration notes.
Attributes
TLBI IPAS2LE1 is a 64-bit system operation.
The TLBI IPAS2LE1 input value bit assignments are:

63

36 35

0

RES0

IPA[47:12]

Bits [63:36]
Reserved, RES0.
IPA[47:12], bits [35:0]
Bits[47:12] of the intermediate physical address to match.

Performing the TLBI IPAS2LE1 operation:
To perform the TLBI IPAS2LE1 operation:
TLBI IPAS2LE1, 

The operation is encoded as follows:

C4-346

op0

op1

CRn

CRm

op2

01

100

1000

0100

101

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

C4.6.12
TLBI IPAS2LE1IS, TLB Invalidate by Intermediate Physical Address, Stage 2, Last level, EL1,
Inner Shareable
The TLBI IPAS2LE1IS characteristics are:
Purpose
Invalidate EL1&0 regime stage 2 TLB entries for the last level of translation, the given IPA, and the
current VMID, on all PEs in the same Inner Shareable domain.
This register is part of:
•
the TLB maintenance operations functional group
•
the Virtualization registers functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

-

-

WO

WO

WO

If SCR_EL3.NS==0, or EL2 is not implemented, this instruction is a NOP.
This instruction must apply to structures that contain only stage 2 translation information, but does
not need to apply to structures that contain combined stage 1 and stage 2 translation information.
Configurations
There are no configuration notes.
Attributes
TLBI IPAS2LE1IS is a 64-bit system operation.
The TLBI IPAS2LE1IS input value bit assignments are:

63

36 35

0

RES0

IPA[47:12]

Bits [63:36]
Reserved, RES0.
IPA[47:12], bits [35:0]
Bits[47:12] of the intermediate physical address to match.

Performing the TLBI IPAS2LE1IS operation:
To perform the TLBI IPAS2LE1IS operation:
TLBI IPAS2LE1IS, 

The operation is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

01

100

1000

0000

101

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-347

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

C4.6.13

TLBI VAAE1, TLB Invalidate by VA, All ASID, EL1
The TLBI VAAE1 characteristics are:
Purpose
Invalidate EL1&0 regime stage 1 TLB entries for the given VA and the current VMID.
This register is part of the TLB maintenance operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

WO

WO

WO

WO

WO

If EL3 is implemented, the translations that are invalidated are those associated with either the
Secure or Non-secure address space, depending on the value of SCR_EL3.NS.
Configurations
There are no configuration notes.
Attributes
TLBI VAAE1 is a 64-bit system operation.
The TLBI VAAE1 input value bit assignments are:

63

44 43
RES0

0
VA[55:12]

Bits [63:44]
Reserved, RES0.
VA[55:12], bits [43:0]
Bits[55:12] of the virtual address to match. Any appropriate TLB entries that match the VA will be
affected by this operation, regardless of the ASID.
If the TLB maintenance instructions are targeting a translation regime that is using AArch32, and
so has a VA of only 32 bits, then the software must treat bits[55:32] as RES0.
The treatment of the low-order bits of this field depends on the translation granule size, as follows:
•

Where a 4KB translation granule is being used, all bits are valid and used for the invalidation.

•

Where a 16KB translation granule is being used, bits [1:0] of this field are RES0 and ignored
when the instruction is executed, because VA[13:12] have no effect on the operation of the
instruction.

•

Where a 64KB translation granule is being used, bits [3:0] of this field are RES0 and ignored
when the instruction is executed, because VA[15:12] have no effect on the operation of the
instruction.

Performing the TLBI VAAE1 operation:
To perform the TLBI VAAE1 operation:
TLBI VAAE1, 

C4-348

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

The operation is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

01

000

1000

0111

011

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-349

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

C4.6.14

TLBI VAAE1IS, TLB Invalidate by VA, All ASID, EL1, Inner Shareable
The TLBI VAAE1IS characteristics are:
Purpose
Invalidate EL1&0 regime stage 1 TLB entries for the given VA and the current VMID on all PEs in
the same Inner Shareable domain.
This register is part of the TLB maintenance operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

WO

WO

WO

WO

WO

If EL3 is implemented, the translations that are invalidated are those associated with either the
Secure or Non-secure address space, depending on the value of SCR_EL3.NS.
Configurations
There are no configuration notes.
Attributes
TLBI VAAE1IS is a 64-bit system operation.
The TLBI VAAE1IS input value bit assignments are:

63

44 43
RES0

0
VA[55:12]

Bits [63:44]
Reserved, RES0.
VA[55:12], bits [43:0]
Bits[55:12] of the virtual address to match. Any appropriate TLB entries that match the VA will be
affected by this operation, regardless of the ASID.
If the TLB maintenance instructions are targeting a translation regime that is using AArch32, and
so has a VA of only 32 bits, then the software must treat bits[55:32] as RES0.
The treatment of the low-order bits of this field depends on the translation granule size, as follows:
•

Where a 4KB translation granule is being used, all bits are valid and used for the invalidation.

•

Where a 16KB translation granule is being used, bits [1:0] of this field are RES0 and ignored
when the instruction is executed, because VA[13:12] have no effect on the operation of the
instruction.

•

Where a 64KB translation granule is being used, bits [3:0] of this field are RES0 and ignored
when the instruction is executed, because VA[15:12] have no effect on the operation of the
instruction.

Performing the TLBI VAAE1IS operation:
To perform the TLBI VAAE1IS operation:
TLBI VAAE1IS, 

C4-350

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

The operation is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

01

000

1000

0011

011

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-351

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

C4.6.15

TLBI VAALE1, TLB Invalidate by VA, All ASID, Last level, EL1
The TLBI VAALE1 characteristics are:
Purpose
Invalidate EL1&0 regime stage 1 TLB entries for the last level of translation table walk, the given
VA, and the current VMID.
This register is part of the TLB maintenance operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

WO

WO

WO

WO

WO

If EL3 is implemented, the translations that are invalidated are those associated with either the
Secure or Non-secure address space, depending on the value of SCR_EL3.NS.
Configurations
There are no configuration notes.
Attributes
TLBI VAALE1 is a 64-bit system operation.
The TLBI VAALE1 input value bit assignments are:

63

44 43
RES0

0
VA[55:12]

Bits [63:44]
Reserved, RES0.
VA[55:12], bits [43:0]
Bits[55:12] of the virtual address to match. Any appropriate TLB entries that match the VA will be
affected by this operation, regardless of the ASID.
If the TLB maintenance instructions are targeting a translation regime that is using AArch32, and
so has a VA of only 32 bits, then the software must treat bits[55:32] as RES0.
The treatment of the low-order bits of this field depends on the translation granule size, as follows:
•

Where a 4KB translation granule is being used, all bits are valid and used for the invalidation.

•

Where a 16KB translation granule is being used, bits [1:0] of this field are RES0 and ignored
when the instruction is executed, because VA[13:12] have no effect on the operation of the
instruction.

•

Where a 64KB translation granule is being used, bits [3:0] of this field are RES0 and ignored
when the instruction is executed, because VA[15:12] have no effect on the operation of the
instruction.

Performing the TLBI VAALE1 operation:
To perform the TLBI VAALE1 operation:
TLBI VAALE1, 

C4-352

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

The operation is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

01

000

1000

0111

111

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-353

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

C4.6.16

TLBI VAALE1IS, TLB Invalidate by VA, All ASID, EL1, Inner Shareable
The TLBI VAALE1IS characteristics are:
Purpose
Invalidate EL1&0 regime stage 1 TLB entries for the last level of translation table walk, the given
VA, and the current VMID, on all PEs in the same Inner Shareable domain.
This register is part of the TLB maintenance operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

WO

WO

WO

WO

WO

If EL3 is implemented, the translations that are invalidated are those associated with either the
Secure or Non-secure address space, depending on the value of SCR_EL3.NS.
Configurations
There are no configuration notes.
Attributes
TLBI VAALE1IS is a 64-bit system operation.
The TLBI VAALE1IS input value bit assignments are:

63

44 43
RES0

0
VA[55:12]

Bits [63:44]
Reserved, RES0.
VA[55:12], bits [43:0]
Bits[55:12] of the virtual address to match. Any appropriate TLB entries that match the VA will be
affected by this operation, regardless of the ASID.
If the TLB maintenance instructions are targeting a translation regime that is using AArch32, and
so has a VA of only 32 bits, then the software must treat bits[55:32] as RES0.
The treatment of the low-order bits of this field depends on the translation granule size, as follows:
•

Where a 4KB translation granule is being used, all bits are valid and used for the invalidation.

•

Where a 16KB translation granule is being used, bits [1:0] of this field are RES0 and ignored
when the instruction is executed, because VA[13:12] have no effect on the operation of the
instruction.

•

Where a 64KB translation granule is being used, bits [3:0] of this field are RES0 and ignored
when the instruction is executed, because VA[15:12] have no effect on the operation of the
instruction.

Performing the TLBI VAALE1IS operation:
To perform the TLBI VAALE1IS operation:
TLBI VAALE1IS, 

C4-354

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

The operation is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

01

000

1000

0011

111

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-355

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

C4.6.17

TLBI VAE1, TLB Invalidate by VA, EL1
The TLBI VAE1 characteristics are:
Purpose
Invalidate EL1&0 regime stage 1 TLB entries for the given VA and ASID and the current VMID.
This register is part of the TLB maintenance operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

WO

WO

WO

WO

WO

If EL3 is implemented, the translations that are invalidated are those associated with either the
Secure or Non-secure address space, depending on the value of SCR_EL3.NS.
Configurations
There are no configuration notes.
Attributes
TLBI VAE1 is a 64-bit system operation.
The TLBI VAE1 input value bit assignments are:

63

48 47
ASID

44 43
RES0

0
VA[55:12]

ASID, bits [63:48]
ASID value to match. Any TLB entries that match the ASID value and VA value will be affected
by this operation.
Global TLB entries that match the VA value will be affected by this operation, regardless of the
value of the ASID field.
If the implementation supports 16 bits of ASID, but only 8 bits are being used in the context being
invalidated, the upper bits are considered RES0 and must be written to 0 by software performing the
TLB maintenance.
Bits [47:44]
Reserved, RES0.
VA[55:12], bits [43:0]
Bits[55:12] of the virtual address to match. Any appropriate TLB entries that match the ASID value
(if appropriate) and VA will be affected by this operation.
If the TLB maintenance instructions are targeting a translation regime that is using AArch32, and
so has a VA of only 32 bits, then the software must treat bits[55:32] as RES0.
The treatment of the low-order bits of this field depends on the translation granule size, as follows:

C4-356

•

Where a 4KB translation granule is being used, all bits are valid and used for the invalidation.

•

Where a 16KB translation granule is being used, bits [1:0] of this field are RES0 and ignored
when the instruction is executed, because VA[13:12] have no effect on the operation of the
instruction.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

•

Where a 64KB translation granule is being used, bits [3:0] of this field are RES0 and ignored
when the instruction is executed, because VA[15:12] have no effect on the operation of the
instruction.

Performing the TLBI VAE1 operation:
To perform the TLBI VAE1 operation:
TLBI VAE1, 

The operation is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

01

000

1000

0111

001

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-357

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

C4.6.18

TLBI VAE1IS, TLB Invalidate by VA, EL1, Inner Shareable
The TLBI VAE1IS characteristics are:
Purpose
Invalidate EL1&0 regime stage 1 TLB entries for the given VA and ASID, and the current VMID,
on all PEs in the same Inner Shareable domain.
This register is part of the TLB maintenance operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

WO

WO

WO

WO

WO

If EL3 is implemented, the translations that are invalidated are those associated with either the
Secure or Non-secure address space, depending on the value of SCR_EL3.NS.
Configurations
There are no configuration notes.
Attributes
TLBI VAE1IS is a 64-bit system operation.
The TLBI VAE1IS input value bit assignments are:

63

48 47
ASID

44 43
RES0

0
VA[55:12]

ASID, bits [63:48]
ASID value to match. Any TLB entries that match the ASID value and VA value will be affected
by this operation.
Global TLB entries that match the VA value will be affected by this operation, regardless of the
value of the ASID field.
If the implementation supports 16 bits of ASID, but only 8 bits are being used in the context being
invalidated, the upper bits are considered RES0 and must be written to 0 by software performing the
TLB maintenance.
Bits [47:44]
Reserved, RES0.
VA[55:12], bits [43:0]
Bits[55:12] of the virtual address to match. Any appropriate TLB entries that match the ASID value
(if appropriate) and VA will be affected by this operation.
If the TLB maintenance instructions are targeting a translation regime that is using AArch32, and
so has a VA of only 32 bits, then the software must treat bits[55:32] as RES0.
The treatment of the low-order bits of this field depends on the translation granule size, as follows:

C4-358

•

Where a 4KB translation granule is being used, all bits are valid and used for the invalidation.

•

Where a 16KB translation granule is being used, bits [1:0] of this field are RES0 and ignored
when the instruction is executed, because VA[13:12] have no effect on the operation of the
instruction.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

•

Where a 64KB translation granule is being used, bits [3:0] of this field are RES0 and ignored
when the instruction is executed, because VA[15:12] have no effect on the operation of the
instruction.

Performing the TLBI VAE1IS operation:
To perform the TLBI VAE1IS operation:
TLBI VAE1IS, 

The operation is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

01

000

1000

0011

001

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-359

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

C4.6.19

TLBI VAE2, TLB Invalidate by VA, EL2
The TLBI VAE2 characteristics are:
Purpose
Invalidate EL2 regime stage 1 TLB entries for the given VA.
This register is part of the TLB maintenance operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

-

-

WO

WO

-

Performing this operation from EL3 is UNDEFINED if EL2 does not exist.
Configurations
There are no configuration notes.
Attributes
TLBI VAE2 is a 64-bit system operation.
The TLBI VAE2 input value bit assignments are:

63

44 43
RES0

0
VA[55:12]

Bits [63:44]
Reserved, RES0.
VA[55:12], bits [43:0]
Bits[55:12] of the virtual address to match. Any appropriate TLB entries that match the ASID value
(if appropriate) and VA will be affected by this operation.
If the TLB maintenance instructions are targeting a translation regime that is using AArch32, and
so has a VA of only 32 bits, then the software must treat bits[55:32] as RES0.
The treatment of the low-order bits of this field depends on the translation granule size, as follows:
•

Where a 4KB translation granule is being used, all bits are valid and used for the invalidation.

•

Where a 16KB translation granule is being used, bits [1:0] of this field are RES0 and ignored
when the instruction is executed, because VA[13:12] have no effect on the operation of the
instruction.

•

Where a 64KB translation granule is being used, bits [3:0] of this field are RES0 and ignored
when the instruction is executed, because VA[15:12] have no effect on the operation of the
instruction.

Performing the TLBI VAE2 operation:
To perform the TLBI VAE2 operation:
TLBI VAE2, 

C4-360

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

The operation is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

01

100

1000

0111

001

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-361

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

C4.6.20

TLBI VAE2IS, TLB Invalidate by VA, EL2, Inner Shareable
The TLBI VAE2IS characteristics are:
Purpose
Invalidate EL2 regime stage 1 TLB entries for the given VA on all PEs in the same Inner Shareable
domain.
This register is part of the TLB maintenance operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

-

-

WO

WO

-

Performing this operation from EL3 is UNDEFINED if EL2 does not exist.
Configurations
There are no configuration notes.
Attributes
TLBI VAE2IS is a 64-bit system operation.
The TLBI VAE2IS input value bit assignments are:

63

44 43
RES0

0
VA[55:12]

Bits [63:44]
Reserved, RES0.
VA[55:12], bits [43:0]
Bits[55:12] of the virtual address to match. Any appropriate TLB entries that match the ASID value
(if appropriate) and VA will be affected by this operation.
If the TLB maintenance instructions are targeting a translation regime that is using AArch32, and
so has a VA of only 32 bits, then the software must treat bits[55:32] as RES0.
The treatment of the low-order bits of this field depends on the translation granule size, as follows:
•

Where a 4KB translation granule is being used, all bits are valid and used for the invalidation.

•

Where a 16KB translation granule is being used, bits [1:0] of this field are RES0 and ignored
when the instruction is executed, because VA[13:12] have no effect on the operation of the
instruction.

•

Where a 64KB translation granule is being used, bits [3:0] of this field are RES0 and ignored
when the instruction is executed, because VA[15:12] have no effect on the operation of the
instruction.

Performing the TLBI VAE2IS operation:
To perform the TLBI VAE2IS operation:
TLBI VAE2IS, 

C4-362

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

The operation is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

01

100

1000

0011

001

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-363

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

C4.6.21

TLBI VAE3, TLB Invalidate by VA, EL3
The TLBI VAE3 characteristics are:
Purpose
Invalidate EL3 regime stage 1 TLB entries for the given VA.
This register is part of the TLB maintenance operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

-

-

-

WO

WO

Configurations
There are no configuration notes.
Attributes
TLBI VAE3 is a 64-bit system operation.
The TLBI VAE3 input value bit assignments are:

63

44 43
RES0

0
VA[55:12]

Bits [63:44]
Reserved, RES0.
VA[55:12], bits [43:0]
Bits[55:12] of the virtual address to match. Any appropriate TLB entries that match the ASID value
(if appropriate) and VA will be affected by this operation.
If the TLB maintenance instructions are targeting a translation regime that is using AArch32, and
so has a VA of only 32 bits, then the software must treat bits[55:32] as RES0.
The treatment of the low-order bits of this field depends on the translation granule size, as follows:
•

Where a 4KB translation granule is being used, all bits are valid and used for the invalidation.

•

Where a 16KB translation granule is being used, bits [1:0] of this field are RES0 and ignored
when the instruction is executed, because VA[13:12] have no effect on the operation of the
instruction.

•

Where a 64KB translation granule is being used, bits [3:0] of this field are RES0 and ignored
when the instruction is executed, because VA[15:12] have no effect on the operation of the
instruction.

Performing the TLBI VAE3 operation:
To perform the TLBI VAE3 operation:
TLBI VAE3, 

C4-364

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

The operation is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

01

110

1000

0111

001

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-365

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

C4.6.22

TLBI VAE3IS, TLB Invalidate by VA, EL3, Inner Shareable
The TLBI VAE3IS characteristics are:
Purpose
Invalidate EL3 regime stage 1 TLB entries for the given VA on all PEs in the same Inner Shareable
domain.
This register is part of the TLB maintenance operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

-

-

-

WO

WO

Configurations
There are no configuration notes.
Attributes
TLBI VAE3IS is a 64-bit system operation.
The TLBI VAE3IS input value bit assignments are:

63

44 43
RES0

0
VA[55:12]

Bits [63:44]
Reserved, RES0.
VA[55:12], bits [43:0]
Bits[55:12] of the virtual address to match. Any appropriate TLB entries that match the ASID value
(if appropriate) and VA will be affected by this operation.
If the TLB maintenance instructions are targeting a translation regime that is using AArch32, and
so has a VA of only 32 bits, then the software must treat bits[55:32] as RES0.
The treatment of the low-order bits of this field depends on the translation granule size, as follows:
•

Where a 4KB translation granule is being used, all bits are valid and used for the invalidation.

•

Where a 16KB translation granule is being used, bits [1:0] of this field are RES0 and ignored
when the instruction is executed, because VA[13:12] have no effect on the operation of the
instruction.

•

Where a 64KB translation granule is being used, bits [3:0] of this field are RES0 and ignored
when the instruction is executed, because VA[15:12] have no effect on the operation of the
instruction.

Performing the TLBI VAE3IS operation:
To perform the TLBI VAE3IS operation:
TLBI VAE3IS, 

C4-366

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

The operation is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

01

110

1000

0011

001

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-367

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

C4.6.23

TLBI VALE1, TLB Invalidate by VA, Last level, EL1
The TLBI VALE1 characteristics are:
Purpose
Invalidate EL1&0 regime stage 1 TLB entries for the last level of translation table walk, the given
VA and ASID, and the current VMID.
This register is part of the TLB maintenance operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

WO

WO

WO

WO

WO

If EL3 is implemented, the translations that are invalidated are those associated with either the
Secure or Non-secure address space, depending on the value of SCR_EL3.NS.
Configurations
There are no configuration notes.
Attributes
TLBI VALE1 is a 64-bit system operation.
The TLBI VALE1 input value bit assignments are:

63

48 47
ASID

44 43
RES0

0
VA[55:12]

ASID, bits [63:48]
ASID value to match. Any TLB entries that match the ASID value and VA value will be affected
by this operation.
Global TLB entries that match the VA value will be affected by this operation, regardless of the
value of the ASID field.
If the implementation supports 16 bits of ASID, but only 8 bits are being used in the context being
invalidated, the upper bits are considered RES0 and must be written to 0 by software performing the
TLB maintenance.
Bits [47:44]
Reserved, RES0.
VA[55:12], bits [43:0]
Bits[55:12] of the virtual address to match. Any appropriate TLB entries that match the ASID value
(if appropriate) and VA will be affected by this operation.
If the TLB maintenance instructions are targeting a translation regime that is using AArch32, and
so has a VA of only 32 bits, then the software must treat bits[55:32] as RES0.
The treatment of the low-order bits of this field depends on the translation granule size, as follows:

C4-368

•

Where a 4KB translation granule is being used, all bits are valid and used for the invalidation.

•

Where a 16KB translation granule is being used, bits [1:0] of this field are RES0 and ignored
when the instruction is executed, because VA[13:12] have no effect on the operation of the
instruction.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

•

Where a 64KB translation granule is being used, bits [3:0] of this field are RES0 and ignored
when the instruction is executed, because VA[15:12] have no effect on the operation of the
instruction.

Performing the TLBI VALE1 operation:
To perform the TLBI VALE1 operation:
TLBI VALE1, 

The operation is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

01

000

1000

0111

101

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-369

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

C4.6.24

TLBI VALE1IS, TLB Invalidate by VA, Last level, EL1, Inner Shareable
The TLBI VALE1IS characteristics are:
Purpose
Invalidate EL1&0 regime stage 1 TLB entries for the last level of translation table walk, the given
VA and ASID, and the current VMID, on all PEs in the same Inner Shareable domain.
This register is part of the TLB maintenance operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

WO

WO

WO

WO

WO

If EL3 is implemented, the translations that are invalidated are those associated with either the
Secure or Non-secure address space, depending on the value of SCR_EL3.NS.
Configurations
There are no configuration notes.
Attributes
TLBI VALE1IS is a 64-bit system operation.
The TLBI VALE1IS input value bit assignments are:

63

48 47
ASID

44 43
RES0

0
VA[55:12]

ASID, bits [63:48]
ASID value to match. Any TLB entries that match the ASID value and VA value will be affected
by this operation.
Global TLB entries that match the VA value will be affected by this operation, regardless of the
value of the ASID field.
If the implementation supports 16 bits of ASID, but only 8 bits are being used in the context being
invalidated, the upper bits are considered RES0 and must be written to 0 by software performing the
TLB maintenance.
Bits [47:44]
Reserved, RES0.
VA[55:12], bits [43:0]
Bits[55:12] of the virtual address to match. Any appropriate TLB entries that match the ASID value
(if appropriate) and VA will be affected by this operation.
If the TLB maintenance instructions are targeting a translation regime that is using AArch32, and
so has a VA of only 32 bits, then the software must treat bits[55:32] as RES0.
The treatment of the low-order bits of this field depends on the translation granule size, as follows:

C4-370

•

Where a 4KB translation granule is being used, all bits are valid and used for the invalidation.

•

Where a 16KB translation granule is being used, bits [1:0] of this field are RES0 and ignored
when the instruction is executed, because VA[13:12] have no effect on the operation of the
instruction.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

•

Where a 64KB translation granule is being used, bits [3:0] of this field are RES0 and ignored
when the instruction is executed, because VA[15:12] have no effect on the operation of the
instruction.

Performing the TLBI VALE1IS operation:
To perform the TLBI VALE1IS operation:
TLBI VALE1IS, 

The operation is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

01

000

1000

0011

101

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-371

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

C4.6.25

TLBI VALE2, TLB Invalidate by VA, Last level, EL2
The TLBI VALE2 characteristics are:
Purpose
Invalidate EL2 regime stage 1 TLB entries for the last level of translation table walk and the given
VA.
This register is part of the TLB maintenance operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

-

-

WO

WO

-

Performing this operation from EL3 is UNDEFINED if EL2 does not exist.
Configurations
There are no configuration notes.
Attributes
TLBI VALE2 is a 64-bit system operation.
The TLBI VALE2 input value bit assignments are:

63

44 43
RES0

0
VA[55:12]

Bits [63:44]
Reserved, RES0.
VA[55:12], bits [43:0]
Bits[55:12] of the virtual address to match. Any appropriate TLB entries that match the ASID value
(if appropriate) and VA will be affected by this operation.
If the TLB maintenance instructions are targeting a translation regime that is using AArch32, and
so has a VA of only 32 bits, then the software must treat bits[55:32] as RES0.
The treatment of the low-order bits of this field depends on the translation granule size, as follows:
•

Where a 4KB translation granule is being used, all bits are valid and used for the invalidation.

•

Where a 16KB translation granule is being used, bits [1:0] of this field are RES0 and ignored
when the instruction is executed, because VA[13:12] have no effect on the operation of the
instruction.

•

Where a 64KB translation granule is being used, bits [3:0] of this field are RES0 and ignored
when the instruction is executed, because VA[15:12] have no effect on the operation of the
instruction.

Performing the TLBI VALE2 operation:
To perform the TLBI VALE2 operation:
TLBI VALE2, 

C4-372

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

The operation is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

01

100

1000

0111

101

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-373

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

C4.6.26

TLBI VALE2IS, TLB Invalidate by VA, Last level, EL2, Inner Shareable
The TLBI VALE2IS characteristics are:
Purpose
Invalidate EL2 regime stage 1 TLB entries for the last level of translation table walk and the given
VA on all PEs in the same Inner Shareable domain.
This register is part of the TLB maintenance operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

-

-

WO

WO

-

Performing this operation from EL3 is UNDEFINED if EL2 does not exist.
Configurations
There are no configuration notes.
Attributes
TLBI VALE2IS is a 64-bit system operation.
The TLBI VALE2IS input value bit assignments are:

63

44 43
RES0

0
VA[55:12]

Bits [63:44]
Reserved, RES0.
VA[55:12], bits [43:0]
Bits[55:12] of the virtual address to match. Any appropriate TLB entries that match the ASID value
(if appropriate) and VA will be affected by this operation.
If the TLB maintenance instructions are targeting a translation regime that is using AArch32, and
so has a VA of only 32 bits, then the software must treat bits[55:32] as RES0.
The treatment of the low-order bits of this field depends on the translation granule size, as follows:
•

Where a 4KB translation granule is being used, all bits are valid and used for the invalidation.

•

Where a 16KB translation granule is being used, bits [1:0] of this field are RES0 and ignored
when the instruction is executed, because VA[13:12] have no effect on the operation of the
instruction.

•

Where a 64KB translation granule is being used, bits [3:0] of this field are RES0 and ignored
when the instruction is executed, because VA[15:12] have no effect on the operation of the
instruction.

Performing the TLBI VALE2IS operation:
To perform the TLBI VALE2IS operation:
TLBI VALE2IS, 

C4-374

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

The operation is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

01

100

1000

0011

101

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-375

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

C4.6.27

TLBI VALE3, TLB Invalidate by VA, Last level, EL3
The TLBI VALE3 characteristics are:
Purpose
Invalidate EL3 regime stage 1 TLB entries for the last level of translation table walk and the given
VA.
This register is part of the TLB maintenance operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

-

-

-

WO

WO

Configurations
There are no configuration notes.
Attributes
TLBI VALE3 is a 64-bit system operation.
The TLBI VALE3 input value bit assignments are:

63

44 43
RES0

0
VA[55:12]

Bits [63:44]
Reserved, RES0.
VA[55:12], bits [43:0]
Bits[55:12] of the virtual address to match. Any appropriate TLB entries that match the ASID value
(if appropriate) and VA will be affected by this operation.
If the TLB maintenance instructions are targeting a translation regime that is using AArch32, and
so has a VA of only 32 bits, then the software must treat bits[55:32] as RES0.
The treatment of the low-order bits of this field depends on the translation granule size, as follows:
•

Where a 4KB translation granule is being used, all bits are valid and used for the invalidation.

•

Where a 16KB translation granule is being used, bits [1:0] of this field are RES0 and ignored
when the instruction is executed, because VA[13:12] have no effect on the operation of the
instruction.

•

Where a 64KB translation granule is being used, bits [3:0] of this field are RES0 and ignored
when the instruction is executed, because VA[15:12] have no effect on the operation of the
instruction.

Performing the TLBI VALE3 operation:
To perform the TLBI VALE3 operation:
TLBI VALE3, 

C4-376

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

The operation is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

01

110

1000

0111

101

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-377

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

C4.6.28

TLBI VALE3IS, TLB Invalidate by VA, Last level, EL3, Inner Shareable
The TLBI VALE3IS characteristics are:
Purpose
Invalidate EL3 regime stage 1 TLB entries for the last level of translation table walk and the given
VA on all PEs in the same Inner Shareable domain.
This register is part of the TLB maintenance operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

-

-

-

WO

WO

Configurations
There are no configuration notes.
Attributes
TLBI VALE3IS is a 64-bit system operation.
The TLBI VALE3IS input value bit assignments are:

63

44 43
RES0

0
VA[55:12]

Bits [63:44]
Reserved, RES0.
VA[55:12], bits [43:0]
Bits[55:12] of the virtual address to match. Any appropriate TLB entries that match the ASID value
(if appropriate) and VA will be affected by this operation.
If the TLB maintenance instructions are targeting a translation regime that is using AArch32, and
so has a VA of only 32 bits, then the software must treat bits[55:32] as RES0.
The treatment of the low-order bits of this field depends on the translation granule size, as follows:
•

Where a 4KB translation granule is being used, all bits are valid and used for the invalidation.

•

Where a 16KB translation granule is being used, bits [1:0] of this field are RES0 and ignored
when the instruction is executed, because VA[13:12] have no effect on the operation of the
instruction.

•

Where a 64KB translation granule is being used, bits [3:0] of this field are RES0 and ignored
when the instruction is executed, because VA[15:12] have no effect on the operation of the
instruction.

Performing the TLBI VALE3IS operation:
To perform the TLBI VALE3IS operation:
TLBI VALE3IS, 

C4-378

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

The operation is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

01

110

1000

0011

101

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-379

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

C4.6.29

TLBI VMALLE1, TLB Invalidate by VMID, All entries at stage 1, EL1
The TLBI VMALLE1 characteristics are:
Purpose
Invalidate all EL1&0 regime stage 1 TLB entries for the current VMID.
This register is part of the TLB maintenance operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

WO

WO

WO

WO

WO

If EL3 is implemented, the translations that are invalidated are those associated with either the
Secure or Non-secure address space, depending on the value of SCR_EL3.NS.
Configurations
There are no configuration notes.
Attributes
TLBI VMALLE1 is a 64-bit system operation.
The TLBI VMALLE1 operation ignores the value in the register specified by the instruction used to perform this
operation. Software does not have to write a value to the register before issuing this instruction.

Performing the TLBI VMALLE1 operation:
To perform the TLBI VMALLE1 operation:
TLBI VMALLE1

The operation is encoded as follows:

C4-380

op0

op1

CRn

CRm

op2

01

000

1000

0111

000

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

C4.6.30

TLBI VMALLE1IS, TLB Invalidate by VMID, All entries at stage 1, EL1, Inner Shareable
The TLBI VMALLE1IS characteristics are:
Purpose
Invalidate all EL1&0 regime stage 1 TLB entries for the current VMID on all PEs in the same Inner
Shareable domain.
This register is part of the TLB maintenance operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

WO

WO

WO

WO

WO

If EL3 is implemented, the translations that are invalidated are those associated with either the
Secure or Non-secure address space, depending on the value of SCR_EL3.NS.
Configurations
There are no configuration notes.
Attributes
TLBI VMALLE1IS is a 64-bit system operation.
The TLBI VMALLE1IS operation ignores the value in the register specified by the instruction used to perform this
operation. Software does not have to write a value to the register before issuing this instruction.

Performing the TLBI VMALLE1IS operation:
To perform the TLBI VMALLE1IS operation:
TLBI VMALLE1IS

The operation is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

01

000

1000

0011

000

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-381

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

C4.6.31

TLBI VMALLS12E1, TLB Invalidate by VMID, All entries at Stage 1 and 2, EL1
The TLBI VMALLS12E1 characteristics are:
Purpose
Invalidate all EL1&0 regime stage 1 and 2 TLB entries for the current VMID.
This register is part of the TLB maintenance operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

-

-

WO

WO

WO

If EL3 is implemented, the translations that are invalidated are those associated with either the
Secure or Non-secure address space, depending on the value of SCR_EL3.NS.
Configurations
There are no configuration notes.
Attributes
TLBI VMALLS12E1 is a 64-bit system operation.
The TLBI VMALLS12E1 operation ignores the value in the register specified by the instruction used to perform
this operation. Software does not have to write a value to the register before issuing this instruction.

Performing the TLBI VMALLS12E1 operation:
To perform the TLBI VMALLS12E1 operation:
TLBI VMALLS12E1

The operation is encoded as follows:

C4-382

op0

op1

CRn

CRm

op2

01

100

1000

0111

110

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

C4.6.32
TLBI VMALLS12E1IS, TLB Invalidate by VMID, All entries at Stage 1 and 2, EL1, Inner
Shareable
The TLBI VMALLS12E1IS characteristics are:
Purpose
Invalidate all EL1&0 regime stage 1 and 2 TLB entries for the current VMID on all PEs in the same
Inner Shareable domain.
This register is part of the TLB maintenance operations functional group.
Usage constraints
This operation can be performed at the exception levels shown below:
EL0

EL1 (NS)

EL1 (S)

EL2

EL3 (SCR.NS=1)

EL3 (SCR.NS=0)

-

-

-

WO

WO

WO

If EL3 is implemented, the translations that are invalidated are those associated with either the
Secure or Non-secure address space, depending on the value of SCR_EL3.NS.
Configurations
There are no configuration notes.
Attributes
TLBI VMALLS12E1IS is a 64-bit system operation.
The TLBI VMALLS12E1IS operation ignores the value in the register specified by the instruction used to perform
this operation. Software does not have to write a value to the register before issuing this instruction.

Performing the TLBI VMALLS12E1IS operation:
To perform the TLBI VMALLS12E1IS operation:
TLBI VMALLS12E1IS

The operation is encoded as follows:

ARM DDI 0487A.a
ID090413

op0

op1

CRn

CRm

op2

01

100

1000

0011

110

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C4-383

C4 The AArch64 System Instruction Class
C4.6 A64 system instructions for TLB maintenance

C4-384

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

Chapter C5
A64 Base Instruction Descriptions

This chapter describes the A64 base instructions.
It contains the following sections:
•
Introduction on page C5-386.
•
Register size on page C5-387.
•
Use of the PC on page C5-388.
•
Use of the stack pointer on page C5-389.
•
Condition flags and related instructions on page C5-390.
•
Alphabetical list of instructions on page C5-391.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C5-385

C5 A64 Base Instruction Descriptions
C5.1 Introduction

C5.1

Introduction
This chapter provides information on key aspects of the base instructions, and an alphabetic list of instructions from
the following functional groups:
•
Branch, Exception generation, and system instructions.
•
Loads and stores associated with the general-purpose registers.
•
Data processing (immediate).
•
Data processing (register).
A64 instruction index by encoding on page C3-172 provides an overview of the instruction encodings as well as of
the instruction classes within their functional groups.
The base instruction descriptions include:
•
Register size on page C5-387.
•
Use of the PC on page C5-388.
•
Use of the stack pointer on page C5-389.
•
Condition flags and related instructions on page C5-390.

C5-386

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C5 A64 Base Instruction Descriptions
C5.2 Register size

C5.2

Register size
Most data processing, comparison, and conversion instructions that use the general-purpose registers as the source
or destination operand have two instruction variants that operate on either a 32-bit or a 64-bit value.
Where a 32-bit instruction form is selected, the following holds:
•
The upper 32 bits of the source registers are ignored.
•
The upper 32 bits of the destination register are set to zero.
•
Right shifts and right rotates inject at bit[31], not at bit[63].
•
The condition flags, where set by the instruction, are computed from the lower 32 bits.
This distinction applies even when the results of a 32-bit instruction form are indistinguishable from the lower 32
bits computed by the equivalent 64-bit instruction form. For example, a 32-bit bitwise ORR could be performed using
a 64-bit ORR and simply ignoring the top 32 bits of the result. However, the A64 instruction set includes separate
32-bit and 64-bit forms of the ORR instruction.
As well as distinct sign-extend or zero-extend instructions, the A64 instruction set also provides the ability to extend
and shift the final source register of an ADD, SUB, ADDS, or SUBS instruction and the index register of a Load/Store
instruction. This enables array index calculations involving a 64-bit array pointer and a 32-bit array index to be
implemented efficiently.
The assembly language notation enables the distinct identification of registers holding 32-bit values and registers
holding 64-bit values. See Register names on page C1-114 and Register indexed addressing on page C1-118.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C5-387

C5 A64 Base Instruction Descriptions
C5.3 Use of the PC

C5.3

Use of the PC
A64 instructions have limited access to the PC. The only instructions that can read the PC are those that generate a
PC relative address:
•

ADR and ADRP.

•

The Load register (literal) instruction class.

•

Direct branches that use an immediate offset.

•

The unconditional branch with link instructions, BL and BLR, that use the PC to create the return link
address.

Only explicit control flow instructions can modify the PC:
•
Conditional and unconditional branch and return instructions.
•
Exception generation and exception return instructions.
For more details on instructions that can modify the PC, see Branches, Exception generating, and System
instructions on page C2-124.

C5-388

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C5 A64 Base Instruction Descriptions
C5.4 Use of the stack pointer

C5.4

Use of the stack pointer
A64 instructions can use the stack pointer only in a limited number of cases:
•

Load/Store instructions use the current stack pointer as the base address:
—

ARM DDI 0487A.a
ID090413

When stack alignment checking is enabled by system software and the base register is SP, the current
stack pointer must be initially quadword aligned, That is, it must be aligned to 16 bytes. Misalignment
generates a Stack Alignment fault. See Stack pointer alignment checking on page D1-1424 for more
information.

•

Add and subtract data processing instructions in their immediate and extended register forms, use the current
stack pointer as a source register or the destination register or both.

•

Logical data processing instructions in their immediate form use the current stack pointer as the destination
register.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C5-389

C5 A64 Base Instruction Descriptions
C5.5 Condition flags and related instructions

C5.5

Condition flags and related instructions
The A64 base instructions that use the condition flags as an input are:
•

Conditional branch. The conditional branch instruction is B.cond.

•

Add or subtract with carry. These instruction types include instructions to perform multi-precision arithmetic
and calculate checksums. The add or subtract with carry instructions are ADC, ADCS, SBC, and SBCS, or an
architectural alias for these instructions.

•

Conditional select with increment, negate, or invert.This instruction type conditionally selects between one
source register and a second, incremented, negated, inverted, or unmodified source register. The conditional
select with increment, negate, or invert instructions are CSINC, CSINV, and CSNEG.
These instructions also implement:

•

—

Conditional select or move. The condition flags select one of two source registers as the destination
register. Short conditional sequences can be replaced by unconditional instructions followed by a
conditional select, CSEL.

—

Conditional set. Conditionally selects between 0 and 1, or 0 and -1. This can be used to convert the
condition flags to a Boolean value or mask in a general-purpose register, for example. These
instructions include CSET and CSETM.

Conditional compare. This instruction type sets the condition flags to the result of a comparison if the original
condition is true, otherwise it sets the condition flags to an immediate value. It permits the flattening of nested
conditional expressions without using conditional branches or performing Boolean arithmetic within the
general-purpose registers.The conditional compare instructions are CCMP and CCMN.

The A64 base instructions that update the condition flags as an output are:
•

Flag-setting data processing instructions, such as ADCS, ADDS, ANDS, BICS, SBCS, and SUBS, and the aliases CMN,
CMP, and TST.

•

Conditional compare instructions such as CCMN, CCMP.

The flags can be directly accessed for a read/write using the NZCV, Condition Flags on page C4-267.
The A64 base instructions also include conditional branch instructions that do not use the condition flags as an input:
•
Compare and branch if a register is zero or nonzero, CBZ and CBNZ.
•
Test a single bit in a register and branch if the bit is zero or nonzero, TBZ and TBNZ.

C5-390

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C5 A64 Base Instruction Descriptions
C5.6 Alphabetical list of instructions

C5.6

Alphabetical list of instructions
This section lists every instruction in the base category of the A64 instruction set. For details of the format used, see
Structure of the A64 assembler language on page C1-113.

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C5-391

C5 A64 Base Instruction Descriptions
C5.6 Alphabetical list of instructions

C5.6.1

ADC
Add with carry: Rd = Rn + Rm + C

31 30 29 28 27 26 25 24 23 22 21 20
sf 0 0 1 1 0 1 0 0 0 0
op S

16 15 14 13 12 11 10 9
Rm

0 0 0 0 0 0

5 4
Rn

0
Rd

32-bit variant (sf = 0)
ADC , , 

64-bit variant (sf = 1)
ADC , , 
integer
integer
integer
integer
boolean
boolean

d = UInt(Rd);
n = UInt(Rn);
m = UInt(Rm);
datasize = if sf == '1' then 64 else 32;
sub_op = (op == '1');
setflags = (S == '1');

Assembler Symbols


Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.



Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.



Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.



Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.



Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.



Is the 64-bit name of the second general-purpose source register, encoded in the "Rm" field.

Operation
bits(datasize) result;
bits(datasize) operand1 = X[n];
bits(datasize) operand2 = X[m];
bits(4) nzcv;
if sub_op then
operand2 = NOT(operand2);
(result, nzcv) = AddWithCarry(operand1, operand2, PSTATE.C);
if setflags then
PSTATE. = nzcv;
X[d] = result;

C5-392

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C5 A64 Base Instruction Descriptions
C5.6 Alphabetical list of instructions

C5.6.2

ADCS
Add with carry, setting the condition flags: Rd = Rn + Rm + C

31 30 29 28 27 26 25 24 23 22 21 20
sf 0 1 1 1 0 1 0 0 0 0
op S

16 15 14 13 12 11 10 9
Rm

0 0 0 0 0 0

5 4
Rn

0
Rd

32-bit variant (sf = 0)
ADCS , , 

64-bit variant (sf = 1)
ADCS , , 
integer
integer
integer
integer
boolean
boolean

d = UInt(Rd);
n = UInt(Rn);
m = UInt(Rm);
datasize = if sf == '1' then 64 else 32;
sub_op = (op == '1');
setflags = (S == '1');

Assembler Symbols


Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.



Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.



Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.



Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.



Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.



Is the 64-bit name of the second general-purpose source register, encoded in the "Rm" field.

Operation
bits(datasize) result;
bits(datasize) operand1 = X[n];
bits(datasize) operand2 = X[m];
bits(4) nzcv;
if sub_op then
operand2 = NOT(operand2);
(result, nzcv) = AddWithCarry(operand1, operand2, PSTATE.C);
if setflags then
PSTATE. = nzcv;
X[d] = result;

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C5-393

C5 A64 Base Instruction Descriptions
C5.6 Alphabetical list of instructions

C5.6.3

ADD (extended register)
Add (extended register): Rd = Rn + LSL(extend(Rm), amount)

31 30 29 28 27 26 25 24 23 22 21 20
sf 0 0 0 1 0 1 1 0 0 1
op S

16 15
Rm

13 12

option

10 9

imm3

5 4
Rn

0
Rd

32-bit variant (sf = 0)
ADD , , {,  {#}}

64-bit variant (sf = 1)
ADD , , {,  {#}}
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
boolean sub_op = (op == '1');
boolean setflags = (S == '1');
ExtendType extend_type = DecodeRegExtend(option);
integer shift = UInt(imm3);
if shift > 4 then ReservedValue();

Assembler Symbols


Is the 32-bit name of the destination general-purpose register or stack pointer, encoded in the "Rd"
field.



Is the 32-bit name of the first source general-purpose register or stack pointer, encoded in the "Rn"
field.



Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.



Is the 64-bit name of the destination general-purpose register or stack pointer, encoded in the "Rd"
field.



Is the 64-bit name of the first source general-purpose register or stack pointer, encoded in the "Rn"
field.



Is a width specifier,
W

when option = 00x

W

when option = 010

X

when option = x11

W

when option = 10x

W

when option = 110



Is the number [0-30] of the second general-purpose source register or the name ZR (31), encoded in
the "Rm" field.



For the 32-bit variant: is the extension to be applied to the second source operand,
UXTB

when option = 000

UXTH

when option = 001

LSL|UXTW when option = 010
UXTX

C5-394

when option = 011

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C5 A64 Base Instruction Descriptions
C5.6 Alphabetical list of instructions

SXTB

when option = 100

SXTH

when option = 101

SXTW

when option = 110

SXTX

when option = 111

The LSL form can only be used when at least one of "Rd" or "Rn" is '11111' (i.e. WSP) and in that
case is also the default. In all other cases  must be present.


For the 64-bit variant: is the extension to be applied to the second source operand,
UXTB

when option = 000

UXTH

when option = 001

UXTW

when option = 010

LSL|UXTX when option = 011
SXTB

when option = 100

SXTH

when option = 101

SXTW

when option = 110

SXTX

when option = 111

The LSL form can only be used when at least one of "Rd" or "Rn" is '11111' (i.e. SP) and in that case
is also the default. In all other cases  must be present.


Is the left shift amount in the range 0 to 4, which is optional with a default of 0 when  is
not LSL, encoded in the "imm3" field.

Operation
bits(datasize) result;
bits(datasize) operand1 = if n == 31 then SP[] else X[n];
bits(datasize) operand2 = ExtendReg(m, extend_type, shift);
bits(4) nzcv;
bit carry_in;
if sub_op then
operand2 = NOT(operand2);
carry_in = '1';
else
carry_in = '0';
(result, nzcv) = AddWithCarry(operand1, operand2, carry_in);
if setflags then
PSTATE. = nzcv;
if d == 31 && !setflags then
SP[] = result;
else
X[d] = result;

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C5-395

C5 A64 Base Instruction Descriptions
C5.6 Alphabetical list of instructions

C5.6.4

ADD (immediate)
Add (immediate): Rd = Rn + shift(imm)
This instruction is used by the alias MOV (to/from SP).See the Alias conditions table for details of when each alias
is preferred.

31 30 29 28 27 26 25 24 23 22 21

10 9

sf 0 0 1 0 0 0 1 shift
op S

imm12

5 4
Rn

0
Rd

32-bit variant (sf = 0)
ADD , , #{, }

64-bit variant (sf = 1)
ADD , , #{, }
integer d = UInt(Rd);
integer n = UInt(Rn);
integer datasize = if sf == '1' then 64 else 32;
boolean sub_op = (op == '1');
boolean setflags = (S == '1');
bits(datasize) imm;
case shift of
when '00' imm = ZeroExtend(imm12, datasize);
when '01' imm = ZeroExtend(imm12 : Zeros(12), datasize);
when '1x' ReservedValue();

Alias conditions
Alias

is preferred when

MOV (to/from SP)

(Rd == '11111' || Rn == '11111') && IsZero(shift:imm12)

Assembler Symbols


Is the 32-bit name of the destination general-purpose register or stack pointer, encoded in the "Rd"
field.



Is the 32-bit name of the source general-purpose register or stack pointer, encoded in the "Rn" field.



Is the 64-bit name of the destination general-purpose register or stack pointer, encoded in the "Rd"
field.



Is the 64-bit name of the source general-purpose register or stack pointer, encoded in the "Rn" field.



Is an unsigned immediate, in the range 0 to 4095, encoded in the "imm12" field.



Is the optional left shift to apply to the immediate, defaulting to LSL #0 and
LSL #0

when shift = 00

LSL #12

when shift = 01

RESERVED when shift = 1x

C5-396

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C5 A64 Base Instruction Descriptions
C5.6 Alphabetical list of instructions

Operation
bits(datasize) result;
bits(datasize) operand1 = if n == 31 then SP[] else X[n];
bits(datasize) operand2 = imm;
bits(4) nzcv;
bit carry_in;
if sub_op then
operand2 = NOT(operand2);
carry_in = '1';
else
carry_in = '0';
(result, nzcv) = AddWithCarry(operand1, operand2, carry_in);
if setflags then
PSTATE. = nzcv;
if d == 31 && !setflags then
SP[] = result;
else
X[d] = result;

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C5-397

C5 A64 Base Instruction Descriptions
C5.6 Alphabetical list of instructions

C5.6.5

ADD (shifted register)
Add (shifted register): Rd = Rn + shift(Rm, amount)

31 30 29 28 27 26 25 24 23 22 21 20
sf 0 0 0 1 0 1 1 shift 0
op S

16 15
Rm

10 9
imm6

5 4
Rn

0
Rd

32-bit variant (sf = 0)
ADD , , {,  #}

64-bit variant (sf = 1)
ADD , , {,  #}
integer
integer
integer
integer
boolean
boolean

d = UInt(Rd);
n = UInt(Rn);
m = UInt(Rm);
datasize = if sf == '1' then 64 else 32;
sub_op = (op == '1');
setflags = (S == '1');

if shift == '11' then ReservedValue();
if sf == '0' && imm6<5> == '1' then ReservedValue();
ShiftType shift_type = DecodeShift(shift);
integer shift_amount = UInt(imm6);

Assembler Symbols


Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.



Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.



Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.



Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.



Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.



Is the 64-bit name of the second general-purpose source register, encoded in the "Rm" field.



Is the optional shift type to be applied to the second source operand, defaulting to LSL and
LSL

when shift = 00

LSR

when shift = 01

ASR

when shift = 10

RESERVED when shift = 11

C5-398



For the 32-bit variant: is the shift amount, in the range 0 to 31, defaulting to 0 and encoded in the
"imm6" field.



For the 64-bit variant: is the shift amount, in the range 0 to 63, defaulting to 0 and encoded in the
"imm6" field.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C5 A64 Base Instruction Descriptions
C5.6 Alphabetical list of instructions

Operation
bits(datasize) result;
bits(datasize) operand1 = X[n];
bits(datasize) operand2 = ShiftReg(m, shift_type, shift_amount);
bits(4) nzcv;
bit carry_in;
if sub_op then
operand2 = NOT(operand2);
carry_in = '1';
else
carry_in = '0';
(result, nzcv) = AddWithCarry(operand1, operand2, carry_in);
if setflags then
PSTATE. = nzcv;
X[d] = result;

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C5-399

C5 A64 Base Instruction Descriptions
C5.6 Alphabetical list of instructions

C5.6.6

ADDS (extended register)
Add (extended register), setting the condition flags: Rd = Rn + LSL(extend(Rm), amount)
This instruction is used by the alias CMN (extended register).See the Alias conditions table for details of when each
alias is preferred.

31 30 29 28 27 26 25 24 23 22 21 20
sf 0 1 0 1 0 1 1 0 0 1
op S

16 15
Rm

13 12

option

10 9

imm3

5 4
Rn

0
Rd

32-bit variant (sf = 0)
ADDS , , {,  {#}}

64-bit variant (sf = 1)
ADDS , , {,  {#}}
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
boolean sub_op = (op == '1');
boolean setflags = (S == '1');
ExtendType extend_type = DecodeRegExtend(option);
integer shift = UInt(imm3);
if shift > 4 then ReservedValue();

Alias conditions
Alias

is preferred when

CMN (extended register)

Rd == '11111'

Assembler Symbols


Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.



Is the 32-bit name of the first source general-purpose register or stack pointer, encoded in the "Rn"
field.



Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.



Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.



Is the 64-bit name of the first source general-purpose register or stack pointer, encoded in the "Rn"
field.



Is a width specifier,



C5-400

W

when option = 00x

W

when option = 010

X

when option = x11

W

when option = 10x

W

when option = 110

Is the number [0-30] of the second general-purpose source register or the name ZR (31), encoded in
the "Rm" field.

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C5 A64 Base Instruction Descriptions
C5.6 Alphabetical list of instructions



For the 32-bit variant: is the extension to be applied to the second source operand,
UXTB

when option = 000

UXTH

when option = 001

LSL|UXTW when option = 010
UXTX

when option = 011

SXTB

when option = 100

SXTH

when option = 101

SXTW

when option = 110

SXTX

when option = 111

The LSL form can only be used when "Rn" is '11111' (i.e. WSP) and in that case is also the default.
In all other cases  must be present.


For the 64-bit variant: is the extension to be applied to the second source operand,
UXTB

when option = 000

UXTH

when option = 001

UXTW

when option = 010

LSL|UXTX when option = 011
SXTB

when option = 100

SXTH

when option = 101

SXTW

when option = 110

SXTX

when option = 111

The LSL form can only be used when "Rn" is '11111' (i.e. SP) and in that case is also the default. In
all other cases  must be present.


Is the left shift amount in the range 0 to 4, which is optional with a default of 0 when  is
not LSL, encoded in the "imm3" field.

Operation
bits(datasize) result;
bits(datasize) operand1 = if n == 31 then SP[] else X[n];
bits(datasize) operand2 = ExtendReg(m, extend_type, shift);
bits(4) nzcv;
bit carry_in;
if sub_op then
operand2 = NOT(operand2);
carry_in = '1';
else
carry_in = '0';
(result, nzcv) = AddWithCarry(operand1, operand2, carry_in);
if setflags then
PSTATE. = nzcv;
if d == 31 && !setflags then
SP[] = result;
else
X[d] = result;

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C5-401

C5 A64 Base Instruction Descriptions
C5.6 Alphabetical list of instructions

C5.6.7

ADDS (immediate)
Add (immediate), setting the condition flags: Rd = Rn + shift(imm)
This instruction is used by the alias CMN (immediate).See the Alias conditions table for details of when each alias
is preferred.

31 30 29 28 27 26 25 24 23 22 21
sf 0 1 1 0 0 0 1 shift
op S

10 9
imm12

5 4

0

Rn

Rd

32-bit variant (sf = 0)
ADDS , , #{, }

64-bit variant (sf = 1)
ADDS , , #{, }
integer d = UInt(Rd);
integer n = UInt(Rn);
integer datasize = if sf == '1' then 64 else 32;
boolean sub_op = (op == '1');
boolean setflags = (S == '1');
bits(datasize) imm;
case shift of
when '00' imm = ZeroExtend(imm12, datasize);
when '01' imm = ZeroExtend(imm12 : Zeros(12), datasize);
when '1x' ReservedValue();

Alias conditions
Alias

is preferred when

CMN (immediate)

Rd == '11111'

Assembler Symbols


Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.



Is the 32-bit name of the source general-purpose register or stack pointer, encoded in the "Rn" field.



Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.



Is the 64-bit name of the source general-purpose register or stack pointer, encoded in the "Rn" field.



Is an unsigned immediate, in the range 0 to 4095, encoded in the "imm12" field.



Is the optional left shift to apply to the immediate, defaulting to LSL #0 and
LSL #0

when shift = 00

LSL #12

when shift = 01

RESERVED when shift = 1x

C5-402

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C5 A64 Base Instruction Descriptions
C5.6 Alphabetical list of instructions

Operation
bits(datasize) result;
bits(datasize) operand1 = if n == 31 then SP[] else X[n];
bits(datasize) operand2 = imm;
bits(4) nzcv;
bit carry_in;
if sub_op then
operand2 = NOT(operand2);
carry_in = '1';
else
carry_in = '0';
(result, nzcv) = AddWithCarry(operand1, operand2, carry_in);
if setflags then
PSTATE. = nzcv;
if d == 31 && !setflags then
SP[] = result;
else
X[d] = result;

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C5-403

C5 A64 Base Instruction Descriptions
C5.6 Alphabetical list of instructions

C5.6.8

ADDS (shifted register)
Add (shifted register), setting the condition flags: Rd = Rn + shift(Rm, amount)
This instruction is used by the alias CMN (shifted register).See the Alias conditions table for details of when each
alias is preferred.

31 30 29 28 27 26 25 24 23 22 21 20
sf 0 1 0 1 0 1 1 shift 0
op S

16 15
Rm

10 9
imm6

5 4
Rn

0
Rd

32-bit variant (sf = 0)
ADDS , , {,  #}

64-bit variant (sf = 1)
ADDS , , {,  #}
integer
integer
integer
integer
boolean
boolean

d = UInt(Rd);
n = UInt(Rn);
m = UInt(Rm);
datasize = if sf == '1' then 64 else 32;
sub_op = (op == '1');
setflags = (S == '1');

if shift == '11' then ReservedValue();
if sf == '0' && imm6<5> == '1' then ReservedValue();
ShiftType shift_type = DecodeShift(shift);
integer shift_amount = UInt(imm6);

Alias conditions

C5-404

Alias

is preferred when

CMN (shifted register)

Rd == '11111'

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

ARM DDI 0487A.a
ID090413

C5 A64 Base Instruction Descriptions
C5.6 Alphabetical list of instructions

Assembler Symbols


Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.



Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.



Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.



Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.



Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.



Is the 64-bit name of the second general-purpose source register, encoded in the "Rm" field.



Is the optional shift type to be applied to the second source operand, defaulting to LSL and
LSL

when shift = 00

LSR

when shift = 01

ASR

when shift = 10

RESERVED when shift = 11


For the 32-bit variant: is the shift amount, in the range 0 to 31, defaulting to 0 and encoded in the
"imm6" field.



For the 64-bit variant: is the shift amount, in the range 0 to 63, defaulting to 0 and encoded in the
"imm6" field.

Operation
bits(datasize) result;
bits(datasize) operand1 = X[n];
bits(datasize) operand2 = ShiftReg(m, shift_type, shift_amount);
bits(4) nzcv;
bit carry_in;
if sub_op then
operand2 = NOT(operand2);
carry_in = '1';
else
carry_in = '0';
(result, nzcv) = AddWithCarry(operand1, operand2, carry_in);
if setflags then
PSTATE. = nzcv;
X[d] = result;

ARM DDI 0487A.a
ID090413

Copyright © 2013 ARM Limited. All rights reserved.
Non-Confidential - Beta

C5-405

C5 A64 Base Instruction Descriptions
C5.6 Alphabetical list of instructions

C5.6.9

ADR
Address of label at a PC-relative offset

31 30 29 28 27 26 25 24 23
0 immlo 1 0 0 0 0
op

5 4
immhi

0
Rd

Literal variant
ADR , 

Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.7
Linearized                      : Yes
Page Mode                       : UseOutlines
XMP Toolkit                     : Adobe XMP Core 4.0-c321 44.398116, Tue Aug 04 2009 14:24:39
Format                          : application/pdf
Title                           : ARM Architecture Reference Manual ARMv8, for ARMv8-A architecture profile
Creator                         : ARM Limited
Description                     : Defines the ARMv8 architecture for the ARMv7-A architecture profiles, including the A32 (ARM), T32 (Thumb), and A64 instruction sets. The A (Application) profile defines a Virtual Memory System Architecture (VMSA) with support for the AArch64 and AArch32 Execution states, Virtualization and TrustZone Security. Includes the Debug Architecture, the GIC CPU interface, and the Generic Timer and Performance Monitors Extensions.
Create Date                     : 2013:09:04 14:48:27Z
Creator Tool                    : FrameMaker 8.0
Modify Date                     : 2013:09:04 15:56:25+01:00
Metadata Date                   : 2013:09:04 15:56:25+01:00
Producer                        : Acrobat Distiller 8.3.1 (Windows)
Copyright                       : Copyright © 2013 ARM Limited. All rights reserved.
Document ID                     : uuid:98637ab5-a130-45da-96cf-321f253d778f
Instance ID                     : uuid:9624232c-ef45-433c-8625-f860c663d220
Page Count                      : 5158
Subject                         : Defines the ARMv8 architecture for the ARMv7-A architecture profiles, including the A32 (ARM), T32 (Thumb), and A64 instruction sets. The A (Application) profile defines a Virtual Memory System Architecture (VMSA) with support for the AArch64 and AArch32 Execution states, Virtualization and TrustZone Security. Includes the Debug Architecture, the GIC CPU interface, and the Generic Timer and Performance Monitors Extensions.
Author                          : ARM Limited
Keywords                        : ARMv8, Cortex-A50, Cortex-A53, Cortex-A57, NEON, SecurCore
EXIF Metadata provided by EXIF.tools

Navigation menu