ARM Architecture Reference Manual ARMv8, For ARMv8 A Profile W'f AArch64

AArch64-Reference-Manual

AArch64-Reference-Manual

AArch64-Reference-Manual

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 6666

DownloadARM Architecture Reference Manual ARMv8, For ARMv8-A Profile W'f AArch64-Reference-Manual
Open PDF In BrowserView PDF
ARM Architecture Reference Manual
®

ARMv8, for ARMv8-A architecture profile

Printed on: December 19, 2017

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
ARM DDI 0487C.a (ID121917)

ARM Architecture Reference Manual
ARMv8, for ARMv8-A architecture profile
Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Release Information
The following releases of this document have been made.
Release history
Date

Issue

Confidentiality

Change

30 April 2013

A.a-1

Confidential-Beta Draft

Beta draft of first issue, limited circulation

12 June 2013

A.a-2

Confidential-Beta Draft

Second beta draft of first issue, limited circulation

04 September 2013

A.a

Non-Confidential Beta

Beta release.

24 December 2013

A.b

Non-Confidential Beta

Second beta release.

18 July 2014

A.c

Non-Confidential Beta

Third beta release.

09 October 2014

A.d

Non-Confidential Beta

Fourth beta release.

17 December 2014

A.e

Non-Confidential Beta

Fifth beta release.

25 March 2015

A.f

Non-Confidential Beta

Sixth beta release.

10 July 2015

A.g

Non-Confidential Beta

Seventh beta release.

30 September 2015

A.h

Non-Confidential Beta

Eighth beta release.

28 January 2016

A.i

Non-Confidential Beta

Ninth beta release.

03 June 2016

A.j

Non-Confidential EAC

EAC release.

30 September 2016

A.k

Non-Confidential v8.0 EAC

Updated EAC release.

31 March 2017

B.a

Non-Confidential v8.1 EAC, v8.2 Beta

Initial release incorporating ARMv8.1 and ARMv8.2.

26 September 2017

B.b

Non-Confidential v8.2 EAC

Initial v8.2 EAC release, incorporating SPE.

20 December 2017

C.a

Non-Confidential v8.3 EAC

Initial v8.3 EAC release.

Proprietary Notice
This document is protected by copyright and other related rights and the practice or implementation of the information contained
in this document may be protected by one or more patents or pending patent applications. No part of this document may be
reproduced in any form by any means without the express prior written permission of Arm. No license, express or implied, by
estoppel or otherwise to any intellectual property rights is granted by this document unless specifically stated.
Your access to the information in this document is conditional upon your acceptance that you will not use or permit others to use
the information for the purposes of determining whether implementations infringe any third party patents.
THIS DOCUMENT IS PROVIDED “AS IS”. ARM PROVIDES NO REPRESENTATIONS AND NO WARRANTIES,
EXPRESS, IMPLIED OR STATUTORY, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
MERCHANTABILITY, SATISFACTORY QUALITY, NON-INFRINGEMENT OR FITNESS FOR A PARTICULAR
PURPOSE WITH RESPECT TO THE DOCUMENT. For the avoidance of doubt, Arm makes no representation with respect to,
and has undertaken no analysis to identify or understand the scope and content of, patents, copyrights, trade secrets, or other rights.
This document may include technical inaccuracies or typographical errors.
TO THE EXTENT NOT PROHIBITED BY LAW, IN NO EVENT WILL ARM BE LIABLE FOR ANY DAMAGES,
INCLUDING WITHOUT LIMITATION ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, PUNITIVE, OR
CONSEQUENTIAL DAMAGES, HOWEVER CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY, ARISING
OUT OF ANY USE OF THIS DOCUMENT, EVEN IF ARM HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH
DAMAGES.

ii

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

This document consists solely of commercial items. You shall be responsible for ensuring that any use, duplication or disclosure
of this document complies fully with any relevant export laws and regulations to assure that this document or any portion thereof
is not exported, directly or indirectly, in violation of such export laws. Use of the word “partner” in reference to Arm’s customers
is not intended to create or refer to any partnership relationship with any other company. Arm may make changes to this document
at any time and without notice.
If any of the provisions contained in these terms conflict with any of the provisions of any click through or signed written
agreement covering this document with Arm, then the click through or signed written agreement prevails over and supersedes the
conflicting provisions of these terms. This document may be translated into other languages for convenience, and you agree that
if there is any conflict between the English version of this document and any translation, the terms of the English version of the
Agreement shall prevail.
The Arm corporate logo and words marked with ® or ™ are registered trademarks or trademarks of Arm Limited (or its subsidiaries)
in the US and/or elsewhere. All rights reserved. Other brands and names mentioned in this document may be the trademarks of
their respective owners. You must follow the Arm’s trademark usage guidelines
http://www.arm.com/company/policies/trademarks.
Copyright © 2013-2017 Arm Limited (or its affiliates). All rights reserved.
Arm Limited. Company 02557590 registered in England.
110 Fulbourn Road, Cambridge, England CB1 9NJ.
LES-PRE-20349
In this document, where the term ARM is used to refer to the company it means “Arm or any of its subsidiaries as appropriate”.

Note
•

The term ARM can refer to versions of the ARM architecture, for example ARMv8 refers to version 8 of the ARM
architecture. The context makes it clear when the term is used in this way.

•

This document describes only the ARMv8-A architecture profile. For the behaviors required by the previous version of
this architecture profile, ARMv7-A, see the ARM® Architecture Reference Manual, ARMv7-A and ARMv7-R edition.

Confidentiality Status
This document is Non-Confidential. The right to use, copy and disclose this document may be subject to license restrictions in
accordance with the terms of the agreement entered into by ARM and the party that ARM delivered this document to.
Product Status
The information in this document is final, that is for a developed product.
Web Address
http://www.arm.com

Limitations of this issue
This issue of the ARMv8 Architecture Reference Manual contains many improvements and corrections. Validation of this
document has identified the following issues that ARM will address in future issues:
•

ARM DDI 0487C.a
ID121917

Appendix K12 ARM Pseudocode Definition requires further review and update. Since this appendix is informative, rather
than being part of the architecture specification, this does not affect the quality status of this release.

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

iii

iv

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

Contents
ARM Architecture Reference Manual ARMv8, for
ARMv8-A architecture profile

Preface
About this manual ..................................................................................................... xvi
Using this manual .................................................................................................... xviii
Conventions ............................................................................................................ xxiv
Additional reading ................................................................................................... xxvi
Feedback ............................................................................................................... xxvii

Part A
Chapter A1

ARMv8 Architecture Introduction and Overview
Introduction to the ARMv8 Architecture
A1.1
A1.2
A1.3
A1.4
A1.5
A1.6
A1.7

Part B
Chapter B1

A1-32
A1-34
A1-36
A1-40
A1-50
A1-56
A1-57

The AArch64 Application Level Architecture
The AArch64 Application Level Programmers’ Model
B1.1
B1.2
B1.3

ARM DDI 0487C.a
ID121917

About the ARM architecture ................................................................................
Architecture profiles ............................................................................................
ARMv8 architectural concepts ............................................................................
Supported data types ..........................................................................................
Advanced SIMD and floating-point support .........................................................
The ARM memory model ....................................................................................
ARMv8 architecture extensions ..........................................................................

About the Application level programmers’ model ................................................ B1-76
Registers in AArch64 Execution state ................................................................. B1-77
Software control features and EL0 ...................................................................... B1-82

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

v

Contents

Chapter B2

The AArch64 Application Level Memory Model
B2.1
B2.2
B2.3
B2.4
B2.5
B2.6
B2.7
B2.8
B2.9

Part C
Chapter C1

The AArch64 Instruction Set
The A64 Instruction Set
C1.1
C1.2
C1.3
C1.4

Chapter C2

Chapter D1

The System instruction class encoding space ..................................................
Special-purpose registers .................................................................................
A64 System instructions for cache maintenance ..............................................
A64 system instructions for address translation ................................................
A64 System instructions for TLB maintenance .................................................

C5-328
C5-339
C5-409
C5-432
C5-459

About the A64 base instructions ....................................................................... C6-524
Alphabetical list of A64 base instructions .......................................................... C6-526

About the A64 SIMD and floating-point instructions ........................................ C7-1006
Alphabetical list of A64 Advanced SIMD and floating-point instructions ......... C7-1008

The AArch64 System Level Architecture
The AArch64 System Level Programmers’ Model
D1.1
D1.2
D1.3
D1.4

vi

A64 instruction set encoding ............................................................................. C4-224

A64 Advanced SIMD and Floating-point Instruction Descriptions
C7.1
C7.2

Part D

C3-162
C3-169
C3-185
C3-190
C3-198

A64 Base Instruction Descriptions
C6.1
C6.2

Chapter C7

Branches, Exception generating, and System instructions ...............................
Loads and stores ...............................................................................................
Data processing - immediate ............................................................................
Data processing - register .................................................................................
Data processing - SIMD and floating-point .......................................................

The A64 System Instruction Class
C5.1
C5.2
C5.3
C5.4
C5.5

Chapter C6

Understanding the A64 instruction descriptions ................................................ C2-154
General information about the A64 instruction descriptions .............................. C2-157

A64 Instruction Set Encoding
C4.1

Chapter C5

C1-142
C1-143
C1-149
C1-152

A64 Instruction Set Overview
C3.1
C3.2
C3.3
C3.4
C3.5

Chapter C4

About the A64 instruction set ............................................................................
Structure of the A64 assembler language .........................................................
Address generation ...........................................................................................
Instruction aliases .............................................................................................

About the A64 Instruction Descriptions
C2.1
C2.2

Chapter C3

About the ARM memory model ........................................................................... B2-86
Atomicity in the ARM architecture ....................................................................... B2-88
Definition of the ARMv8 memory model .............................................................. B2-92
Caches and memory hierarchy ......................................................................... B2-104
Alignment support ............................................................................................. B2-109
Endian support .................................................................................................. B2-111
Memory types and attributes ............................................................................. B2-114
Mismatched memory attributes ......................................................................... B2-125
Synchronization and semaphores ..................................................................... B2-128

Exception levels ..............................................................................................
Exception terminology .....................................................................................
Execution state ................................................................................................
Security state ..................................................................................................

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

D1-1850
D1-1851
D1-1853
D1-1854

ARM DDI 0487C.a
ID121917

Contents

D1.5
D1.6
D1.7
D1.8
D1.9
D1.10
D1.11
D1.12
D1.13
D1.14
D1.15
D1.16
D1.17
D1.18
D1.19
D1.20
D1.21

Chapter D2

D2.5
D2.6
D2.7
D2.8
D2.9
D2.10
D2.11
D2.12
D2.13

D3-2048
D3-2049
D3-2050
D3-2051
D3-2074
D3-2076
D3-2077

About the Virtual Memory System Architecture (VMSA) .................................
The VMSAv8-64 address translation system ..................................................
VMSAv8-64 translation table format descriptors .............................................
Memory access control ...................................................................................
Memory region attributes ................................................................................
Virtualization Host Extensions .........................................................................
Nested virtualization ........................................................................................
VMSAv8-64 memory aborts ............................................................................
Translation Lookaside Buffers (TLBs) .............................................................
TLB maintenance requirements and the TLB maintenance instructions .........
Caches in a VMSAv8-64 implementation ........................................................

D4-2082
D4-2091
D4-2143
D4-2155
D4-2174
D4-2183
D4-2188
D4-2191
D4-2201
D4-2207
D4-2221

The Performance Monitors Extension
D5.1

ARM DDI 0487C.a
ID121917

About the memory system architecture ...........................................................
Address space ................................................................................................
Mixed-endian support ......................................................................................
Cache support .................................................................................................
External aborts ................................................................................................
Memory barrier instructions .............................................................................
Pseudocode description of general memory system instructions ...................

The AArch64 Virtual Memory System Architecture
D4.1
D4.2
D4.3
D4.4
D4.5
D4.6
D4.7
D4.8
D4.9
D4.10
D4.11

Chapter D5

About self-hosted debug ................................................................................. D2-1980
The debug exception enable controls ............................................................. D2-1984
Routing debug exceptions ............................................................................... D2-1985
Enabling debug exceptions from the current Exception level and Security state ...........
D2-1987
The effect of powerdown on debug exceptions ............................................... D2-1989
Summary of the routing and enabling of debug exceptions ............................ D2-1990
Pseudocode description of debug exceptions ................................................. D2-1992
Breakpoint Instruction exceptions ................................................................... D2-1993
Breakpoint exceptions ..................................................................................... D2-1995
Watchpoint exceptions .................................................................................... D2-2016
Vector Catch exceptions ................................................................................. D2-2031
Software Step exceptions ............................................................................... D2-2032
Synchronization and debug exceptions .......................................................... D2-2046

The AArch64 System Level Memory Model
D3.1
D3.2
D3.3
D3.4
D3.5
D3.6
D3.7

Chapter D4

D1-1856
D1-1859
D1-1865
D1-1868
D1-1870
D1-1874
D1-1883
D1-1887
D1-1894
D1-1902
D1-1910
D1-1953
D1-1954
D1-1959
D1-1961
D1-1962
D1-1974

AArch64 Self-hosted Debug
D2.1
D2.2
D2.3
D2.4

Chapter D3

Virtualization ....................................................................................................
Registers for instruction processing and exception handling ..........................
Process state, PSTATE ..................................................................................
Program counter and stack pointer alignment ................................................
Reset ...............................................................................................................
Exception entry ...............................................................................................
Exception return ..............................................................................................
The Exception level hierarchy .........................................................................
Synchronous exception types, routing and priorities .......................................
Asynchronous exception types, routing, masking and priorities .....................
Configurable instruction enables and disables, and trap controls ...................
System calls ....................................................................................................
Mechanisms for entering a low-power state ....................................................
Self-hosted debug ...........................................................................................
The Performance Monitors Extension .............................................................
Interprocessing ................................................................................................
The effect of implementation choices on the programmers’ model .................

About the Performance Monitors .................................................................... D5-2226

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

vii

Contents

D5.2
D5.3
D5.4
D5.5
D5.6
D5.7
D5.8
D5.9
D5.10
D5.11

Chapter D6

Chapter E1

About the AArch64 System registers ............................................................
General system control registers ..................................................................
Debug registers .............................................................................................
Performance Monitors registers ....................................................................
Statistical Profiling Extension registers .........................................................
Generic Timer registers .................................................................................

D10-2348
D10-2357
D10-2765
D10-2840
D10-2885
D10-2917

The AArch32 Application Level Programmers’ Model
About the Application level programmers’ model ............................................
The Application level programmers’ model in AArch32 state ..........................
Advanced SIMD and floating-point instructions ..............................................
About the AArch32 System register interface .................................................
Exceptions ......................................................................................................

E1-2978
E1-2979
E1-2990
E1-3001
E1-3002

The AArch32 Application Level Memory Model
E2.1
E2.2
E2.3
E2.4
E2.5
E2.6
E2.7
E2.8

viii

The System register encoding space .............................................................. D9-2332
op0==0b10, Moves to and from debug and trace System registers ............... D9-2333
op0==0b11, Moves to and from non-debug System registers, Special-purpose registers
D9-2335

The AArch32 Application Level Architecture
E1.1
E1.2
E1.3
E1.4
E1.5

Chapter E2

About the Generic Timer ................................................................................. D8-2322
The AArch64 view of the Generic Timer ......................................................... D8-2326

AArch64 System Register Descriptions
D10.1
D10.2
D10.3
D10.4
D10.5
D10.6

Part E

About the Statistical Profiling Extension Sample Records .............................. D7-2296
Alphabetical list of Statistical Profiling Extension packets ............................... D7-2299

AArch64 System Register Encoding
D9.1
D9.2
D9.3

Chapter D10

D6-2272
D6-2280
D6-2283
D6-2289

The Generic Timer in AArch64 state
D8.1
D8.2

Chapter D9

Statistical Profiling ...........................................................................................
Programmers’ Model .......................................................................................
Enable and Filtering controls ...........................................................................
Profiling Buffer management interrupt ............................................................

Statistical Profiling Extension Sample Record Specification
D7.1
D7.2

Chapter D8

D5-2228
D5-2230
D5-2232
D5-2233
D5-2235
D5-2237
D5-2238
D5-2239
D5-2240
D5-2268

The Statistical Profiling Extension
D6.1
D6.2
D6.3
D6.4

Chapter D7

Accuracy of the Performance Monitors ...........................................................
Behavior on overflow .......................................................................................
Attributability ....................................................................................................
Effect of EL3 and EL2 .....................................................................................
Event filtering ..................................................................................................
Performance Monitors and Debug state .........................................................
Counter enables ..............................................................................................
Counter access ...............................................................................................
PMU events and event numbers .....................................................................
Performance Monitors Extension registers .....................................................

About the ARM memory model .......................................................................
Atomicity in the ARM architecture ...................................................................
Definition of the ARMv8 memory model ..........................................................
Caches and memory hierarchy .......................................................................
Alignment support ...........................................................................................
Endian support ................................................................................................
Memory types and attributes ...........................................................................
Mismatched memory attributes .......................................................................

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

E2-3004
E2-3006
E2-3011
E2-3022
E2-3027
E2-3029
E2-3032
E2-3042

ARM DDI 0487C.a
ID121917

Contents

E2.9

Part F
Chapter F1

The AArch32 Instruction Sets
The AArch32 Instruction Sets Overview
F1.1
F1.2
F1.3
F1.4
F1.5
F1.6
F1.7
F1.8
F1.9
F1.10
F1.11
F1.12
F1.13
F1.14

Chapter F2

Chapter G1

A32 instruction set encoding ............................................................................ F4-3198
About the A32 Advanced SIMD and floating-point instructions and their encoding .......
F4-3256

Alphabetical list of T32 and A32 base instruction set instructions ................... F5-3258
Encoding and use of banked register transfer instructions .............................. F5-3928

Alphabetical list of Advanced SIMD and floating-point instructions ................. F6-3934

The AArch32 System Level Architecture
The AArch32 System Level Programmers’ Model
G1.1
G1.2
G1.3
G1.4
G1.5

ARM DDI 0487C.a
ID121917

T32 instruction set encoding ............................................................................ F3-3130
About the T32 Advanced SIMD and floating-point instructions and their encoding .......
F3-3196

T32 and A32 Advanced SIMD and Floating-point Instruction Descriptions
F6.1

Part G

F2-3096
F2-3100
F2-3101
F2-3104
F2-3106
F2-3107
F2-3108
F2-3121
F2-3122

T32 and A32 Base Instruction Set Instruction Descriptions
F5.1
F5.2

Chapter F6

Format of instruction descriptions ....................................................................
Standard assembler syntax fields ....................................................................
Conditional execution .......................................................................................
Shifts applied to a register ................................................................................
Memory accesses ............................................................................................
Encoding of lists of general-purpose registers and the PC ..............................
General information about the T32 and A32 instruction descriptions ..............
Additional pseudocode support for instruction descriptions .............................
Additional information about Advanced SIMD and floating-point instructions ..

A32 Instruction Set Encoding
F4.1
F4.2

Chapter F5

F1-3058
F1-3059
F1-3061
F1-3062
F1-3070
F1-3071
F1-3074
F1-3075
F1-3077
F1-3079
F1-3080
F1-3082
F1-3083
F1-3093

T32 Instruction Set Encoding
F3.1
F3.2

Chapter F4

Support for instructions in different versions of the ARM architecture .............
Unified Assembler Language ...........................................................................
Branch instructions ...........................................................................................
Data-processing instructions ............................................................................
PSTATE and banked register access instructions ...........................................
Load/store instructions .....................................................................................
Load/store multiple instructions ........................................................................
Miscellaneous instructions ...............................................................................
Exception-generating and exception-handling instructions ..............................
System register access instructions .................................................................
Advanced SIMD and floating-point load/store instructions ...............................
Advanced SIMD and floating-point register transfer instructions .....................
Advanced SIMD data-processing instructions .................................................
Floating-point data-processing instructions ......................................................

About the T32 and A32 Instruction Descriptions
F2.1
F2.2
F2.3
F2.4
F2.5
F2.6
F2.7
F2.8
F2.9

Chapter F3

Synchronization and semaphores ................................................................... E2-3045

About the AArch32 System level programmers’ model ...................................
Exception levels ..............................................................................................
Exception terminology .....................................................................................
Execution state ................................................................................................
Instruction Set state ........................................................................................

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

G1-4590
G1-4591
G1-4592
G1-4594
G1-4596
ix

Contents

G1.6
G1.7
G1.8
G1.9
G1.10
G1.11
G1.12
G1.13
G1.14
G1.15
G1.16
G1.17
G1.18
G1.19
G1.20
G1.21

Chapter G2

G2.5
G2.6
G2.7
G2.8
G2.9
G2.10
G2.11
G2.12

About the memory system architecture ...........................................................
Address space ................................................................................................
Mixed-endian support ......................................................................................
AArch32 cache and branch predictor support .................................................
System register support for IMPLEMENTATION DEFINED memory features
External aborts ................................................................................................
Memory barrier instructions .............................................................................
Pseudocode description of general memory system instructions ...................

G3-4806
G3-4807
G3-4808
G3-4809
G3-4834
G3-4835
G3-4837
G3-4838

The AArch32 Virtual Memory System Architecture
G4.1
G4.2
G4.3
G4.4
G4.5
G4.6
G4.7
G4.8
G4.9
G4.10
G4.11
G4.12
G4.13
G4.14
G4.15
G4.16

x

About self-hosted debug ................................................................................. G2-4738
The debug exception enable controls ............................................................. G2-4742
Routing debug exceptions ............................................................................... G2-4743
Enabling debug exceptions from the current Privilege level and Security state .............
G2-4745
The effect of powerdown on debug exceptions ............................................... G2-4747
Summary of permitted routing and enabling of debug exceptions .................. G2-4748
Pseudocode description of debug exceptions ................................................. G2-4750
Breakpoint Instruction exceptions ................................................................... G2-4751
Breakpoint exceptions ..................................................................................... G2-4754
Watchpoint exceptions .................................................................................... G2-4781
Vector Catch exceptions ................................................................................. G2-4795
Synchronization and debug exceptions .......................................................... G2-4803

The AArch32 System Level Memory Model
G3.1
G3.2
G3.3
G3.4
G3.5
G3.6
G3.7
G3.8

Chapter G4

G1-4597
G1-4600
G1-4602
G1-4604
G1-4614
G1-4620
G1-4622
G1-4642
G1-4645
G1-4650
G1-4661
G1-4684
G1-4688
G1-4693
G1-4696
G1-4702

AArch32 Self-hosted Debug
G2.1
G2.2
G2.3
G2.4

Chapter G3

Security state ..................................................................................................
Security state, Exception levels, and AArch32 execution privilege .................
Virtualization ....................................................................................................
AArch32 PE modes, and general-purpose and Special-purpose registers .....
Process state, PSTATE ..................................................................................
Instruction set states .......................................................................................
Handling exceptions that are taken to an Exception level using AArch32 ......
Routing of aborts taken to AArch32 state .......................................................
Exception return to an Exception level using AArch32 ...................................
Asynchronous exception behavior for exceptions taken from AArch32 state .
AArch32 state exception descriptions .............................................................
Reset into AArch32 state ................................................................................
Mechanisms for entering a low-power state ....................................................
The AArch32 System register interface ..........................................................
Advanced SIMD and floating-point support .....................................................
Configurable instruction enables and disables, and trap controls ...................

About VMSAv8-32 ...........................................................................................
The effects of disabling address translation stages on VMSAv8-32 behavior
Translation tables ............................................................................................
The VMSAv8-32 Short-descriptor translation table format ..............................
The VMSAv8-32 Long-descriptor translation table format ..............................
Memory access control ...................................................................................
Memory region attributes ................................................................................
Translation Lookaside Buffers (TLBs) .............................................................
TLB maintenance requirements ......................................................................
Caches in VMSAv8-32 ....................................................................................
VMSAv8-32 memory aborts ............................................................................
Exception reporting in a VMSAv8-32 implementation .....................................
Address translation instructions ......................................................................
Pseudocode description of VMSAv8-32 memory system operations .............
About the System registers for VMSAv8-32 ....................................................
Functional grouping of VMSAv8-32 System registers .....................................

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

G4-4842
G4-4850
G4-4854
G4-4859
G4-4868
G4-4888
G4-4899
G4-4911
G4-4915
G4-4929
G4-4932
G4-4944
G4-4963
G4-4970
G4-4972
G4-4977

ARM DDI 0487C.a
ID121917

Contents

Chapter G5

The Generic Timer in AArch32 state
G5.1
G5.2

Chapter G6

AArch32 System Register Encoding
G6.1
G6.2
G6.3

Chapter G7

Chapter H1

H2-5696
H2-5697
H2-5704
H2-5707
H2-5733

Introduction to Halting debug events ...............................................................
Halting Step debug events ..............................................................................
Halt Instruction debug event ...........................................................................
Exception Catch debug event .........................................................................
External Debug Request debug event ............................................................
OS Unlock Catch debug event ........................................................................
Reset Catch debug events ..............................................................................
Software Access debug event .........................................................................
Synchronization and Halting debug events .....................................................

H3-5736
H3-5738
H3-5748
H3-5749
H3-5752
H3-5753
H3-5754
H3-5755
H3-5756

The Debug Communication Channel and Instruction Transfer Register
Introduction .....................................................................................................
DCC and ITR registers ....................................................................................
DCC and ITR access modes ...........................................................................
Flow control of the DCC and ITR registers .....................................................
Synchronization of DCC and ITR accesses ....................................................
Interrupt-driven use of the DCC ......................................................................
Pseudocode description of the operation of the DCC and ITR registers ........

H4-5760
H4-5761
H4-5764
H4-5768
H4-5772
H4-5778
H4-5779

The Embedded Cross-Trigger Interface
H5.1
H5.2
H5.3

ARM DDI 0487C.a
ID121917

About Debug state ..........................................................................................
Halting the PE on debug events ......................................................................
Entering Debug state ......................................................................................
Behavior in Debug state ..................................................................................
Exiting Debug state .........................................................................................

Halting Debug Events

H4.1
H4.2
H4.3
H4.4
H4.5
H4.6
H4.7

Chapter H5

Introduction to external debug ......................................................................... H1-5692
External debug ................................................................................................ H1-5693
Required debug authentication ....................................................................... H1-5694

Debug State

H3.1
H3.2
H3.3
H3.4
H3.5
H3.6
H3.7
H3.8
H3.9

Chapter H4

G7-5012
G7-5027
G7-5502
G7-5594
G7-5646

About External Debug

H2.1
H2.2
H2.3
H2.4
H2.5

Chapter H3

About the AArch32 System registers ..............................................................
General system control registers ....................................................................
Debug registers ...............................................................................................
Performance Monitors registers ......................................................................
Generic Timer registers ...................................................................................

External Debug
H1.1
H1.2
H1.3

Chapter H2

The AArch32 System register encoding space ............................................... G6-4992
VMSAv8-32 organization of registers in the (coproc==0b1110) encoding space ..........
G6-4993
VMSAv8-32 organization of registers in the (coproc==0b1111) encoding space ..........
G6-4996

AArch32 System Register Descriptions
G7.1
G7.2
G7.3
G7.4
G7.5

Part H

About the Generic Timer in AArch32 state ...................................................... G5-4980
The AArch32 view of the Generic Timer ......................................................... G5-4984

About the Embedded Cross-Trigger (ECT) ..................................................... H5-5782
Basic operation on the ECT ............................................................................ H5-5784
Cross-triggers on a PE in an ARMv8 implementation ..................................... H5-5788

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

xi

Contents

H5.4
H5.5
H5.6

Chapter H6

Debug Reset and Powerdown Support
H6.1
H6.2
H6.3
H6.4
H6.5
H6.6

Chapter H7

Chapter I1

Chapter J1

About the Generic Timer specification .............................................................. I2-5994
Memory-mapped counter module ..................................................................... I2-5996
Memory-mapped timer components ................................................................. I2-6000

About the external interface to the Performance Monitors registers ................. I3-6006

External System Control Register Descriptions
About the external system control register descriptions ....................................
External Performance Monitors registers summary ..........................................
Performance Monitors external register descriptions ........................................
Generic Timer memory-mapped registers overview .........................................
Generic Timer memory-mapped register descriptions ......................................

I4-6014
I4-6015
I4-6018
I4-6086
I4-6087

Architectural Pseudocode
ARMv8 Pseudocode
J1.1

xii

Supported access sizes .................................................................................... I1-5988
Synchronization of memory-mapped registers .................................................. I1-5990

Recommended External Interface to the Performance Monitors

I4.1
I4.2
I4.3
I4.4
I4.5

Part J

About the debug registers ............................................................................... H9-5840
External debug registers ................................................................................. H9-5841
Cross-Trigger Interface registers .................................................................... H9-5942

System Level Implementation of the Generic Timer

I3.1

Chapter I4

H8-5816
H8-5817
H8-5818
H8-5822
H8-5824
H8-5828
H8-5833
H8-5835

Requirements for Memory-mapped Components

I2.1
I2.2
I2.3

Chapter I3

Relationship between external debug and System registers ..........................
Endianness and supported access sizes ........................................................
Synchronization of changes to the external debug registers ..........................
Memory-mapped accesses to the external debug interface ...........................
External debug interface register access permissions ....................................
External debug interface registers ..................................................................
Cross-trigger interface registers .....................................................................
External debug register resets ........................................................................

Memory-mapped Components of the ARMv8 Architecture
I1.1
I1.2

Chapter I2

About the PC Sample-based Profiling Extension ............................................ H7-5812

External Debug Register Descriptions
H9.1
H9.2
H9.3

Part I

H6-5798
H6-5799
H6-5800
H6-5803
H6-5805
H6-5809

About the External Debug Registers
H8.1
H8.2
H8.3
H8.4
H8.5
H8.6
H8.7
H8.8

Chapter H9

About Debug over powerdown ........................................................................
Power domains and debug .............................................................................
Core power domain power states ...................................................................
Emulating low-power states ............................................................................
Debug OS Save and Restore sequences .......................................................
Reset and debug .............................................................................................

The PC Sample-based Profiling Extension
H7.1

Chapter H8

Description and allocation of CTI triggers ....................................................... H5-5789
CTI registers programmers’ model .................................................................. H5-5793
Examples ........................................................................................................ H5-5794

Pseudocode for AArch64 operations ............................................................... J1-6132

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

Contents

J1.2
J1.3

Part K
Appendix K1

Appendixes
Architectural Constraints on UNPREDICTABLE behaviors
K1.1
K1.2

Appendix K2

Using memory access mode in AArch64 state ............................................... K9-6510

Use of the Advanced SIMD complex number instructions ............................ K10-6514
Use of the ARMv8.2 extensions to the Cryptographic Extension .................. K10-6516

Barrier Litmus Tests
K11.1
K11.2
K11.3
K11.4
K11.5
K11.6

ARM DDI 0487C.a
ID121917

Save Debug registers ...................................................................................... K8-6504
Restore Debug registers ................................................................................. K8-6506

Software Usage Examples
K10.1
K10.2

Appendix K11

AArch64 Address translation examples .......................................................... K7-6480
AArch32 Address translation examples .......................................................... K7-6493

Recommended Upload and Download Processes for External Debug
K9.1

Appendix K10

Legacy Instruction Syntax ............................................................................... K6-6472

Example OS Save and Restore Sequences
K8.1
K8.2

Appendix K9

Providing a complete set of features in a system level implementation .......... K5-6468
Gray-count scheme for timer distribution scheme ........................................... K5-6470

Address translation examples
K7.1
K7.2

Appendix K8

ARM recommendations for reporting memory attributes on an interconnect .. K4-6466

Legacy Instruction Syntax for AArch32 Instruction Sets
K6.1

Appendix K7

ARM recommendations for IMPLEMENTATION DEFINED event numbers ... K3-6448
Summary of events for exceptions taken to an Exception level using AArch64 ............
K3-6462

Additional Information for Implementations of the Generic Timer
K5.1
K5.2

Appendix K6

K2-6430
K2-6434
K2-6435
K2-6436

Recommendations for reporting memory attributes on an interconnect
K4.1

Appendix K5

About the recommended external debug interface .........................................
PMUEVENT bus .............................................................................................
Recommended authentication interface ..........................................................
Management registers and CoreSight compliance .........................................

Recommendations for Performance Monitors Event Numbers for
IMPLEMENTATION DEFINED Events
K3.1
K3.2

Appendix K4

AArch32 CONSTRAINED UNPREDICTABLE behaviors ............................... K1-6390
AArch64 CONSTRAINED UNPREDICTABLE behaviors ............................... K1-6413

Recommended External Debug Interface
K2.1
K2.2
K2.3
K2.4

Appendix K3

Pseudocode for AArch32 operation ................................................................. J1-6218
Shared pseudocode ......................................................................................... J1-6295

Introduction ...................................................................................................
Load-Acquire, Store-Release and barriers ....................................................
Load-Acquire Exclusive, Store-Release Exclusive and barriers ...................
Using a mailbox to send an interrupt .............................................................
Cache and TLB maintenance instructions and barriers ................................
ARMv7 compatible approaches for ordering, using DMB and DSB barriers .

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

K11-6524
K11-6527
K11-6531
K11-6536
K11-6537
K11-6549

xiii

Contents

Appendix K12

ARM Pseudocode Definition
K12.1
K12.2
K12.3
K12.4
K12.5
K12.6
K12.7
K12.8

Appendix K13

About the ARM pseudocode .........................................................................
Pseudocode for instruction descriptions .......................................................
Data types .....................................................................................................
Operators ......................................................................................................
Statements and control structures ................................................................
Built-in functions ............................................................................................
Miscellaneous helper procedures and functions ...........................................
ARM pseudocode definition index .................................................................

K12-6564
K12-6565
K12-6567
K12-6572
K12-6578
K12-6583
K12-6586
K12-6588

Registers Index
K13.1
K13.2
K13.3
K13.4
K13.5
K13.6
K13.7

Introduction and register disambiguation ......................................................
Alphabetical index of AArch64 registers and system instructions .................
Functional index of AArch64 registers and system instructions ....................
Alphabetical index of AArch32 registers and system instructions .................
Functional index of AArch32 registers and system instructions ....................
Alphabetical index of memory-mapped registers ..........................................
Functional index of memory-mapped registers .............................................

K13-6592
K13-6597
K13-6607
K13-6618
K13-6626
K13-6636
K13-6641

Glossary

xiv

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

Preface

This preface introduces the ARM Architecture Reference Manual, ARMv8, for ARMv8-A architecture profile. It
contains the following sections:
•
About this manual on page xvi.
•
Using this manual on page xviii.
•
Conventions on page xxiv.
•
Additional reading on page xxvi.
•
Feedback on page xxvii.

Note
This document describes only the ARMv8-A architecture profile. For the behaviors required by the ARMv7-A and
ARMv7-R architecture profiles, see the ARM® Architecture Reference Manual, ARMv7-A and ARMv7-R edition.

ARM DDI 0487C.a
ID121917

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

xv

Preface
About this manual

About this manual
This manual describes the ARM® architecture v8, ARMv8. The architecture describes the operation of an
ARMv8-A Processing element (PE), and this manual includes descriptions of:
•

The two Execution states, AArch64 and AArch32.

•

The instruction sets:
—

In AArch32 state, the A32 and T32 instruction sets, that are compatible with earlier versions of the
ARM architecture.

—

In AArch64 state, the A64 instruction set.

•

The states that determine how a PE operates, including the current Exception level and Security state, and in
AArch32 state the PE mode.

•

The Exception model.

•

The interprocessing model, that supports transitioning between AArch64 state and AArch32 state.

•

The memory model, that defines memory ordering and memory management. This manual covers a single
architecture profile, ARMv8-A, that defines a Virtual Memory System Architecture (VMSA).

•

The programmers’ model, and its interfaces to System registers that control most PE and memory system
features, and provide status information.

•

The Advanced SIMD and floating-point instructions, that provide high-performance:
—
Single-precision, half-precision, and double-precision floating-point operations.
—
Conversions between double-precision, single-precision, and half-precision floating-point values.
—
Integer, single-precision floating-point, half-precision floating-point, and in A64, double-precision
vector operations in all instruction sets.
—
Single-precision, half-precision, and double-precision floating-point vector operations in the A64
instruction set.

•

The security model, that provides two security states to support secure applications.

•

The virtualization model, that support the virtualization of Non-secure operation.

•

The Debug architecture, that provides software access to debug features.

This manual gives the assembler syntax for the instructions it describes, meaning that it describes instructions in
textual form. However, this manual is not a tutorial for ARM assembler language, nor does it describe ARM
assembler language, except at a very basic level. To make effective use of ARM assembler language, read the
documentation supplied with the assembler being used.
This manual is organized into parts:

xvi

Part A

Provides an introduction to the ARMv8-A architecture, and an overview of the AArch64 and
AArch32 Execution states.

Part B

Describes the application level view of the AArch64 Execution state, meaning the view from EL0.
It describes the application level view of the programmers’ model and the memory model.

Part C

Describes the A64 instruction set, that is available in the AArch64 Execution state. The descriptions
for each instruction also include the precise effects of each instruction when executed at EL0,
described as unprivileged execution, including any restrictions on its use, and how the effects of the
instruction differ at higher Exception levels. This information is of primary importance to authors
and users of compilers, assemblers, and other programs that generate ARM machine code.

Part D

Describes the system level view of the AArch64 Execution state. It includes details of the System
registers, most of which are not accessible from EL0, and the system level view of the programmers’
model and the memory model. This part includes the description of self-hosted debug.

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

Preface
About this manual

Part E

Describes the application level view of the AArch32 Execution state, meaning the view from the
EL0. It describes the application level view of the programmers’ model and the memory model.

Note
In AArch32 state, execution at EL0 is execution in User mode.
Part F

Describes the T32 and A32 instruction sets, that are available in the AArch32 Execution state. These
instruction sets are backwards-compatible with earlier versions of the ARM architecture. This part
describes the precise effects of each instruction when executed in User mode, described as
unprivileged execution or execution at EL0, including any restrictions on its use, and how the effects
of the instruction differ at higher Exception levels. This information is of primary importance to
authors and users of compilers, assemblers, and other programs that generate ARM machine code.

Note
User mode is the only mode where software execution is unprivileged.
Part G

Describes the system level view of the AArch32 Execution state, that is generally compatible with
earlier versions of the ARM architecture. This part includes details of the System registers, most of
which are not accessible from EL0, and the instruction interface to those registers. It also describes
the system level view of the programmers’ model and the memory model.

Part H

Describes the Debug architecture for external debug. This provides configuration, breakpoint and
watchpoint support, and a Debug Communications Channel (DCC) to a debug host.

Part I

Describes additional features of the architecture that are not closely coupled to a processing element
(PE), and therefore are accessed through memory-mapped interfaces. Some of these features are
OPTIONAL.

Part J

Provides pseudocode that describes various features of the ARMv8 architecture.

Part K, Appendixes
Provide additional information. Some appendixes give information that is not part of the ARMv8
architectural requirements. The cover page of each appendix indicates its status.
Glossary

Defines terms used in this document that have a specialized meaning.

Note
Terms that are generally well understood in the microelectronics industry are not included in the
Glossary.

ARM DDI 0487C.a
ID121917

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

xvii

Preface
Using this manual

Using this manual
The information in this manual is organized into parts, as described in this section.

Part A, Introduction and Architecture Overview
Part A gives an overview of the ARMv8-A architecture profile, including its relationship to the other ARM PE
architectures. It introduces the terminology used to describe the architecture, and gives an overview of the
Executions states, AArch64 and AArch32. It contains the following chapter:
Chapter A1 Introduction to the ARMv8 Architecture
Read this for an introduction to the ARMv8 architecture.

Part B, The AArch64 Application Level Architecture
Part B describes the AArch64 state application level view of the architecture. It contains the following chapters:
Chapter B1 The AArch64 Application Level Programmers’ Model
Read this for an application level description of the programmers’ model for software executing in
AArch64 state. It describes execution at EL0 when EL0 is using AArch64 state.
Chapter B2 The AArch64 Application Level Memory Model
Read this for an application level description of the memory model for software executing in
AArch64 state. It describes the memory model for execution in EL0 when EL0 is using AArch64
state. It includes information about ARM memory types, attributes, and memory access controls.

Part C, The A64 Instruction Set
Part C describes the A64 instruction set, that is used in AArch64 state. It contains the following chapters:
Chapter C1 The A64 Instruction Set
Read this for a description of the A64 instruction set and common instruction operation details.
Chapter C2 About the A64 Instruction Descriptions
Read this to understand the format of the A64 instruction descriptions.
Chapter C3 A64 Instruction Set Overview
Read this for an overview of the individual A64 instructions, that are divided into five functional
groups.
Chapter C4 A64 Instruction Set Encoding
Read this for a description of the A64 instruction set encoding.
Chapter C5 The A64 System Instruction Class
Read this for a description of the AArch64 system instructions and register descriptions, and the
system instruction class encoding space.
Chapter C6 A64 Base Instruction Descriptions
Read this for information on key aspects of the A64 base instructions and for descriptions of the
individual instructions, which are listed in alphabetical order.
Chapter C7 A64 Advanced SIMD and Floating-point Instruction Descriptions
Read this for information on key aspects of the A64 Advanced SIMD and floating-point instructions
and for descriptions of the individual instructions, which are listed in alphabetical order.

xviii

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

Preface
Using this manual

Part D, The AArch64 System Level Architecture
Part D describes the AArch64 state system level view of the architecture. It contains the following chapters:
Chapter D1 The AArch64 System Level Programmers’ Model
Read this for a description of the AArch64 state system level view of the programmers’ model.
Chapter D2 AArch64 Self-hosted Debug
Read this for an introduction to, and a description of, self-hosted debug in AArch64 state.
Chapter D3 The AArch64 System Level Memory Model
Read this for a description of the AArch64 state system level view of the general features of the
memory system.
Chapter D4 The AArch64 Virtual Memory System Architecture
Read this for a system level view of the AArch64 Virtual Memory System Architecture (VMSA),
the memory system architecture of an ARMv8 implementation that is executing in AArch64 state.
Chapter D5 The Performance Monitors Extension
Read this for a description of an implementation of the ARM Performance Monitors, that are an
optional non-invasive debug component.
Chapter D6 The Statistical Profiling Extension
Read this for a description of an implementation of the Statistical Profiling Extension, that is an
optional AArch64 state non-invasive debug component.
Chapter D7 Statistical Profiling Extension Sample Record Specification
Read this for a description the sample records generated by the Statistical Profiling Extension.
Chapter D8 The Generic Timer in AArch64 state
Read this for a description of the AArch64 view of an implementation of the ARM Generic Timer.
Chapter D9 AArch64 System Register Encoding
Read this for a description of the description of the encoding of the AArch64 System registers, and
the other uses of the AArch64 System registers encoding space.
Chapter D10 AArch64 System Register Descriptions
Read this for an introduction to, and description of, each of the AArch64 System registers.

Part E, The AArch32 Application Level Architecture
Part E describes the AArch32 state application level view of the architecture. It contains the following chapters:
Chapter E1 The AArch32 Application Level Programmers’ Model
Read this for an application level description of the programmers’ model for software executing in
AArch32 state. It describes execution at EL0 when EL0 is using AArch32 state.
Chapter E2 The AArch32 Application Level Memory Model
Read this for an application level description of the memory model for software executing in
AArch32 state. It describes the memory model for execution in EL0 when EL0 is using AArch32
state. It includes information about ARM memory types, attributes, and memory access controls.

ARM DDI 0487C.a
ID121917

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

xix

Preface
Using this manual

Part F, The AArch32 Instruction Sets
Part F describes the T32 and A32 instruction sets, that are used in AArch32 state. It contains the following chapters:
Chapter F1 The AArch32 Instruction Sets Overview
Read this for an overview of the T32 and A32 instruction sets.
Chapter F2 About the T32 and A32 Instruction Descriptions
Read this to understand the format of the T32 and A32 instruction descriptions.
Chapter F3 T32 Instruction Set Encoding
Read this for a description of the T32 instruction set encoding. This includes the T32 encoding of
the Advanced SIMD and floating-point instructions.
Chapter F4 A32 Instruction Set Encoding
Read this for a description of the A32 instruction set encoding. This includes the A32 encoding of
the Advanced SIMD and floating-point instructions.
Chapter F5 T32 and A32 Base Instruction Set Instruction Descriptions
Read this for a description of each of the T32 and A32 base instructions.
Chapter F6 T32 and A32 Advanced SIMD and Floating-point Instruction Descriptions
Read this for a description of each of the T32 and A32 Advanced SIMD and floating-point
instructions.

Part G, The AArch32 System Level Architecture
Part G describes the AArch32 state system level view of the architecture. It contains the following chapters:
Chapter G1 The AArch32 System Level Programmers’ Model
Read this for a description of the AArch32 state system level view of the programmers’ model for
execution in an Exception level that is using AArch32.
Chapter G2 AArch32 Self-hosted Debug
Read this for an introduction to, and a description of, self-hosted debug in AArch64 state.
Chapter G3 The AArch32 System Level Memory Model
Read this for a system level view of the general features of the memory system.
Chapter G4 The AArch32 Virtual Memory System Architecture
Read this for a description of the AArch32 Virtual Memory System Architecture (VMSA).
Chapter G5 The Generic Timer in AArch32 state
Read this for a description of the AArch32 view of an implementation of the ARM Generic Timer.
Chapter G6 AArch32 System Register Encoding
Read this for a description of the description of the encoding of the AArch32 System registers,
including the System instructions that are part of the the AArch32 System registers encoding space.
Chapter G7 AArch32 System Register Descriptions
Read this for a description of each of the AArch32 System registers.

xx

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

Preface
Using this manual

Part H, External Debug
Part H describes the architecture for external debug. It contains the following chapters:
Chapter H1 About External Debug
Read this for an introduction to external debug, and a definition of the scope of this part of the
manual.
Chapter H2 Debug State
Read this for a description of debug state, which the PE might enter as the result of a Halting debug
event.
Chapter H3 Halting Debug Events
Read this for a description of the external debug events referred to as Halting debug events.
Chapter H4 The Debug Communication Channel and Instruction Transfer Register
Read this for a description of the communication between a debugger and the PE debug logic using
the Debug Communications Channel and the Instruction Transfer register.
Chapter H5 The Embedded Cross-Trigger Interface
Read this for a description of the embedded cross-trigger interface.
Chapter H6 Debug Reset and Powerdown Support
Read this for a description of reset and powerdown support in the Debug architecture.
Chapter H7 The PC Sample-based Profiling Extension
Read this for a description of the PC Sample-based Profiling Extension that is an OPTIONAL
extension to an ARMv8 implementation.
Chapter H8 About the External Debug Registers
Read this for some additional information about the external debug registers.
Chapter H9 External Debug Register Descriptions
Read this for a description of each external debug register.

Part I, Memory-mapped Components of the ARMv8 Architecture
Part I describes the memory-mapped components in the architecture. It contains the following chapters:
Chapter I1 Requirements for Memory-mapped Components
Read this for descriptions of some general requirements for memory-mapped components within a
system that complies with the ARMv8 Architecture.
Chapter I2 System Level Implementation of the Generic Timer
Read this for a definition of a system level implementation of the Generic Timer.
Chapter I3 Recommended External Interface to the Performance Monitors
Read this for a description of the recommended memory-mapped and external debug interfaces to
the Performance Monitors.
Chapter I4 External System Control Register Descriptions
Read this for a description of each memory-mapped system control register.

ARM DDI 0487C.a
ID121917

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

xxi

Preface
Using this manual

Part J, Architectural Pseudocode
Part J contains pseudocode that describes various features of the ARM architecture. It contains the following
chapter:
Chapter J1 ARMv8 Pseudocode
Read this for the pseudocode definitions that describe various features of the ARMv8 architecture,
for operation in AArch64 state and in AArch32 state.

Part K, Appendixes
This manual contains the following appendixes:
Appendix K1 Architectural Constraints on UNPREDICTABLE behaviors
Read this for a description of the architecturally-required constraints on UNPREDICTABLE behaviors
in the ARMv8 architecture, including AArch32 behaviors that were UNPREDICTABLE in previous
versions of the architecture.
Appendix K2 Recommended External Debug Interface
Read this for a description of the recommended external debug interface.

Note
This description is not part of the ARM architecture specification. It is included here as
supplementary information, for the convenience of developers and users who might require this
information.
Appendix K3 Recommendations for Performance Monitors Event Numbers for IMPLEMENTATION
DEFINED Events
Read this for a description of ARM recommendations for the use of the IMPLEMENTATION DEFINED
event numbers.

Note
This description is not part of the ARM architecture specification. It is included here as
supplementary information, for the convenience of developers and users who might require this
information.
Appendix K4 Recommendations for reporting memory attributes on an interconnect
Read this for the ARM recommendations about how the architectural memory attributes are
reported on an interconnect.
Appendix K5 Additional Information for Implementations of the Generic Timer
Read this for additional information about implementations of the ARM Generic Timer. This
information does not form part of the architectural definition of the Generic Timer.
Appendix K6 Legacy Instruction Syntax for AArch32 Instruction Sets
Read this for information about the pre-UAL syntax of the AArch32 instruction sets, which can still
be valid for the A32 instruction set.
Appendix K7 Address translation examples
Read this for examples of translation table lookups using the translation regimes described in
Chapter D4 The AArch64 Virtual Memory System Architecture and Chapter G4 The AArch32 Virtual
Memory System Architecture.

xxii

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

Preface
Using this manual

Appendix K8 Example OS Save and Restore Sequences
Read this for software examples that perform the OS Save and Restore sequences for an ARMv8
debug implementation.

Note
Chapter H6 Debug Reset and Powerdown Support describes the OS Save and Restore mechanism.
Appendix K9 Recommended Upload and Download Processes for External Debug
Read this for information about implementing and using the ARM architecture.

Note
This description is not part of the ARM architecture specification. It is included here as
supplementary information, for the convenience of developers and users who might require this
information.
Appendix K10 Software Usage Examples
Read this for software examples that help understanding of some aspects of the Arm architecture.

Note
This description is not part of the ARM architecture specification. It is included here as
supplementary information, for the convenience of developers and users who might require this
information.
Appendix K11 Barrier Litmus Tests
Read this for examples of the use of barrier instructions provided by the ARMv8 architecture.

Note
This description is not part of the ARM architecture specification. It is included here as
supplementary information, for the convenience of developers and users who might require this
information.
Appendix K12 ARM Pseudocode Definition
Read this for definitions of the AArch32 pseudocode.
Appendix K13 Registers Index
Read this for an alphabetic and functional index of AArch32 and AArch64 registers, and
memory-mapped registers.

Glossary
Defines terms used in this document that have a specialized meaning.

Note
Terms that are generally well understood in the microelectronics industry are not included in the Glossary.

ARM DDI 0487C.a
ID121917

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

xxiii

Preface
Conventions

Conventions
The following sections describe conventions that this book can use:
•
Typographic conventions.
•
Signals on page xxv.
•
Numbers on page xxv.
•
Pseudocode descriptions on page xxv.
•
Assembler syntax descriptions on page xxv.

Typographic conventions
The typographical conventions are:
italic

Introduces special terminology, and denotes citations.

bold

Denotes signal names, and is used for terms in descriptive lists, where appropriate.

monospace

Used for assembler syntax descriptions, pseudocode, and source code examples.
Also used in the main text for instruction mnemonics and for references to other items appearing in
assembler syntax descriptions, pseudocode, and source code examples.

SMALL CAPITALS

Used in body text for a few terms that have specific technical meanings, and are defined in the
Glossary.
Colored text

{ and }

Indicates a link. This can be:
•

A URL, for example http://infocenter.arm.com.

•

A cross-reference, that includes the page number of the referenced information if it is not on
the current page, for example, Assembler syntax descriptions on page xxv.

•

A link, to a chapter or appendix, or to a glossary entry, or to the section of the document that
defines the colored term, for example Simple sequential execution or SCTLR.

Braces, { and }, have two distinct uses:
Optional items
In syntax descriptions braces enclose optional items. In the following example they
indicate that the  parameter is optional:
ADD , , #{, }

Similarly they can be used in generalized field descriptions, for example
TCR_ELx.{I}PS refers to a field in the TCR_ELx registers that is called either IPS or
PS.
Sets of items
Braces can be used to enclose sets. For example, HCR_EL2.{E2H, TGE} refers to a set
of two register fields, HCR_EL2.E2H and HCR_EL2.TGE
Notes

Notes are formatted as:

Note
This is a Note.
In this Manual, Notes are used only to provide additional information, usually to help understanding
of the text. While a Note may repeat architectural information given elsewhere in the Manual, a
Note never provides any part of the definition of the architecture.

xxiv

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

Preface
Conventions

Signals
In general this specification does not define hardware signals, but it does include some signal examples and
recommendations. The signal conventions are:
Signal level

The level of an asserted signal depends on whether the signal is active-HIGH or
active-LOW. Asserted means:
•
HIGH for active-HIGH signals.
•
LOW for active-LOW signals.

Lower-case n

At the start or end of a signal name denotes an active-LOW signal.

Numbers
Numbers are normally written in decimal. Binary numbers are preceded by 0b, and hexadecimal numbers by 0x. In
both cases, the prefix and the associated value are written in a monospace font, for example 0xFFFF0000. To improve
readability, long numbers can be written with an underscore separator between every four characters, for example
0xFFFF_0000_0000_0000. Ignore any underscores when interpreting the value of a number.

Pseudocode descriptions
This manual uses a form of pseudocode to provide precise descriptions of the specified functionality. This
pseudocode is written in monospace font, and is described in Appendix K12 ARM Pseudocode Definition.

Assembler syntax descriptions
This manual contains numerous syntax descriptions for assembler instructions and for components of assembler
instructions. These are shown in a monospace font, and use the conventions described in Structure of the A64
assembler language on page C1-143, Appendix K12 ARM Pseudocode Definition, and Pseudocode operators and
keywords on page K12-5648.

ARM DDI 0487C.a
ID121917

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

xxv

Preface
Additional reading

Additional reading
This section lists relevant publications from ARM and third parties.
See the Infocenter, http://infocenter.arm.com, for access to ARM documentation.

ARM publications
•

ARM® AMBA® 4 ATB Protocol Specification, ATBv1.0 and ATBv1.1, (ARM IHI 0032B).

•

ARM® Architecture Reference Manual, ARMv7-A and ARMv7-R edition (ARM DDI 0406).

•

ARM® Architecture Reference Manual Supplement, ARMv8, for the ARMv8-R AArch32 architecture profile
(ARM DDI 0568).

•

ARM® Debug Interface Architecture Specification, ADIv6.0 (ARM IHI 0074).

•

ARM® Debug Interface Architecture Specification, ADIv5.0 to ADIv5.2 (ARM IHI 0031).

•

ARM® Embedded Trace Macrocell Architecture Specification, ETMv4 (ARM IHI 0064).

•

ARM® Generic Interrupt Controller Architecture Specification, GIC architecture version 3.0 and version 4.0
(ARM IHI 0069).

•

ARM® CoreSight™ SoC Technical Reference Manual (ARM DDI 0480).

•

ARM® CoreSight™ Architecture Specification (ARM IHI 0029).

•

ARM® Procedure Call Standard for the ARM 64-bit Architecture (ARM IHI 0055).

•

ARM® Reliability, Availability, and Serviceability (RAS) Specification, ARMv8, for the ARMv8-A architecture
profile (ARM DDI 0587).

•

ARM® Architecture Reference Manual Supplement, The Scalable Vector Extension (SVE), for ARMv8-A
(ARM DDI 0584).

Other publications
The following publications are referred to in this manual, or provide more information:
•

Announcing the Advanced Encryption Standard (AES), Federal Information Processing Standards
Publication 197, November 2001.

•

IEEE Std 754-2008, IEEE Standard for Floating-point Arithmetic, August 2008.

•

IEEE Std 754-1985, IEEE Standard for Floating-point Arithmetic, March 1985.

•

Secure Hash Standard (SHA), Federal Information Processing Standards Publication 180-2, August 2002.

•

The Galois/Counter Mode of Operation, McGraw, D. and Viega, J., Submission to NIST Modes of Operation
Process, January 2004.

•

Memory Consistency Models for Shared Memory-Multiprocessors, Gharachorloo, Kourosh, 1995, Stanford
University Technical Report CSL-TR-95-685.

•

Standard Manufacturer’s Identification Code, JEP106, JEDEC Solid State Technology Association.

•

SM3 Cryptographic Hash Algorithm, China Internet Network Information Center (CNNIC).

•

SM4 Block Cipher Algorithm, China Internet Network Information Center (CNNIC).

•

The QARMA Block Cipher Family, Roberto Avanzi, Qualcomm Product Security Initiative.
Available from https://eprint.iacr.org/2016/444.

xxvi

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

Preface
Feedback

Feedback
ARM welcomes feedback on its documentation.

Feedback on this manual
If you have comments on the content of this manual, send e-mail to errata@arm.com. Give:
•
The title.
•
The number, ARM DDI 0487C.a.
•
The page numbers to which your comments apply.
•
A concise explanation of your comments.
ARM also welcomes general suggestions for additions and improvements.

ARM DDI 0487C.a
ID121917

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

xxvii

Preface
Feedback

xxviii

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

Part A
ARMv8 Architecture Introduction and Overview

Chapter A1
Introduction to the ARMv8 Architecture

This chapter introduces the ARM architecture. It contains the following sections:
•
About the ARM architecture on page A1-32.
•
Architecture profiles on page A1-34.
•
ARMv8 architectural concepts on page A1-36.
•
Supported data types on page A1-40.
•
Advanced SIMD and floating-point support on page A1-50.
•
The ARM memory model on page A1-56.
•
ARMv8 architecture extensions on page A1-57.

ARM DDI 0487C.a
ID121917

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

A1-31

A1 Introduction to the ARMv8 Architecture
A1.1 About the ARM architecture

A1.1

About the ARM architecture
The ARM architecture described in this Architecture Reference Manual defines the behavior of an abstract machine,
referred to as a processing element, often abbreviated to PE. Implementations compliant with the ARM architecture
must conform to the described behavior of the processing element. It is not intended to describe how to build an
implementation of the PE, nor to limit the scope of such implementations beyond the defined behaviors.
Except where the architecture specifies differently, the programmer-visible behavior of an implementation that is
compliant with the ARM architecture must be the same as a simple sequential execution of the program on the
processing element. This programmer-visible behavior does not include the execution time of the program.
The ARM Architecture Reference Manual also describes rules for software to use the processing element.
The ARM architecture includes definitions of:
•

An associated debug architecture, see:
—
Chapter D2 AArch64 Self-hosted Debug.
—
Chapter G2 AArch32 Self-hosted Debug.
—
Part H of this manual, External Debug on page 5689.

•

Associated trace architectures that define trace macrocells that implementers can implement with the
associated processor hardware. For more information, see the Embedded Trace Macrocell Architecture
Specification.

The ARM architecture is a Reduced Instruction Set Computer (RISC) architecture with the following RISC
architecture features:
•

A large uniform register file.

•

A load/store architecture, where data-processing operations only operate on register contents, not directly on
memory contents.

•

Simple addressing modes, with all load/store addresses determined from register contents and instruction
fields only.

The architecture defines the interaction of the PE with memory, including caches, and includes a memory translation
system. It also describes how multiple PEs interact with each other and with other observers in a system.
This document defines the ARMv8-A architecture profile. See Architecture profiles on page A1-34 for more
information.
The ARM architecture supports implementations across a wide range of performance points. Implementation size,
performance, and very low power consumption are key attributes of the ARM architecture.
An important feature of the ARMv8 architecture is backwards compatibility, combined with the freedom for optimal
implementation in a wide range of standard and more specialized use cases. The ARMv8 architecture supports:
•
A 64-bit Execution state, AArch64.
•
A 32-bit Execution state, AArch32, that is compatible with previous versions of the ARM architecture.

Note
•

The AArch32 Execution state is compatible with the ARMv7-A architecture profile, and enhances that
profile to support some features included in the AArch64 Execution state.

•

This document describes only the ARMv8-A architecture profile. For the behaviors required by the
ARMv7-A and ARMv7-R architecture profiles, see the ARM® Architecture Reference Manual, ARMv7-A and
ARMv7-R edition.

Features that are optional are explicitly defined as such in this Manual.

Note
The presence of an ID register field for a feature does not imply that the feature is optional.

A1-32

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

A1 Introduction to the ARMv8 Architecture
A1.1 About the ARM architecture

Both Execution states support SIMD and floating-point instructions:
•

AArch32 state provides:
—
SIMD instructions in the base instruction sets that operate on the 32-bit general-purpose registers.
—
Advanced SIMD instructions that operate on registers in the SIMD and floating-point register
(SIMD&FP register) file.
—
Floating-point instructions that operate on registers in the SIMD&FP register file.

•

AArch64 state provides:
—
Advanced SIMD instructions that operate on registers in the SIMD&FP register file.
—
Floating-point instructions that operate on registers in the SIMD&FP register file.

Note
See Conventions on page xxiv for information about conventions used in this manual, including the use of SMALL
CAPITALS for particular terms that have ARM-specific meanings that are defined in the Glossary.

ARM DDI 0487C.a
ID121917

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

A1-33

A1 Introduction to the ARMv8 Architecture
A1.2 Architecture profiles

A1.2

Architecture profiles
The ARM architecture has evolved significantly since its introduction, and ARM continues to develop it. Eight
major versions of the architecture have been defined to date, denoted by the version numbers 1 to 8. Of these, the
first three versions are now obsolete.
The generic names AArch64 and AArch32 describe the 64-bit and 32-bit Execution states:
AArch64

Is the 64-bit Execution state, meaning addresses are held in 64-bit registers, and instructions in the
base instruction set can use 64-bit registers for their processing. AArch64 state supports the A64
instruction set.

AArch32

Is the 32-bit Execution state, meaning addresses are held in 32-bit registers, and instructions in the
base instruction sets use 32-bit registers for their processing. AArch32 state supports the T32 and
A32 instruction sets.

Note
The Base instruction set comprises the supported instructions other than the Advanced SIMD and floating-point
instructions.
See sections Execution state on page A1-36 and The ARM instruction sets on page A1-37 for more information.
ARM defines three architecture profiles:
A

Application profile, described in this manual:
•

Supports a Virtual Memory System Architecture (VMSA) based on a Memory Management
Unit (MMU).

Note
An ARMv8-A implementation can be called an AArchv8-A implementation.
•
R

Supports the A64, A32, and T32 instruction sets.

Real-time profile:

M

•

Supports a Protected Memory System Architecture (PMSA) based on a Memory Protection
Unit (MPU).

•

Supports the A32 and T32 instruction sets.

Microcontroller profile:
•

Implements a programmers' model designed for low-latency interrupt processing, with
hardware stacking of registers and support for writing interrupt handlers in high-level
languages.

•

Implements a variant of the R-profile PMSA.

•

Supports a variant of the T32 instruction set.

Note
This Architecture Reference Manual describes only the ARMv8-A profile.
For information about the R and M architecture profiles, and earlier ARM architecture versions see:
•
The ARM® Architecture Reference Manual Supplement, ARMv8, for the ARMv8-R AArch32 architecture
profile.
•
The ARM® Architecture Reference Manual, ARMv7-A and ARMv7-R edition.
•
The ARM®v8-M Architecture Reference Manual.
•
The ARM®v7-M Architecture Reference Manual.
•
The ARM®v6-M Architecture Reference Manual.

A1-34

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

A1 Introduction to the ARMv8 Architecture
A1.2 Architecture profiles

A1.2.1

Debug architecture version
The ARM Debug architecture is fully integrated with the architecture, and does not have a separate version number.

ARM DDI 0487C.a
ID121917

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

A1-35

A1 Introduction to the ARMv8 Architecture
A1.3 ARMv8 architectural concepts

A1.3

ARMv8 architectural concepts
ARMv8 introduces major changes to the ARM architecture, while maintaining a high level of consistency with
previous versions of the architecture. The ARMv8 Architecture Reference Manual includes significant changes in
the terminology used to describe the architecture, and this section introduces both the ARMv8 architectural concepts
and the associated terminology.
The following subsections describe key ARMv8 architectural concepts. Each section introduces the corresponding
terms that are used to describe the architecture:
•
Execution state.
•
The ARM instruction sets on page A1-37.
•
System registers on page A1-37.
•
ARMv8 Debug on page A1-38.

A1.3.1

Execution state
The Execution state defines the PE execution environment, including:
•
The supported register widths.
•
The supported instruction sets.
•
Significant aspects of:
—
The exception model.
—
The Virtual Memory System Architecture (VMSA).
—
The programmers’ model.
The Execution states are:
AArch64

AArch32

A1-36

The 64-bit Execution state. This Execution state:
•

Provides 31 64-bit general-purpose registers, of which X30 is used as the procedure link
register.

•

Provides a 64-bit program counter (PC), stack pointers (SPs), and exception link registers
(ELRs).

•

Provides 32 128-bit registers for SIMD vector and scalar floating-point support.

•

Provides a single instruction set, A64. For more information, see The ARM instruction sets
on page A1-37.

•

Defines the ARMv8 Exception model, with up to four Exception levels, EL0 - EL3, that
provide an execution privilege hierarchy, see Exception levels on page D1-1850.

•

Provides support for 64-bit virtual addressing. For more information, including the limits on
address ranges, see Chapter D4 The AArch64 Virtual Memory System Architecture.

•

Defines a number of Process state (PSTATE) elements that hold PE state. The A64
instruction set includes instructions that operate directly on various PSTATE elements.

•

Names each System register using a suffix that indicates the lowest Exception level at which
the register can be accessed.

The 32-bit Execution state. This Execution state:
•

Provides 13 32-bit general-purpose registers, and a 32-bit PC, SP, and link register (LR). The
LR is used as both an ELR and a procedure link register.
Some of these registers have multiple banked instances for use in different PE modes.

•

Provides a single ELR, for exception returns from Hyp mode.

•

Provides 32 64-bit registers for Advanced SIMD vector and scalar floating-point support.

•

Provides two instruction sets, A32 and T32. For more information, see The ARM instruction
sets on page A1-37.

•

Supports the ARMv7-A exception model, based on PE modes, and maps this onto the
ARMv8 Exception model, that is based on the Exception levels.

•

Provides support for 32-bit virtual addressing.

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

A1 Introduction to the ARMv8 Architecture
A1.3 ARMv8 architectural concepts

•

Defines a number of Process state (PSTATE) elements that hold PE state. The A32 and T32
instruction sets include instructions that operate directly on various PSTATE elements, and
instructions that access PSTATE by using the Application Program Status Register (APSR)
or the Current Program Status Register (CPSR).

Later subsections give more information about the different properties of the Execution states.
Transferring control between the AArch64 and AArch32 Execution states is known as interprocessing. The PE can
move between Execution states only on a change of Exception level, and subject to the rules given in
Interprocessing on page D1-1962. This means different software layers, such as an application, an operating system
kernel, and a hypervisor, executing at different Exception levels, can execute in different Execution states.

A1.3.2

The ARM instruction sets
In ARMv8 the possible instruction sets depend on the Execution state:
AArch64

AArch64 state supports only a single instruction set, called A64. This is a fixed-length instruction
set that uses 32-bit instruction encodings.
For information on the A64 instruction set, see Chapter C3 A64 Instruction Set Overview.

AArch32

AArch32 state supports the following instruction sets:
A32

This is a fixed-length instruction set that uses 32-bit instruction encodings.

T32

This is a variable-length instruction set that uses both 16-bit and 32-bit instruction
encodings.

In previous documentation, these instruction sets were called the ARM and Thumb instruction sets.
ARMv8 extends each of these instruction sets. In AArch32 state, the Instruction set state determines
the instruction set that the PE executes.
For information on the A32 and T32 instruction sets, see Chapter F1 The AArch32 Instruction Sets
Overview.
The ARMv8 instruction sets support SIMD and scalar floating-point instructions. See Advanced SIMD and
floating-point support on page A1-50.

A1.3.3

System registers
System registers provide control and status information of architected features.
The System registers use a standard naming format: . to identify specific
registers as well as control and status bits within a register.
Bits can also be described by their numerical position in the form [x:y] or the generic form
bits[x:y].
In addition, in AArch64 state, most register names include the lowest Exception level that can access the register as
a suffix to the register name:
•

_ELx, where x is 0, 1, 2, or 3.

For information about Exception levels, see Exception levels on page D1-1850.
The System registers comprise:
•

ARM DDI 0487C.a
ID121917

The following registers that are described in this manual:
—
General system control registers.
—
Debug registers.
—
Generic Timer registers.
—
Optionally, Performance Monitor registers.

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

A1-37

A1 Introduction to the ARMv8 Architecture
A1.3 ARMv8 architectural concepts

•

•

Optionally, one or more of the following groups of registers that are defined in other ARM architecture
specifications:
—

Trace System registers, as defined in the Embedded Trace Macrocell Architecture Specification,
ETMv4.

—

Scalable Vector Extension System registers, as defined in the ARM® Architecture Reference Manual
Supplement, The Scalable Vector Extension (SVE), for ARMv8-A.

—

Statistical Profiling Extension System registers, as defined in the ARM® Architecture Reference
Manual Supplement, The Statistical Profiling Extension, for ARMv8-A.

—

Generic Interrupt Controller (GIC) System registers, see The ARM Generic Interrupt Controller
System registers.

RAS Extension System registers, as defined in the ARM® Reliability, Availability, and Serviceability (RAS)
Specification, ARMv8, for the ARMv8-A architecture profile. The RAS Extension is a mandatory extension
to the ARMv8.2 architecture, and an optional extension to the ARMv8.0 and the ARMv8.1 architectures.

For information about the AArch64 System registers, see Chapter D10 AArch64 System Register Descriptions.
For information about the AArch32 System registers, see Chapter G7 AArch32 System Register Descriptions.

The ARM Generic Interrupt Controller System registers
From version 3 of the ARM Generic Interrupt Controller architecture, GICv3, the GIC architecture specification
defines a System register interface to some of its functionality. The System register summaries in this manual
include these registers, see:
•

About the GIC System registers on page D9-2345, for more information about the AArch64 GIC System
registers.

•

About the GIC System registers on page G6-5009, for more information about the AArch32 GIC System
registers.

These sections give only short overviews of the GIC System registers. For more information, including descriptions
of the registers, see the ARM® Generic Interrupt Controller Architecture Specification, GIC architecture version 3.0
and version 4.0 (ARM IHI 0069).

Note
The programmers’ model for earlier versions of the GIC architecture is wholly memory-mapped.

A1.3.4

ARMv8 Debug
ARMv8 supports the following:
Self-hosted debug
In this model, the PE generates debug exceptions. Debug exceptions are part of the ARMv8
Exception model.
External debug
In this model, debug events cause the PE to enter Debug state. In Debug state, the PE is controlled
by an external debugger.
All ARMv8 implementations support both models. The model chosen by a particular user depends on the debug
requirements during different stages of the design and development life cycle of the product. For example, external
debug might be used during debugging of the hardware implementation and OS bring-up, and self-hosted debug
might be used during application development.
For more information about self-hosted debug:
•
In AArch64 state, see Chapter D2 AArch64 Self-hosted Debug.
•
In AArch32 state, see Chapter G2 AArch32 Self-hosted Debug.

A1-38

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

A1 Introduction to the ARMv8 Architecture
A1.3 ARMv8 architectural concepts

—

ARM DDI 0487C.a
ID121917

For more information about external debug, see Part H External Debug on page 5689.

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

A1-39

A1 Introduction to the ARMv8 Architecture
A1.4 Supported data types

A1.4

Supported data types
The ARMv8 architecture supports the following integer data types:
Byte
8 bits.
Halfword
16 bits.
Word
32 bits.
Doubleword 64 bits.
Quadword
128 bits.
The architecture also supports the following floating-point data types:
•
Half-precision, see Half-precision floating-point formats on page A1-44 for details.
•
Single-precision, see Single-precision floating-point format on page A1-46 for details.
•
Double-precision, see Double-precision floating-point format on page A1-47 for details.
It also supports:
•
Fixed-point interpretation of words and doublewords. See Fixed-point format on page A1-48.
•
Vectors, where a register holds multiple elements, each of the same data type. See Vector formats on
page A1-41 for details.
The ARMv8 architecture provides two register files:
•
A general-purpose register file.
•
A SIMD&FP register file.
In each of these, the possible register widths depend on the Execution state.
In AArch64 state:
•

A general-purpose register file contains 64-bit registers:
—

•

Many instructions can access these registers as 64-bit registers or as 32-bit registers, using only the
bottom 32 bits.

A SIMD&FP register file contains 128-bit registers:
—

The quadword integer data types only apply to the SIMD&FP register file.

—

The floating-point data types only apply to the SIMD&FP register file.

—

While the AArch64 vector registers support 128-bit vectors, the effective vector length can be 64-bits
or 128-bits depending on the A64 instruction encoding used, see Instruction Mnemonics on
page C1-144.

For more information on the register files in AArch64 state, see Registers in AArch64 Execution state on
page B1-77.
In AArch32 state:
•

•

A general-purpose register file contains 32-bit registers:
—

Two 32-bit registers can support a doubleword.

—

Vector formatting is supported, see Figure A1-4 on page A1-44.

A SIMD&FP register file contains 64-bit registers:
—

AArch32 state does not support quadword integer or floating-point data types.

Note
Two consecutive 64-bit registers can be used as a 128-bit register.
For more information on the register files in AArch32 state, see The general-purpose registers, and the PC, in
AArch32 state on page E1-2981.

A1-40

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

A1 Introduction to the ARMv8 Architecture
A1.4 Supported data types

A1.4.1

Vector formats
In an implementation that includes the SIMD instructions that operate on the SIMD&FP register file, a register can
hold one or more packed elements, all of the same size and type. The combination of a register and a data type
describes a vector of elements. The vector is considered to be an array of elements of the data type specified in the
instruction. The number of elements in the vector is implied by the size of the data elements and the size of the
register.
Vector indices are in the range 0 to (number of elements – 1). An index of 0 refers to the least significant end of the
vector.

Vector formats in AArch64 state
In AArch64 state, the SIMD&FP registers can be referred to as Vn, where n is a value from 0 to 31.
The SIMD&FP registers support three data formats for loads, stores, and data-processing operations:
•
A single, scalar, element in the least significant bits of the register.
•
A 64-bit vector of byte, halfword, or word elements.
•
A 128-bit vector of byte, halfword, word, or doubleword elements.
The element sizes are defined in Table A1-1 with the vector format described as:
•
For a 128-bit vector: Vn{.2D, .4S, .8H, .16B}.
•
For a 64-bit vector: Vn{.1D, .2S, .4H, .8B}.
Table A1-1 SIMD elements in AArch64 state
Mnemonic

Size

B

8 bits

H

16 bits

S

32 bits

D

64 bits

Figure A1-1 on page A1-42 shows the SIMD vectors in AArch64 state.

ARM DDI 0487C.a
ID121917

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

A1-41

A1 Introduction to the ARMv8 Architecture
A1.4 Supported data types

127

112 111

96 95

80 79

64 63

48 47

32 31

16 15

0

Vn
128-bit vector of 64-bit elements (.2D)

128-bit vector of 32-bit elements (.4S)

128-bit vector of 16-bit elements (.8H)

128-bit vector of 8-bit elements (.16B)

.D

.D

[1]

[0]

.S

.S

.S

.S

[3]

[2]

[1]

[0]

.H

.H

.H

.H

.H

.H

.H

.H

[7]

[6]

[5]

[4]

[3]

[2]

[1]

[0]

.B

.B

.B

.B

.B

.B

.B

.B

.B

.B

.B

.B

.B

[15] [14] [13] [12] [11] [10] [9]

[8]

[7]

[6]

[5]

[4]

[3]

[2]

[1]

[0]

.B

.B

.B

48 47

63

32 31

16 15

0

Vn
.S

.S

[1]

[0]

64-bit vector of 32-bit elements (.2S)

64-bit vector of 16-bit elements (.4H)

64-bit vector of 8-bit elements (.8B)

.H

.H

.H

.H

[3]

[2]

[1]

[0]

.B

.B

.B

.B

.B

.B

.B

.B

[7]

[6]

[5]

[4]

[3]

[2]

[1]

[0]

Figure A1-1 SIMD vectors in AArch64 state

Vector formats in AArch32 state
Table A1-2 shows the available formats. Each instruction description specifies the data types that the instruction
supports.
Table A1-2 Advanced SIMD data types in AArch32 state
Data type specifier

Meaning

.

Any element of  bits

.F

Floating-point number of  bits

.I

Signed or unsigned integer of  bits

.P

Polynomial over {0, 1} of degree less than 

.S

Signed integer of  bits

.U

Unsigned integer of  bits

Polynomial arithmetic over {0, 1} on page A1-49 describes the polynomial data type.
The .F16 data type is the half-precision data type selected by the FPSCR.AHP bit, see Half-precision floating-point
formats on page A1-44.
The .F32 data type is the ARM standard single-precision floating-point data type, see Single-precision
floating-point format on page A1-46.

A1-42

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

A1 Introduction to the ARMv8 Architecture
A1.4 Supported data types

The instruction definitions use a data type specifier to define the data types appropriate to the operation. Figure A1-2
shows the hierarchy of the Advanced SIMD data types.
.S8
.U8

.I8
.8

.P8
.S16
.U16

.I16
.16

.P16 †
.F16
.S32
.U32

.I32
.32

.F32
.S64
.U64

.I64
.64

.P64 ‡
-

† Output format only. See VMULL instruction description.
‡ Available only if the Cyptographic Extension is implemented.
See VMULL instruction description.

Figure A1-2 Advanced SIMD data type hierarchy in AArch32 state
For example, a multiply instruction must distinguish between integer and floating-point data types.
An integer multiply instruction that generates a double-width (long) result must specify the input data types as
signed or unsigned. However, some integer multiply instructions use modulo arithmetic, and therefore do not have
to distinguish between signed and unsigned inputs.
Figure A1-3 on page A1-44 shows the Advanced SIMD vectors in AArch32 state.

Note
In AArch32 state, a pair of even and following odd numbered doubleword registers can be concatenated and treated
as a single quadword register.

ARM DDI 0487C.a
ID121917

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

A1-43

A1 Introduction to the ARMv8 Architecture
A1.4 Supported data types

127

112 111

96 95

80 79

64 63

48 47

32 31

16 15

0

Qn
128-bit vector of double-precision
(64-bit) elements
128-bit vector of single-precision
(32-bit) elements

128-bit vector of 16-bit elements

128-bit vector of 8-bit elements

.64

.64

[1]

[0]

.32

.32

.32

.32

[3]

[2]

[1]

[0]

.16

.16

.16

.16

.16

.16

.16

.16

[7]

[6]

[5]

[4]

[3]

[2]

[1]

[0]

.8

.8

.8

.8

.8

.8

.8

.8

.8

.8

.8

.8

.8

.8

.8

.8

[15] [14] [13] [12] [11] [10] [9]

[8]

[7]

[6]

[5]

[4]

[3]

[2]

[1]

[0]

48 47

63

32 31

16 15

0

Dn
64-bit vector of 32-bit elements

64-bit vector of 16-bit elements

64-bit vector of 8-bit elements

.32

.32

[1]

[0]

.16

.16

.16

.16

[3]

[2]

[1]

[0]

.8

.8

.8

.8

.8

.8

.8

.8

[7]

[6]

[5]

[4]

[3]

[2]

[1]

[0]

Figure A1-3 Advanced SIMD vectors in AArch32 state
The AArch32 general-purpose registers support vectors formats for use by the SIMD instructions in the Base
instruction set. Figure A1-4 shows these formats, that means that a general-purpose register can be treated as either
2 halfwords or 4 bytes.
31

24 23

16 15

8 7

0

Rn
32-bit general-purpose register
as a set of two halfwords

32-bit general-purpose register
as a set of four bytes

.16

.16

[1]

[0]

.8

.8

.8

.8

[3]

[2]

[1]

[0]

Figure A1-4 Vector formatting in AArch32 state

A1.4.2

Half-precision floating-point formats
ARMv8 supports two half-precision floating-point formats:
•
IEEE half-precision, as described in the IEEE 754-2008 standard.
•
ARM alternative half-precision format.
Both formats can be used for conversions to and from other floating-point formats. FPCR.AHP controls the format
in AArch64 state and FPSCR.AHP controls the format in AArch32 state. ARMv8.2-FP16 adds half-precision
data-processing instructions, which always use the IEEE format. These instructions ignore the value of the relevant
AHP field, and behave as if it has an Effective value of 0.

A1-44

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

A1 Introduction to the ARMv8 Architecture
A1.4 Supported data types

The description of IEEE half-precision includes ARM-specific details that are left open by the standard, and is only
an introduction to the formats and to the values they can contain. For more information, especially on the handling
of infinities, NaNs, and signed zeros, see the IEEE 754 standard.
For both half-precision floating-point formats, the layout of the 16-bit format is the same. The format is:
15 14
S

10 9
exponent

0
fraction

The interpretation of the format depends on the value of the exponent field, bits[14:10] and on which half-precision
format is being used.
0 < exponent < 0x1F
The value is a normalized number and is equal to:
(–1)S × 2(exponent-15) × (1.fraction)
The minimum positive normalized number is 2–14, or approximately 6.104 × 10–5.
The maximum positive normalized number is (2 – 2–10) × 215, or 65504.
Larger normalized numbers can be expressed using the alternative format when the
exponent == 0x1F.
exponent == 0
The value is either a zero or a denormalized number, depending on the fraction bits:
fraction == 0
The value is a zero. There are two distinct zeros:
+0
when S==0
–0
when S==1.
fraction != 0
The value is a denormalized number and is equal to:
(–1)S × 2–14 × (0.fraction)
The minimum positive denormalized number is 2–24, or approximately 5.960 × 10–8.
Half-precision denormalized numbers are not flushed to zero by default. When ARMv8.2-FP16 is
implemented, the FPCR.FZ16 bit controls whether Flush-to-Zero mode is enabled for half-precision
data-processing instructions. For details, see Flush-to-zero on page A1-53.
exponent == 0x1F
The value depends on which half-precision format is being used:
IEEE half-precision
The value is either an infinity or a Not a Number (NaN), depending on the fraction bits:
fraction == 0
The value is an infinity. There are two distinct infinities:
+infinity

When S==0. This represents all positive numbers that are too
big to be represented accurately as a normalized number.

-infinity

When S==1. This represents all negative numbers with an
absolute value that is too big to be represented accurately as a
normalized number.

fraction != 0
The value is a NaN, and is either a quiet NaN or a signaling NaN.
The two types of NaN are distinguished by their most significant fraction
bit, bit[9]:
bit[9] == 0 The NaN is a signaling NaN. The sign bit can take any value,
and the remaining fraction bits can take any value except all
zeros.
ARM DDI 0487C.a
ID121917

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

A1-45

A1 Introduction to the ARMv8 Architecture
A1.4 Supported data types

bit[9] == 1 The NaN is a quiet NaN. The sign bit and remaining fraction
bits can take any value.
Alternative half-precision
The value is a normalized number and is equal to:
-1S × 216 × (1.fraction)
The maximum positive normalized number is (2-2-10) × 216 or 131008.

A1.4.3

Single-precision floating-point format
The single-precision floating-point format is as defined by the IEEE 754 standard.
This description includes ARM-specific details that are left open by the standard. It is only intended as an
introduction to the formats and to the values they can contain. For full details, especially of the handling of infinities,
NaNs, and signed zeros, see the IEEE 754 standard.
A single-precision value is a 32-bit word with the format:
31 30
S

23 22

0

exponent

fraction

The interpretation of the format depends on the value of the exponent field, bits[30:23]:
0 < exponent < 0xFF
The value is a normalized number and is equal to:
(–1)S × 2(exponent – 127) × (1.fraction)
The minimum positive normalized number is 2–126, or approximately 1.175 × 10–38.
The maximum positive normalized number is (2 – 2–23) × 2127, or approximately 3.403 × 1038.
exponent == 0
The value is either a zero or a denormalized number, depending on the fraction bits:
fraction == 0
The value is a zero. There are two distinct zeros:
+0
When S==0.
–0
When S==1.
These usually behave identically. In particular, the result is equal if +0 and –0 are
compared as floating-point numbers. However, they yield different results in some
circumstances. For example, the sign of the infinity produced as the result of dividing
by zero depends on the sign of the zero. The two zeros can be distinguished from each
other by performing an integer comparison of the two words.
fraction != 0
The value is a denormalized number and is equal to:
(–1)S × 2–126 × (0.fraction)
The minimum positive denormalized number is 2–149, or approximately 1.401 × 10–45.
Denormalized numbers are always flushed to zero in Advanced SIMD processing in AArch32 state.
They are optionally flushed to zero in floating-point processing and in Advanced SIMD processing
in AArch64 state. For details, see Flush-to-zero on page A1-53.
exponent == 0xFF
The value is either an infinity or a Not a Number (NaN), depending on the fraction bits:
fraction == 0
The value is an infinity. There are two distinct infinities:
+infinity

A1-46

When S==0. This represents all positive numbers that are too big to be
represented accurately as a normalized number.

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

A1 Introduction to the ARMv8 Architecture
A1.4 Supported data types

-infinity

When S==1. This represents all negative numbers with an absolute value
that is too big to be represented accurately as a normalized number.

fraction != 0
The value is a NaN, and is either a quiet NaN or a signaling NaN.
The two types of NaN are distinguished by their most significant fraction bit, bit[22]:
bit[22] == 0
The NaN is a signaling NaN. The sign bit can take any value, and the
remaining fraction bits can take any value except all zeros.
bit[22] == 1
The NaN is a quiet NaN. The sign bit and remaining fraction bits can take
any value.
For details of the default NaN, see NaN handling and the Default NaN on page A1-54.

Note
NaNs with different sign or fraction bits are distinct NaNs, but this does not mean software can use floating-point
comparison instructions to distinguish them. This is because the IEEE 754 standard specifies that a NaN compares
as unordered with everything, including itself.

A1.4.4

Double-precision floating-point format
The double-precision floating-point format is as defined by the IEEE 754 standard. Double-precision floating-point
is supported by both SIMD and floating-point instructions in AArch64 state, and only by floating-point instructions
in AArch32 state.
This description includes implementation-specific details that are left open by the standard. It is only intended as an
introduction to the formats and to the values they can contain. For full details, especially of the handling of infinities,
NaNs, and signed zeros, see the IEEE 754 standard.
A double-precision value is a 64-bit doubleword, with the format:
63 62
S

52 51

32 31

exponent

0
fraction

Double-precision values represent numbers, infinities, and NaNs in a similar way to single-precision values, with
the interpretation of the format depending on the value of the exponent:
0 < exponent < 0x7FF
The value is a normalized number and is equal to:
(–1)S × 2(exponent–1023) × (1.fraction)
The minimum positive normalized number is 2–1022, or approximately 2.225 × 10–308.
The maximum positive normalized number is (2 – 2–52) × 21023, or approximately 1.798 × 10308.
exponent == 0
The value is either a zero or a denormalized number, depending on the fraction bits:
fraction == 0
The value is a zero. There are two distinct zeros that behave in the same way as the two
single-precision zeros:
+0
when S==0
–0
when S==1.
fraction != 0
The value is a denormalized number and is equal to:
(-1)S × 2–1022 × (0.fraction)

ARM DDI 0487C.a
ID121917

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

A1-47

A1 Introduction to the ARMv8 Architecture
A1.4 Supported data types

The minimum positive denormalized number is 2–1074, or approximately 4.941 × 10–324.
Optionally, denormalized numbers are flushed to zero in floating-point calculations. For details, see
Flush-to-zero on page A1-53.
exponent == 0x7FF
The value is either an infinity or a NaN, depending on the fraction bits:
fraction == 0
The value is an infinity. As for single-precision, there are two infinities:
+infinity When S==0.
-infinity When S==1.
fraction != 0
The value is a NaN, and is either a quiet NaN or a signaling NaN.
The two types of NaN are distinguished by their most significant fraction bit, bit[51] of
the doubleword:
bit[51] == 0
The NaN is a signaling NaN. The sign bit can take any value, and the
remaining fraction bits can take any value except all zeros.
bit[51] == 1
The NaN is a quiet NaN. The sign bit and the remaining fraction bits can
take any value.
For details of the default NaN, see NaN handling and the Default NaN on page A1-54.

Note
NaNs with different sign or fraction bits are distinct NaNs, but this does not mean software can use floating-point
comparison instructions to distinguish them. This is because the IEEE 754 standard specifies that a NaN compares
as unordered with everything, including itself.

A1.4.5

Fixed-point format
Fixed-point formats are used only for conversions between floating-point and fixed-point values. They apply to
general-purpose registers.
Fixed-point values can be signed or unsigned, and can be 16-bit or 32-bit. Conversion instructions take an argument
that specifies the number of fraction bits in the fixed-point number. That is, it specifies the position of the binary
point.

A1.4.6

Conversion between floating-point and fixed-point values
ARMv8 supports the conversion of a scalar floating-point to or from a signed or unsigned fixed-point value in a
general-purpose register.
The instruction argument #fbits indicates that the general-purpose register holds a fixed-point number with fbits bits
after the binary point, where fbits is in the range 1 to 64 for a 64-bit general-purpose register, or 1 to 32 for a 32-bit
general-purpose register.
More specifically:
•
For a 64-bit register Xd:

•

A1-48

—

The integer part is Xd[63:#fbits].

—

The fractional part is Xd[(#fbits-1):0].

For a 32-bit register Wd or Rd:
—

The integer part is Wd[31:#fbits] or Rd[31:#fbits].

—

The fractional part is Wd[(#fbits-1):0] or Rd[(#fbits-1):0].

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

A1 Introduction to the ARMv8 Architecture
A1.4 Supported data types

These instructions can cause the following floating-point exceptions:
Invalid Operation

When the floating-point input is NaN or Infinity or when a numerical value cannot be
represented within the destination register.

Inexact

When the numeric result differs from the input value.

Input Denormal

When Flush-to-zero mode is enabled and the denormal input is replaced by a zero.

Note
An out of range fixed-point result is saturated to the destination size.
For more information, see Floating-point exceptions and exception traps on page D1-1899.

A1.4.7

Polynomial arithmetic over {0, 1}
Some SIMD instructions that operate on SIMD&FP registers can operate on polynomials over {0, 1}, see Supported
data types on page A1-40. The polynomial data type represents a polynomial in x of the form bn–1xn–1 + … + b1x
+ b0 where bk is bit[k] of the value.
The coefficients 0 and 1 are manipulated using the rules of Boolean arithmetic:
•
0+0=1+1=0
•
0+1=1+0=1
•
0×0=0×1=1×0=0
•
1 × 1 = 1.
That is:
•

Adding two polynomials over {0, 1} is the same as a bitwise exclusive OR.

•

Multiplying two polynomials over {0, 1} is the same as integer multiplication except that partial products are
exclusive-ORed instead of being added.

A64, A32, and T32 provide instructions for performing polynomial multiplication of 8-bit values.
•

For AArch32, see VMUL (integer and polynomial) on page F6-4300 and VMULL (integer and polynomial)
on page F6-4305.

•

For AArch64, see PMUL on page C7-1449 and PMULL, PMULL2 on page C7-1451.

The Cryptographic Extension adds the ability to perform long polynomial multiplies of 64-bit values. See PMULL,
PMULL2 on page C7-1451.

Pseudocode description of polynomial multiplication
In pseudocode, polynomial addition is described by the EOR operation on bitstrings.
Polynomial multiplication is described by the PolynomialMult() function defined in Chapter J1 ARMv8 Pseudocode.

ARM DDI 0487C.a
ID121917

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

A1-49

A1 Introduction to the ARMv8 Architecture
A1.5 Advanced SIMD and floating-point support

A1.5

Advanced SIMD and floating-point support
Note
In AArch32 state, the SIMD instructions that operate on SIMD&FP registers are always described as the Advanced
SIMD instructions, to distinguish them from the SIMD instructions in the base instruction sets, that operate on the
32-bit general-purpose registers. The A64 instruction set does not provide any SIMD instructions that operate on
the general-purpose registers, and therefore some AArch64 state descriptions use SIMD as a synonym for Advanced
SIMD. Unless the context clearly indicates otherwise, this section describes the support for SIMD instructions that
operate on SIMD&FP registers.
ARMv8 can support the following levels of support for Advanced SIMD and floating-point instructions:
•

Full SIMD and floating-point support without exception trapping.

•

Full SIMD and floating-point support with exception trapping.

•

No floating-point or SIMD support. This option is licensed only for implementations targeting specialized
markets.

Note
All systems that support standard operating systems with rich application environments provide hardware
support for Advanced SIMD and floating-point. It is a requirement of the ARM Procedure Call Standard for
AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.
ARMv8 supports single-precision (32-bit) and double-precision (64-bit) floating-point data types and arithmetic as
defined by the IEEE 754 floating-point standard. It also supports the half-precision (16-bit) floating-point data type
for data storage, by supporting conversions between single-precision and half-precision data types and
double-precision and half-precision data types. When ARMv8.2-FP16 is implemented, it also supports the
half-precision floating-point data type for data-processing operations.
The SIMD instructions provide packed Single Instruction Multiple Data (SIMD) and single-element scalar
operations, and support:
•
Single-precision and double-precision arithmetic in AArch64 state.
•
Single-precision arithmetic only in AArch32 state.
•
When ARMv8.2-FP16 is implemented, half-precision arithmetic is supported in AArch64 and AArch32
states.
Floating-point support in AArch64 state SIMD is IEEE 754-2008 compliant with:
•
Configurable rounding modes.
•
Configurable Default NaN behavior.
•
Configurable Flush-to-zero behavior.
Floating-point computation using AArch32 Advanced SIMD instructions remains unchanged from ARMv7. A32
and T32 Advanced SIMD floating-point always uses ARM standard floating-point arithmetic and performs
IEEE 754 floating-point arithmetic with the following restrictions:
•
Denormalized numbers are flushed to zero, see Flush-to-zero on page A1-53.
•
Only default NaNs are supported, see NaN handling and the Default NaN on page A1-54.
•
The Round to Nearest rounding mode is used.
•
Untrapped floating-point exception handling is used for all floating-point exceptions.
If floating-point exception trapping is supported, floating-point exceptions, such as Overflow or Divide by Zero,
can be handled without trapping. This applies to both SIMD and floating-point operations. When handled in this
way, a floating-point exception causes a cumulative status register bit to be set to 1 and a default result to be
produced by the operation. For more information about floating-point exceptions, see Floating-point exceptions and
exception traps on page D1-1899.

A1-50

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

A1 Introduction to the ARMv8 Architecture
A1.5 Advanced SIMD and floating-point support

In AArch64 state, the following registers control floating-point operation and return floating-point status
information:
•

•

The Floating-Point Control Register, FPCR, controls:
—

The half-precision format where applicable, FPCR.AHP bit.

—

Default NaN behavior, FPCR.DN bit.

—

Flush-to-zero behavior, FPCR.{FZ, FZ16} bits. If ARMv8.2-FP16 is not implemented, FPCR.FZ16
is RES0.

—

Rounding mode support, FPCR.Rmode field.

—

Len and Stride fields associated with execution in AArch32 state, and only supported for a context
save and restore from AArch64 state. These fields are obsolete in ARMv8 and can be implemented as
RAZ/WI. If they are implemented as RW and are programmed to a nonzero value, they make some
AArch32 floating-point instructions UNDEFINED.

—

Floating-point exception trap controls, the FPCR.{IDE, IXE, UFE, OFE, DZE, IOE} bits, see
Floating-point exceptions and exception traps on page D1-1899.

The Floating-Point Status Register, FPSR, provides:
—

Cumulative floating-point exceptions flags, FPSR.{IDC, IXC, UFC, OFC, DZC, IOC and QC}.

—

The AArch32 floating-point comparison flags {N,Z,C,V}. These bits are RES0 if AArch32
floating-point is not implemented.

Note
In AArch64 state, the process state flags, PSTATE.{N,Z,C,V} are used for all data-processing
compares and any associated conditional execution.
AArch32 state provides a single Floating-Point Status and Control Register, FPSCR, combining the FPCR and
FPSR fields.
For system level information about the SIMD and floating-point support, see Advanced SIMD and floating-point
support on page G1-4696.

A1.5.1

Instruction support
The Advanced SIMD and floating-point instructions support:
•
Load and store for single elements and vectors of multiple elements.

Note
Single elements are also referred to as scalar elements.
•
•
•
•
•

Data processing on single and multiple elements for both integer and floating-point data types.
When ARMv8.3-CompNum is implemented, complex number arithmetic.
Floating-point conversion between different levels of precision.
Conversion between floating-point, fixed-point integer, and integer data types.
Floating-point rounding.

For more information on the SIMD and floating-point instructions in AArch64 state, see Chapter C3 A64
Instruction Set Overview.
For more information on the Advanced SIMD and floating-point instructions in AArch32 state, see Chapter F1 The
AArch32 Instruction Sets Overview.

ARM DDI 0487C.a
ID121917

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

A1-51

A1 Introduction to the ARMv8 Architecture
A1.5 Advanced SIMD and floating-point support

A1.5.2

Floating-point standards, and terminology
The ARM includes support for all the required features of ANSI/IEEE Std 754-2008, IEEE Standard for Binary
Floating-Point Arithmetic, referred to as IEEE 754-2008. However, some terms in this manual are based on the
1985 version of this standard, referred to as IEEE 754-1985:
•

ARM floating-point terminology generally uses the IEEE 754-1985 terms. This section summarizes how
IEEE 754-2008 changes these terms.

•

References to IEEE 754 that do not include the issue year apply to either issue of the standard.

Table A1-3 shows how the terminology in this manual differs from that used in IEEE 754-2008.
Table A1-3 Floating-point terminology
This manual

IEEE 754-2008

Normalized a

Normal

Denormal, or denormalized

Subnormal

Round towards Minus Infinity (RM)

roundTowardsNegative

Round towards Plus Infinity (RP)

roundTowardsPositive

Round towards Zero (RZ)

roundTowardZero

Round to Nearest (RN)

roundTiesToEven

Round to Nearest with Ties to Away

roundTiesToAway

Rounding mode

Rounding-direction attribute

a. Normalized number is used in preference to normal number,
because of the other specific uses of normal in this manual.

A1.5.3

ARM standard floating-point input and output values
ARMv8 provides full IEEE 754 floating-point arithmetic support. In AArch32 state, floating-point operations
performed using Advanced SIMD instructions are limited to ARM standard floating-point operation, regardless of
the selected rounding mode in the FPSCR. Unlike AArch32, AArch64 SIMD floating point arithmetic is performed
using the rounding mode selected by the FPCR.
ARM standard floating-point arithmetic supports the following input formats defined by the IEEE 754
floating-point standard:
•
Zeros.
•
Normalized numbers.
•
Denormalized numbers are flushed to 0 before floating-point operations, see Flush-to-zero on page A1-53.
•
NaNs.
•
Infinities.
ARM standard floating-point arithmetic supports the Round to Nearest (roundTiesToEven) rounding mode defined
by the IEEE 754 standard.
ARM standard floating-point arithmetic supports the following output result formats defined by the IEEE 754
standard:

A1-52

•

Zeros.

•

Normalized numbers.

•

Results that are less than the minimum normalized number are flushed to zero, see Flush-to-zero on
page A1-53.

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

A1 Introduction to the ARMv8 Architecture
A1.5 Advanced SIMD and floating-point support

A1.5.4

•

NaNs produced in floating-point operations are always the default NaN, see NaN handling and the Default
NaN on page A1-54.

•

Infinities.

Flush-to-zero
The performance of floating-point processing can be reduced when doing calculations involving denormalized
numbers and Underflow exceptions. In many algorithms, this performance can be recovered, without significantly
affecting the accuracy of the final result, by replacing the denormalized operands and intermediate results with
zeros. To permit this optimization, ARM floating-point implementations have a processing mode called
Flush-to-zero mode. Single-precision and double-precision AArch32 Advanced SIMD floating-point instructions
always use Flush-to-zero mode. When ARMv8.2-FP16 is implemented, Flush-to-Zero mode can be enabled for all
half-precision data-processing instructions using the FPCR.FZ16 bit in AArch64 state and the FPSCR.FZ16 bit in
AArch32 state.
Behavior in Flush-to-zero mode differs from standard IEEE 754 arithmetic in the following ways:
•

All inputs to floating-point operations that are double-precision denormalized numbers or single-precision
denormalized numbers are treated as though they were zero. This causes an Input Denormal floating-point
exception, but does not cause an Inexact floating-point exception.
The Input Denormal floating-point exception occurs only in Flush-to-zero mode.

Note
When ARMv8.2-FP16 is implemented, when in Flush-to-zero mode, a half-precision floating-point number
that is flushed to zero does not generate an Input Denormal floating-point exception. This is because this
situation is much less exceptional than for double-precision or single-precision denormalized numbers.
In AArch32 state, the FPSCR contains a cumulative exception bit FPSCR.IDC and optional trap enable bit
FPSCR.IDE corresponding to the Input Denormal floating-point exception.
In AArch64 state, the FPSR contains a cumulative exception bit FPSR.IDC and optional trap enable bit
FPCR.IDE corresponding to the Input Denormal floating-point exception.
The occurrence of all floating-point exceptions except Input Denormal is determined using the input values
after flush-to-zero processing has occurred.
•

The result of a floating-point operation is flushed to zero if the result of the operation before rounding
satisfies the condition:
0 < Abs(result) < MinNorm, where:
—

MinNorm is 2–14 for half-precision.

—

MinNorm is 2-126 for single-precision.

—

MinNorm is 2-1022 for double-precision.

This causes the FPSR.UFC bit to be set to 1, and prevents any Inexact floating-point exception from
occurring for the operation.
Underflow floating-point exceptions occur only when a result is flushed to zero.
In all implementations Underflow floating-point exceptions that occur in Flush-to-zero mode are always
treated as untrapped, even when the Underflow trap enable bit, FPCR.UFE, is set to 1.
•

An Inexact floating-point exception does not occur if the result is flushed to zero, even though the final result
of zero is not equivalent to the value that would be produced if the operation were performed with unbounded
precision and exponent range.

When an input or a result is flushed to zero, the value of the sign bit of the zero is preserved. That is, the sign bit of
the zero matches the sign bit of the input or result that is being flushed to zero.

ARM DDI 0487C.a
ID121917

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

A1-53

A1 Introduction to the ARMv8 Architecture
A1.5 Advanced SIMD and floating-point support

Flush-to-zero mode has no effect on half-precision numbers that are inputs to floating-point operations, or results
from floating-point operations. When ARMv8.2-FP16 is implemented and when Flush-to-zero mode is enabled,
Flush-to-zero mode affects half-precision data-processing instructions, but has no effect on conversions between
half-precision and single-precision or double-precision.

Note
Flush-to-zero mode is incompatible with the IEEE 754 standard, and must not be used when IEEE 754 compatibility
is a requirement. Flush-to-zero mode must be used with care. Although it can improve performance on some
algorithms, there are significant limitations on its use. These are application dependent:

A1.5.5

•

On many algorithms, it has no noticeable effect, because the algorithm does not normally use denormalized
numbers.

•

On other algorithms, it can cause exceptions to occur or seriously reduce the accuracy of the results of the
algorithm.

NaN handling and the Default NaN
The IEEE 754 standard specifies that:
•

An operation that causes an Invalid Operation floating-point exception generates a quiet NaN as its result if
that exception is untrapped.

•

An operation involving a quiet NaN operand, but not a signaling NaN operand, returns an input NaN as its
result.

The floating-point processing behavior when Default NaN mode is disabled adheres to this, with the following
additions:
•

•

If an untrapped Invalid Operation floating-point exception occurs, the quiet NaN result is derived from:
—

The first signaling NaN operand, if the exception occurs because at least one of the operands is a
signaling NaN.

—

Otherwise, the default NaN.

If an untrapped Invalid Operation floating-point exception does not occur, but at least one of the operands is
a quiet NaN, the result is derived from the first quiet NaN operand.

Depending on the operation, the exact value of a derived quiet NaN result may differ in both sign and number of
fraction bits from its source. For a quiet NaN result derived from signaling NaN operand, the most-significant
fraction bit is set to 1.

Note
•

In these descriptions, first operand relates to the left-to-right ordering of the arguments to the pseudocode
function that describes the operation.

•

The IEEE 754 standard specifies that the sign bit of a NaN has no significance.

The SIMD and floating-point processing behavior when Default NaN mode is enabled is that the Default NaN is
the result of all floating-point operations that either:
•
Cause untrapped Invalid Operation floating-point exceptions.
•
Have one or more quiet NaN inputs, but no signaling NaN inputs.
Table A1-4 on page A1-55 shows the format of the default NaN for ARM floating-point operations.
Default NaN mode is selected for the floating-point processing by setting the FPCR.DN bit to 1.
Other aspects of the functionality of the Invalid Operation floating-point exception are not affected by Default NaN
mode. These are that:
•
If untrapped, it causes the FPSR.IOC bit to be set to 1.
•
If trapped, it causes a user trap handler to be invoked.
A1-54

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

A1 Introduction to the ARMv8 Architecture
A1.5 Advanced SIMD and floating-point support

Table A1-4 Default NaN encoding

ARM DDI 0487C.a
ID121917

Half-precision, IEEE Format

Single-precision

Double-precision

Sign bit

0

0

0

Exponent

0x1F

0xFF

0x7FF

Fraction

Bit[9] == 1, bits[8:0] == 0

Bit[22] == 1, bits[21:0] == 0

Bit[51] == 1, bits[50:0] == 0

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

A1-55

A1 Introduction to the ARMv8 Architecture
A1.6 The ARM memory model

A1.6

The ARM memory model
The ARM memory model supports:
•
Generating an exception on an unaligned memory access.
•
Restricting access by applications to specified areas of memory.
•
Translating virtual addresses (VAs) provided by executing instructions to physical addresses (PAs).
•
Altering the interpretation of multi-byte data between big-endian and little-endian.
•
Controlling the order of accesses to memory.
•
Controlling caches and address translation structures.
•
Synchronizing access to shared memory by multiple PEs.
VA support depends on the Execution state, as follows:
AArch64 state
Supports 64-bit virtual addressing, with the Translation Control Register determining the supported
VA range. Execution at EL1 and EL0 supports two independent VA ranges, each with its own
translation controls.
AArch32 state
Supports 32-bit virtual addressing, with the Translation Control Register determining the supported
VA range. For execution at EL1 and EL0, system software can split the VA range into two
subranges, each with its own translation controls.
The supported PA space is IMPLEMENTATION DEFINED, and can be discovered by system software.
Regardless of the Execution state, the Virtual Memory System Architecture (VMSA) can translate VAs to blocks or
pages of memory anywhere within the supported PA space.
For more information, see:
For execution in AArch64 state
•
Chapter B2 The AArch64 Application Level Memory Model.
•
Chapter D3 The AArch64 System Level Memory Model.
•
Chapter D4 The AArch64 Virtual Memory System Architecture.
For execution in AArch32 state
•
Chapter E2 The AArch32 Application Level Memory Model.
•
Chapter G3 The AArch32 System Level Memory Model.
•
Chapter G4 The AArch32 Virtual Memory System Architecture.

A1-56

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

A1 Introduction to the ARMv8 Architecture
A1.7 ARMv8 architecture extensions

A1.7

ARMv8 architecture extensions
The original ARMv8-A architecture is called ARMv8.0. The following sections of this manual describe or
summarize permitted extensions to ARMv8.0:
•
The ARMv8 Cryptographic Extension on page A1-58.
•
The Reliability, Availability, and Serviceability (RAS) Extension on page A1-70.
•
The Performance Monitors Extension on page D1-1961.
•
The IVIPT Extension on page D4-2223.
•
Chapter H7 The PC Sample-based Profiling Extension.
In addition to describing ARMv8.0, this manual describes the following architectural extensions:
The ARMv8.1 architectural extension
The ARMv8.1 architecture extension adds both:
•

Architectural features. Some of these are mandatory, others are optional. Some features must
be implemented together.

•

Architectural requirements. These are mandatory.

An implementation is ARMv8.1 compliant when all of the following apply:
•

It includes all of the ARMv8.1 architectural features that are mandatory. See Architectural
features added by ARMv8.1 on page A1-59 for all of the ARMv8.1 architectural features.

•

It includes all of the ARMv8.1 architectural requirements. Additional requirements of
ARMv8.1 on page A1-61 lists these requirements.

For more information, see The ARMv8.1 architecture extension on page A1-59.
The ARMv8.2 architectural extension
The ARMv8.2 architecture extension is an extension to ARMv8.1. It adds both:
•

Architectural features. Some of these are mandatory, others are optional. Some features must
be implemented together.

•

Architectural requirements. These are mandatory.

An implementation is ARMv8.2 compliant if all of the following apply:
•

It is ARMv8.1 compliant.

•

It includes all of the ARMv8.2 architectural features that are mandatory. See Architectural
features added by ARMv8.2 on page A1-62 for all of the ARMv8.2 architectural features.

•

It includes all of the ARMv8.2 architectural requirements. Additional requirements of
ARMv8.2 on page A1-68 lists these requirements.

For more information, see The ARMv8.2 architecture extension on page A1-62.
The ARMv8.3 architectural extension
The ARMv8.3 architecture extension is an extension to ARMv8.2. It adds architectural features.
Some of these are mandatory, others are optional. Some features must be implemented together.
An implementation is ARMv8.3 compliant if all of the following apply:
•

It is ARMv8.2 compliant.

•

It includes all of the ARMv8.3 architectural features that are mandatory.

For more information, see The ARMv8.3 architecture extension on page A1-68.
The Statistical Profiling Extension
SPE is an optional extension to ARMv8.2. That is, SPE requires the implementation of ARMv8.2.
For more information see The Statistical Profiling Extension on page A1-72.
The Scalable Vector Extension (SVE)
SVE is an optional extension to ARMv8.2. That is, SVE requires the implementation of ARMv8.2.
For more information see The Scalable Vector Extension (SVE) on page A1-72.
ARM DDI 0487C.a
ID121917

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

A1-57

A1 Introduction to the ARMv8 Architecture
A1.7 ARMv8 architecture extensions

See also Permitted implementation of subsets of ARMv8.x and ARMv8.(x+1) architectural features.

A1.7.1

Permitted implementation of subsets of ARMv8.x and ARMv8.(x+1) architectural features
An ARMv8.x compliant implementation can include any arbitrary subset of the architectural features of
ARMv8.(x+1), subject only to those constraints that require that certain features be implemented together.
An ARMv8.x compliant implementation cannot include any features of ARMv8.(x+2).

Note
The addition of ARMv8.(x+1) features to an ARMv8.x compliant implementation is only permitted if the
implementer has a licence to ARMv8.(x+1) in addition to the licence to ARMv8.x.

A1.7.2

The ARMv8 Cryptographic Extension
The presence of the ARMv8.0 Cryptographic Extension in an implementation is subject to export license controls.
The Cryptographic Extension is an extension of the SIMD support and operates on the vector register file. The
ARMv8.0 implementation of the Extension provides instructions for the acceleration of encryption and decryption
to support the following:
•
AES.
•
SHA1.
•
SHA2-256 (SHA256).
The Cryptographic Extension also provides multiply instructions that operate on long polynomials.
The ARMv8.0 Cryptographic Extension provides this functionality in AArch64 state and AArch32 state, and an
implementation that supports both AArch64 state and AArch32 state provides the same ARMv8.0 Cryptographic
Extension functionality in both states. For more information see The Cryptographic Extension on page C3-218 or
The Cryptographic Extension in AArch32 state on page F1-3091.

ARMv8.2 extensions to the Cryptographic Extension
From ARMv8.2, an implementation of the ARMv8.0 Cryptographic Extension can include either or both of:
•

The AES functionality, including support for multiplication of 64-bit polynomials. The
ID_AA64ISAR0_EL1.AES field indicates whether this functionality is supported.

•

The SHA1 and SHA2-256 functionality. The ID_AA64ISAR0_EL1.{SHA2, SHA1} fields indicate whether
this functionality is supported.

In addition, ARMv8.2 adds two optional extensions to the ARMv8 Cryptographic Extension, that provide
cryptographic functionality in AArch64 state only. These two optional features are:
ARMv8.2-SHA, SHA2-512 and SHA3 functionality
In the A64 instruction set only, ARMv8.2-SHA adds Advanced SIMD instructions that support:
•
SHA2-512 (SHA512).
•
SHA3.
Implementation of ARMv8.2-SHA requires implementation of the ARMv8.0 Cryptographic
Extension SHA-1 and SHA256 functionality.
The ID_AA64ISAR0_EL1.{SHA2, SHA3} fields identify the presence of ARMv8.2-SHA.
For more information see ARMv8.2-SHA, SHA2-512 and SHA3 on page C3-219.
ARMv8.2-SM, SM3 and SM4 functionality
In the A64 instruction set only, ARMv8.2-SM adds Advanced SIMD instructions that support the
Chinese cryptography algorithms SM3 and SM4.
Implementation of ARMv8.2-SM is independent of the implementation of any SHA functionality.
The ID_AA64ISAR0_EL1.{SM3, SM4} fields identify the presence of ARMv8.2-SM.
A1-58

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

A1 Introduction to the ARMv8 Architecture
A1.7 ARMv8 architecture extensions

Note
This means ARMv8.2-SM can be implemented without any other Cryptographic Extension
features.
For more information see ARMv8.2-SM, SM3 and SM4 on page C3-220.

A1.7.3

The ARMv8.1 architecture extension
The ARMv8.1 architecture extension adds both architectural features and architectural requirements.

Architectural features added by ARMv8.1
An implementation of the ARMv8.1 extension must include all of the features that this section describes as
mandatory. Such an implementation, when combined with the additional requirements of ARMv8.1, is also called
an implementation of the ARMv8.1 architecture.
The ARMv8.1 architecture extension adds the following architectural features, which are identified by the
architectural feature name and a short description of the feature:
ARMv8.1-Atomics, ARMv8.1 Atomic instructions
ARMv8.1-Atomics introduces a set of atomic instructions:
•

Compare and Swap instructions, CAS and CASP.

•

Atomic memory operation instructions, LD and ST, where  is one of ADD, CLR, EOR,
SET, SMAX, SMIN, UMAX, and UMIN.

•

Swap instruction, SWP.

These instructions are only added to the A64 instruction set.
This feature is mandatory in ARMv8.1 implementations.
Implementations of ARMv8.1-VHE require the implementation of ARMv8.1-Atomics.
The ID_AA64ISAR0_EL1.Atomic field identifies the presence of ARMv8.1-Atomics.
For more information, see:
•
Compare and Swap on page C3-181.
•
Atomic memory operations on page C3-182.
•
Swap on page C3-184.
ARMv8.1-SIMD, ARMv8.1 Advanced SIMD instructions
ARMv8.1-SIMD introduces Rounding Double Multiply Add/Subtract Advanced SIMD
instructions. For more information, see:
For the A64 instruction set
•
SQRDMLAH (by element) on page C7-1598.
•
SQRDMLAH (vector) on page C7-1601.
•
SQRDMLSH (by element) on page C7-1603.
•
SQRDMLSH (vector) on page C7-1606.
For the T32 and A32 instruction sets
•
VQRDMLAH on page F6-4378.
•
VQRDMLSH on page F6-4382.
This feature is mandatory in ARMv8.1 implementations.
The following fields identify the presence of ARMv8.1-SIMD:
•
ID_AA64ISAR0_EL1.RDM.
•
ID_ISAR5_EL1.RDM.
•
ID_ISAR5.RDM.

ARM DDI 0487C.a
ID121917

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

A1-59

A1 Introduction to the ARMv8 Architecture
A1.7 ARMv8 architecture extensions

ARMv8.1-LOR, Limited ordering regions
Limited ordering regions allow large systems to perform special load-acquire and store-release
instructions that provide order between the memory accesses to a region of the PA map as observed
by a limited set of observers.
This feature is mandatory in ARMv8.1 implementations.
This feature is supported in AArch64 state only.
The ID_AA64MMFR1_EL1.LO field identifies the support for ARMv8.1-LOR.
For more information, see:
•

Limited ordering regions on page B2-103.

ARMv8.1-HPD, Hierarchical permission disables
ARMv8.1-HPD introduces the facility to disable the hierarchical attributes, APTable, PXNTable,
and UXNTable, in the translation tables. This disable has no effect on the NSTable bit.
This feature is mandatory in ARMv8.1 implementations.
This feature is added only to the VMSAv8-64 translation regimes. ARMv8.2 extends this to the
AArch32 translation regimes, see ARMv8.2-AA32HPD.
The ID_AA64MMFR1_EL1.HPDS field identifies the support for ARMv8.1-HPD.
ARMv8.1-TTHM, Hardware management of the Access flag and dirty state
In ARMv8.0, all updates to the translation tables are performed by software. From ARMv8.1, for
the VMSAv8-64 translation regimes only, hardware can perform updates to the translation tables in
two contexts:
•
Hardware management of the Access flag.
•
Hardware management of dirty state, with updates to a dirty state bit in the translation tables.
The dirty state bit is introduced in ARMv8.1.
Hardware management of dirty state can only be enabled when hardware management of the Access
flag is also enabled.
This feature is optional in ARMv8.1 implementations. It is IMPLEMENTATION DEFINED whether this
is implemented.
The ID_AA64MMFR1_EL1.HAFDBS field identifies the support for ARMv8.1-TTHM.
For more information, see:
•

The dirty state bit on page D4-2165.

•

Hardware management of the Access flag and dirty state on page D4-2165.

ARMv8.1-PAN, Privileged access never
ARMv8.1-PAN adds a new bit to PSTATE. When the value of this PAN state bit is 1, any privileged
data access from EL1 or EL2 to a virtual memory address that is accessible at EL0 generates a
Permission fault.
This feature is mandatory in ARMv8.1 implementations.
This feature is supported in AArch64 and AArch32 states.
The following fields identify the support for ARMv8.1-PAN:
•
ID_AA64MMFR1_EL1.PAN.
•
ID_MMFR3_EL1.PAN.
•
ID_MMFR3.PAN.
For more information, see:
•
About PSTATE.PAN on page D4-2156.
•
About the PAN bit on page G4-4891.

A1-60

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

A1 Introduction to the ARMv8 Architecture
A1.7 ARMv8 architecture extensions

ARMv8.1-VMID16, 16-bit VMID
In an ARMv8.1 implementation, when EL2 is using AArch64, the VMID size is an
IMPLEMENTATION DEFINED choice of 8 bits or 16 bits.
This feature is optional in ARMv8.1 implementations. It is IMPLEMENTATION DEFINED whether this
is implemented.
When implemented, this feature is supported only when EL2 is using AArch64.
The ID_AA64MMFR1_EL1.VMIDBits field identifies the supported VMID size.
For more information, see:
•

VMID size on page D4-2203.

ARMv8.1-VHE, Virtualization Host Extensions
ARMv8.1 introduces the Virtualization Host Extensions (VHE) that provide enhanced support for
Type 2 hypervisors in Non-secure state.
This feature is mandatory in ARMv8.1 implementations.
An implementation that includes ARMv8.1-VHE requires ARMv8.1-Atomics to be implemented.
The ID_AA64MMFR1_EL1.VH field identifies the support for ARMv8.1-VHE.
The following fields indicate the presence of the Virtualization Host Extensions for debug,
including the changes for the PC Sample-based Profiling Extension and the Performance Monitors
Extension:
•
ID_AA64DFR0_EL1.DebugVer.
•
ID_DFR0_EL1.{CopSDbg, CopDbg}.
For more information, see:
•

Virtualization Host Extensions on page D4-2183.

ARMv8.1-PMU, ARMv8.1 PMU Extension
ARMv8.1 makes the following enhancements to the Performance Monitors Extension:
•

The event number space is extended to 16 bits to allow additional IMPLEMENTATION DEFINED
event types, and the reserved space for future additions to the architecturally-defined event
types is extended.

•

The HPMD bit is added to MDCR_EL2. This bit disables event counting at EL2.

•

The STALL_FRONTEND and STALL_BACKEND events are required to be implemented.
For more information, see Required events on page D5-2266.

The Performance Monitors Extension is an OPTIONAL feature of an implementation, but ARM
strongly recommends that ARMv8.1 implementations include either:
•
Version 3 of the Performance Monitors Extension, PMUv3, with a 16-bit evtCount field.
•
An IMPLEMENTATION DEFINED form of performance monitors.
The following fields identify the ARMv8.1-PMU:
•
ID_AA64DFR0_EL1.PMUVer.
•
ID_DFR0_EL1.PerfMon.
•
ID_DFR0.PerfMon.

Additional requirements of ARMv8.1
The ARMv8.1 architecture includes some mandatory changes that are not associated with a feature. These are:
Changes to CRC32 instructions
All implementations of the ARMv8.1 architecture are required to implement the CRC32* instructions.
These are optional in ARMv8.0.
The following fields identify the support for the CRC32* instructions:
•
ID_AA64ISAR0_EL1.CRC32.
•
ID_ISAR5_EL1.CRC32.
ARM DDI 0487C.a
ID121917

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

A1-61

A1 Introduction to the ARMv8 Architecture
A1.7 ARMv8 architecture extensions

•

ID_ISAR5.CRC32.

An implementation of the ARMv8.1 extension must comply with all of the additional requirements. Such an
implementation, when combined with the mandatory architectural features of ARMv8.1, is also called an
implementation of the ARMv8.1 architecture.

A1.7.4

The ARMv8.2 architecture extension
The ARMv8.2 architecture extension adds both architectural features and architectural requirements.

Architectural features added by ARMv8.2
An implementation of the ARMv8.2 extension must include all of the features that this section describes as
mandatory. Such an implementation, when combined with the additional requirements of ARMv8.2, is also called
an implementation of the ARMv8.2 architecture.
The ARMv8.2 architecture extension adds the following architectural features, which are identified by the
architectural feature name and a short description of the feature:
ARMv8.2-A64ISA, ARMv8.2 changes to the A64 ISA
ARMv8.2-A64ISA adds the BFC instruction to the A64 instruction set as an alias of BFM. It also
requires that the new BFC instruction and the A64 pseudo-instruction REV64 are implemented by
assemblers.

Note
•

In ARMv8.0 and ARMv8.1, the A64 pseudo-instruction REV64 is optional.

•

Because this feature relates to support for an instruction alias and for a pseudo-instruction
there are no corresponding feature ID register fields.

This change to the instruction set and assembler requirements is mandatory in an ARMv8.2
implementation.
For more information, see:
•
BFC on page C6-563.
•
REV64 on page C6-848.
ARMv8.2-ATS1E1, AT S1E1R and AT S1E1W instruction variants, taking account of PSTATE.PAN
ARMv8.2-ATS1E1 adds new variants of the AArch64 AT S1E1R and AT S1E1W instructions and the
AArch32 ATS1CPR and ATS1CPW instructions. These new instructions factor in the PSTATE.PAN bit
when determining whether or not the location will generate a permission fault for a privileged
access, as is reported in the PAR. For more information, see:
For the AArch64 System instructions
•
AT S1E1RP, Address Translate Stage 1 EL1 Read PAN on page C5-447.
•
AT S1E1WP, Address Translate Stage 1 EL1 Write PAN on page C5-451.
For the AArch32 System instructions
•
ATS1CPRP, Address Translate Stage 1 Current state PL1 Read PAN on
page G7-5054.
•
ATS1CPWP, Address Translate Stage 1 Current state PL1 Write PAN on
page G7-5058.
This feature is mandatory in ARMv8.2 implementations.
These instructions are added to the A64 and A32/T32 instruction sets.
The following fields identify the presence of ARMv8.2-ATS1E1:
•
ID_AA64MMFR1_EL1.PAN.
•
ID_MMFR3_EL1.PAN.
•
ID_MMFR3.PAN.

A1-62

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

A1 Introduction to the ARMv8 Architecture
A1.7 ARMv8 architecture extensions

For more information, see:
•
Address translation instructions on page D4-2139.
•
ATS1C**, Address translation stage 1, current security state on page G4-4964.
•
Encoding and availability of the address translation instructions on page G4-4965.
ARMv8.2-FP16, Half-precision floating-point data processing
ARMv8.2-FP16 supports:
•

Half-precision data-processing instructions for Advanced SIMD and floating-point in both
AArch64 and AArch32 states.

•

The FPCR.FZ16 and FPSCR.FZ16 bits, that enable a Flush-to-zero mode for half-precision
data-processing instructions.

This feature is optional in ARMv8.2 implementations, unless SVE is implemented, in which case
ARMv8.2-FP16 is mandatory. When this feature is implemented it is implemented in both
Advanced SIMD and floating-point, and in AArch64 and AArch32 states.
The following fields identify the presence of ARMv8.2-FP16:
•
ID_AA64PFR0_EL1.{FP, AdvSIMD}.
•
MVFR1_EL1.{FPHP, SIMDHP}.
•
MVFR1.{FPHP, SIMDHP}.
For more information, see:
•
Half-precision floating-point formats on page A1-44.
•
Flush-to-zero on page A1-53.
•
Modified immediate constants in A64 instructions on page C2-158.
ARMv8.2-DotProd, SIMD Dot Product
ARMv8.2-DotProd provides instructions to perform the dot product of two 32-bit vectors,
accumulating the result in a third 32-bit vector. This can be performed using signed or unsigned
arithmetic.
This feature is optional in ARMv8.2 implementations.
These instructions are added to the A64 and A32/T32 instruction sets.
The following fields identify the presence of ARMv8.2-DotProd:
•
ID_AA64ISAR0_EL1.DP.
•
ID_ISAR6_EL1.DP.
•
ID_ISAR6.DP.
For more information, see:
•
SIMD dot product on page C3-217.
•
Advanced SIMD dot product instructions on page F1-3089.
ARMv8.2-FHM, Floating-point multiplication variant
ARMv8.2-FHM adds new floating-point multiplication instructions.
These instructions are added to the A64 and A32/T32 instruction sets.
This feature is optional in ARMv8.2 implementations, and can only be implemented when
ARMv8.2-FP16 is implemented.
The following fields identify the presence of ARMv8.2-FHM:
•
ID_AA64ISAR0_EL1.FHM.
•
ID_ISAR6_EL1.FHM.
•
ID_ISAR6.FHM.
For more information, see:
•
SIMD arithmetic on page C3-205.
•
SIMD by element arithmetic on page C3-212.
•
Advanced SIMD multiply instructions on page F1-3088.

ARM DDI 0487C.a
ID121917

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

A1-63

A1 Introduction to the ARMv8 Architecture
A1.7 ARMv8 architecture extensions

ARMv8.2-LSMAOC, Load/Store Multiple atomicity and ordering controls
ARMv8.2-LSMAOC adds controls that disable legacy behavior of AArch32 Load Multiple and
Store Multiple instructions, and provide a trap of one aspect of this legacy behavior.
Implementation of ARMv8.2-LSMAOC is optional. When implemented it provides:
•

•

LSMAOE fields in the SCTLR_EL1, SCTLR_EL2, HSCTLR, and SCTLR registers. These
fields can have the following effects on the behavior of AArch32 Load Multiple and Store
Multiple instructions:
—

An interrupt can be taken between two memory accesses made by a single Load
Multiple or Store Multiple instruction.

—

The memory accesses made by a single Load Multiple or Store Multiple instruction to
Device memory with the non-Reordering attribute can be reordered.

nTLSMD fields in the SCTLR_EL1, SCTLR_EL2, HSCTLR, and SCTLR registers. These
fields can cause an access to Device-nGRE, Device-nGnRE, or Device-nGnRnE memory by
an AArch32 Load Multiple and Store Multiple instruction to generate an Alignment fault.

Note
ARMv8.2 deprecates software dependence on the legacy behavior of AArch32 Load Multiple and
Store Multiple instructions, and these fields disable this behavior.
The following fields identify the support for ARMv8.2-LSMAOC:
•
ID_AA64MMFR2_EL1.LSM
•
ID_MMFR4_EL1.LSM
•
ID_MMFR4.LSM.
For more information, see the register field descriptions and:
•

Generation of Alignment faults by Load/store multiple accesses to Device memory on
page E2-3028.

•

Multi-register loads and stores that access Device memory on page E2-3040.

•

Taking an interrupt or other exception during a multiple-register load or store on
page G1-4660.

ARMv8.2-UAO, PSTATE override of Unprivileged Load/Store
ARMv8.2 adds a new bit to PSTATE. When the value of PSTATE.UAO is 1, and when executed at
EL1 or at EL2 with HCR_EL2.{E2H, TGE} == {1, 1}, the memory accesses made by the
Load/Store unprivileged instructions behave as if they were made by the Load/Store register
instructions. See Load/Store unprivileged on page C3-173 and Load/Store register on page C3-169.
This feature is mandatory in ARMv8.2 implementations.
This feature is supported in AArch64 state only.
The ID_AA64MMFR2_EL1.UAO field identifies the support for ARMv8.2-UAO.
For more information, see:
•
About PSTATE.UAO on page D4-2157.
ARMv8.2-DCPoP, Data cache clean to Point of Persistence
ARMv8.2-DCPoP introduces a mechanism to identify and manage persistent memory locations in
a shared memory hierarchy, including adding the DC CVAP instruction.
This feature is mandatory in ARMv8.2 implementations.
This feature is supported in AArch64 state only.
The ID_AA64ISAR1_EL1.DPB field identifies the support for ARMv8.2-DCPoP.
For more information about ARMv8.2-DCPoP, see:
•
Memory hierarchy on page B2-105.

A1-64

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

A1 Introduction to the ARMv8 Architecture
A1.7 ARMv8 architecture extensions

ARMv8.2-VPIPT, VMID-aware PIPT instruction cache
ARMv8.2-VPIPT supports a new instruction cache type, described as the VMID-aware PIPT
(VPIPT) instruction cache.

Note
ARMv8.2 adds VPIPT to the set of supported cache types, meaning an ARMv8.2 implementation
is permitted to implement VPIPT caches, but is not required to do so.
This feature is supported in AArch64 and AArch32 states.
The CTR_EL0.L1Ip and CTR.L1Ip fields identify the support for ARMv8.2-VPIPT.
For more information, see:
•
VPIPT (VMID-aware PIPT) instruction caches on page D4-2222.
•
VPIPT (VMID-aware PIPT ) instruction caches on page G4-4930.
ARMv8.2-AA32HPD, AArch32 Hierarchical permission disables
ARMv8.1-HPD introduced the ability to disable the hierarchical attributes, APTable, PXNTable,
and UXNTable, in the VMSAv8-64 translation regimes. ARMv8.2-AA32HPD extends this
functionality to the VMSAv8-32 translation regimes when those regimes are using the Long
descriptor translation table format.
This feature is optional in ARMv8.2 implementations. It is IMPLEMENTATION DEFINED whether this
is implemented.
The ID_MMFR4_EL1.HPDS and ID_MMFR4.HPDS fields identify the support for
ARMv8.2-AA32HPD.
For more information, see:
•

Attribute fields in VMSAv8-32 Long-descriptor translation table format descriptors on
page G4-4872.

ARMv8.2-TTPBHA, Translation table page-based hardware attributes
ARMv8.2 provides a mechanism to allow operating systems or hypervisors to make up to four bits
of translation table final-level descriptors available for IMPLEMENTATION DEFINED hardware use.
This functionality is available for all translation regimes in AArch64 state and for stages of
translation in AArch32 state that use the Long descriptor translation table format.
ARMv8.2-TTPBHA is optional in ARMv8.2 implementations, but implementation of
ARMv8.2-TTPBHA requires implementation of both:
•
ARMv8.1-HPD.
•
ARMv8.2-AA32HPD, if any Exception level higher than EL0 can use AArch32.

Note
For stage 1 translations, page-based hardware attributes can only be used for a stage of translation
for which the Hierarchical permission disables field has a value of 1.
The following fields identify the support for ARMv8.2-TTPBHA:
•
ID_AA64MMFR1_EL1.HPDS
•
ID_MMFR4_EL1.HPDS
•
ID_MMFR4.HPDS.
For more information, see:

ARM DDI 0487C.a
ID121917

•

Memory attribute fields in the VMSAv8-64 translation table format descriptors on
page D4-2148.

•

Attribute fields in VMSAv8-32 Long-descriptor translation table format descriptors on
page G4-4872.

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

A1-65

A1 Introduction to the ARMv8 Architecture
A1.7 ARMv8 architecture extensions

ARMv8.2-LPA, Large PA and IPA support
ARMv8.2-LPA:
•

Allows a larger intermediate physical address (IPA) and PA space of up to 52 bits when using
the 64KB translation granule.

•

Allows a level 1 block size where the block covers a 4TB address range for the 64KB
translation granule if the implementation support 52 bits of PA.

This is an optional feature in ARMv8.2 implementations. It is IMPLEMENTATION DEFINED whether
it is implemented.
This feature is supported in AArch64 state only.
The ID_AA64MMFR0_EL1.PARange field identifies the support for ARMv8.2-LPA.
For more information about ARMv8.2-LPA, see:
•

VMSA address types and address spaces on page D4-2083.

•

Address size configuration on page D4-2097.

•

Extending addressing above 48 bits on page D4-2101.

•

VMSAv8-64 translation table level 0, level 1, and level 2 descriptor formats on
page D4-2143.

•

ARMv8 translation table level 3 descriptor formats on page D4-2147.

ARMv8.2-LVA, Large VA support
ARMv8.2-LVA supports a larger VA space for each translation table base register of up to 52 bits
when using the 64KB translation granule.
This feature is supported in AArch64 state only.
This is an optional feature in ARMv8.2 implementations. It is IMPLEMENTATION DEFINED whether
it is implemented.
If ARMv8.2-LVA is implemented, then any implemented trace macrocell must be at least ETMv4.2.
The ID_AA64MMFR2_EL1.VARange field identifies the support for ARMv8.2-LVA.
For more information about ARMv8.2-LVA, see:
•

VMSA address types and address spaces on page D4-2083.

•

Address size configuration on page D4-2097.

•

Extending addressing above 48 bits on page D4-2101.

•

VMSAv8-64 translation table level 0, level 1, and level 2 descriptor formats on
page D4-2143.

•

ARMv8 translation table level 3 descriptor formats on page D4-2147.

ARMv8.2-TTCNP, Translation table Common not private translations
ARMv8.2-TTCNP permits multiple PEs in the same Inner Shareable domain to use the same
translation tables for a given stage of address translation.
This feature is mandatory in ARMv8.2 implementations.
This facility is available for all VMSAv8-64 translation regimes and for VMSAv8-32 translation
stages that use the Long descriptor translation table format.
The following fields identify the support for ARMv8.2-TTCNP:
•
ID_AA64MMFR2_EL1.CnP.
•
ID_MMFR4_EL1.CnP.
•
ID_MMFR4.CnP.
For more information, see:
•
Common not private translations on page D4-2202.
•
Common not private translations in VMSAv8-32 on page G4-4919.

A1-66

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

A1 Introduction to the ARMv8 Architecture
A1.7 ARMv8 architecture extensions

ARMv8.2-TTS2UXN, Translation table stage 2 Unprivileged Execute-never
ARMv8.2-TTS2UXN extends the stage 2 translation table access permissions to provide control of
whether memory is executable at EL0 independent of whether it is executable at EL1.
This feature is mandatory in ARMv8.2 implementations.
This facility is available for stage 2 translation stages in VMSAv8-64 and VMSAv8-32.
The following fields identify the support for ARMv8.2-TTS2UXN:
•
ID_AA64MMFR1_EL1.XNX.
•
ID_MMFR4_EL1.XNX.
•
ID_MMFR4.XNX.
For more information, see:
•
Access permissions for instruction execution on page D4-2159.
•
Access permissions for instruction execution on page G4-4892.
ARMv8.2-Debug, ARMv8.2 Debug
ARMv8.2-Debug covers a selection of mandatory changes, including:
•

If the core power domain is powered up and DoubleLockStatus() == TRUE,
EDPRSR.{DLK,SPD,PU} is only permitted to read {UNKNOWN, 0, 0}.

•

The definition of External Catch debug events is extended to include reset entry.

•

All CONSTRAINED UNPREDICTABLE cases that generate External Catch debug events are
removed.

•

Controls are added to EDECCR to control Exception Catch debug event generation on
exception return.

•

All IMPLEMENTATION DEFINED control of external debug accesses to OSLAR_EL1 is
removed.

•

ExternalSecureNoninvasiveDebugEnabled() cannot override software controls of counting

attributable events in Secure state.
The fields that identify the support for ARMv8.2-Debug are:
•
ID_AA64DFR0_EL1.DebugVer and DBGDIDR.Version.
•
ID_DFR0_EL1.{CopSDbg, CopDbg} and ID_DFR0.{CopSDbg, CopDbg}.
•
EDDEVARCH.ARCHID.
For more information, see:
•
Exception Catch debug events from ARMv8.2 on page H3-5751.
•
EDPRSR.{DLK, SPD, PU} and the Core power domain on page H6-5801.
•
Interaction with EL3 on page D5-2233.
•
External access disabled on page H8-5824.
ARMv8.2-PCSample, PC Sample-based Profiling
In ARMv8.2, the control and implementation of the OPTIONAL PC Sample-based Profiling extension
is moved from ED*SR Debug registers to PM*SR registers in the Performance Monitors address
space. See Chapter H7 The PC Sample-based Profiling Extension.
This is an optional feature in ARMv8.2 implementations. It is IMPLEMENTATION DEFINED whether
it is implemented.
The following fields identify the support for ARMv8.2-PCSample:
•
EDDEVID.PCSample.
•
DBGDEVID.PCSample.
•
EDDEVID1.PCSROffset.
•
DBGDEVID1.PCSROffset.
•
PMDEVID.PCSample.

ARM DDI 0487C.a
ID121917

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

A1-67

A1 Introduction to the ARMv8 Architecture
A1.7 ARMv8 architecture extensions

ARMv8.2-IESB, Implicit error synchronization event
ARMv8.2-IESB adds an implicit error synchronization event at exception entry and return,
controlled by the added SCTLR_ELx.IESB fields. An IESB field is added to the ESR_ELx
syndrome registers.
The implicit error synchronization events affect the same synchronizable asynchronous events that
are synchronized by the ESB instruction, see The Reliability, Availability, and Serviceability (RAS)
Extension on page A1-70.
This feature is mandatory in ARMv8.2 implementations.
This feature is supported in AArch64 state only.
The ID_AA64MMFR2_EL1.IESB field identifies the support for ARMv8.2-IESB.
For more information, see:
•

The ARM® Reliability, Availability, and Serviceability (RAS) Specification, ARMv8, for the
ARMv8-A architecture profile.

Extensions to the ARM Cryptographic Extensions
See the description of the ARMv8.2-SHA and ARMv8.2-SM features in ARMv8.2 extensions to the
Cryptographic Extension on page A1-58.

Additional requirements of ARMv8.2
The ARMv8.2 architecture includes some mandatory changes that are not associated with a feature. These are:
Changes to ACTLR2 and HCTLR2 registers
In AArch32 state, the ACTLR2 and HACTLR2 registers become mandatory.
Implementation of RAS Extension
The RAS Extension must be implemented, see The Reliability, Availability, and Serviceability
(RAS) Extension on page A1-70.
An implementation of the ARMv8.2 extension must comply with all of the additional requirements. Such an
implementation, when combined with the mandatory architectural features of ARMv8.2, is also called an
implementation of the ARMv8.2 architecture.

A1.7.5

The ARMv8.3 architecture extension
The ARMv8.3 architecture extension adds architectural features.

Architectural features added by ARMv8.3
An implementation of the ARMv8.3 extension must include all of the features that this section describes as
mandatory. Such an implementation is also called an implementation of the ARMv8.2 architecture.
The ARMv8.3 architecture extension adds the following architectural features, which are identified by the
architectural feature name and a short description of the feature:
ARMv8.3-CompNum, SIMD complex number support
ARMv8.3-CompNum introduces instructions for floating-point multiplication and addition of
complex numbers.
These instructions are added to the A64 and A32/T32 instruction sets.
This feature is mandatory in ARMv8.3 implementations.
The half-precision versions of these instructions are implemented only if ARMv8.2-FP16 is
implemented. Otherwise they are UNDEFINED.
The fields that identify the presence of ARMv8.3-CompNum are:
•
ID_AA64ISAR1_EL1.FCMA.
•
ID_ISAR5_EL1.VCMA.

A1-68

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

A1 Introduction to the ARMv8 Architecture
A1.7 ARMv8 architecture extensions

•

ID_ISAR5.VCMA.

For more information, see:
•
SIMD complex number arithmetic on page C3-218.
•
Advanced SIMD complex number arithmetic instructions on page F1-3089.
ARMv8.3-JSConv, Javascript conversion instructions
ARMv8.3-JSConv introduces instructions that perform a conversion from a double-precision
floating point value to a signed 32-bit integer, with rounding to zero. For more information, see:
For the A64 instruction set
•

FJCVTZS on page C7-1206.

For the A32/T32 instruction set
•

VJCVT on page F6-4153.

These instructions are added to the A64 and A32/T32 instruction sets.
The feature is mandatory in ARMv8.3 implementations.
The fields that identify the presence of ARMv8.3-JSConv are:
•
ID_AA64ISAR1_EL1.JSCVT
•
ID_ISAR6_EL1.JSCVT
•
ID_ISAR6.JSCVT.
For more information, see:
•
Floating-point conversion on page C3-200.
•
About the A64 SIMD and floating-point instructions on page C7-1006.
•
Advanced SIMD and floating-point instructions on page E1-2990.
•
Floating-point data-processing instructions on page F1-3093.
ARMv8.3-RCPC, Weaker release consistency
ARMv8.3-RCPC introduces three instructions to support the weaker Release Consistency processor
consistent (RCpc) model that enables the reordering of a Store-Release followed by a Load-Acquire
to a different address:
•
LDAPR on page C6-655.
•
LDAPRB on page C6-657.
•
LDAPRH on page C6-658.
These instructions are added to the A64 instruction set.
The feature is mandatory in ARMv8.3 implementations.
The ID_AA64ISAR1_EL1.LRCPC field identifies the presence of ARMv8.3-RCPC.
For more information, see:
•
Load-Acquire, Load-AcquirePC, and Store-Release on page B2-101.
•
Load-Acquire/Store-Release on page C3-174.
ARMv8.3-NV, Nested Virtualization
ARMv8.3-NV provides support for a Guest Hypervisor to run in Non-secure EL1 and ensures that
the Guest Hypervisor is unaware that it is running at that Exception level. A Guest Hypervisor is
supported regardless of the value of HCR_EL2.E2H.
This feature is supported in AArch64 state only.
The feature is mandatory in ARMv8.3 implementations.
The ID_AA64MMFR2_EL1.NV field identifies the support for ARMv8.3-NV.
For more information, see Nested virtualization on page D4-2188.
ARMv8.3-CCIDX, Cache extended number of sets

ARM DDI 0487C.a
ID121917

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

A1-69

A1 Introduction to the ARMv8 Architecture
A1.7 ARMv8 architecture extensions

ARMv8.3-CCIDX introduces the following registers to allow caches to be described with greater
numbers of sets and greater associativity:
•
A 64-bit format of CCSIDR_EL1.
•
CCSIDR2_EL1.
•
CCSIDR2.
This feature is supported in AArch64 and AArch32 states.
This feature is optional in ARMv8.3 implementations.
The following fields identify the support for ARMv8.3-CCIDX:
•
ID_AA64MMFR2_EL1.CCIDX
•
ID_MMFR4_EL1.CCIDX.
•
ID_MMFR4.CCIDX.
For more information, see:
•

Possible formats of the Cache Size Identification Register, CCSIDR_EL1 on page D3-2053.

•

Possible formats of the Cache Size Identification Registers, CCSIDR and CCSIDR2 on
page G3-4812.

ARMv8.3-PAuth, Pointer Authentication
ARMv8.3-PAuth adds functionality that supports address authentication of the contents of a register
before that register is used as the target of an indirect branch, or as a load.
This feature is supported only in AArch64 state.
This feature is mandatory in ARMv8.3 implementations.
The fields that identify the support for ARMv8.3-PAuth are ID_AA64ISAR1_EL1.{GPI, GPA,
API, APA}.
For more information, see Pointer authentication in AArch64 state on page D4-2086.

A1.7.6

The Reliability, Availability, and Serviceability (RAS) Extension
The RAS Extension is a mandatory extension to the ARMv8.2 architecture, and an optional extension to the
ARMv8.0 and the ARMv8.1 architectures.
The RAS Extension improves the dependability of a system by providing:
•
Reliability, that is, the continuity of correct service.
•
Availability, that is, the readiness for correct service.
•
Serviceability, that is, the ability to undergo modifications and repairs.
ID_AA64PFR0_EL1.RAS in AArch64 state, and ID_PFR0.RAS in AArch32 state, indicate whether the RAS
Extension is implemented.
The RAS Extension introduces a new barrier instruction, the Error Synchronization Barrier (ESB), to the A32, T32,
and A64 instruction sets.
The RAS Extension introduces the following System registers:
•

A1-70

In AArch64 state:
—

DISR_EL1, Deferred Interrupt Status Register.

—

ERRIDR_EL1, Error Record ID Register.

—

ERRSELR_EL1, Error Record Select Register.

—

ERXADDR_EL1, Selected Error Record Address Register.

—

ERXCTLR_EL1, Selected Error Record Control Register.

—

ERXFR_EL1, Selected Error Record Feature Register.

—

ERXMISC0_EL1, Selected Error Record Miscellaneous Register 0.

—

ERXMISC1_EL1, Selected Error Record Miscellaneous Register 1.

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

A1 Introduction to the ARMv8 Architecture
A1.7 ARMv8 architecture extensions

•

—

ERXSTATUS_EL1, Selected Error Record Primary Status Register.

—

VDISR_EL2, Virtual Deferred Interrupt Status Register.

—

VSESR_EL2, Virtual SError Exception Syndrome Register.

In AArch32 state:
—

DISR, Deferred Interrupt Status Register.

—

ERRIDR, Error Record ID Register.

—

ERRSELR, Error Record Select Register.

—

ERXADDR, Selected Error Record Address Register.

—

ERXADDR2, Selected Error Record Address Register 2.

—

ERXCTLR, Selected Error Record Control Register.

—

ERXCTLR2, Selected Error Record Control Register 2.

—

ERXFR, Selected Error Record Feature Register.

—

ERXFR2, Selected Error Record Feature Register 2.

—

ERXMISC0, Selected Error Record Miscellaneous Register 0.

—

ERXMISC1, Selected Error Record Miscellaneous Register 1.

—

ERXMISC2, Selected Error Record Miscellaneous Register 2.

—

ERXMISC3, Selected Error Record Miscellaneous Register 3.

—

ERXSTATUS, Selected Error Record Primary Status Register.

—

VDFSR, Virtual SError Exception Syndrome Register.

—

VDISR, Virtual Deferred Interrupt Status Register.

In addition, the RAS Extension introduces a number of memory-mapped registers.
All registers that are exclusively associated with the RAS architecture are described in the ARM® Reliability,
Availability, and Serviceability (RAS) Specification, ARMv8, for the ARMv8-A architecture profile.
The RAS Extension modifies the following existing System registers:
•

•

ARM DDI 0487C.a
ID121917

In AArch64 state:
—

HCR_EL2.

—

ID_AA64MMFR1_EL1.

—

ID_AA64PFR0_EL1.

—

ID_MMFR4_EL1.

—

ID_PFR0_EL1.

—

SCR_EL3.

—

ESR_ELx.

—

IFSR32_EL2.

In AArch32 state:
—

HCR2.

—

ID_MMFR4.

—

ID_PFR0.

—

SCR.

—

DFSR.

—

IFSR.

—

HSR.

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

A1-71

A1 Introduction to the ARMv8 Architecture
A1.7 ARMv8 architecture extensions

A1.7.7

The Statistical Profiling Extension
The Statistical Profiling Extension is an optional extension introduced by the ARMv8.2 architecture.
Implementation of the Statistical Profiling Extension requires implementation of at least ARMv8.1 of the
ARMv8-A architecture profile. The Statistical Profiling Extension is only supported in AArch64 state.
The Statistical Profiling Extension provides a non-invasive method of sampling software and hardware using
randomized sampling of either architectural instructions, as defined by the instruction set architecture, or by
microarchitectural operations.
ID_AA64DFR0_EL1.PMSVer indicates whether the Statistical Profiling Extension is implemented.
For more information see Chapter D6 The Statistical Profiling Extension.

A1.7.8

The Scalable Vector Extension (SVE)
The Scalable Vector Extension is an optional extension introduced by the ARMv8.2 architecture. SVE is supported
in AArch64 state only.
The Scalable Vector Extension provides vector instructions that, primarily, support wider vectors than the ARM
Advanced SIMD instruction set. The ARM® Architecture Reference Manual Supplement, The Scalable Vector
Extension (SVE), for ARMv8-A describes the SVE.
ID_AA64PFR0_EL1.SVE indicates whether the Scalable Vector Extension is implemented.
The Scalable Vector Extension affects some AArch64 System registers, and those register changes are included in
this issue of this Manual, where they are identified as SVE features. SVE also introduces new AArch64 System
registers, however these do not appear in this manual. For more information about the new System registers
introduced by SVE, please see the ARM® Architecture Reference Manual Supplement, The Scalable Vector
Extension (SVE), for ARMv8-A.
The Scalable Vector Extension introduces the following System registers:
•
ID_AA64ZFR0_EL1.
•
ZCR_EL1, and an EL2 alias of this register, ZCR_EL12.
•
ZCR_EL2.
•
ZCR_EL3.
The Scalable Vector Extension modifies the following existing System registers:
•
CPACR_EL1.
•
CPTR_EL2.
•
CPTR_EL3.
•
ESR_ELx.
•
ID_AA64PFR0_EL1.
•
TCR_EL1.
•
TCR_EL2.

A1-72

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

Part B
The AArch64 Application Level Architecture

Chapter B1
The AArch64 Application Level Programmers’ Model

This chapter gives an application level view of the ARM programmers’ model. It contains the following sections:
•
About the Application level programmers’ model on page B1-76.
•
Registers in AArch64 Execution state on page B1-77.
•
Software control features and EL0 on page B1-82.

ARM DDI 0487C.a
ID121917

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

B1-75

B1 The AArch64 Application Level Programmers’ Model
B1.1 About the Application level programmers’ model

B1.1

About the Application level programmers’ model
This chapter contains the programmers’ model information required for application development.
The information in this chapter is distinct from the system information required to service and support application
execution under an operating system, or higher level of system software. However, some knowledge of the system
information is needed to put the Application level programmers' model into context.
Depending on the implementation choices, the architecture supports multiple levels of execution privilege,
indicated by different Exception levels that number upwards from EL0 to EL3. EL0 corresponds to the lowest
privilege level and is often described as unprivileged. The Application level programmers’ model is the
programmers’ model for software executing at EL0. For more information see Exception levels on page D1-1850.
System software determines the Exception level, and therefore the level of privilege, at which software runs. When
an operating system supports execution at both EL1 and EL0, an application usually runs unprivileged at EL0. This:
•

Permits the operating system to allocate system resources to an application in a unique or shared manner.

•

Provides a degree of protection from other processes, and so helps protect the operating system from
malfunctioning software.

This chapter indicates where some system level understanding is necessary, and where relevant it gives a reference
to the system level description.
Execution at any Exception level above EL0 is often referred to as privileged execution.
For more information on the system level view of the architecture refer to Chapter D1 The AArch64 System Level
Programmers’ Model.

B1-76

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

B1 The AArch64 Application Level Programmers’ Model
B1.2 Registers in AArch64 Execution state

B1.2

Registers in AArch64 Execution state
This section describes the registers and process state visible at EL0 when executing in the AArch64 state. It includes
the following:
•
Registers in AArch64 state
•
Process state, PSTATE on page B1-79
•
System registers on page B1-80

B1.2.1

Registers in AArch64 state
In the AArch64 application level view, an ARM processing element has:
R0-R30

31 general-purpose registers, R0 to R30. Each register can be accessed as:
•

A 64-bit general-purpose register named X0 to X30.

•

A 32-bit general-purpose register named W0 to W30.

See the register name mapping in Figure B1-1.
63

32 31

0

Rn
Wn
Xn

Figure B1-1 General-purpose register naming
The X30 general-purpose register is used as the procedure call link register.

Note
In instruction encodings, the value 0b11111 (31) is used to indicate the ZR (zero register). This
indicates that the argument takes the value zero, but does not indicate that the ZR is implemented
as a physical register.
SP

A 64-bit dedicated Stack Pointer register. The least significant 32 bits of the stack-pointer can be
accessed via the register name WSP.
The use of SP as an operand in an instruction, indicates the use of the current stack pointer.

Note
Stack pointer alignment to a 16-byte boundary is configurable at EL1. For more information see the
Procedure Call Standard for the ARM 64-bit Architecture.
PC

A 64-bit Program Counter holding the address of the current instruction.
Software cannot write directly to the PC. It can only be updated on a branch, exception entry or
exception return.

Note
Attempting to execute an A64 instruction that is not word-aligned generates a PC alignment fault,
see PC alignment checking on page D1-1868.
V0-V31

32 SIMD&FP registers, V0 to V31. Each register can be accessed as:
•

ARM DDI 0487C.a
ID121917

A 128-bit register named Q0 to Q31.

•

A 64-bit register named D0 to D31.

•

A 32-bit register named S0 to S31.

•

A 16-bit register named H0 to H31.

•

An 8-bit register named B0 to B31.

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

B1-77

B1 The AArch64 Application Level Programmers’ Model
B1.2 Registers in AArch64 Execution state

•

A 128-bit vector of elements.

•

A 64-bit vector of elements.

Where the number of bits described by a register name does not occupy an entire SIMD&FP
register, it refers to the least significant bits. See Figure B1-2.

127

64 63

32 31

16 15

8 7

0

Vn
Bn
Hn
Sn
Dn
Qn

Figure B1-2 SIMD and floating-point register naming
For more information about data types and vector formats, see Supported data types on page A1-40.
FPCR, FPSR Two SIMD and floating-point control and status registers, FPCR and FPSR.
See Registers for instruction processing and exception handling on page D1-1859 for more information on the
registers.

Pseudocode description of registers in AArch64 state
In the pseudocode functions that access registers:
•
The assignment form is used for register writes.
•
The non-assignment for register reads.
The uses of the X[] function are:
•
Reading or writing X0-X30, using n to index the required register.
•
Reading the zero register ZR, accessed as X[31].

Note
The pseudocode use of X[31] to represent the zero register does not indicate that hardware must implement this
register.
The AArch64 SP[] function is used to read or write the current SP.
The AArch64 PC[] function is used to read the PC.
The AArch64 V[] function is used to read or write the Advanced SIMD and floating-point registers V0-V31, using
a parameter n to index the required register.
The AArch64 Vpart[] function is used to read or write a part of one of V0-V31, using a parameter n to index the
required register, and a parameter part to indicate the required part of the register, see the function description for
more information.
The SP[], PC[], V[], and Vpart[] functions are defined in Chapter J1 ARMv8 Pseudocode.

B1-78

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

B1 The AArch64 Application Level Programmers’ Model
B1.2 Registers in AArch64 Execution state

B1.2.2

Process state, PSTATE
Process state or PSTATE is an abstraction of process state information. All of the instruction sets provide
instructions that operate on elements of PSTATE.
The following PSTATE information is accessible at EL0:
The Condition flags
Flag-setting instructions set these. They are:
N

Negative Condition flag. If the result of the instruction is regarded as a two's
complement signed integer, the PE sets this to:
•
1 if the result is negative.
•
0 if the result is positive or zero.

Z

Zero Condition flag. Set to:
•
1 if the result of the instruction is zero.
•
0 otherwise.
A result of zero often indicates an equal result from a comparison.

C

Carry Condition flag. Set to:

V

•

1 if the instruction results in a carry condition, for example an unsigned overflow
that is the result of an addition.

•

0 otherwise.

Overflow Condition flag. Set to:
•

1 if the instruction results in an overflow condition, for example a signed
overflow that is the result of an addition.

•

0 otherwise.

Conditional instructions test the N, Z, C and V Condition flags, combining them with the Condition
code for the instruction to determine whether the instruction must be executed. In this way,
execution of the instruction is conditional on the result of a previous operation. For more
information about conditional execution, see Condition flags and related instructions on
page C6-525.
The exception masking bits
D

Debug exception mask bit. When EL0 is enabled to modify the mask bits, this bit is
visible and can be modified. However, this bit is architecturally ignored at EL0.

A

SError interrupt mask bit.

I

IRQ interrupt mask bit.

F

FIQ interrupt mask bit.

For each bit, the values are:
0
Exception not masked.
1
Exception masked.
Access at EL0 using AArch64 state depends on SCTLR_EL1.UMA. See Traps to EL1 of EL0
accesses to the PSTATE.{D, A, I, F} interrupt masks on page D1-1914.
See Process state, PSTATE on page D1-1865 for the system level view of PSTATE.

ARM DDI 0487C.a
ID121917

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

B1-79

B1 The AArch64 Application Level Programmers’ Model
B1.2 Registers in AArch64 Execution state

Accessing PSTATE fields at EL0
At EL0 using AArch64 state, PSTATE fields can be accessed using Special-purpose registers that can be directly
read using the MRS instruction and directly written using the MSR (register) instructions. Table B1-1 shows the
Special-purpose registers that access the PSTATE fields that hold AArch64 state when the PE is at EL0 using
AArch64. All other PSTATE fields do not have direct read and write access at EL0.
Table B1-1 Accessing PSTATE fields at EL0 using MRS and MSR (register)
Special-purpose register

PSTATE fields

NZCV

N, Z, C, V

DAIF

D, A, I, F

Software can also use the MSR (immediate) instruction to directly write to PSTATE.{D, A, I, F}. Table B1-2 shows
the MSR (immediate) operands that can directly write to PSTATE.{D, A, I, F} when the PE is at EL0 using AArch64
state.
Table B1-2 Accessing PSTATE.{D, A, I, F} at EL0 using MSR (immediate)
Operand

PSTATE fields

Notes

DAIFSet

D, A, I, F

Directly sets any of the PSTATE.{D,A, I, F} bits to 1

DAIFClr

D, A, I, F

Directly clears any of the PSTATE.{D, A, I, F} bits to 0

However, access to the PSTATE.{D, A, I, F} fields at EL0 using AArch64 state depends on SCTLR_EL1.UMA.
Traps to EL1 of EL0 accesses to the PSTATE.{D, A, I, F} interrupt masks on page D1-1914.
Writes to the PSTATE fields have side-effects on various aspects of the PE operation. All of these side-effects, are
guaranteed:
•
Not to be visible to earlier instructions in the execution stream.
•
To be visible to later instructions in the execution stream.

B1.2.3

System registers
System registers provide support for execution control, status and general system configuration. The majority of the
System registers are not accessible at EL0.
However, some System registers can be configured to allow access from software executing at EL0. Any access
from EL0 to a System register with the access right disabled causes the instruction to behave as UNDEFINED. The
registers that can be accessed from EL0 are:
Cache ID registers

The CTR_EL0 and DCZID_EL0 registers provide implementation parameters for EL0
cache management support.

Debug registers

A debug communications channel is supported by the MDCCSR_EL0, DBGDTR_EL0,
DBGDTRRX_EL0 and DBGDTRTX_EL0 registers.

Performance Monitors registers
See Performance Monitors support on page B1-81.

B1-80

Thread ID registers

The TPIDR_EL0 and TPIDRRO_EL0 registers are two thread ID registers with different
access rights.

Timer registers

In ARMv8 the following operations are performed:
•

Read access to the system counter clock frequency using CNTFRQ_EL0.

•

Physical and virtual timer count registers, CNTPCT_EL0 and CNTVCT_EL0.

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

B1 The AArch64 Application Level Programmers’ Model
B1.2 Registers in AArch64 Execution state

•

Physical up-count comparison, down-count value and timer control registers,
CNTP_CVAL_EL0, CNTP_TVAL_EL0, and CNTP_CTL_EL0.

•

Virtual up-count comparison, down-count value and timer control registers,
CNTV_CVAL_EL0, CNTV_TVAL_EL0, and CNTV_CTL_EL0.

Performance Monitors support
The ARMv8 architecture defines optional Performance Monitors.
The basic form of the Performance Monitors is:
•

A 64-bit cycle counter.

•

Up to a maximum of 32 IMPLEMENTATION DEFINED event counters, where the number is identified by the
PMCR_EL0.N field.

•

System register access to the cycle counter and event registers, and related controls for:
—
Enabling and resetting counters.
—
Flagging overflows.
—
Generating interrupts on overflow.
Software can enable the cycle counter independently of the event counters.

Software executing at EL1 or a higher Exception level, for example an operating system, can enable access to the
counters from EL0. This allows an application to monitor its own performance with fine grain control without
requiring operating system support. For example, an application might implement per-function performance
monitoring.
For details on the features, configuration and control of the Performance Monitors, see Chapter D5 The
Performance Monitors Extension.
EL0 access to Performance Monitors
To allow application code to make use of the Performance Monitors, software executing at a higher Exception level
must set the following bits in the PMUSERENR_EL0 System register:
EN

When set to 1, access to all Performance Monitors registers is allowed at EL0, except for writes to
PMUSERENR_EL0, and reads/writes of PMINTENSET_EL1 and PMINTENCLR_EL1.

ER

When set to 1, read access to event counters is allowed at EL0. This includes read/write access to
PMSELR_EL0, so that the event counter to read through PMXEVCNTR_EL0 can be set.

CR

When set to 1, read access to PMCCNTR_EL0 is allowed at EL0.

SW

When set to 1, write access to PMSWINC_EL0 is allowed at EL0.

Note
Register PMUSERENR_EL0 is always read-only at EL0.

ARM DDI 0487C.a
ID121917

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

B1-81

B1 The AArch64 Application Level Programmers’ Model
B1.3 Software control features and EL0

B1.3

Software control features and EL0
The following sections describe the EL0 view of the ARMv8 software control features:
•
Exception handling
•
Wait for Interrupt and Wait for Event
•
The YIELD instruction
•
Application level cache management on page B1-83
•
Instructions relating to Debug on page B1-83

B1.3.1

Exception handling
In the ARM architecture, an exception causes a change of program flow. Execution of an exception handler starts,
at an Exception level higher than EL0, from a defined vector that relates to the exception taken.
Exceptions include:
•
Interrupts.
•
Memory system aborts.
•
Exceptions generated by attempting to execute an instruction that is UNDEFINED.
•
System calls.
•
Secure monitor or Hypervisor traps.
•
Debug exceptions.
Most details of exception handling are not visible to application level software, and are described in Chapter D1 The
AArch64 System Level Programmers’ Model.
The SVC instruction causes a Supervisor Call exception. This provides a mechanism for unprivileged software to
make a system call to an operating system.
The BRK instruction generates a Breakpoint Instruction exception. This provides a mechanism for debugging
software using debugger executing on the same PE, see Breakpoint Instruction exceptions on page D2-1993.

Note
The BRK instruction is supported only in the A64 instruction set. The equivalent instruction in the T32 and A32
instruction sets is BKPT.

B1.3.2

Wait for Interrupt and Wait for Event
Issuing a WFI instruction indicates that no further execution is required until a WFI wake-up event occurs, see Wait
For Interrupt on page D1-1957. This permits entry to a low-power state.
Issuing a WFE instruction indicates that no further execution is required until a WFE wake-up event occurs, see Wait
for Event mechanism and Send event on page D1-1954. This permits entry to a low-power state.

B1.3.3

The YIELD instruction
The YIELD instruction provides a hint that the task performed by a thread is of low importance so that it could yield,
see YIELD on page C6-1003. This mechanism can be used to improve overall performance in a Symmetric
Multithreading (SMT) or Symmetric Multiprocessing (SMP) system.
Examples of when the YIELD instruction might be used include a thread that is sitting in a spin-lock, or where the
arbitration priority of the snoop bit in an SMP system is modified. The YIELD instruction permits binary
compatibility between SMT and SMP systems.
The YIELD instruction is a NOP (No Operation) hint instruction.
The YIELD instruction has no effect in a single-threaded system, but developers of such systems can use the
instruction to flag its intended use for future migration to a multiprocessor or multithreading system. Operating
systems can use YIELD in places where a yield hint is wanted, knowing that it will be treated as a NOP if there is no
implementation benefit.

B1-82

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

B1 The AArch64 Application Level Programmers’ Model
B1.3 Software control features and EL0

B1.3.4

Application level cache management
A small number of cache management instructions can be enabled at EL0 from higher levels of privilege using the
SCTLR_EL1 System register. Any access from EL0 to an operation with the access right disabled causes the
instruction to behave as UNDEFINED.
About the available operations, see Application level access to functionality related to caches on page B2-106.

B1.3.5

Instructions relating to Debug
Exception handling on page B1-82 refers to the BRK instruction, which generates a Breakpoint Instruction exception.
In addition, in both AArch64 state and AArch32 state, the HLT instruction causes the PE to halt execution and enter
Debug state. This provides a mechanism for debugging software using a debugger that is external to the PE, see
Chapter H1 About External Debug.

Note
In AArch32 state, previous versions of the architecture defined the DBG instruction, that could provide a hint to the
debug system. In ARMv8, this instruction executes as a NOP. ARM deprecates the use of the DBG instruction.

ARM DDI 0487C.a
ID121917

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

B1-83

B1 The AArch64 Application Level Programmers’ Model
B1.3 Software control features and EL0

B1-84

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

Chapter B2
The AArch64 Application Level Memory Model

This chapter gives an application level view of the memory model. It contains the following sections:
•
About the ARM memory model on page B2-86.
•
Atomicity in the ARM architecture on page B2-88.
•
Definition of the ARMv8 memory model on page B2-92.
•
Caches and memory hierarchy on page B2-104.
•
Alignment support on page B2-109.
•
Endian support on page B2-111.
•
Memory types and attributes on page B2-114.
•
Mismatched memory attributes on page B2-125.
•
Synchronization and semaphores on page B2-128.

Note
In this chapter, System register names usually link to the description of the register in Chapter D10 AArch64 System
Register Descriptions, for example SCTLR_EL1.

ARM DDI 0487C.a
ID121917

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

B2-85

B2 The AArch64 Application Level Memory Model
B2.1 About the ARM memory model

B2.1

About the ARM memory model
The ARM architecture is a weakly ordered memory architecture that permits the observation and completion of
memory accesses in a different order from the program order. The following sections of this chapter provide the
complete definition of the ARMv8 memory model, this introduction is not intended to contradict the definition
found in those sections. In general, the basic principles of the ARMv8 memory model are:
•

To provide a memory model that has similar weaknesses to those found in the memory models used by
high-level programming languages such as C or Java. For example, by permitting independent memory
accesses to be re-ordered as seen by other observers.

•

To avoid the requirement for multi-copy atomicity in the majority of memory types.

•

The provision of instructions and memory barriers to compensate for the lack of multi-copy atomicity in the
cases where it would be needed.

•

The use of address, data, and control dependencies in the creation of order so as to avoid having excessive
numbers of barriers or other explicit instructions in common situations where some order is required by the
programmer or the compiler.

This section contains:
•
Address space.
•
Memory type overview.

B2.1.1

Address space
Address calculations are performed using 64-bit registers. However, supervisory software can configure the top
eight address bits for use as a tag, as described in Address tagging in AArch64 state on page D4-2084. If this is done,
address bits[63:56]:
•
Are not considered when determining whether the address is valid.
•
Are never propagated to the program counter.
Supervisory software determines the valid address range. Attempting to access an address that is not valid generates
an MMU fault.
Simple sequential execution of instructions might overflow the valid address range. For more information, see
Virtual address space overflow on page D3-2049.
Memory accesses use the Mem[] function. This function makes an access of the required type. If supervisory software
configures the top eight address bits for use as a tag, the top eight address bits are ignored.
The AccType{} enumeration defines the different access types.

Note

B2.1.2

•

Chapter D3 The AArch64 System Level Memory Model and Chapter D4 The AArch64 Virtual Memory System
Architecture include descriptions of memory system features that are transparent to the application, including
memory access, address translation, memory maintenance instructions, and alignment checking and the
associated fault handling. These chapters also include pseudocode descriptions of these operations.

•

For information on the pseudocode that relates to memory accesses, see Basic memory access on
page D3-2077, Unaligned memory access on page D3-2078, and Aligned memory access on page D3-2078.

Memory type overview
ARMv8 provides the following mutually-exclusive memory types:

B2-86

Normal

This is generally used for bulk memory operations, both read/write and read-only operations.

Device

The ARM architecture forbids Speculative reads of any type of Device memory. This means Device
memory types are suitable attributes for read-sensitive Locations.

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

B2 The AArch64 Application Level Memory Model
B2.1 About the ARM memory model

Locations of the memory map that are assigned to peripherals are usually assigned the Device
memory attribute.
Device memory has additional attributes that have the following effects:
•

They prevent aggregation of reads and writes, maintaining the number and size of the
specified memory accesses. See Gathering on page B2-121.

•

They preserve the access order and synchronization requirements, both for accesses to a
single peripheral and where there is a synchronization requirement on the observability of
one or more memory write and read accesses. See Reordering on page B2-122

•

They indicate whether a write can be acknowledged other than at the end point. See Early
Write Acknowledgement on page B2-123.

For more information on Normal memory and Device memory, see Memory types and attributes on page B2-114.

Note
Earlier versions of the ARM architecture defined a single Device memory type and a Strongly-ordered memory
type. A Note in Device memory on page B2-118 describes how these memory types map onto the ARMv8 memory
types.

ARM DDI 0487C.a
ID121917

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

B2-87

B2 The AArch64 Application Level Memory Model
B2.2 Atomicity in the ARM architecture

B2.2

Atomicity in the ARM architecture
Atomicity is a feature of memory accesses, described as atomic accesses. The ARM architecture description refers
to two types of atomicity, single-copy atomicity and multi-copy atomicity. In the ARMv8 architecture, the atomicity
requirements for memory accesses depend on the memory type, and whether the access is explicit or implicit. For
more information, see:
•
Requirements for single-copy atomicity.
•
Properties of single-copy atomic accesses on page B2-89.
•
Multi-copy atomicity on page B2-89.
•
Requirements for multi-copy atomicity on page B2-90.
•
Concurrent modification and execution of instructions on page B2-90.
For more information about the memory types, see Memory type overview on page B2-86.

B2.2.1

Requirements for single-copy atomicity
For explicit memory accesses generated from an Exception level the following rules apply:
•

A read that is generated by a load instruction that loads a single general-purpose register and is aligned to the
size of the read in the instruction is single-copy atomic.

•

A write that is generated by a store instruction that stores a single general-purpose register and is aligned to
the size of the write in the instruction is single-copy atomic.

•

Reads that are generated by a Load Pair instruction that loads two general-purpose registers and are aligned
to the size of the load to each register are treated as two single-copy atomic reads, one for each register being
loaded.

•

Writes that are generated by a Store pair instruction that stores two general-purpose registers and are aligned
to the size of the store of each register are treated as two single-copy atomic writes, one for each register being
stored.

•

Load-Exclusive Pair instructions of two 32-bit quantities and Store-Exclusive Pair instructions of 32-bit
quantities are single-copy atomic.

•

When the Store-Exclusive of a Load-Exclusive/Store-Exclusive pair instruction using two 64-bit quantities
succeeds, it causes a single-copy atomic update of the entire memory location being updated.

Note
To atomically load two 64-bit quantities, perform a Load-Exclusive pair/Store-Exclusive pair sequence of
reading and writing the same value for which the Store-Exclusive pair succeeds, and use the read values from
the Load-Exclusive pair.

B2-88

•

Where translation table walks generate a read of a translation table entry, this read is single-copy atomic.

•

For the atomicity of instruction fetches, see Concurrent modification and execution of instructions on
page B2-90.

•

Reads to SIMD and floating-point registers of a single 64-bit or smaller quantity that is aligned to the size of
the quantity being loaded are treated as single-copy atomic reads.

•

Writes from SIMD and floating-point registers of a single 64-bit or smaller quantity that is aligned to the size
of the quantity being stored are treated as single-copy atomic writes.

•

Element or Structure Reads to SIMD and floating-point registers of 64-bit or smaller elements, where each
element is aligned to the size of the element being loaded, have each element treated as a single-copy atomic
read.

•

Element or Structure Writes from SIMD and floating-point registers of 64-bit or smaller elements, where
each element is aligned to the size of the element being stored, have each element treated as a single-copy
atomic store.

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

B2 The AArch64 Application Level Memory Model
B2.2 Atomicity in the ARM architecture

•

Reads to SIMD and floating-point registers of a 128-bit value that is 64-bit aligned in memory are treated as
a pair of single-copy atomic 64-bit reads.

•

Writes from SIMD and floating-point registers of a 128-bit value that is 64-bit aligned in memory are treated
as a pair of single-copy atomic 64-bit writes.

All other memory accesses are regarded as streams of accesses to bytes, and no atomicity between accesses to
different bytes is ensured by the architecture.
All accesses to any byte are single-copy atomic.

Note
In AArch64 state, no memory accesses from a DC ZVA have single-copy atomicity of any quantity greater than
individual bytes.
If, according to these rules, an instruction is executed as a sequence of accesses, exceptions, including interrupts,
can be taken during that sequence, regardless of the memory type being accessed. If any of these exceptions are
returned from using their preferred return address, the instruction that generated the sequence of accesses is
re-executed, and so any access performed before the exception was taken is repeated. See also Taking an interrupt
or other exception during a multiple-register load or store on page D1-1909.

Note
The exception behavior for these multiple access instructions means that they are not suitable for use for writes to
memory for the purpose of software synchronization.

B2.2.2

Properties of single-copy atomic accesses
A memory access instruction that is single-copy atomic has the following properties:
1.

For a pair of overlapping single-copy atomic store instructions, all of the overlapping writes generated by one
of the stores are Coherence-after the corresponding overlapping writes generated by the other store.

2.

For a single-copy atomic load instruction L1 that overlaps a single-copy atomic store instruction S2, if one of
the overlapping reads generated by L1 Reads-from one of the overlapping writes generated by S2, then none
of the overlapping writes generated by S2 are Coherence-after the corresponding overlapping reads generated
by L1.

For more information, see Definition of the ARMv8 memory model on page B2-92.

B2.2.3

Multi-copy atomicity
In a multiprocessing system, writes to a memory location are multi-copy atomic if the following conditions are both
true:
•

All writes to the same location are serialized, meaning they are observed in the same order by all observers,
although some observers might not observe all of the writes.

•

A read of a location does not return the value of a write until all observers observe that write.

Note
Writes that are not coherent are not multi-copy atomic.

ARM DDI 0487C.a
ID121917

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

B2-89

B2 The AArch64 Application Level Memory Model
B2.2 Atomicity in the ARM architecture

B2.2.4

Requirements for multi-copy atomicity
For Normal memory, writes are not required to be multi-copy atomic.
For Device memory, writes are not required to be multi-copy atomic.
The ARMv8 memory model is Other-multi-copy atomic. For more information, see Ordering constraints on
page B2-96.

B2.2.5

Concurrent modification and execution of instructions
The ARMv8 architecture limits the set of instructions that can be executed by one thread of execution as they are
being modified by another thread of execution without requiring explicit synchronization.
Concurrent modification and execution of instructions can lead to the resulting instruction performing any behavior
that can be achieved by executing any sequence of instructions that can be executed from the same Exception level,
except where each of the instruction before modification and the instruction after modification is one of a B, BL, BRK,
HVC, ISB, NOP, SMC, or SVC instruction.
For the B, BL, BRK, HVC, ISB, NOP, SMC, and SVC instructions the architecture guarantees that, after modification of the
instruction, behavior is consistent with execution of either:
•
The instruction originally fetched.
•
A fetch of the modified instruction.
If one thread of execution changes a conditional branch instruction, such as B or BL, to another conditional instruction
and the change affects both the condition field and the branch target, execution of the changed instruction by another
thread of execution before the change is synchronized can lead to either:
•
The old condition being associated with the new target address.
•
The new condition being associated with the old target address.
These possibilities apply regardless of whether the condition, either before or after the change to the branch
instruction, is the always condition.
For all other instructions, to avoid UNPREDICTABLE or CONSTRAINED UNPREDICTABLE behavior, instruction
modifications must be explicitly synchronized before they are executed. The required synchronization is as follows:
1.

No PE must be executing an instruction when another PE is modifying that instruction.

2.

To ensure that the modified instructions are observable, a PE that is writing the instructions must issue the
following sequence of instructions and operations:
; Coherency example for data and instruction accesses within the same Inner Shareable domain.
; Enter this code with  containing a new 32-bit instruction,
; to be held in Cacheable space at a location pointed to by Xn.
STR Wt, [Xn]
DC CVAU, Xn
; Clean data cache by VA to point of unification (PoU)
DSB ISH
; Ensure visibility of the data cleaned from cache
IC IVAU, Xn
; Invalidate instruction cache by VA to PoU
DSB ISH
; Ensure completion of the invalidations

Note

B2-90

•

The DC CVAU operation is not required if the area of memory is either Non-cacheable or Write-Through
Cacheable.

•

If the contents of physical memory differ between the mappings, changing the mapping of VAs to PAs
can cause the instructions to be concurrently modified by one PE and executed by another PE. If the
modifications affect instructions other than those listed as being acceptable for modification,
synchronization must be used to avoid UNPREDICTABLE or CONSTRAINED UNPREDICTABLE behavior.

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

B2 The AArch64 Application Level Memory Model
B2.2 Atomicity in the ARM architecture

3.

In a multiprocessor system, the IC IVAU is broadcast to all PEs within the Inner Shareable domain of the PE
running this sequence. However, when the modified instructions are observable, each PE that is executing
the modified instructions must issue the following instruction to ensure execution of the modified
instructions:
ISB

; Synchronize fetched instruction stream

For more information about the required synchronization operation, see Synchronization and coherency issues
between data and instruction accesses on page B2-107.

Note
For information about memory accesses caused by instruction fetches, see Ordering relations on page B2-95.

B2.2.6

Possible implementation restrictions on using atomic instructions
In some implementations, and for some memory types, the properties of atomicity can be met only by functionality
outside the PE. Some system implementations might not support atomic instructions for all regions of the memory.
In particular, this can apply to:
•

Any type of memory in the system that does not support hardware cache coherency.

•

Device, Non-cacheable memory, or memory that is treated as Non-cacheable, in an implementation that does
support hardware cache coherency.

In such implementations, it is defined by the system:
•

Whether the atomic instructions are atomic in regard to other agents that access memory.

•

If the atomic instructions are atomic in regard to other agents that access memory, which address ranges or
memory types this applies to.

An implementation can choose which memory type is treated as Non-cacheable.
The memory types for which it is architecturally guaranteed that the atomic instructions will be atomic are:
•

Inner Shareable, Inner Write-Back, Outer Write-Back Normal memory with Read allocation hints and Write
allocation hints and not transient.

•

Outer Shareable, Inner Write-Back, Outer Write-Back Normal memory with Read allocation hints and Write
allocation hints and not transient.

If the atomic instructions are not atomic in regard to other agents that access memory, then performing an atomic
instruction to such a location can have one or more of the following effects:
•

The instruction generates a synchronous External abort.

•

The instruction generates a System Error interrupt.

•

The instruction generates an IMPLEMENTATION DEFINED MMU fault reported using the Data Abort Fault
status code of ESR_ELx.DFSC = 110101.
For the Non-secure EL1&0 translation regime, if the atomic instruction is not supported because of the
memory type that is defined in the first stage of translation, or the second stage of translation is not enabled,
then this exception is a first stage abort and is taken to EL1. Otherwise, the exception is a second stage abort
and is taken to EL2.

ARM DDI 0487C.a
ID121917

•

The instruction is treated as a NOP.

•

The instructions are performed, but there is no guarantee that the memory accesses were performed
atomically in regard to other agents that access memory. In this case, the instruction might also generate a
System Error interrupt.

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

B2-91

B2 The AArch64 Application Level Memory Model
B2.3 Definition of the ARMv8 memory model

B2.3

Definition of the ARMv8 memory model
This section describes observation and ordering in the ARMv8 memory model. It contains the following
subsections:
•
Locations.
•
Ordering and observability on page B2-93.
•
Ordering constraints on page B2-96.
•
Completion and endpoint ordering on page B2-97.
•
Memory barriers on page B2-98.
•
Limited ordering regions on page B2-103.
For more information about endpoint ordering of memory accesses, see Reordering on page B2-122.
In the ARMv8 memory model, the Shareability memory attribute indicates the degree to which hardware must
ensure memory coherency between a set of observers, see Memory types and attributes on page B2-114.
The ARMv8 architecture defines additional memory attributes and associated behaviors, which are defined in the
system level section of this manual. See:
•
Chapter D3 The AArch64 System Level Memory Model.
•
Chapter D4 The AArch64 Virtual Memory System Architecture.
See also Mismatched memory attributes on page B2-125.

B2.3.1

Locations
The ARMv8 memory model provides a set of definitions that are used to constrain the permitted sequences of
accesses to memory. The ARMv8 memory model defines:
•

The ordering of observation of memory accesses between different observers.

•

The ordering of arrival of memory accesses arriving at an endpoint.

•

The mechanisms to control the ordering of observation of memory accesses and the arrival of memory
accesses at an endpoint.

Locations, Memory effects, and Observers
The ARMv8 memory model provides the following definition of a Location in memory:
Location
A Location refers to a single byte in memory.
As part of its execution an instruction might generate a Memory effect. Observers in the system might observe the
Memory effects of that instruction on a Location. The ARMv8 memory model provides the following definitions
of a Memory effect and an Observer:
Memory effect
The Memory effects of an instruction are the read, write, or barrier effects of that instruction. For an
instruction that accesses memory:
•

A read effect is generated for each Location that is read by the instruction.

•

A write effect is generated for each Location that is written by the instruction.

An instruction can generate both read and write effects.
The Memory effects of an instruction I1 are said to appear in program order before the Memory
effects of instruction I2 if and only if I1 occurs before I2 in program order.
For the purposes of describing the ARMv8 memory model, all read and write effects access only
Normal memory locations in a Common Shareability Domain. Where this section refers to a read,
write, or memory barrier without any qualification, then it is referring to the corresponding Memory
effect.

B2-92

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

B2 The AArch64 Application Level Memory Model
B2.3 Definition of the ARMv8 memory model

Observer
An Observer refers to either a processing element, or some other memory accessing agent that can
generate reads from or writes to memory.
Common Shareability Domain
A Common Shareability Domain for a program is the smallest Shareability domain that contains all
of the active Observers of the Memory effects generated by a program.

B2.3.2

Ordering and observability
The ARMv8 memory model permits reordering of memory accesses. This section defines the constraints placed on
the reordering of memory accesses using the following:
•

Register value dependencies to establish order between instructions on a PE.

•

Ordering constraints to establish order between accesses to a Location.

Register value dependencies
The ARMv8 memory model defines the following dependencies between instructions:
Register dependency
A Register dependency from a first data value V1 to a second data value V2 exists within a PE if and
only if either:
•

The register, excluding the AArch64 zero register (XZR or WZR), that is used to hold V1 is
used in the calculation of V2.

•

There is a Register dependency from V1 to a third data value V3 and there is a register
dependency from V3 to V2.

Register data dependency
A Register data dependency from a first data value V1 to a second data value V2 exists within a PE
if and only if either:
•

•

The register, excluding the AArch64 zero register (XZR or WZR) and the AArch32 PC, that
is used to hold V1 and is used in the calculation of V2, and the calculation between V1 and
the V2 does not consist of either:
—

A conditional branch whose condition is determined by V1.

—

A conditional selection, move, or computation whose condition is determined by V1,
where the input data values for the selection, move, or computation do not have a data
dependency on V1.

There is a Register data dependency from V1 to a third data value V3, and there is a Register
data dependency from V3 to V2.

Address dependency
An Address dependency from a read R1 to a subsequent read R2 exists if and only if there is a
Register data dependency from the data value that is returned by R1 to the address used by R2.
An Address dependency from a read R1 to a subsequent write W2 exists if and only if there is a
Register dependency from the data value that is returned by R1 to the address used by W2.
Data dependency
A Data dependency from a read R1 to a subsequent write W2 exists if and only if there is a Register
dependency from the data value returned by R1 to the data value written by W2.

ARM DDI 0487C.a
ID121917

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

B2-93

B2 The AArch64 Application Level Memory Model
B2.3 Definition of the ARMv8 memory model

Control dependency
A Control dependency from a read R1 to a subsequent instruction I2 exists if and only if there is a
Register dependency from the data value returned by R1 to the data value used in the evaluation of
a conditional branch or the determination of a synchronous exception on an instruction and I2 is only
executed as a result of one of the possible outcomes of that conditional branch or synchronous
exception.

Ordering and observability at a Location
Memory effects on a Location are related by the following relations:
Reads-from
A Reads-from relation that couples reads and writes to the same Location such that each read is
paired with a single write in the program. A read R2 of a Location Reads-from a write W1 to the
same Location if and only if R2 takes its data from W1.

Note
The Reads-from relation represents a read being satisfied by a write and then returning the written
data.
Coherence order
A Coherence order relation for each Location in the program that provides a total order on all writes
from all coherent Observers to that Location, starting with a notional write of the initial value.

Note
The Coherence order of a Location represents the order in which writes to the Location arrive at
memory.
Coherence-after
A write W2 to a Location is Coherence-after another write W1 to the same Location if and only if
W2 is sequenced after W1 in the Coherence order of the Location.
A write W2 to a Location is Coherence-after a read R1 of the same location if and only if R1
Reads-from a write W3 to the same Location and W2 is Coherence-after W3.
Overlapping accesses
Two Memory effect overlap if and only if they access the same Location. Two instructions overlap
if and only if one or more of their generated Memory effects overlap.
Observed-by
A read or a write RW1 from an Observer is Observed-by a write W2 from a different Observer if and
only if W2 is coherence-after RW1.
A write W1 from an Observer is Observed-by a read R2 from a different Observer if and only if R2
Reads-from W1.

Note
The Observed-by relation only relates accesses generated by different Observers.
DMB FULL

A DMB FULL is a DMB with neither the LD or the ST qualifier.
Where this section refers to DMB without any qualification, then it is referring to all types of DMB.
Unless a specific shareability domain is defined, a DMB applies to the Common Shareability Domain.
All properties that apply to DMB also apply to the corresponding DSB.

B2-94

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

B2 The AArch64 Application Level Memory Model
B2.3 Definition of the ARMv8 memory model

Ordering relations
In addition to the ordering relations for a single Location, the ARMv8 memory model also provides ordering
relations to describe the ordering of Memory effects to multiple Locations. These are as follows:
Dependency-ordered-before
A dependency creates externally-visible order between a read and another Memory effect generated
by the same Observer. A read R1 is Dependency-ordered-before a read or write RW2 from the same
Observer if and only if R1 appears in program order before RW2 and any of the following cases
apply:
•

There is an Address dependency or a Data dependency from R1 to RW2.

•

RW2 is a write W2 and there is a Control dependency from R1 to W2.

•

RW2 is a read R2 generated by an instruction appearing in program order after an instruction
I3 that generates a Context synchronization event, and there is a Control dependency from R1
to I3.

•

RW2 is a write W2 appearing in program order after a read or a write RW3 and there is an
Address dependency from R1 to RW3.

•

RW2 is a write W2 that is Coherence-after a write W3 and there is a Control dependency or a
Data dependency from R1 to W3.

•

RW2 is a read R2 that Reads-from a write W3 and there is an Address dependency or a Data
dependency from R1 to W3.

Atomic-ordered-before
Load-Exclusive and Store-Exclusive instructions provide some ordering guarantees, even in the
absence of dependencies. A read or a write RW1 is Atomic-ordered-before a read or a write RW2
from the same Observer if and only if RW1 appears in program order before RW2 and either of the
following cases apply:
•

RW1 is a read R1 and RW2 is a write W2 such that R1 and W2 are generated by an atomic
instruction or a successful Load-Exclusive/Store-Exclusive instruction pair to the same
Location.

•

RW1 is a write W1 generated by an atomic instruction or a successful Store-Exclusive
instruction and RW2 is a read R2 generated by an instruction with Acquire or AcquirePC
semantics such that R2 Reads-from W1.

For more information, see Synchronization and semaphores on page B2-128.
Barrier-ordered-before
Barrier instructions order prior Memory effects before subsequent Memory effects generated by the
same Observer. A read or a write RW1 is Barrier-ordered-before a read or a write RW2 from the
same Observer if and only if RW1 appears in program order before RW2 and any of the following
cases apply:
•

RW1 appears in program order before a DMB FULL or an atomic instruction with both Acquire
and Release semantics that appears in program order before RW2.

•

RW1 is a write W1 generated by an instruction with Release semantics and RW2 is a read R2
generated by an instruction with Acquire semantics.

•

RW1 is a read R1 and either:

•

ARM DDI 0487C.a
ID121917

—

R1 appears in program order before a DMB LD that appears in program order before RW2.

—

R1 is generated by an instruction with Acquire or AcquirePC semantics.

RW2 is a write W2 and either:
—

RW1 is a write W1 appearing in program order before a DMB ST that appears in program
order before W2.

—

W2 is generated by an instruction with Release semantics.

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

B2-95

B2 The AArch64 Application Level Memory Model
B2.3 Definition of the ARMv8 memory model

—

RW1 appears in program order before a write W3 generated by an instruction with
Release semantics and W2 is Coherence-after W3.

Ordered-before
An arbitrary pair of Memory effects is ordered if it can be linked by a chain of ordered accesses
consistent with external observation. A read or a write RW1 is Ordered-before a read or a write RW2
if and only if any of the following cases apply:

B2.3.3

•

RW1 is Observed-by RW2.

•

RW1 is Dependency-ordered-before RW2.

•

RW1 is Atomic-ordered-before RW2.

•

RW1 is Barrier-ordered-before RW2.

•

RW1 is Ordered-before a read or a write that is Ordered-before RW2.

Ordering constraints
The ARMv8 memory model is described as being Other-multi-copy atomic. The definition of Other-multi-copy
atomic is as follows:
Other-multi-copy atomic
In an Other-multi-copy atomic system, it is required that a write from an Observer, if observed by a
different Observer, is then observed by all other Observers that access the Location coherently. It is,
however, permitted for an Observer to observe its own writes prior to making them visible to other
observers in the system.
The Other-multi-copy atomic property of the ARMv8 memory model is enforced by placing constraints on the
possible executions of a program. Those executions that meet the constraints given by the ordering model are said
to be architecturally well-formed. An implementation that is executing a program is only permitted to exhibit
behavior consistent with an architecturally well-formed execution:
Architecturally well-formed
An architecturally well-formed execution must satisfy both of the following requirements:
Internal visibility requirement
For a read or a write RW1 that appears in program order before a read or a write RW2 to
the same Location, the internal visibility requirement requires that exactly one of the
following statements is true:
•

RW2 is a write W2 that is Coherence-after RW1.

•

RW1 is a write W1 and RW2 is a read R2 such that either:

•

—

R2 Reads-from W1.

—

R2 Reads-from another write that is Coherence-after W1.

RW1 and RW2 are both reads R1 and R2 such that R1 Reads-from a write W3 and
either:
—

R2 Reads-from W3.

—

R2 Reads-from another write that is Coherence-after W3.

Note
If a Memory effect M1 from an Observer appears in program order before a Memory
effect M2 from the same Observer, then M1 will be seen to occur before M2 by that
Observer.

B2-96

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

B2 The AArch64 Application Level Memory Model
B2.3 Definition of the ARMv8 memory model

External visibility requirement
For a read or a write RW1 from an Observer that is Ordered-before a read or a write RW2
from a different Observer, the external visibility constraint requires that RW2 is not
Observed-by RW1. This means that an Architecturally well-formed execution must not
exhibit a cycle in the Ordered-before relation.

Note
If a Memory effect M1 from an Observer is Ordered-before another Memory effect M2,
from a different Observer, then M1 will be seen to occur before M2 by all Observers in
the system.

B2.3.4

Completion and endpoint ordering
Interaction between Observers in a system is not restricted to communication via shared variables in coherent
memory. For example, an Observer could configure an interrupt controller to raise an interrupt on another Observer
as a form of message passing. These interactions typically involve an additional agent, which defines the instruction
sequence that is required to establish communication links between different Observers. When these forms of
interaction are used in conjunction with shared variables, a DSB instruction can be used to enforce ordering between
them.
For all memory, the completion rules are defined as:
•

•

A read R1 to a Location is complete for a shareability domain when all of the following are true:
—

Any write to the same Location by an Observer within the shareability domain will be Coherence-after
R1.

—

Any translation table walks associated with R1 are complete for that shareability domain.

A write W1 to a Location is complete for a shareability domain when all of the following are true:
—

Any write to the same Location by an Observer within the shareability domain will be Coherence-after
W1.

—

Any read to the same Location by an Observer within the shareability domain will either Reads-from
W1 or Reads-from a write that is Coherence-after W1.

—

Any translation table walks associated with the write are complete for that shareability domain.

•

A translation table walk is complete for a shareability domain when the memory accesses, including the
updates to translation table entries, associated with the translation table walk are complete for that
shareability domain, and the TLB is updated.

•

A cache maintenance instruction is complete for a shareability domain when the memory effects of the
instruction are complete for that shareability domain, and any translation table walks that arise from the
instruction are complete for that shareability domain.

•

A TLB invalidate instruction is complete when all memory accesses using the TLB entries that have been
invalidated are complete.

The completion of any cache or TLB maintenance instruction includes its completion on all PEs that are affected
by both the instruction and the DSB operation that is required to guarantee visibility of the maintenance instruction.

Note
These completion rules mean that, for example, a cache maintenance instruction that operates by VA to the PoC
completes only after memory at the PoC has been updated.
Additionally, for Device-nGnRnE memory, a read or write of a Location in a Memory-mapped peripheral that
exhibits side-effects is complete only when the read or write both:
•
Can begin to affect the state of the Memory-mapped peripheral.
•
Can trigger all associated side-effects, whether they affect other peripheral devices, PEs, or memory.

ARM DDI 0487C.a
ID121917

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

B2-97

B2 The AArch64 Application Level Memory Model
B2.3 Definition of the ARMv8 memory model

Note
This requirement for Device-nGnRnE memory is consistent with the memory access having reached the peripheral
endpoint.

Peripherals
This section defines a Memory-mapped peripheral and the total order of reads and writes to a peripheral which is
defined as the Peripheral coherence order:
Memory-mapped peripheral
A Memory-mapped peripheral occupies a memory region of IMPLEMENTATION DEFINED size and
can be accessed using load and store instructions. Memory effects to a Memory-mapped peripheral
can have side-effects, such as causing the peripheral to perform an action. Values that are read from
addresses within a Memory-mapped peripheral might not correspond to the last data value written
to those addresses. As such, Memory effects to a Memory-mapped peripheral might not appear in
the Reads-from or Coherence order relations.
Peripheral coherence order
The Peripheral coherence order of a Memory-mapped peripheral is a total order on all reads and
writes to that peripheral.

Note
The Peripheral coherence order for a Memory-mapped peripheral signifies the order in which
accesses arrive at the endpoint.
For a read or a write RW1 and a read or a write RW2 to the same peripheral, then RW1 will appear
in the Peripheral coherence order for the peripheral before RW2 if either of the following cases
apply:
•

RW1 and RW2 are accesses using Non-cacheable or Device attributes and RW1 is
Ordered-before RW2.

•

RW1 and RW2 are accesses using Device-nGnRE or Device-nGnRnE attributes and RW1
appears in program order before RW2.

Out-of-band-ordered-before
A read or a write RW1 is Out-of-band-ordered-before a read or a write RW2 if and only if either of
the following cases apply:
•

RW1 appears in program order before a DSB instruction that begins an IMPLEMENTATION
DEFINED instruction sequence indirectly leading to the generation of RW2.

•

RW1 is Ordered-before a read or a write RW3 and RW3 is Out-of-band-ordered-before RW2.

If a Memory effect M1 is Out-of-band-ordered-before a read or a write M2, then M1 is seen to occur
before M2 by all Observers.

B2.3.5

Memory barriers
Memory barrier is the general term applied to an instruction, or sequence of instructions, that forces synchronization
events by a PE with respect to retiring Load/Store instructions. The memory barriers defined by the ARMv8
architecture provide a range of functionality, including:
•
Ordering of Load/Store instructions.
•
Completion of Load/Store instructions.
•
Context synchronization.
The following subsections describe the ARMv8 memory barrier instructions:
•
Instruction Synchronization Barrier (ISB) on page B2-99
•
Data Memory Barrier (DMB) on page B2-99.

B2-98

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

B2 The AArch64 Application Level Memory Model
B2.3 Definition of the ARMv8 memory model

•
•
•
•

Data Synchronization Barrier (DSB) on page B2-100.
Shareability and access limitations on the data barrier operations on page B2-100.
Load-Acquire, Load-AcquirePC, and Store-Release on page B2-101.
LoadLOAcquire, StoreLORelease on page B2-102.

Note
Depending on the required synchronization, a program might use memory barriers on their own, or it might use them
in conjunction with cache maintenance and memory management instructions that in general are only available
when software execution is at EL1 or higher.
DMB and DSB instructions affect reads and writes to the memory system generated by Load/Store instructions and data
or unified cache maintenance instructions being executed by the PE. Instruction fetches or accesses caused by a
hardware translation table access are not explicit accesses.

Instruction Synchronization Barrier (ISB)
An ISB instruction ensures that all instructions that come after the ISB instruction in program order are fetched from
the cache or memory after the ISB instruction has completed. Using an ISB ensures that the effects of
context-changing operations executed before the ISB are visible to the instructions fetched after the ISB instruction.
Examples of context-changing operations that require the insertion of an ISB instruction to ensure the effects of the
operation are visible to instructions fetched after the ISB instruction are:
•
Completed cache and TLB maintenance instructions.
•
Changes to System registers.
Any context-changing operations appearing in program order after the ISB instruction only take effect after the ISB
has been executed.
The pseudocode function for the operation of an ISB is InstructionSynchronizationBarrier().
See also Memory barriers on page D3-2079.

Data Memory Barrier (DMB)
The DMB instruction is a memory barrier instruction that ensures the relative order of memory accesses before the
barrier with memory accesses after the barrier. The DMB instruction does not ensure the completion of any of the
memory accesses for which it ensures relative order.
The full definition of the DMB is covered formally in the Definition of the ARMv8 memory model on page B2-92 and
this introduction to the DMB instruction is not intended to contradict that section.
The basic principle of a DMB instruction is to introduce order between memory accesses that are specified to be
affected by the DMB options supplied as arguments to the DMB instruction. The DMB instruction ensures that all affected
memory accesses by the PE executing the DMB that appear in program order before the DMB and those which originate
from a different PE, to the extent required by the DMB options, which have been Observed-by the PE before the DMB
is executed, are Observed-by each PE, to the extent required by the DMB options, before any affected memory
accesses that appear in program order after the DMB are Observed-by that PE.
The use of a DMB creates order between the Memory effects of instructions as described in the definition of
Barrier-ordered-before.
DMB only affects memory accesses and the operation of data cache and unified cache maintenance instructions, see

A64 Cache maintenance instructions on page D3-2062. It has no effect on the ordering of any other instructions
executing on the PE. A DMB instruction intended to ensure the completion of cache maintenance instructions must
have an access type of both loads and stores.
The pseudocode function for the operation of a DMB is DataMemoryBarrier().

ARM DDI 0487C.a
ID121917

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

B2-99

B2 The AArch64 Application Level Memory Model
B2.3 Definition of the ARMv8 memory model

Data Synchronization Barrier (DSB)
A DSB is a memory barrier that ensures that memory accesses that occur before the DSB have completed before the
completion of the DSB instruction. In doing this, it acts as a stronger barrier than a DMB and all ordering that is created
by a DMB with specific options is also generated by a DSB with the same options.
Execution of a DSB:
•

At EL2 ensures that any memory accesses caused by Speculative translation table walks from the Non-secure
EL1&0 translation regime have been observed.

•

At EL3 ensures that any memory accesses caused by speculative translation table walks from any of the
following translation regimes have been observed:
—
The EL2 or EL2&0 translation regime.
—
The Secure EL1&0 translation regime.
—
The Non-secure EL1&0 translation regime.

For more information, see Use of out-of-context translation regimes on page D4-2103.
A DSB completes when all of the following apply:
•

All explicit memory accesses that are observed by PEe before the DSB is executed and are of the required
access types, and are from observers in the same required shareability domain as PEe, are complete for the
set of observers in the required shareability domain.

•

If the required access types of the DSB is reads and writes, then all cache maintenance instructions and all TLB
maintenance instructions issued by PEe before the DSB are complete for the required shareability domain.

In addition, no instruction that appears in program order after the DSB instruction can alter any state of the system or
perform any part of its functionality other than:
•

Being fetched from memory and decoded.

•

Reading the general-purpose, SIMD and floating-point, Special-purpose, or System registers that are directly
or indirectly read without causing side-effects until the DSB completes.

The pseudocode function for the operation of a DSB is DataSynchronizationBarrier().
See also Memory barriers on page D3-2079.

Shareability and access limitations on the data barrier operations
The DMB and DSB instructions take an argument that specifies:
•
The shareability domain over which the instruction must operate. This is one of:
—
Full system.
—
Outer Shareable.
—
Inner Shareable.
—
Non-shareable.
•
The accesses for which the instruction operates. This is one of:
—
Read and write accesses, both before and after the barrier instruction.
—
Write accesses only, before and after the barrier instruction.
—
Read accesses before the barrier instruction, and read and write accesses after the barrier instruction.

Note
This form of a DMB or DSB instruction can be described as a Load-Load/Store barrier.
For more information on whether an access is before or after a barrier instruction, see Data Memory Barrier (DMB)
on page B2-99 or Data Synchronization Barrier (DSB).

B2-100

Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Non-Confidential

ARM DDI 0487C.a
ID121917

B2 The AArch64 Application Level Memory Model
B2.3 Definition of the ARMv8 memory model

Table B2-1 shows how these options are encoded in the 

Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.7
Linearized                      : No
Page Mode                       : UseOutlines
XMP Toolkit                     : Adobe XMP Core 4.0-c321 44.398116, Tue Aug 04 2009 14:24:39
Creator Tool                    : FrameMaker 8.0
Modify Date                     : 2017:12:19 12:35Z
Create Date                     : 2017:12:19 12:35Z
Copyright                       : Copyright © 2013-2017 ARM Limited or its affiliates. All rights reserved.
Producer                        : Acrobat Distiller 8.3.1 (Windows)
Format                          : application/pdf
Title                           : ARM Architecture Reference Manual ARMv8, for ARMv8-A architecture profile
Creator                         : ARM Limited
Description                     : Defines the ARMv8 architecture for the ARMv8-A architecture profile including the A32 (ARM), T32 (Thumb), and A64 instruction sets. The A (Application) profile defines a Virtual Memory System Architecture (VMSA) with support for the AArch64 and AArch32 Execution states, Virtualization and TrustZone Security. Includes the Debug Architecture, the GIC CPU interface, and the Generic Timer and Performance Monitors Extensions.
Document ID                     : uuid:47cbc270-8c58-46de-b308-21e1a22bb0e7
Instance ID                     : uuid:395303e0-8ec7-4ca2-9a51-da8949a4d620
Page Count                      : 6666
Subject                         : Defines the ARMv8 architecture for the ARMv8-A architecture profile including the A32 (ARM), T32 (Thumb), and A64 instruction sets. The A (Application) profile defines a Virtual Memory System Architecture (VMSA) with support for the AArch64 and AArch32 Execution states, Virtualization and TrustZone Security. Includes the Debug Architecture, the GIC CPU interface, and the Generic Timer and Performance Monitors Extensions.
Author                          : ARM Limited
Keywords                        : ARMv8, Cortex-A50, Cortex-A70, Cortex-A53, Cortex-A57, Cortes-A72, NEON, SecurCore, TrustZone
EXIF Metadata provided by EXIF.tools

Navigation menu