ARM Architecture Reference Manual ARMv8, For ARMv8 A Profile ARM® Ation
User Manual:
Open the PDF directly: View PDF .
Page Count: 7476
ARM Architecture Reference Manual
®
ARMv8, for ARMv8-A architecture profile
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
ARM DDI 0487D.a (ID103018)
ARM Architecture Reference Manual
ARMv8, for ARMv8-A architecture profile
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Release Information
The following releases of this document have been made.
Release history
Date
Issue
Confidentiality
Change
30 April 2013
A.a-1
Confidential-Beta Draft
Beta draft of first issue, limited circulation.
12 June 2013
A.a-2
Confidential-Beta Draft
Second beta draft of first issue, limited circulation.
04 September 2013
A.a
Non-Confidential Beta
Beta release.
24 December 2013
A.b
Non-Confidential Beta
Second beta release.
18 July 2014
A.c
Non-Confidential Beta
Third beta release.
09 October 2014
A.d
Non-Confidential Beta
Fourth beta release.
17 December 2014
A.e
Non-Confidential Beta
Fifth beta release.
25 March 2015
A.f
Non-Confidential Beta
Sixth beta release.
10 July 2015
A.g
Non-Confidential Beta
Seventh beta release.
30 September 2015
A.h
Non-Confidential Beta
Eighth beta release.
28 January 2016
A.i
Non-Confidential Beta
Ninth beta release.
03 June 2016
A.j
Non-Confidential EAC
EAC release.
30 September 2016
A.k
Non-Confidential v8.0 EAC
Updated EAC release.
31 March 2017
B.a
Non-Confidential v8.1 EAC, v8.2 Beta
Initial release incorporating ARMv8.1 and ARMv8.2.
26 September 2017
B.b
Non-Confidential v8.2 EAC
Initial v8.2 EAC release, incorporating SPE.
20 December 2017
C.a
Non-Confidential v8.3 EAC
Initial v8.3 EAC release.
31 October 2018
D.a
Non-Confidential v8.4 EAC
Initial v8.4 EAC release.
Proprietary Notice
This document is protected by copyright and other related rights and the practice or implementation of the information contained
in this document may be protected by one or more patents or pending patent applications. No part of this document may be
reproduced in any form by any means without the express prior written permission of Arm. No license, express or implied, by
estoppel or otherwise to any intellectual property rights is granted by this document unless specifically stated.
Your access to the information in this document is conditional upon your acceptance that you will not use or permit others to use
the information for the purposes of determining whether implementations infringe any third party patents.
THIS DOCUMENT IS PROVIDED “AS IS”. ARM PROVIDES NO REPRESENTATIONS AND NO WARRANTIES,
EXPRESS, IMPLIED OR STATUTORY, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
MERCHANTABILITY, SATISFACTORY QUALITY, NON-INFRINGEMENT OR FITNESS FOR A PARTICULAR
PURPOSE WITH RESPECT TO THE DOCUMENT. For the avoidance of doubt, Arm makes no representation with respect to,
and has undertaken no analysis to identify or understand the scope and content of, patents, copyrights, trade secrets, or other rights.
This document may include technical inaccuracies or typographical errors.
TO THE EXTENT NOT PROHIBITED BY LAW, IN NO EVENT WILL ARM BE LIABLE FOR ANY DAMAGES,
INCLUDING WITHOUT LIMITATION ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, PUNITIVE, OR
CONSEQUENTIAL DAMAGES, HOWEVER CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY, ARISING
OUT OF ANY USE OF THIS DOCUMENT, EVEN IF ARM HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH
DAMAGES.
ii
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
This document consists solely of commercial items. You shall be responsible for ensuring that any use, duplication or disclosure
of this document complies fully with any relevant export laws and regulations to assure that this document or any portion thereof
is not exported, directly or indirectly, in violation of such export laws. Use of the word “partner” in reference to Arm’s customers
is not intended to create or refer to any partnership relationship with any other company. Arm may make changes to this document
at any time and without notice.
If any of the provisions contained in these terms conflict with any of the provisions of any click through or signed written
agreement covering this document with Arm, then the click through or signed written agreement prevails over and supersedes the
conflicting provisions of these terms. This document may be translated into other languages for convenience, and you agree that
if there is any conflict between the English version of this document and any translation, the terms of the English version of the
Agreement shall prevail.
The Arm corporate logo and words marked with ® or ™ are registered trademarks or trademarks of Arm Limited (or its subsidiaries)
in the US and/or elsewhere. All rights reserved. Other brands and names mentioned in this document may be the trademarks of
their respective owners. You must follow the Arm’s trademark usage guidelines
http://www.arm.com/company/policies/trademarks.
Copyright © 2013-2018 Arm Limited (or its affiliates). All rights reserved.
Arm Limited. Company 02557590 registered in England.
110 Fulbourn Road, Cambridge, England CB1 9NJ.
LES-PRE-20349
In this document, where the term ARM is used to refer to the company it means “Arm or any of its subsidiaries as appropriate”.
Note
•
The term ARM can refer to versions of the ARM architecture, for example ARMv8 refers to version 8 of the ARM
architecture. The context makes it clear when the term is used in this way.
•
This document describes only the ARMv8-A architecture profile. For the behaviors required by the previous version of
this architecture profile, ARMv7-A, see the ARM® Architecture Reference Manual, ARMv7-A and ARMv7-R edition.
Confidentiality Status
This document is Non-Confidential. The right to use, copy and disclose this document may be subject to license restrictions in
accordance with the terms of the agreement entered into by Arm and the party that Arm delivered this document to.
Product Status
The information in this document is final, that is for a developed product.
Web Address
http://www.arm.com
Limitations of this issue
This issue of the ARMv8 Architecture Reference Manual contains many improvements and corrections. Validation of this
document has identified the following issues that ARM will address in future issues:
ARM DDI 0487D.a
ID103018
•
PE state on reset to AArch64 state on page D1-2167 and PE state on reset into AArch32 state on page G1-5297 require
further update. Since the reset information is present in the register descriptions, this does not affect the quality status of
the release.
•
ARMv8.4-NV, Nested Virtualization; although the descriptions of the effects on accessibility tables and traps caused by
this feature are correct technically, it is recognised that they are very difficult to read. This usability issue will be addressed
in a future release.
•
Appendix K12 ARM Pseudocode Definition requires further review and update. Since this appendix is informative, rather
than being part of the architecture specification, this does not affect the quality status of this release.
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
iii
iv
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
Contents
ARM Architecture Reference Manual ARMv8, for
ARMv8-A architecture profile
Preface
About this manual ..................................................................................................... xvi
Using this manual .................................................................................................... xviii
Conventions ............................................................................................................ xxiv
Additional reading ................................................................................................... xxvi
Feedback .............................................................................................................. xxviii
Part A
Chapter A1
ARMv8 Architecture Introduction and Overview
Introduction to the ARMv8 Architecture
A1.1
A1.2
A1.3
A1.4
A1.5
A1.6
A1.7
Part B
Chapter B1
A1-32
A1-34
A1-36
A1-39
A1-49
A1-55
A1-56
The AArch64 Application Level Architecture
The AArch64 Application Level Programmers’ Model
B1.1
B1.2
B1.3
ARM DDI 0487D.a
ID103018
About the ARM architecture ................................................................................
Architecture profiles ............................................................................................
ARMv8 architectural concepts ............................................................................
Supported data types ..........................................................................................
Advanced SIMD and floating-point support .........................................................
The ARM memory model ....................................................................................
ARMv8 architecture extensions ..........................................................................
About the Application level programmers’ model ................................................ B1-80
Registers in AArch64 Execution state ................................................................. B1-81
Software control features and EL0 ...................................................................... B1-86
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
v
Contents
Chapter B2
The AArch64 Application Level Memory Model
B2.1
B2.2
B2.3
B2.4
B2.5
B2.6
B2.7
B2.8
B2.9
Part C
Chapter C1
The AArch64 Instruction Set
The A64 Instruction Set
C1.1
C1.2
C1.3
C1.4
Chapter C2
Chapter D1
The System instruction class encoding space ..................................................
Special-purpose registers .................................................................................
A64 System instructions for cache maintenance ..............................................
A64 System instructions for address translation ...............................................
A64 System instructions for TLB maintenance .................................................
C5-338
C5-350
C5-429
C5-452
C5-479
About the A64 base instructions ....................................................................... C6-688
Alphabetical list of A64 base instructions .......................................................... C6-690
About the A64 SIMD and floating-point instructions ........................................ C7-1268
Alphabetical list of A64 Advanced SIMD and floating-point instructions ......... C7-1270
The AArch64 System Level Architecture
The AArch64 System Level Programmers’ Model
D1.1
D1.2
D1.3
D1.4
vi
A64 instruction set encoding ............................................................................. C4-232
A64 Advanced SIMD and Floating-point Instruction Descriptions
C7.1
C7.2
Part D
C3-170
C3-177
C3-193
C3-198
C3-206
A64 Base Instruction Descriptions
C6.1
C6.2
Chapter C7
Branches, Exception generating, and System instructions ...............................
Loads and stores ...............................................................................................
Data processing - immediate ............................................................................
Data processing - register .................................................................................
Data processing - SIMD and floating-point .......................................................
The A64 System Instruction Class
C5.1
C5.2
C5.3
C5.4
C5.5
Chapter C6
Understanding the A64 instruction descriptions ................................................ C2-162
General information about the A64 instruction descriptions .............................. C2-165
A64 Instruction Set Encoding
C4.1
Chapter C5
C1-150
C1-151
C1-157
C1-160
A64 Instruction Set Overview
C3.1
C3.2
C3.3
C3.4
C3.5
Chapter C4
About the A64 instruction set ............................................................................
Structure of the A64 assembler language .........................................................
Address generation ...........................................................................................
Instruction aliases .............................................................................................
About the A64 Instruction Descriptions
C2.1
C2.2
Chapter C3
About the Arm memory model ............................................................................ B2-90
Atomicity in the Arm architecture ........................................................................ B2-92
Definition of the ARMv8 memory model .............................................................. B2-97
Caches and memory hierarchy ......................................................................... B2-111
Alignment support ............................................................................................. B2-116
Endian support .................................................................................................. B2-119
Memory types and attributes ............................................................................. B2-122
Mismatched memory attributes ......................................................................... B2-132
Synchronization and semaphores ..................................................................... B2-135
Exception levels ..............................................................................................
Exception terminology .....................................................................................
Execution state ................................................................................................
Security state ..................................................................................................
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
D1-2146
D1-2147
D1-2149
D1-2150
ARM DDI 0487D.a
ID103018
Contents
D1.5
D1.6
D1.7
D1.8
D1.9
D1.10
D1.11
D1.12
D1.13
D1.14
D1.15
D1.16
D1.17
D1.18
D1.19
D1.20
D1.21
Chapter D2
About self-hosted trace ...................................................................................
Prohibited regions in self-hosted trace ............................................................
Self-hosted trace timestamps ..........................................................................
Synchronization in self-hosted trace ...............................................................
D3-2344
D3-2345
D3-2346
D3-2347
About the memory system architecture ...........................................................
Address space ................................................................................................
Mixed-endian support ......................................................................................
Cache support .................................................................................................
External aborts ................................................................................................
Memory barrier instructions .............................................................................
Pseudocode description of general memory System instructions ...................
D4-2350
D4-2351
D4-2352
D4-2353
D4-2377
D4-2379
D4-2380
The AArch64 Virtual Memory System Architecture
D5.1
D5.2
D5.3
D5.4
D5.5
D5.6
D5.7
D5.8
D5.9
ARM DDI 0487D.a
ID103018
D2-2282
D2-2286
D2-2287
D2-2289
D2-2291
D2-2292
D2-2293
D2-2294
D2-2296
D2-2314
D2-2328
D2-2329
D2-2342
The AArch64 System Level Memory Model
D4.1
D4.2
D4.3
D4.4
D4.5
D4.6
D4.7
Chapter D5
About self-hosted debug .................................................................................
The debug exception enable controls .............................................................
Routing debug exceptions ...............................................................................
Enabling debug exceptions from the current Exception level .........................
The effect of powerdown on debug exceptions ...............................................
Summary of the routing and enabling of debug exceptions ............................
Pseudocode description of debug exceptions .................................................
Breakpoint Instruction exceptions ...................................................................
Breakpoint exceptions .....................................................................................
Watchpoint exceptions ....................................................................................
Vector Catch exceptions .................................................................................
Software Step exceptions ...............................................................................
Synchronization and debug exceptions ..........................................................
AArch64 Self-hosted Trace
D3.1
D3.2
D3.3
D3.4
Chapter D4
D1-2152
D1-2155
D1-2161
D1-2164
D1-2166
D1-2170
D1-2179
D1-2183
D1-2190
D1-2198
D1-2208
D1-2254
D1-2255
D1-2260
D1-2262
D1-2263
D1-2276
AArch64 Self-hosted Debug
D2.1
D2.2
D2.3
D2.4
D2.5
D2.6
D2.7
D2.8
D2.9
D2.10
D2.11
D2.12
D2.13
Chapter D3
Virtualization ....................................................................................................
Registers for instruction processing and exception handling ..........................
Process state, PSTATE ..................................................................................
Program counter and stack pointer alignment ................................................
Reset ...............................................................................................................
Exception entry ...............................................................................................
Exception return ..............................................................................................
The Exception level hierarchy .........................................................................
Synchronous exception types, routing and priorities .......................................
Asynchronous exception types, routing, masking and priorities .....................
Configurable instruction enables and disables, and trap controls ...................
System calls ....................................................................................................
Mechanisms for entering a low-power state ....................................................
Self-hosted debug ...........................................................................................
Event monitors ................................................................................................
Interprocessing ................................................................................................
The effect of implementation choices on the programmers’ model .................
About the Virtual Memory System Architecture (VMSA) .................................
The VMSAv8-64 address translation system ..................................................
VMSAv8-64 translation table format descriptors .............................................
Memory access control ...................................................................................
Memory region attributes ................................................................................
Virtualization Host Extensions .........................................................................
Nested virtualization ........................................................................................
VMSAv8-64 memory aborts ............................................................................
Translation Lookaside Buffers (TLBs) .............................................................
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
D5-2384
D5-2392
D5-2444
D5-2456
D5-2476
D5-2486
D5-2492
D5-2499
D5-2509
vii
Contents
D5.10
D5.11
Chapter D6
The Performance Monitors Extension
D6.1
D6.2
D6.3
D6.4
D6.5
D6.6
D6.7
D6.8
D6.9
D6.10
D6.11
Chapter D7
About the Statistical Profiling Extension Sample Records .............................. D9-2620
Alphabetical list of Statistical Profiling Extension packets ............................... D9-2623
About the Generic Timer ............................................................................... D10-2646
The AArch64 view of the Generic Timer ....................................................... D10-2650
The System register encoding space ............................................................ D11-2656
op0==0b10, Moves to and from debug and trace System registers ............. D11-2657
op0==0b11, Moves to and from non-debug System registers, Special-purpose registers
D11-2659
AArch64 System Register Descriptions
D12.1
D12.2
D12.3
D12.4
D12.5
D12.6
D12.7
D12.8
viii
D8-2594
D8-2596
D8-2598
D8-2601
D8-2602
D8-2604
D8-2609
D8-2613
D8-2617
AArch64 System Register Encoding
D11.1
D11.2
D11.3
Chapter D12
About the Statistical Profiling Extension ..........................................................
Defining the sample population .......................................................................
Controlling when an operation is sampled ......................................................
Enabling profiling .............................................................................................
Filtering sample records ..................................................................................
The profiling data ............................................................................................
The Profiling Buffer .........................................................................................
Profiling Buffer management ...........................................................................
Synchronization and Statistical Profiling .........................................................
The Generic Timer in AArch64 state
D10.1
D10.2
Chapter D11
About the Activity Monitors Extension ............................................................. D7-2588
Properties and behaviour of the activity monitors ........................................... D7-2589
AMU events and event numbers ..................................................................... D7-2591
Statistical Profiling Extension Sample Record Specification
D9.1
D9.2
Chapter D10
D6-2538
D6-2540
D6-2542
D6-2544
D6-2545
D6-2547
D6-2549
D6-2550
D6-2551
D6-2553
D6-2585
The Statistical Profiling Extension
D8.1
D8.2
D8.3
D8.4
D8.5
D8.6
D8.7
D8.8
D8.9
Chapter D9
About the Performance Monitors ....................................................................
Accuracy of the Performance Monitors ...........................................................
Behavior on overflow .......................................................................................
Attributability ....................................................................................................
Effect of EL3 and EL2 .....................................................................................
Event filtering ..................................................................................................
Performance Monitors and Debug state .........................................................
Counter enables ..............................................................................................
Counter access ...............................................................................................
PMU events and event numbers .....................................................................
Performance Monitors Extension registers .....................................................
The Activity Monitors Extension
D7.1
D7.2
D7.3
Chapter D8
TLB maintenance requirements and the TLB maintenance instructions ......... D5-2515
Caches in a VMSAv8-64 implementation ........................................................ D5-2533
About the AArch64 System registers ............................................................
General system control registers ..................................................................
Debug registers .............................................................................................
Performance Monitors registers ....................................................................
Activity Monitors registers .............................................................................
Statistical Profiling Extension registers .........................................................
RAS registers ................................................................................................
Generic Timer registers .................................................................................
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
D12-2674
D12-2683
D12-3210
D12-3299
D12-3343
D12-3368
D12-3404
D12-3441
ARM DDI 0487D.a
ID103018
Contents
Part E
Chapter E1
The AArch32 Application Level Architecture
The AArch32 Application Level Programmers’ Model
E1.1
E1.2
E1.3
E1.4
E1.5
Chapter E2
Chapter F1
F1-3612
F1-3613
F1-3615
F1-3616
F1-3624
F1-3625
F1-3628
F1-3629
F1-3631
F1-3633
F1-3634
F1-3636
F1-3637
F1-3647
Format of instruction descriptions ....................................................................
Standard assembler syntax fields ....................................................................
Conditional execution .......................................................................................
Shifts applied to a register ................................................................................
Memory accesses ............................................................................................
Encoding of lists of general-purpose registers and the PC ..............................
General information about the T32 and A32 instruction descriptions ..............
Additional pseudocode support for instruction descriptions .............................
Additional information about Advanced SIMD and floating-point instructions ..
F2-3650
F2-3654
F2-3655
F2-3657
F2-3659
F2-3660
F2-3661
F2-3674
F2-3675
T32 Instruction Set Encoding
T32 instruction set encoding ............................................................................ F3-3682
About the T32 Advanced SIMD and floating-point instructions and their encoding .......
F3-3748
A32 Instruction Set Encoding
F4.1
ARM DDI 0487D.a
ID103018
Support for instructions in different versions of the ARM architecture .............
Unified Assembler Language ...........................................................................
Branch instructions ...........................................................................................
Data-processing instructions ............................................................................
PSTATE and banked register access instructions ...........................................
Load/store instructions .....................................................................................
Load/store multiple instructions ........................................................................
Miscellaneous instructions ...............................................................................
Exception-generating and exception-handling instructions ..............................
System register access instructions .................................................................
Advanced SIMD and floating-point load/store instructions ...............................
Advanced SIMD and floating-point register transfer instructions .....................
Advanced SIMD data-processing instructions .................................................
Floating-point data-processing instructions ......................................................
About the T32 and A32 Instruction Descriptions
F3.1
F3.2
Chapter F4
E2-3556
E2-3558
E2-3562
E2-3575
E2-3580
E2-3582
E2-3586
E2-3596
E2-3599
The AArch32 Instruction Sets Overview
F2.1
F2.2
F2.3
F2.4
F2.5
F2.6
F2.7
F2.8
F2.9
Chapter F3
About the ARM memory model .......................................................................
Atomicity in the ARM architecture ...................................................................
Definition of the ARMv8 memory model ..........................................................
Caches and memory hierarchy .......................................................................
Alignment support ...........................................................................................
Endian support ................................................................................................
Memory types and attributes ...........................................................................
Mismatched memory attributes .......................................................................
Synchronization and semaphores ...................................................................
The AArch32 Instruction Sets
F1.1
F1.2
F1.3
F1.4
F1.5
F1.6
F1.7
F1.8
F1.9
F1.10
F1.11
F1.12
F1.13
F1.14
Chapter F2
E1-3530
E1-3531
E1-3542
E1-3553
E1-3554
The AArch32 Application Level Memory Model
E2.1
E2.2
E2.3
E2.4
E2.5
E2.6
E2.7
E2.8
E2.9
Part F
About the Application level programmers’ model ............................................
The Application level programmers’ model in AArch32 state ..........................
Advanced SIMD and floating-point instructions ..............................................
About the AArch32 System register interface .................................................
Exceptions ......................................................................................................
A32 instruction set encoding ............................................................................ F4-3750
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ix
Contents
F4.2
Chapter F5
T32 and A32 Base Instruction Set Instruction Descriptions
F5.1
F5.2
Chapter F6
Chapter G1
The AArch32 System Level Programmers’ Model
G1.10
G1.11
G1.12
G1.13
G1.14
G1.15
G1.16
G1.17
G1.18
G1.19
G1.20
G1.21
G2.5
G2.6
G2.7
G2.8
G2.9
G2.10
G2.11
G2.12
About self-hosted debug ................................................................................. G2-5350
The debug exception enable controls ............................................................. G2-5354
Routing debug exceptions ............................................................................... G2-5355
Enabling debug exceptions from the current Privilege level and Security state .............
G2-5357
The effect of powerdown on debug exceptions ............................................... G2-5359
Summary of permitted routing and enabling of debug exceptions .................. G2-5360
Pseudocode description of debug exceptions ................................................. G2-5362
Breakpoint Instruction exceptions ................................................................... G2-5363
Breakpoint exceptions ..................................................................................... G2-5366
Watchpoint exceptions .................................................................................... G2-5391
Vector Catch exceptions ................................................................................. G2-5405
Synchronization and debug exceptions .......................................................... G2-5412
AArch32 Self-hosted Trace
G3.1
G3.2
G3.3
G3.4
x
About the AArch32 System level programmers’ model ................................... G1-5208
Exception levels .............................................................................................. G1-5209
Exception terminology ..................................................................................... G1-5210
Execution state ................................................................................................ G1-5212
Instruction Set state ........................................................................................ G1-5214
Security state .................................................................................................. G1-5215
Security state, Exception levels, and AArch32 execution privilege ................. G1-5218
Virtualization .................................................................................................... G1-5220
AArch32 state PE modes, and general-purpose and Special-purpose registers ...........
G1-5222
Process state, PSTATE .................................................................................. G1-5231
Instruction set states ....................................................................................... G1-5237
Handling exceptions that are taken to an Exception level using AArch32 ...... G1-5239
Routing of aborts taken to AArch32 state ....................................................... G1-5258
Exception return to an Exception level using AArch32 ................................... G1-5261
Asynchronous exception behavior for exceptions taken from AArch32 state . G1-5266
AArch32 state exception descriptions ............................................................. G1-5274
Reset into AArch32 state ................................................................................ G1-5296
Mechanisms for entering a low-power state .................................................... G1-5300
The AArch32 System register interface .......................................................... G1-5305
Advanced SIMD and floating-point support ..................................................... G1-5308
Configurable instruction enables and disables, and trap controls ................... G1-5314
AArch32 Self-hosted Debug
G2.1
G2.2
G2.3
G2.4
Chapter G3
Alphabetical list of Advanced SIMD and floating-point instructions ................. F6-4520
The AArch32 System Level Architecture
G1.1
G1.2
G1.3
G1.4
G1.5
G1.6
G1.7
G1.8
G1.9
Chapter G2
Alphabetical list of T32 and A32 base instruction set instructions ................... F5-3810
Encoding and use of banked register transfer instructions .............................. F5-4514
T32 and A32 Advanced SIMD and Floating-point Instruction Descriptions
F6.1
Part G
About the A32 Advanced SIMD and floating-point instructions and their encoding .......
F4-3808
About self-hosted trace ...................................................................................
Prohibited regions in self-hosted trace ............................................................
Self-hosted trace timestamps ..........................................................................
Synchronization in self-hosted trace ...............................................................
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
G3-5416
G3-5417
G3-5418
G3-5419
ARM DDI 0487D.a
ID103018
Contents
Chapter G4
The AArch32 System Level Memory Model
G4.1
G4.2
G4.3
G4.4
G4.5
G4.6
G4.7
G4.8
Chapter G5
G7.3
Chapter H1
About the AArch32 System registers ..............................................................
General system control registers ....................................................................
Debug registers ...............................................................................................
Performance Monitors registers ......................................................................
Activity Monitors registers ...............................................................................
RAS registers ..................................................................................................
Generic Timer registers ...................................................................................
G8-5628
G8-5643
G8-6130
G8-6231
G8-6283
G8-6311
G8-6356
About External Debug
Introduction to external debug ......................................................................... H1-6412
External debug ................................................................................................ H1-6413
Required debug authentication ....................................................................... H1-6414
Debug State
H2.1
ARM DDI 0487D.a
ID103018
The AArch32 System register encoding space ............................................... G7-5606
VMSAv8-32 organization of registers in the (coproc==0b1110) encoding space ..........
G7-5607
VMSAv8-32 organization of registers in the (coproc==0b1111) encoding space ..........
G7-5610
External Debug
H1.1
H1.2
H1.3
Chapter H2
About the Generic Timer in AArch32 state ...................................................... G6-5594
The AArch32 view of the Generic Timer ......................................................... G6-5598
AArch32 System Register Descriptions
G8.1
G8.2
G8.3
G8.4
G8.5
G8.6
G8.7
Part H
G5-5456
G5-5464
G5-5468
G5-5473
G5-5482
G5-5502
G5-5513
G5-5525
G5-5529
G5-5543
G5-5546
G5-5558
G5-5577
G5-5584
G5-5586
G5-5591
AArch32 System Register Encoding
G7.1
G7.2
Chapter G8
About VMSAv8-32 ...........................................................................................
The effects of disabling address translation stages on VMSAv8-32 behavior
Translation tables ............................................................................................
The VMSAv8-32 Short-descriptor translation table format ..............................
The VMSAv8-32 Long-descriptor translation table format ..............................
Memory access control ...................................................................................
Memory region attributes ................................................................................
Translation Lookaside Buffers (TLBs) .............................................................
TLB maintenance requirements ......................................................................
Caches in VMSAv8-32 ....................................................................................
VMSAv8-32 memory aborts ............................................................................
Exception reporting in a VMSAv8-32 implementation .....................................
Address translation instructions ......................................................................
Pseudocode description of VMSAv8-32 memory system operations .............
About the System registers for VMSAv8-32 ....................................................
Functional grouping of VMSAv8-32 System registers .....................................
The Generic Timer in AArch32 state
G6.1
G6.2
Chapter G7
G4-5422
G4-5423
G4-5424
G4-5425
G4-5448
G4-5449
G4-5451
G4-5452
The AArch32 Virtual Memory System Architecture
G5.1
G5.2
G5.3
G5.4
G5.5
G5.6
G5.7
G5.8
G5.9
G5.10
G5.11
G5.12
G5.13
G5.14
G5.15
G5.16
Chapter G6
About the memory system architecture ...........................................................
Address space ................................................................................................
Mixed-endian support ......................................................................................
AArch32 cache and branch predictor support .................................................
System register support for IMPLEMENTATION DEFINED memory features
External aborts ................................................................................................
Memory barrier instructions .............................................................................
Pseudocode description of general memory System instructions ...................
About Debug state .......................................................................................... H2-6416
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
xi
Contents
H2.2
H2.3
H2.4
H2.5
Chapter H3
H5-6502
H5-6504
H5-6508
H5-6509
H5-6513
H5-6514
About Debug over powerdown ........................................................................
Power domains and debug .............................................................................
Core power domain power states ...................................................................
Emulating low-power states ............................................................................
Debug OS Save and Restore sequences .......................................................
Reset and debug .............................................................................................
H6-6518
H6-6519
H6-6520
H6-6523
H6-6525
H6-6529
About the PC Sample-based Profiling Extension ............................................ H7-6532
Relationship between external debug and System registers ..........................
Endianness and supported access sizes ........................................................
Synchronization of changes to the external debug registers ..........................
Memory-mapped accesses to the external debug interface ...........................
External debug interface register access permissions ....................................
External debug interface registers ..................................................................
Cross-trigger interface registers .....................................................................
External debug register resets ........................................................................
H8-6536
H8-6537
H8-6538
H8-6542
H8-6545
H8-6549
H8-6554
H8-6556
External Debug Register Descriptions
H9.1
H9.2
H9.3
xii
About the Embedded Cross-Trigger (ECT) .....................................................
Basic operation on the ECT ............................................................................
Cross-triggers on a PE in an ARMv8 implementation .....................................
Description and allocation of CTI triggers .......................................................
CTI registers programmers’ model ..................................................................
Examples ........................................................................................................
About the External Debug Registers
H8.1
H8.2
H8.3
H8.4
H8.5
H8.6
H8.7
H8.8
Chapter H9
H4-6480
H4-6481
H4-6484
H4-6488
H4-6492
H4-6498
H4-6499
The PC Sample-based Profiling Extension
H7.1
Chapter H8
Introduction .....................................................................................................
DCC and ITR registers ....................................................................................
DCC and ITR access modes ...........................................................................
Flow control of the DCC and ITR registers .....................................................
Synchronization of DCC and ITR accesses ....................................................
Interrupt-driven use of the DCC ......................................................................
Pseudocode description of the operation of the DCC and ITR registers ........
Debug Reset and Powerdown Support
H6.1
H6.2
H6.3
H6.4
H6.5
H6.6
Chapter H7
H3-6456
H3-6458
H3-6468
H3-6469
H3-6473
H3-6474
H3-6475
H3-6476
H3-6477
The Embedded Cross-Trigger Interface
H5.1
H5.2
H5.3
H5.4
H5.5
H5.6
Chapter H6
Introduction to Halting debug events ...............................................................
Halting Step debug events ..............................................................................
Halt Instruction debug event ...........................................................................
Exception Catch debug event .........................................................................
External Debug Request debug event ............................................................
OS Unlock Catch debug event ........................................................................
Reset Catch debug events ..............................................................................
Software Access debug event .........................................................................
Synchronization and Halting debug events .....................................................
The Debug Communication Channel and Instruction Transfer Register
H4.1
H4.2
H4.3
H4.4
H4.5
H4.6
H4.7
Chapter H5
H2-6417
H2-6424
H2-6427
H2-6452
Halting Debug Events
H3.1
H3.2
H3.3
H3.4
H3.5
H3.6
H3.7
H3.8
H3.9
Chapter H4
Halting the PE on debug events ......................................................................
Entering Debug state ......................................................................................
Behavior in Debug state ..................................................................................
Exiting Debug state .........................................................................................
About the debug registers ............................................................................... H9-6560
External debug registers ................................................................................. H9-6561
Cross-Trigger Interface registers .................................................................... H9-6666
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
Contents
Part I
Chapter I1
Memory-mapped Components of the ARMv8 Architecture
Requirements for Memory-mapped Components
I1.1
I1.2
I1.3
Chapter I2
System Level Implementation of the Generic Timer
I2.1
I2.2
I2.3
Chapter I3
Chapter J1
Appendix K1
Architectural Constraints on UNPREDICTABLE behaviors
About the recommended external debug interface .........................................
PMUEVENT bus .............................................................................................
Recommended authentication interface ..........................................................
Management registers and CoreSight compliance .........................................
K2-7234
K2-7238
K2-7239
K2-7241
Recommendations for Performance Monitors Event Numbers for
IMPLEMENTATION DEFINED Events
ARM recommendations for IMPLEMENTATION DEFINED event numbers ... K3-7252
Summary of events for exceptions taken to an Exception level using AArch64 ............
K3-7267
Recommendations for reporting memory attributes on an interconnect
K4.1
ARM DDI 0487D.a
ID103018
AArch32 CONSTRAINED UNPREDICTABLE behaviors ............................... K1-7194
AArch64 CONSTRAINED UNPREDICTABLE behaviors ............................... K1-7218
Recommended External Debug Interface
K3.1
K3.2
Appendix K4
Pseudocode for AArch64 operations ............................................................... J1-6902
Pseudocode for AArch32 operation ................................................................. J1-7008
Shared pseudocode ......................................................................................... J1-7086
Appendixes
K2.1
K2.2
K2.3
K2.4
Appendix K3
I5-6740
I5-6742
I5-6745
I5-6816
I5-6818
I5-6854
I5-6855
ARMv8 Pseudocode
K1.1
K1.2
Appendix K2
About the external system control register descriptions ....................................
External Performance Monitors registers summary ..........................................
Performance Monitors external register descriptions ........................................
External Activity Monitors Extension registers summary ..................................
Activity Monitors external register descriptions .................................................
Generic Timer memory-mapped registers overview .........................................
Generic Timer memory-mapped register descriptions ......................................
Architectural Pseudocode
J1.1
J1.2
J1.3
Part K
About the external interface to the Activity Monitors Extension registers ......... I4-6738
External System Control Register Descriptions
I5.1
I5.2
I5.3
I5.4
I5.5
I5.6
I5.7
Part J
About the external interface to the Performance Monitors registers ................. I3-6732
Recommended External Interface to the Activity Monitors
I4.1
Chapter I5
About the Generic Timer specification .............................................................. I2-6720
Memory-mapped counter module ..................................................................... I2-6722
Memory-mapped timer components ................................................................. I2-6726
Recommended External Interface to the Performance Monitors
I3.1
Chapter I4
Supported access sizes .................................................................................... I1-6714
Synchronization of memory-mapped registers .................................................. I1-6716
Access requirements for reserved and unallocated registers ........................... I1-6718
ARM recommendations for reporting memory attributes on an interconnect .. K4-7270
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
xiii
Contents
Appendix K5
Additional Information for Implementations of the Generic Timer
K5.1
K5.2
Appendix K6
Legacy Instruction Syntax for AArch32 Instruction Sets
K6.1
Appendix K7
Introduction ...................................................................................................
Load-Acquire, Store-Release and barriers ....................................................
Load-Acquire Exclusive, Store-Release Exclusive and barriers ...................
Using a mailbox to send an interrupt .............................................................
Cache and TLB maintenance instructions and barriers ................................
ARMv7 compatible approaches for ordering, using DMB and DSB barriers .
K11-7326
K11-7329
K11-7333
K11-7338
K11-7339
K11-7351
ARM Pseudocode Definition
K12.1
K12.2
K12.3
K12.4
K12.5
K12.6
K12.7
K12.8
Appendix K13
Use of the Advanced SIMD complex number instructions ............................ K10-7316
Use of the ARMv8.2 extensions to the Cryptographic Extension .................. K10-7318
Barrier Litmus Tests
K11.1
K11.2
K11.3
K11.4
K11.5
K11.6
Appendix K12
Using memory access mode in AArch64 state ............................................... K9-7312
Software Usage Examples
K10.1
K10.2
Appendix K11
Save Debug registers ...................................................................................... K8-7306
Restore Debug registers ................................................................................. K8-7308
Recommended Upload and Download Processes for External Debug
K9.1
Appendix K10
AArch64 Address translation examples .......................................................... K7-7284
AArch32 Address translation examples .......................................................... K7-7296
Example OS Save and Restore Sequences
K8.1
K8.2
Appendix K9
Legacy Instruction Syntax ............................................................................... K6-7276
Address translation examples
K7.1
K7.2
Appendix K8
Providing a complete set of features in a system level implementation .......... K5-7272
Gray-count scheme for timer distribution scheme ........................................... K5-7274
About the ARM pseudocode .........................................................................
Pseudocode for instruction descriptions .......................................................
Data types .....................................................................................................
Operators ......................................................................................................
Statements and control structures ................................................................
Built-in functions ............................................................................................
Miscellaneous helper procedures and functions ...........................................
ARM pseudocode definition index .................................................................
K12-7366
K12-7367
K12-7369
K12-7374
K12-7380
K12-7385
K12-7388
K12-7390
Registers Index
K13.1
K13.2
K13.3
K13.4
K13.5
K13.6
K13.7
Introduction and register disambiguation ......................................................
Alphabetical index of AArch64 registers and System instructions ................
Functional index of AArch64 registers and System instructions ...................
Alphabetical index of AArch32 registers and System instructions ................
Functional index of AArch32 registers and System instructions ...................
Alphabetical index of memory-mapped registers ..........................................
Functional index of memory-mapped registers .............................................
K13-7394
K13-7399
K13-7412
K13-7426
K13-7435
K13-7446
K13-7452
Glossary
xiv
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
Preface
This preface introduces the ARM Architecture Reference Manual, ARMv8, for ARMv8-A architecture profile. It
contains the following sections:
•
About this manual on page xvi.
•
Using this manual on page xviii.
•
Conventions on page xxiv.
•
Additional reading on page xxvi.
•
Feedback on page xxviii.
Note
This document describes only the ARMv8-A architecture profile. For the behaviors required by the ARMv7-A and
ARMv7-R architecture profiles, see the ARM® Architecture Reference Manual, ARMv7-A and ARMv7-R edition.
ARM DDI 0487D.a
ID103018
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
xv
Preface
About this manual
About this manual
This manual describes the ARM® architecture v8, ARMv8. The architecture describes the operation of an
ARMv8-A Processing element (PE), and this manual includes descriptions of:
•
The two Execution states, AArch64 and AArch32.
•
The instruction sets:
—
In AArch32 state, the A32 and T32 instruction sets, that are compatible with earlier versions of the
ARM architecture.
—
In AArch64 state, the A64 instruction set.
•
The states that determine how a PE operates, including the current Exception level and Security state, and in
AArch32 state the PE mode.
•
The Exception model.
•
The interprocessing model, that supports transitioning between AArch64 state and AArch32 state.
•
The memory model, that defines memory ordering and memory management. This manual covers a single
architecture profile, ARMv8-A, that defines a Virtual Memory System Architecture (VMSA).
•
The programmers’ model, and its interfaces to System registers that control most PE and memory system
features, and provide status information.
•
The Advanced SIMD and floating-point instructions, that provide high-performance:
—
Single-precision, half-precision, and double-precision floating-point operations.
—
Conversions between double-precision, single-precision, and half-precision floating-point values.
—
Integer, single-precision floating-point, half-precision floating-point, and in A64, double-precision
vector operations in all instruction sets.
—
Single-precision, half-precision, and double-precision floating-point vector operations in the A64
instruction set.
•
The security model, that provides two security states to support secure applications.
•
The virtualization model, that support the virtualization of Non-secure operation.
•
The Debug architecture, that provides software access to debug features.
This manual gives the assembler syntax for the instructions it describes, meaning that it describes instructions in
textual form. However, this manual is not a tutorial for ARM assembler language, nor does it describe ARM
assembler language, except at a very basic level. To make effective use of ARM assembler language, read the
documentation supplied with the assembler being used.
This manual is organized into parts:
xvi
Part A
Provides an introduction to the ARMv8-A architecture, and an overview of the AArch64 and
AArch32 Execution states.
Part B
Describes the application level view of the AArch64 Execution state, meaning the view from EL0.
It describes the application level view of the programmers’ model and the memory model.
Part C
Describes the A64 instruction set, that is available in the AArch64 Execution state. The descriptions
for each instruction also include the precise effects of each instruction when executed at EL0,
described as unprivileged execution, including any restrictions on its use, and how the effects of the
instruction differ at higher Exception levels. This information is of primary importance to authors
and users of compilers, assemblers, and other programs that generate ARM machine code.
Part D
Describes the system level view of the AArch64 Execution state. It includes details of the System
registers, most of which are not accessible from EL0, and the system level view of the programmers’
model and the memory model. This part includes the description of self-hosted debug.
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
Preface
About this manual
Part E
Describes the application level view of the AArch32 Execution state, meaning the view from the
EL0. It describes the application level view of the programmers’ model and the memory model.
Note
In AArch32 state, execution at EL0 is execution in User mode.
Part F
Describes the T32 and A32 instruction sets, that are available in the AArch32 Execution state. These
instruction sets are backwards-compatible with earlier versions of the ARM architecture. This part
describes the precise effects of each instruction when executed in User mode, described as
unprivileged execution or execution at EL0, including any restrictions on its use, and how the effects
of the instruction differ at higher Exception levels. This information is of primary importance to
authors and users of compilers, assemblers, and other programs that generate ARM machine code.
Note
User mode is the only mode where software execution is unprivileged.
Part G
Describes the system level view of the AArch32 Execution state, that is generally compatible with
earlier versions of the ARM architecture. This part includes details of the System registers, most of
which are not accessible from EL0, and the instruction interface to those registers. It also describes
the system level view of the programmers’ model and the memory model.
Part H
Describes the Debug architecture for external debug. This provides configuration, breakpoint and
watchpoint support, and a Debug Communications Channel (DCC) to a debug host.
Part I
Describes additional features of the architecture that are not closely coupled to a processing element
(PE), and therefore are accessed through memory-mapped interfaces. Some of these features are
OPTIONAL.
Part J
Provides pseudocode that describes various features of the ARMv8 architecture.
Part K, Appendixes
Provide additional information. Some appendixes give information that is not part of the ARMv8
architectural requirements. The cover page of each appendix indicates its status.
Glossary
Defines terms used in this document that have a specialized meaning.
Note
Terms that are generally well understood in the microelectronics industry are not included in the
Glossary.
ARM DDI 0487D.a
ID103018
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
xvii
Preface
Using this manual
Using this manual
The information in this manual is organized into parts, as described in this section.
Part A, Introduction and Architecture Overview
Part A gives an overview of the ARMv8-A architecture profile, including its relationship to the other ARM PE
architectures. It introduces the terminology used to describe the architecture, and gives an overview of the
Executions states, AArch64 and AArch32. It contains the following chapter:
Chapter A1 Introduction to the ARMv8 Architecture
Read this for an introduction to the ARMv8 architecture.
Part B, The AArch64 Application Level Architecture
Part B describes the AArch64 state application level view of the architecture. It contains the following chapters:
Chapter B1 The AArch64 Application Level Programmers’ Model
Read this for an application level description of the programmers’ model for software executing in
AArch64 state. It describes execution at EL0 when EL0 is using AArch64 state.
Chapter B2 The AArch64 Application Level Memory Model
Read this for an application level description of the memory model for software executing in
AArch64 state. It describes the memory model for execution in EL0 when EL0 is using AArch64
state. It includes information about ARM memory types, attributes, and memory access controls.
Part C, The A64 Instruction Set
Part C describes the A64 instruction set, that is used in AArch64 state. It contains the following chapters:
Chapter C1 The A64 Instruction Set
Read this for a description of the A64 instruction set and common instruction operation details.
Chapter C2 About the A64 Instruction Descriptions
Read this to understand the format of the A64 instruction descriptions.
Chapter C3 A64 Instruction Set Overview
Read this for an overview of the individual A64 instructions, that are divided into five functional
groups.
Chapter C4 A64 Instruction Set Encoding
Read this for a description of the A64 instruction set encoding.
Chapter C5 The A64 System Instruction Class
Read this for a description of the AArch64 System instructions and register descriptions, and the
System instruction class encoding space.
Chapter C6 A64 Base Instruction Descriptions
Read this for information on key aspects of the A64 base instructions and for descriptions of the
individual instructions, which are listed in alphabetical order.
Chapter C7 A64 Advanced SIMD and Floating-point Instruction Descriptions
Read this for information on key aspects of the A64 Advanced SIMD and floating-point instructions
and for descriptions of the individual instructions, which are listed in alphabetical order.
xviii
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
Preface
Using this manual
Part D, The AArch64 System Level Architecture
Part D describes the AArch64 state system level view of the architecture. It contains the following chapters:
Chapter D1 The AArch64 System Level Programmers’ Model
Read this for a description of the AArch64 state system level view of the programmers’ model.
Chapter D2 AArch64 Self-hosted Debug
Read this for an introduction to, and a description of, self-hosted debug in AArch64 state.
Chapter D3 AArch64 Self-hosted Trace
Read this for an introduction to, and a description of, self-hosted trace in AArch64 state.
Chapter D4 The AArch64 System Level Memory Model
Read this for a description of the AArch64 state system level view of the general features of the
memory system.
Chapter D5 The AArch64 Virtual Memory System Architecture
Read this for a system level view of the AArch64 Virtual Memory System Architecture (VMSA),
the memory system architecture of an ARMv8 implementation that is executing in AArch64 state.
Chapter D6 The Performance Monitors Extension
Read this for a description of an implementation of the ARM Performance Monitors, that are an
optional non-invasive debug component.
Chapter D7 The Activity Monitors Extension
Read this for a description of an implementation of the ARM Activity Monitors, an optional
non-invasive component.
Chapter D8 The Statistical Profiling Extension
Read this for a description of an implementation of the Statistical Profiling Extension, that is an
optional AArch64 state non-invasive debug component.
Chapter D9 Statistical Profiling Extension Sample Record Specification
Read this for a description the sample records generated by the Statistical Profiling Extension.
Chapter D10 The Generic Timer in AArch64 state
Read this for a description of the AArch64 view of an implementation of the ARM Generic Timer.
Chapter D11 AArch64 System Register Encoding
Read this for a description of the description of the encoding of the AArch64 System registers, and
the other uses of the AArch64 System registers encoding space.
Chapter D12 AArch64 System Register Descriptions
Read this for an introduction to, and description of, each of the AArch64 System registers.
Part E, The AArch32 Application Level Architecture
Part E describes the AArch32 state application level view of the architecture. It contains the following chapters:
Chapter E1 The AArch32 Application Level Programmers’ Model
Read this for an application level description of the programmers’ model for software executing in
AArch32 state. It describes execution at EL0 when EL0 is using AArch32 state.
Chapter E2 The AArch32 Application Level Memory Model
Read this for an application level description of the memory model for software executing in
AArch32 state. It describes the memory model for execution in EL0 when EL0 is using AArch32
state. It includes information about ARM memory types, attributes, and memory access controls.
ARM DDI 0487D.a
ID103018
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
xix
Preface
Using this manual
Part F, The AArch32 Instruction Sets
Part F describes the T32 and A32 instruction sets, that are used in AArch32 state. It contains the following chapters:
Chapter F1 The AArch32 Instruction Sets Overview
Read this for an overview of the T32 and A32 instruction sets.
Chapter F2 About the T32 and A32 Instruction Descriptions
Read this to understand the format of the T32 and A32 instruction descriptions.
Chapter F3 T32 Instruction Set Encoding
Read this for a description of the T32 instruction set encoding. This includes the T32 encoding of
the Advanced SIMD and floating-point instructions.
Chapter F4 A32 Instruction Set Encoding
Read this for a description of the A32 instruction set encoding. This includes the A32 encoding of
the Advanced SIMD and floating-point instructions.
Chapter F5 T32 and A32 Base Instruction Set Instruction Descriptions
Read this for a description of each of the T32 and A32 base instructions.
Chapter F6 T32 and A32 Advanced SIMD and Floating-point Instruction Descriptions
Read this for a description of each of the T32 and A32 Advanced SIMD and floating-point
instructions.
Part G, The AArch32 System Level Architecture
Part G describes the AArch32 state system level view of the architecture. It contains the following chapters:
Chapter G1 The AArch32 System Level Programmers’ Model
Read this for a description of the AArch32 state system level view of the programmers’ model for
execution in an Exception level that is using AArch32.
Chapter G2 AArch32 Self-hosted Debug
Read this for an introduction to, and a description of, self-hosted debug in AArch64 state.
Chapter G3 AArch32 Self-hosted Trace
Read this for an introduction to, and a description of, self-hosted trace in AArch64 state.
Chapter G4 The AArch32 System Level Memory Model
Read this for a system level view of the general features of the memory system.
Chapter G5 The AArch32 Virtual Memory System Architecture
Read this for a description of the AArch32 Virtual Memory System Architecture (VMSA).
Chapter G6 The Generic Timer in AArch32 state
Read this for a description of the AArch32 view of an implementation of the ARM Generic Timer.
Chapter G7 AArch32 System Register Encoding
Read this for a description of the description of the encoding of the AArch32 System registers,
including the System instructions that are part of the AArch32 System registers encoding space.
Chapter G8 AArch32 System Register Descriptions
Read this for a description of each of the AArch32 System registers.
xx
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
Preface
Using this manual
Part H, External Debug
Part H describes the architecture for external debug. It contains the following chapters:
Chapter H1 About External Debug
Read this for an introduction to external debug, and a definition of the scope of this part of the
manual.
Chapter H2 Debug State
Read this for a description of debug state, which the PE might enter as the result of a Halting debug
event.
Chapter H3 Halting Debug Events
Read this for a description of the external debug events referred to as Halting debug events.
Chapter H4 The Debug Communication Channel and Instruction Transfer Register
Read this for a description of the communication between a debugger and the PE debug logic using
the Debug Communications Channel and the Instruction Transfer register.
Chapter H5 The Embedded Cross-Trigger Interface
Read this for a description of the embedded cross-trigger interface.
Chapter H6 Debug Reset and Powerdown Support
Read this for a description of reset and powerdown support in the Debug architecture.
Chapter H7 The PC Sample-based Profiling Extension
Read this for a description of the PC Sample-based Profiling Extension that is an OPTIONAL
extension to an ARMv8 implementation.
Chapter H8 About the External Debug Registers
Read this for some additional information about the external debug registers.
Chapter H9 External Debug Register Descriptions
Read this for a description of each external debug register.
Part I, Memory-mapped Components of the ARMv8 Architecture
Part I describes the memory-mapped components in the architecture. It contains the following chapters:
Chapter I1 Requirements for Memory-mapped Components
Read this for descriptions of some general requirements for memory-mapped components within a
system that complies with the ARMv8 Architecture.
Chapter I2 System Level Implementation of the Generic Timer
Read this for a definition of a system level implementation of the Generic Timer.
Chapter I3 Recommended External Interface to the Performance Monitors
Read this for a description of the recommended memory-mapped and external debug interfaces to
the Performance Monitors.
Chapter I4 Recommended External Interface to the Activity Monitors
Read this for a description of the recommended memory-mapped interface to the Activity Monitors.
Chapter I5 External System Control Register Descriptions
Read this for a description of each memory-mapped system control register.
ARM DDI 0487D.a
ID103018
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
xxi
Preface
Using this manual
Part J, Architectural Pseudocode
Part J contains pseudocode that describes various features of the ARM architecture. It contains the following
chapter:
Chapter J1 ARMv8 Pseudocode
Read this for the pseudocode definitions that describe various features of the ARMv8 architecture,
for operation in AArch64 state and in AArch32 state.
Part K, Appendixes
This manual contains the following appendixes:
Appendix K1 Architectural Constraints on UNPREDICTABLE behaviors
Read this for a description of the architecturally-required constraints on UNPREDICTABLE behaviors
in the ARMv8 architecture, including AArch32 behaviors that were UNPREDICTABLE in previous
versions of the architecture.
Appendix K2 Recommended External Debug Interface
Read this for a description of the recommended external debug interface.
Note
This description is not part of the ARM architecture specification. It is included here as
supplementary information, for the convenience of developers and users who might require this
information.
Appendix K3 Recommendations for Performance Monitors Event Numbers for IMPLEMENTATION
DEFINED Events
Read this for a description of ARM recommendations for the use of the IMPLEMENTATION DEFINED
event numbers.
Note
This description is not part of the ARM architecture specification. It is included here as
supplementary information, for the convenience of developers and users who might require this
information.
Appendix K4 Recommendations for reporting memory attributes on an interconnect
Read this for the ARM recommendations about how the architectural memory attributes are
reported on an interconnect.
Appendix K5 Additional Information for Implementations of the Generic Timer
Read this for additional information about implementations of the ARM Generic Timer. This
information does not form part of the architectural definition of the Generic Timer.
Appendix K6 Legacy Instruction Syntax for AArch32 Instruction Sets
Read this for information about the pre-UAL syntax of the AArch32 instruction sets, which can still
be valid for the A32 instruction set.
Appendix K7 Address translation examples
Read this for examples of translation table lookups using the translation regimes described in
Chapter D5 The AArch64 Virtual Memory System Architecture and Chapter G5 The AArch32 Virtual
Memory System Architecture.
Appendix K8 Example OS Save and Restore Sequences
Read this for software examples that perform the OS Save and Restore sequences for an ARMv8
debug implementation.
xxii
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
Preface
Using this manual
Note
Chapter H6 Debug Reset and Powerdown Support describes the OS Save and Restore mechanism.
Appendix K9 Recommended Upload and Download Processes for External Debug
Read this for information about implementing and using the ARM architecture.
Note
This description is not part of the ARM architecture specification. It is included here as
supplementary information, for the convenience of developers and users who might require this
information.
Appendix K10 Software Usage Examples
Read this for software examples that help understanding of some aspects of the Arm architecture.
Note
This description is not part of the ARM architecture specification. It is included here as
supplementary information, for the convenience of developers and users who might require this
information.
Appendix K11 Barrier Litmus Tests
Read this for examples of the use of barrier instructions provided by the ARMv8 architecture.
Note
This description is not part of the ARM architecture specification. It is included here as
supplementary information, for the convenience of developers and users who might require this
information.
Appendix K12 ARM Pseudocode Definition
Read this for definitions of the AArch32 pseudocode.
Appendix K13 Registers Index
Read this for an alphabetic and functional index of AArch32 and AArch64 registers, and
memory-mapped registers.
Glossary
Defines terms used in this document that have a specialized meaning.
Note
Terms that are generally well understood in the microelectronics industry are not included in the Glossary.
ARM DDI 0487D.a
ID103018
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
xxiii
Preface
Conventions
Conventions
The following sections describe conventions that this book can use:
•
Typographic conventions.
•
Signals on page xxv.
•
Numbers on page xxv.
•
Pseudocode descriptions on page xxv.
•
Assembler syntax descriptions on page xxv.
Typographic conventions
The typographical conventions are:
italic
Introduces special terminology, and denotes citations.
bold
Denotes signal names, and is used for terms in descriptive lists, where appropriate.
monospace
Used for assembler syntax descriptions, pseudocode, and source code examples.
Also used in the main text for instruction mnemonics and for references to other items appearing in
assembler syntax descriptions, pseudocode, and source code examples.
SMALL CAPITALS
Used in body text for a few terms that have specific technical meanings, and are defined in the
Glossary.
Colored text
{ and }
Indicates a link. This can be:
•
A URL, for example http://infocenter.arm.com.
•
A cross-reference, that includes the page number of the referenced information if it is not on
the current page, for example, Assembler syntax descriptions on page xxv.
•
A link, to a chapter or appendix, or to a glossary entry, or to the section of the document that
defines the colored term, for example Simple sequential execution or SCTLR.
Braces, { and }, have two distinct uses:
Optional items
In syntax descriptions braces enclose optional items. In the following example they
indicate that the parameter is optional:
ADD , , #{, }
Similarly they can be used in generalized field descriptions, for example
TCR_ELx.{I}PS refers to a field in the TCR_ELx registers that is called either IPS or
PS.
Sets of items
Braces can be used to enclose sets. For example, HCR_EL2.{E2H, TGE} refers to a set
of two register fields, HCR_EL2.E2H and HCR_EL2.TGE
Notes
Notes are formatted as:
Note
This is a Note.
In this Manual, Notes are used only to provide additional information, usually to help understanding
of the text. While a Note may repeat architectural information given elsewhere in the Manual, a
Note never provides any part of the definition of the architecture.
xxiv
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
Preface
Conventions
Signals
In general this specification does not define hardware signals, but it does include some signal examples and
recommendations. The signal conventions are:
Signal level
The level of an asserted signal depends on whether the signal is active-HIGH or
active-LOW. Asserted means:
•
HIGH for active-HIGH signals.
•
LOW for active-LOW signals.
Lower-case n
At the start or end of a signal name denotes an active-LOW signal.
Numbers
Numbers are normally written in decimal. Binary numbers are preceded by 0b, and hexadecimal numbers by 0x. In
both cases, the prefix and the associated value are written in a monospace font, for example 0xFFFF0000. To improve
readability, long numbers can be written with an underscore separator between every four characters, for example
0xFFFF_0000_0000_0000. Ignore any underscores when interpreting the value of a number.
Pseudocode descriptions
This manual uses a form of pseudocode to provide precise descriptions of the specified functionality. This
pseudocode is written in monospace font, and is described in Appendix K12 ARM Pseudocode Definition.
Assembler syntax descriptions
This manual contains numerous syntax descriptions for assembler instructions and for components of assembler
instructions. These are shown in a monospace font, and use the conventions described in Structure of the A64
assembler language on page C1-151, Appendix K12 ARM Pseudocode Definition, and Pseudocode operators and
keywords on page K12-5648.
ARM DDI 0487D.a
ID103018
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
xxv
Preface
Additional reading
Additional reading
This section lists relevant publications from ARM and third parties.
See the Infocenter, http://infocenter.arm.com, for access to ARM documentation.
ARM publications
•
ARM® AMBA® 4 ATB Protocol Specification, ATBv1.0 and ATBv1.1, (ARM IHI 0032B).
•
ARM® Architecture Reference Manual, ARMv7-A and ARMv7-R edition (ARM DDI 0406).
•
ARM® Architecture Reference Manual Supplement, ARMv8, for the ARMv8-R AArch32 architecture profile
(ARM DDI 0568).
•
ARM® Debug Interface Architecture Specification, ADIv6.0 (ARM IHI 0074).
•
ARM® Debug Interface Architecture Specification, ADIv5.0 to ADIv5.2 (ARM IHI 0031).
•
ARM® Embedded Trace Macrocell Architecture Specification, ETMv4 (ARM IHI 0064).
•
ARM® Generic Interrupt Controller Architecture Specification, GIC architecture version 3.0 and version 4.0
(ARM IHI 0069).
•
ARM® CoreSight™ SoC Technical Reference Manual (ARM DDI 0480).
•
ARM® CoreSight™ Architecture Specification (ARM IHI 0029).
•
ARM® Procedure Call Standard for the ARM 64-bit Architecture (ARM IHI 0055).
•
ARM® Reliability, Availability, and Serviceability (RAS) Specification, ARMv8, for the ARMv8-A architecture
profile (ARM DDI 0587).
•
ARM® Architecture Reference Manual Supplement, The Scalable Vector Extension (SVE), for ARMv8-A
(ARM DDI 0584).
•
ARM® Architecture Reference Manual Supplement, Memory System Resource Partitioning and Monitoring
(MPAM), for Armv8-A (ARM DDI 0598).
Other publications
The following publications are referred to in this manual, or provide more information:
xxvi
•
Announcing the Advanced Encryption Standard (AES), Federal Information Processing Standards
Publication 197, November 2001.
•
IEEE Std 754-2008, IEEE Standard for Floating-point Arithmetic, August 2008.
•
IEEE Std 754-1985, IEEE Standard for Floating-point Arithmetic, March 1985.
•
Secure Hash Standard (SHA), Federal Information Processing Standards Publication 180-2, August 2002.
•
The Galois/Counter Mode of Operation, McGraw, D. and Viega, J., Submission to NIST Modes of Operation
Process, January 2004.
•
Memory Consistency Models for Shared Memory-Multiprocessors, Gharachorloo, Kourosh, 1995, Stanford
University Technical Report CSL-TR-95-685.
•
Standard Manufacturer’s Identification Code, JEP106, JEDEC Solid State Technology Association.
•
SM3 Cryptographic Hash Algorithm, China Internet Network Information Center (CNNIC).
•
SM4 Block Cipher Algorithm, China Internet Network Information Center (CNNIC).
•
The QARMA Block Cipher Family, Roberto Avanzi, Qualcomm Product Security Initiative.
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
Preface
Additional reading
Available from https://eprint.iacr.org/2016/444.
ARM DDI 0487D.a
ID103018
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
xxvii
Preface
Feedback
Feedback
ARM welcomes feedback on its documentation.
Feedback on this manual
If you have comments on the content of this manual, send e-mail to errata@arm.com. Give:
•
The title.
•
The number, ARM DDI 0487D.a.
•
The page numbers to which your comments apply.
•
A concise explanation of your comments.
ARM also welcomes general suggestions for additions and improvements.
xxviii
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
Part A
ARMv8 Architecture Introduction and Overview
Chapter A1
Introduction to the ARMv8 Architecture
This chapter introduces the ARM architecture. It contains the following sections:
•
About the ARM architecture on page A1-32.
•
Architecture profiles on page A1-34.
•
ARMv8 architectural concepts on page A1-36.
•
Supported data types on page A1-39.
•
Advanced SIMD and floating-point support on page A1-49.
•
The ARM memory model on page A1-55.
•
ARMv8 architecture extensions on page A1-56.
ARM DDI 0487D.a
ID103018
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
A1-31
A1 Introduction to the ARMv8 Architecture
A1.1 About the ARM architecture
A1.1
About the ARM architecture
The ARM architecture described in this Architecture Reference Manual defines the behavior of an abstract machine,
referred to as a processing element, often abbreviated to PE. Implementations compliant with the ARM architecture
must conform to the described behavior of the processing element. It is not intended to describe how to build an
implementation of the PE, nor to limit the scope of such implementations beyond the defined behaviors.
Except where the architecture specifies differently, the programmer-visible behavior of an implementation that is
compliant with the ARM architecture must be the same as a simple sequential execution of the program on the
processing element. This programmer-visible behavior does not include the execution time of the program.
The ARM Architecture Reference Manual also describes rules for software to use the processing element.
The ARM architecture includes definitions of:
•
An associated debug architecture, see:
—
Chapter D2 AArch64 Self-hosted Debug.
—
Chapter G2 AArch32 Self-hosted Debug.
—
Part H of this manual, External Debug on page 6409.
•
Associated trace architectures that define PE Trace Units that implementers can implement with the
associated processor hardware. For more information, see:
—
The Embedded Trace Macrocell Architecture Specification.
—
Chapter D3 AArch64 Self-hosted Trace.
—
Chapter G3 AArch32 Self-hosted Trace.
Note
A PE Trace Unit may be named a trace macrocell in other documentation.
The ARM architecture is a Reduced Instruction Set Computer (RISC) architecture with the following RISC
architecture features:
•
A large uniform register file.
•
A load/store architecture, where data-processing operations only operate on register contents, not directly on
memory contents.
•
Simple addressing modes, with all load/store addresses determined from register contents and instruction
fields only.
The architecture defines the interaction of the PE with memory, including caches, and includes a memory translation
system. It also describes how multiple PEs interact with each other and with other observers in a system.
This document defines the ARMv8-A architecture profile. See Architecture profiles on page A1-34 for more
information.
The ARM architecture supports implementations across a wide range of performance points. Implementation size,
performance, and very low power consumption are key attributes of the ARM architecture.
An important feature of the ARMv8 architecture is backwards compatibility, combined with the freedom for optimal
implementation in a wide range of standard and more specialized use cases. The ARMv8 architecture supports:
•
A 64-bit Execution state, AArch64.
•
A 32-bit Execution state, AArch32, that is compatible with previous versions of the ARM architecture.
Note
•
A1-32
The AArch32 Execution state is compatible with the ARMv7-A architecture profile, and enhances that
profile to support some features included in the AArch64 Execution state.
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
A1 Introduction to the ARMv8 Architecture
A1.1 About the ARM architecture
•
This document describes only the ARMv8-A architecture profile. For the behaviors required by the
ARMv7-A and ARMv7-R architecture profiles, see the ARM® Architecture Reference Manual, ARMv7-A and
ARMv7-R edition.
Features that are optional are explicitly defined as such in this Manual.
Note
The presence of an ID register field for a feature does not imply that the feature is optional.
Both Execution states support SIMD and floating-point instructions:
•
AArch32 state provides:
—
SIMD instructions in the base instruction sets that operate on the 32-bit general-purpose registers.
—
Advanced SIMD instructions that operate on registers in the SIMD and floating-point register
(SIMD&FP register) file.
—
Floating-point instructions that operate on registers in the SIMD&FP register file.
•
AArch64 state provides:
—
Advanced SIMD instructions that operate on registers in the SIMD&FP register file.
—
Floating-point instructions that operate on registers in the SIMD&FP register file.
Note
See Conventions on page xxiv for information about conventions used in this manual, including the use of SMALL
CAPITALS for particular terms that have ARM-specific meanings that are defined in the Glossary.
ARM DDI 0487D.a
ID103018
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
A1-33
A1 Introduction to the ARMv8 Architecture
A1.2 Architecture profiles
A1.2
Architecture profiles
The ARM architecture has evolved significantly since its introduction, and ARM continues to develop it. Eight
major versions of the architecture have been defined to date, denoted by the version numbers 1 to 8. Of these, the
first three versions are now obsolete.
The generic names AArch64 and AArch32 describe the 64-bit and 32-bit Execution states:
AArch64
Is the 64-bit Execution state, meaning addresses are held in 64-bit registers, and instructions in the
base instruction set can use 64-bit registers for their processing. AArch64 state supports the A64
instruction set.
AArch32
Is the 32-bit Execution state, meaning addresses are held in 32-bit registers, and instructions in the
base instruction sets use 32-bit registers for their processing. AArch32 state supports the T32 and
A32 instruction sets.
Note
The Base instruction set comprises the supported instructions other than the Advanced SIMD and floating-point
instructions.
See sections Execution state on page A1-36 and The ARMv8 instruction sets on page A1-37 for more information.
ARM defines three architecture profiles:
A
Application profile, described in this manual:
•
Supports a Virtual Memory System Architecture (VMSA) based on a Memory Management
Unit (MMU).
Note
An ARMv8-A implementation can be called an AArchv8-A implementation.
•
R
Supports the A64, A32, and T32 instruction sets.
Real-time profile:
M
•
Supports a Protected Memory System Architecture (PMSA) based on a Memory Protection
Unit (MPU).
•
Supports the A32 and T32 instruction sets.
Microcontroller profile:
•
Implements a programmers' model designed for low-latency interrupt processing, with
hardware stacking of registers and support for writing interrupt handlers in high-level
languages.
•
Implements a variant of the R-profile PMSA.
•
Supports a variant of the T32 instruction set.
Note
This Architecture Reference Manual describes only the ARMv8-A profile.
For information about the R and M architecture profiles, and earlier ARM architecture versions see:
•
The ARM® Architecture Reference Manual Supplement, ARMv8, for the ARMv8-R AArch32 architecture
profile.
•
The ARM® Architecture Reference Manual, ARMv7-A and ARMv7-R edition.
•
The ARM®v8-M Architecture Reference Manual.
•
The ARM®v7-M Architecture Reference Manual.
•
The ARM®v6-M Architecture Reference Manual.
A1-34
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
A1 Introduction to the ARMv8 Architecture
A1.2 Architecture profiles
A1.2.1
Debug architecture version
The ARM Debug architecture is fully integrated with the architecture, and does not have a separate version number.
ARM DDI 0487D.a
ID103018
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
A1-35
A1 Introduction to the ARMv8 Architecture
A1.3 ARMv8 architectural concepts
A1.3
ARMv8 architectural concepts
ARMv8 introduces major changes to the ARM architecture, while maintaining a high level of consistency with
previous versions of the architecture. The ARMv8 Architecture Reference Manual includes significant changes in
the terminology used to describe the architecture, and this section introduces both the ARMv8 architectural concepts
and the associated terminology.
The following subsections describe key ARMv8 architectural concepts. Each section introduces the corresponding
terms that are used to describe the architecture:
•
Execution state.
•
The ARMv8 instruction sets on page A1-37.
•
System registers on page A1-37.
•
ARMv8 Debug on page A1-38.
A1.3.1
Execution state
The Execution state defines the PE execution environment, including:
•
The supported register widths.
•
The supported instruction sets.
•
Significant aspects of:
—
The exception model.
—
The Virtual Memory System Architecture (VMSA).
—
The programmers’ model.
The Execution states are:
AArch64
AArch32
A1-36
The 64-bit Execution state. This Execution state:
•
Provides 31 64-bit general-purpose registers, of which X30 is used as the procedure link
register.
•
Provides a 64-bit program counter (PC), stack pointers (SPs), and exception link registers
(ELRs).
•
Provides 32 128-bit registers for SIMD vector and scalar floating-point support.
•
Provides a single instruction set, A64. For more information, see The ARMv8 instruction sets
on page A1-37.
•
Defines the ARMv8 Exception model, with up to four Exception levels, EL0 - EL3, that
provide an execution privilege hierarchy, see Exception levels on page D1-2146.
•
Provides support for 64-bit virtual addressing. For more information, including the limits on
address ranges, see Chapter D5 The AArch64 Virtual Memory System Architecture.
•
Defines a number of Process state (PSTATE) elements that hold PE state. The A64
instruction set includes instructions that operate directly on various PSTATE elements.
•
Names each System register using a suffix that indicates the lowest Exception level at which
the register can be accessed.
The 32-bit Execution state. This Execution state:
•
Provides 13 32-bit general-purpose registers, and a 32-bit PC, SP, and link register (LR). The
LR is used as both an ELR and a procedure link register.
Some of these registers have multiple banked instances for use in different PE modes.
•
Provides a single ELR, for exception returns from Hyp mode.
•
Provides 32 64-bit registers for Advanced SIMD vector and scalar floating-point support.
•
Provides two instruction sets, A32 and T32. For more information, see The ARMv8
instruction sets on page A1-37.
•
Supports the ARMv7-A exception model, based on PE modes, and maps this onto the
ARMv8 Exception model, that is based on the Exception levels.
•
Provides support for 32-bit virtual addressing.
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
A1 Introduction to the ARMv8 Architecture
A1.3 ARMv8 architectural concepts
•
Defines a number of Process state (PSTATE) elements that hold PE state. The A32 and T32
instruction sets include instructions that operate directly on various PSTATE elements, and
instructions that access PSTATE by using the Application Program Status Register (APSR)
or the Current Program Status Register (CPSR).
Later subsections give more information about the different properties of the Execution states.
Transferring control between the AArch64 and AArch32 Execution states is known as interprocessing. The PE can
move between Execution states only on a change of Exception level, and subject to the rules given in
Interprocessing on page D1-2263. This means different software layers, such as an application, an operating system
kernel, and a hypervisor, executing at different Exception levels, can execute in different Execution states.
A1.3.2
The ARMv8 instruction sets
In ARMv8 the possible instruction sets depend on the Execution state:
AArch64
AArch64 state supports only a single instruction set, called A64. This is a fixed-length instruction
set that uses 32-bit instruction encodings.
For information on the A64 instruction set, see Chapter C3 A64 Instruction Set Overview.
AArch32
AArch32 state supports the following instruction sets:
A32
This is a fixed-length instruction set that uses 32-bit instruction encodings.
T32
This is a variable-length instruction set that uses both 16-bit and 32-bit instruction
encodings.
In previous documentation, these instruction sets were called the ARM and Thumb instruction sets.
ARMv8 extends each of these instruction sets. In AArch32 state, the Instruction set state determines
the instruction set that the PE executes.
For information on the A32 and T32 instruction sets, see Chapter F1 The AArch32 Instruction Sets
Overview.
The ARMv8 instruction sets support SIMD and scalar floating-point instructions. See Advanced SIMD and
floating-point support on page A1-49.
A1.3.3
System registers
System registers provide control and status information of architected features.
The System registers use a standard naming format: . to identify specific
registers as well as control and status bits within a register.
Bits can also be described by their numerical position in the form [x:y] or the generic form
bits[x:y].
In addition, in AArch64 state, most register names include the lowest Exception level that can access the register as
a suffix to the register name:
•
_ELx, where x is 0, 1, 2, or 3.
For information about Exception levels, see Exception levels on page D1-2146.
The System registers comprise:
•
ARM DDI 0487D.a
ID103018
The following registers that are described in this manual:
—
General system control registers.
—
Debug registers.
—
Generic Timer registers.
—
Optionally, Performance Monitor registers.
—
Optionally, the Activity Monitors registers.
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
A1-37
A1 Introduction to the ARMv8 Architecture
A1.3 ARMv8 architectural concepts
•
•
Optionally, one or more of the following groups of registers that are defined in other ARM architecture
specifications:
—
Trace System registers, as defined in the Embedded Trace Macrocell Architecture Specification,
ETMv4.
—
Scalable Vector Extension System registers, as defined in the ARM® Architecture Reference Manual
Supplement, The Scalable Vector Extension (SVE), for ARMv8-A.
—
Statistical Profiling Extension System registers, as defined in the ARM® Architecture Reference
Manual Supplement, The Statistical Profiling Extension, for ARMv8-A.
—
Generic Interrupt Controller (GIC) System registers, see The ARM Generic Interrupt Controller
System registers.
RAS Extension System registers, as defined in the ARM® Reliability, Availability, and Serviceability (RAS)
Specification, ARMv8, for the ARMv8-A architecture profile. The RAS Extension is a mandatory extension
to the ARMv8.2 architecture, and an optional extension to the ARMv8.0 and the ARMv8.1 architectures.
For information about the AArch64 System registers, see Chapter D12 AArch64 System Register Descriptions.
For information about the AArch32 System registers, see Chapter G8 AArch32 System Register Descriptions.
The ARM Generic Interrupt Controller System registers
From version 3 of the ARM Generic Interrupt Controller architecture, GICv3, the GIC architecture specification
defines a System register interface to some of its functionality. The System register summaries in this manual
include these registers, see:
•
About the GIC System registers on page D11-2671, for more information about the AArch64 GIC System
registers.
•
About the GIC System registers on page G7-5624, for more information about the AArch32 GIC System
registers.
These sections give only short overviews of the GIC System registers. For more information, including descriptions
of the registers, see the ARM® Generic Interrupt Controller Architecture Specification, GIC architecture version 3.0
and version 4.0 (ARM IHI 0069).
Note
The programmers’ model for earlier versions of the GIC architecture is wholly memory-mapped.
A1.3.4
ARMv8 Debug
ARMv8 supports the following:
Self-hosted debug
In this model, the PE generates debug exceptions. Debug exceptions are part of the ARMv8
Exception model.
External debug
In this model, debug events cause the PE to enter Debug state. In Debug state, the PE is controlled
by an external debugger.
All ARMv8 implementations support both models. The model chosen by a particular user depends on the debug
requirements during different stages of the design and development life cycle of the product. For example, external
debug might be used during debugging of the hardware implementation and OS bring-up, and self-hosted debug
might be used during application development.
For more information about self-hosted debug:
•
In AArch64 state, see Chapter D2 AArch64 Self-hosted Debug.
•
In AArch32 state, see Chapter G2 AArch32 Self-hosted Debug.
For more information about external debug, see Part H External Debug on page 6409.
A1-38
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
A1 Introduction to the ARMv8 Architecture
A1.4 Supported data types
A1.4
Supported data types
The ARMv8 architecture supports the following integer data types:
Byte
8 bits.
Halfword
16 bits.
Word
32 bits.
Doubleword 64 bits.
Quadword
128 bits.
The architecture also supports the following floating-point data types:
•
Half-precision, see Half-precision floating-point formats on page A1-43 for details.
•
Single-precision, see Single-precision floating-point format on page A1-45 for details.
•
Double-precision, see Double-precision floating-point format on page A1-46 for details.
It also supports:
•
Fixed-point interpretation of words and doublewords. See Fixed-point format on page A1-47.
•
Vectors, where a register holds multiple elements, each of the same data type. See Vector formats on
page A1-40 for details.
The ARMv8 architecture provides two register files:
•
A general-purpose register file.
•
A SIMD&FP register file.
In each of these, the possible register widths depend on the Execution state.
In AArch64 state:
•
A general-purpose register file contains 64-bit registers:
—
•
Many instructions can access these registers as 64-bit registers or as 32-bit registers, using only the
bottom 32 bits.
A SIMD&FP register file contains 128-bit registers:
—
The quadword integer data types only apply to the SIMD&FP register file.
—
The floating-point data types only apply to the SIMD&FP register file.
—
While the AArch64 vector registers support 128-bit vectors, the effective vector length can be 64-bits
or 128-bits depending on the A64 instruction encoding used, see Instruction Mnemonics on
page C1-153.
For more information on the register files in AArch64 state, see Registers in AArch64 Execution state on
page B1-81.
In AArch32 state:
•
A general-purpose register file contains 32-bit registers:
—
Two 32-bit registers can support a doubleword.
—
Vector formatting is supported, see Figure A1-4 on page A1-43.
•
A SIMD&FP register file contains 64-bit registers:
—
AArch32 state does not support quadword integer or floating-point data types.
Note
Two consecutive 64-bit registers can be used as a 128-bit register.
For more information on the register files in AArch32 state, see The general-purpose registers, and the PC, in
AArch32 state on page E1-3533.
ARM DDI 0487D.a
ID103018
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
A1-39
A1 Introduction to the ARMv8 Architecture
A1.4 Supported data types
A1.4.1
Vector formats
In an implementation that includes the SIMD instructions that operate on the SIMD&FP register file, a register can
hold one or more packed elements, all of the same size and type. The combination of a register and a data type
describes a vector of elements. The vector is considered to be an array of elements of the data type specified in the
instruction. The number of elements in the vector is implied by the size of the data elements and the size of the
register.
Vector indices are in the range 0 to (number of elements – 1). An index of 0 refers to the least significant end of the
vector.
Vector formats in AArch64 state
In AArch64 state, the SIMD&FP registers can be referred to as Vn, where n is a value from 0 to 31.
The SIMD&FP registers support three data formats for loads, stores, and data-processing operations:
•
A single, scalar, element in the least significant bits of the register.
•
A 64-bit vector of byte, halfword, or word elements.
•
A 128-bit vector of byte, halfword, word, or doubleword elements.
The element sizes are defined in Table A1-1 with the vector format described as:
•
For a 128-bit vector: Vn{.2D, .4S, .8H, .16B}.
•
For a 64-bit vector: Vn{.1D, .2S, .4H, .8B}.
Table A1-1 SIMD elements in AArch64 state
Mnemonic
Size
B
8 bits
H
16 bits
S
32 bits
D
64 bits
Figure A1-1 on page A1-41 shows the SIMD vectors in AArch64 state.
A1-40
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
A1 Introduction to the ARMv8 Architecture
A1.4 Supported data types
127
112 111
96 95
80 79
64 63
48 47
32 31
16 15
0
Vn
128-bit vector of 64-bit elements (.2D)
128-bit vector of 32-bit elements (.4S)
128-bit vector of 16-bit elements (.8H)
128-bit vector of 8-bit elements (.16B)
.D
.D
[1]
[0]
.S
.S
.S
.S
[3]
[2]
[1]
[0]
.H
.H
.H
.H
.H
.H
.H
.H
[7]
[6]
[5]
[4]
[3]
[2]
[1]
[0]
.B
.B
.B
.B
.B
.B
.B
.B
.B
.B
.B
.B
.B
[15] [14] [13] [12] [11] [10] [9]
[8]
[7]
[6]
[5]
[4]
[3]
[2]
[1]
[0]
.B
.B
.B
48 47
63
32 31
16 15
0
Vn
.S
.S
[1]
[0]
64-bit vector of 32-bit elements (.2S)
64-bit vector of 16-bit elements (.4H)
64-bit vector of 8-bit elements (.8B)
.H
.H
.H
.H
[3]
[2]
[1]
[0]
.B
.B
.B
.B
.B
.B
.B
.B
[7]
[6]
[5]
[4]
[3]
[2]
[1]
[0]
Figure A1-1 SIMD vectors in AArch64 state
Vector formats in AArch32 state
Table A1-2 shows the available formats. Each instruction description specifies the data types that the instruction
supports.
Table A1-2 Advanced SIMD data types in AArch32 state
Data type specifier
Meaning
.
Any element of bits
.F
Floating-point number of bits
.I
Signed or unsigned integer of bits
.P
Polynomial over {0, 1} of degree less than
.S
Signed integer of bits
.U
Unsigned integer of bits
Polynomial arithmetic over {0, 1} on page A1-48 describes the polynomial data type.
The .F16 data type is the half-precision data type selected by the FPSCR.AHP bit, see Half-precision floating-point
formats on page A1-43.
The .F32 data type is the ARM standard single-precision floating-point data type, see Single-precision
floating-point format on page A1-45.
ARM DDI 0487D.a
ID103018
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
A1-41
A1 Introduction to the ARMv8 Architecture
A1.4 Supported data types
The instruction definitions use a data type specifier to define the data types appropriate to the operation. Figure A1-2
shows the hierarchy of the Advanced SIMD data types.
.S8
.U8
.I8
.8
.P8
.S16
.U16
.I16
.16
.P16 †
.F16
.S32
.U32
.I32
.32
.F32
.S64
.U64
.I64
.64
.P64 ‡
-
† Output format only. See VMULL instruction description.
‡ Available only if the Cyptographic Extension is implemented.
See VMULL instruction description.
Figure A1-2 Advanced SIMD data type hierarchy in AArch32 state
For example, a multiply instruction must distinguish between integer and floating-point data types.
An integer multiply instruction that generates a double-width (long) result must specify the input data types as
signed or unsigned. However, some integer multiply instructions use modulo arithmetic, and therefore do not have
to distinguish between signed and unsigned inputs.
Figure A1-3 on page A1-43 shows the Advanced SIMD vectors in AArch32 state.
Note
In AArch32 state, a pair of even and following odd numbered doubleword registers can be concatenated and treated
as a single quadword register.
A1-42
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
A1 Introduction to the ARMv8 Architecture
A1.4 Supported data types
127
112 111
96 95
80 79
64 63
48 47
32 31
16 15
0
Qn
128-bit vector of double-precision
(64-bit) elements
128-bit vector of single-precision
(32-bit) elements
128-bit vector of 16-bit elements
128-bit vector of 8-bit elements
.64
.64
[1]
[0]
.32
.32
.32
.32
[3]
[2]
[1]
[0]
.16
.16
.16
.16
.16
.16
.16
.16
[7]
[6]
[5]
[4]
[3]
[2]
[1]
[0]
.8
.8
.8
.8
.8
.8
.8
.8
.8
.8
.8
.8
.8
.8
.8
.8
[15] [14] [13] [12] [11] [10] [9]
[8]
[7]
[6]
[5]
[4]
[3]
[2]
[1]
[0]
48 47
63
32 31
16 15
0
Dn
64-bit vector of 32-bit elements
64-bit vector of 16-bit elements
64-bit vector of 8-bit elements
.32
.32
[1]
[0]
.16
.16
.16
.16
[3]
[2]
[1]
[0]
.8
.8
.8
.8
.8
.8
.8
.8
[7]
[6]
[5]
[4]
[3]
[2]
[1]
[0]
Figure A1-3 Advanced SIMD vectors in AArch32 state
The AArch32 general-purpose registers support vectors formats for use by the SIMD instructions in the Base
instruction set. Figure A1-4 shows these formats, that means that a general-purpose register can be treated as either
2 halfwords or 4 bytes.
31
24 23
16 15
8 7
0
Rn
32-bit general-purpose register
as a set of two halfwords
32-bit general-purpose register
as a set of four bytes
.16
.16
[1]
[0]
.8
.8
.8
.8
[3]
[2]
[1]
[0]
Figure A1-4 Vector formatting in AArch32 state
A1.4.2
Half-precision floating-point formats
ARMv8 supports two half-precision floating-point formats:
•
IEEE half-precision, as described in the IEEE 754-2008 standard.
•
ARM alternative half-precision format.
Both formats can be used for conversions to and from other floating-point formats. FPCR.AHP controls the format
in AArch64 state and FPSCR.AHP controls the format in AArch32 state. ARMv8.2-FP16 adds half-precision
data-processing instructions, which always use the IEEE format. These instructions ignore the value of the relevant
AHP field, and behave as if it has an Effective value of 0.
ARM DDI 0487D.a
ID103018
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
A1-43
A1 Introduction to the ARMv8 Architecture
A1.4 Supported data types
The description of IEEE half-precision includes ARM-specific details that are left open by the standard, and is only
an introduction to the formats and to the values they can contain. For more information, especially on the handling
of infinities, NaNs, and signed zeros, see the IEEE 754 standard.
For both half-precision floating-point formats, the layout of the 16-bit format is the same. The format is:
15 14
S
10 9
exponent
0
fraction
The interpretation of the format depends on the value of the exponent field, bits[14:10] and on which half-precision
format is being used.
0 < exponent < 0x1F
The value is a normalized number and is equal to:
(–1)S × 2(exponent-15) × (1.fraction)
The minimum positive normalized number is 2–14, or approximately 6.104 ⋅ 10–5.
The maximum positive normalized number is (2 – 2–10) × 215, or 65504.
Larger normalized numbers can be expressed using the alternative format when the
exponent == 0x1F.
exponent == 0
The value is either a zero or a denormalized number, depending on the fraction bits:
fraction == 0
The value is a zero. There are two distinct zeros:
+0
when S==0
–0
when S==1.
fraction != 0
The value is a denormalized number and is equal to:
(–1)S × 2–14 × (0.fraction)
The minimum positive denormalized number is 2–24, or approximately 5.960 × 10–8.
Half-precision denormalized numbers are not flushed to zero by default. When ARMv8.2-FP16 is
implemented, the FPCR.FZ16 bit controls whether Flush-to-Zero mode is enabled for half-precision
data-processing instructions. For details, see Flush-to-zero on page A1-52.
exponent == 0x1F
The value depends on which half-precision format is being used:
IEEE half-precision
The value is either an infinity or a Not a Number (NaN), depending on the fraction bits:
fraction == 0
The value is an infinity. There are two distinct infinities:
+infinity
When S==0. This represents all positive numbers that are too
big to be represented accurately as a normalized number.
-infinity
When S==1. This represents all negative numbers with an
absolute value that is too big to be represented accurately as a
normalized number.
fraction != 0
The value is a NaN, and is either a quiet NaN or a signaling NaN.
The two types of NaN are distinguished by their most significant fraction
bit, bit[9]:
bit[9] == 0 The NaN is a signaling NaN. The sign bit can take any value,
and the remaining fraction bits can take any value except all
zeros.
A1-44
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
A1 Introduction to the ARMv8 Architecture
A1.4 Supported data types
bit[9] == 1 The NaN is a quiet NaN. The sign bit and remaining fraction
bits can take any value.
Alternative half-precision
The value is a normalized number and is equal to:
-1S × 216 × (1.fraction)
The maximum positive normalized number is (2-2-10) × 216 or 131008.
A1.4.3
Single-precision floating-point format
The single-precision floating-point format is as defined by the IEEE 754 standard.
This description includes ARM-specific details that are left open by the standard. It is only intended as an
introduction to the formats and to the values they can contain. For full details, especially of the handling of infinities,
NaNs, and signed zeros, see the IEEE 754 standard.
A single-precision value is a 32-bit word with the format:
31 30
S
23 22
0
exponent
fraction
The interpretation of the format depends on the value of the exponent field, bits[30:23]:
0 < exponent < 0xFF
The value is a normalized number and is equal to:
(–1)S × 2(exponent – 127) × (1.fraction)
The minimum positive normalized number is 2–126, or approximately 1.175 × 10–38.
The maximum positive normalized number is (2 – 2–23) × 2127, or approximately 3.403 × 1038.
exponent == 0
The value is either a zero or a denormalized number, depending on the fraction bits:
fraction == 0
The value is a zero. There are two distinct zeros:
+0
When S==0.
–0
When S==1.
These usually behave identically. In particular, the result is equal if +0 and –0 are
compared as floating-point numbers. However, they yield different results in some
circumstances. For example, the sign of the infinity produced as the result of dividing
by zero depends on the sign of the zero. The two zeros can be distinguished from each
other by performing an integer comparison of the two words.
fraction != 0
The value is a denormalized number and is equal to:
(–1)S × 2–126 × (0.fraction)
The minimum positive denormalized number is 2–149, or approximately 1.401 × 10–45.
Denormalized numbers are always flushed to zero in Advanced SIMD processing in AArch32 state.
They are optionally flushed to zero in floating-point processing and in Advanced SIMD processing
in AArch64 state. For details, see Flush-to-zero on page A1-52.
exponent == 0xFF
The value is either an infinity or a Not a Number (NaN), depending on the fraction bits:
fraction == 0
The value is an infinity. There are two distinct infinities:
+infinity
ARM DDI 0487D.a
ID103018
When S==0. This represents all positive numbers that are too big to be
represented accurately as a normalized number.
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
A1-45
A1 Introduction to the ARMv8 Architecture
A1.4 Supported data types
-infinity
When S==1. This represents all negative numbers with an absolute value
that is too big to be represented accurately as a normalized number.
fraction != 0
The value is a NaN, and is either a quiet NaN or a signaling NaN.
The two types of NaN are distinguished by their most significant fraction bit, bit[22]:
bit[22] == 0
The NaN is a signaling NaN. The sign bit can take any value, and the
remaining fraction bits can take any value except all zeros.
bit[22] == 1
The NaN is a quiet NaN. The sign bit and remaining fraction bits can take
any value.
For details of the default NaN, see NaN handling and the Default NaN on page A1-53.
Note
NaNs with different sign or fraction bits are distinct NaNs, but this does not mean software can use floating-point
comparison instructions to distinguish them. This is because the IEEE 754 standard specifies that a NaN compares
as unordered with everything, including itself.
A1.4.4
Double-precision floating-point format
The double-precision floating-point format is as defined by the IEEE 754 standard. Double-precision floating-point
is supported by both SIMD and floating-point instructions in AArch64 state, and only by floating-point instructions
in AArch32 state.
This description includes implementation-specific details that are left open by the standard. It is only intended as an
introduction to the formats and to the values they can contain. For full details, especially of the handling of infinities,
NaNs, and signed zeros, see the IEEE 754 standard.
A double-precision value is a 64-bit doubleword, with the format:
63 62
S
52 51
32 31
exponent
0
fraction
Double-precision values represent numbers, infinities, and NaNs in a similar way to single-precision values, with
the interpretation of the format depending on the value of the exponent:
0 < exponent < 0x7FF
The value is a normalized number and is equal to:
(–1)S × 2(exponent–1023) × (1.fraction)
The minimum positive normalized number is 2–1022, or approximately 2.225 × 10–308.
The maximum positive normalized number is (2 – 2–52) × 21023, or approximately 1.798 × 10308.
exponent == 0
The value is either a zero or a denormalized number, depending on the fraction bits:
fraction == 0
The value is a zero. There are two distinct zeros that behave in the same way as the two
single-precision zeros:
+0
when S==0
–0
when S==1.
fraction != 0
The value is a denormalized number and is equal to:
(-1)S × 2–1022 × (0.fraction)
A1-46
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
A1 Introduction to the ARMv8 Architecture
A1.4 Supported data types
The minimum positive denormalized number is 2–1074, or approximately 4.941 × 10–324.
Optionally, denormalized numbers are flushed to zero in floating-point calculations. For details, see
Flush-to-zero on page A1-52.
exponent == 0x7FF
The value is either an infinity or a NaN, depending on the fraction bits:
fraction == 0
The value is an infinity. As for single-precision, there are two infinities:
+infinity When S==0.
-infinity When S==1.
fraction != 0
The value is a NaN, and is either a quiet NaN or a signaling NaN.
The two types of NaN are distinguished by their most significant fraction bit, bit[51] of
the doubleword:
bit[51] == 0
The NaN is a signaling NaN. The sign bit can take any value, and the
remaining fraction bits can take any value except all zeros.
bit[51] == 1
The NaN is a quiet NaN. The sign bit and the remaining fraction bits can
take any value.
For details of the default NaN, see NaN handling and the Default NaN on page A1-53.
Note
NaNs with different sign or fraction bits are distinct NaNs, but this does not mean software can use floating-point
comparison instructions to distinguish them. This is because the IEEE 754 standard specifies that a NaN compares
as unordered with everything, including itself.
A1.4.5
Fixed-point format
Fixed-point formats are used only for conversions between floating-point and fixed-point values. They apply to
general-purpose registers.
Fixed-point values can be signed or unsigned, and can be 16-bit or 32-bit. Conversion instructions take an argument
that specifies the number of fraction bits in the fixed-point number. That is, it specifies the position of the binary
point.
A1.4.6
Conversion between floating-point and fixed-point values
ARMv8 supports the conversion of a scalar floating-point to or from a signed or unsigned fixed-point value in a
general-purpose register.
The instruction argument #fbits indicates that the general-purpose register holds a fixed-point number with fbits bits
after the binary point, where fbits is in the range 1 to 64 for a 64-bit general-purpose register, or 1 to 32 for a 32-bit
general-purpose register.
More specifically:
•
For a 64-bit register Xd:
•
ARM DDI 0487D.a
ID103018
—
The integer part is Xd[63:#fbits].
—
The fractional part is Xd[(#fbits-1):0].
For a 32-bit register Wd or Rd:
—
The integer part is Wd[31:#fbits] or Rd[31:#fbits].
—
The fractional part is Wd[(#fbits-1):0] or Rd[(#fbits-1):0].
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
A1-47
A1 Introduction to the ARMv8 Architecture
A1.4 Supported data types
These instructions can cause the following floating-point exceptions:
Invalid Operation
When the floating-point input is NaN or Infinity or when a numerical value cannot be
represented within the destination register.
Inexact
When the numeric result differs from the input value.
Input Denormal
When Flush-to-zero mode is enabled and the denormal input is replaced by a zero.
Note
An out of range fixed-point result is saturated to the destination size.
For more information, see Floating-point exceptions and exception traps on page D1-2196.
A1.4.7
Polynomial arithmetic over {0, 1}
Some SIMD instructions that operate on SIMD&FP registers can operate on polynomials over {0, 1}, see Supported
data types on page A1-39. The polynomial data type represents a polynomial in x of the form bn–1xn–1 + … + b1x
+ b0 where bk is bit[k] of the value.
The coefficients 0 and 1 are manipulated using the rules of Boolean arithmetic:
•
0+0=1+1=0
•
0+1=1+0=1
•
0×0=0×1=1×0=0
•
1 × 1 = 1.
That is:
•
Adding two polynomials over {0, 1} is the same as a bitwise exclusive OR.
•
Multiplying two polynomials over {0, 1} is the same as integer multiplication except that partial products are
exclusive-ORed instead of being added.
A64, A32, and T32 provide instructions for performing polynomial multiplication of 8-bit values.
•
For AArch32, see VMUL (integer and polynomial) on page F6-4903 and VMULL (integer and polynomial)
on page F6-4909.
•
For AArch64, see PMUL on page C7-1728 and PMULL, PMULL2 on page C7-1730.
The Cryptographic Extension adds the ability to perform long polynomial multiplies of 64-bit values. See PMULL,
PMULL2 on page C7-1730.
Pseudocode description of polynomial multiplication
In pseudocode, polynomial addition is described by the EOR operation on bitstrings.
Polynomial multiplication is described by the PolynomialMult() function defined in Chapter J1 ARMv8 Pseudocode.
A1-48
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
A1 Introduction to the ARMv8 Architecture
A1.5 Advanced SIMD and floating-point support
A1.5
Advanced SIMD and floating-point support
Note
In AArch32 state, the SIMD instructions that operate on SIMD&FP registers are always described as the Advanced
SIMD instructions, to distinguish them from the SIMD instructions in the base instruction sets, that operate on the
32-bit general-purpose registers. The A64 instruction set does not provide any SIMD instructions that operate on
the general-purpose registers, and therefore some AArch64 state descriptions use SIMD as a synonym for Advanced
SIMD. Unless the context clearly indicates otherwise, this section describes the support for SIMD instructions that
operate on SIMD&FP registers.
ARMv8 can support the following levels of support for Advanced SIMD and floating-point instructions:
•
Full SIMD and floating-point support without exception trapping.
•
Full SIMD and floating-point support with exception trapping.
•
No floating-point or SIMD support. This option is licensed only for implementations targeting specialized
markets.
Note
All systems that support standard operating systems with rich application environments provide hardware
support for Advanced SIMD and floating-point. It is a requirement of the ARM Procedure Call Standard for
AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.
ARMv8 supports single-precision (32-bit) and double-precision (64-bit) floating-point data types and arithmetic as
defined by the IEEE 754 floating-point standard. It also supports the half-precision (16-bit) floating-point data type
for data storage, by supporting conversions between single-precision and half-precision data types and
double-precision and half-precision data types. When ARMv8.2-FP16 is implemented, it also supports the
half-precision floating-point data type for data-processing operations.
The SIMD instructions provide packed Single Instruction Multiple Data (SIMD) and single-element scalar
operations, and support:
•
Single-precision and double-precision arithmetic in AArch64 state.
•
Single-precision arithmetic only in AArch32 state.
•
When ARMv8.2-FP16 is implemented, half-precision arithmetic is supported in AArch64 and AArch32
states.
Floating-point support in AArch64 state SIMD is IEEE 754-2008 compliant with:
•
Configurable rounding modes.
•
Configurable Default NaN behavior.
•
Configurable Flush-to-zero behavior.
Floating-point computation using AArch32 Advanced SIMD instructions remains unchanged from ARMv7. A32
and T32 Advanced SIMD floating-point always uses ARM standard floating-point arithmetic and performs
IEEE 754 floating-point arithmetic with the following restrictions:
•
Denormalized numbers are flushed to zero, see Flush-to-zero on page A1-52.
•
Only default NaNs are supported, see NaN handling and the Default NaN on page A1-53.
•
The Round to Nearest rounding mode is used.
•
Untrapped floating-point exception handling is used for all floating-point exceptions.
If floating-point exception trapping is supported, floating-point exceptions, such as Overflow or Divide by Zero,
can be handled without trapping. This applies to both SIMD and floating-point operations. When handled in this
way, a floating-point exception causes a cumulative status register bit to be set to 1 and a default result to be
produced by the operation. For more information about floating-point exceptions, see Floating-point exceptions and
exception traps on page D1-2196.
ARM DDI 0487D.a
ID103018
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
A1-49
A1 Introduction to the ARMv8 Architecture
A1.5 Advanced SIMD and floating-point support
In AArch64 state, the following registers control floating-point operation and return floating-point status
information:
•
•
The Floating-Point Control Register, FPCR, controls:
—
The half-precision format where applicable, FPCR.AHP bit.
—
Default NaN behavior, FPCR.DN bit.
—
Flush-to-zero behavior, FPCR.{FZ, FZ16} bits. If ARMv8.2-FP16 is not implemented, FPCR.FZ16
is RES0.
—
Rounding mode support, FPCR.Rmode field.
—
Len and Stride fields associated with execution in AArch32 state, and only supported for a context
save and restore from AArch64 state. These fields are obsolete in ARMv8 and can be implemented as
RAZ/WI. If they are implemented as RW and are programmed to a nonzero value, they make some
AArch32 floating-point instructions UNDEFINED.
—
Floating-point exception trap controls, the FPCR.{IDE, IXE, UFE, OFE, DZE, IOE} bits, see
Floating-point exceptions and exception traps on page D1-2196.
The Floating-Point Status Register, FPSR, provides:
—
Cumulative floating-point exceptions flags, FPSR.{IDC, IXC, UFC, OFC, DZC, IOC and QC}.
—
The AArch32 floating-point comparison flags {N,Z,C,V}. These bits are RES0 if AArch32
floating-point is not implemented.
Note
In AArch64 state, the process state flags, PSTATE.{N,Z,C,V} are used for all data-processing
compares and any associated conditional execution.
AArch32 state provides a single Floating-Point Status and Control Register, FPSCR, combining the FPCR and
FPSR fields.
For system level information about the SIMD and floating-point support, see Advanced SIMD and floating-point
support on page G1-5308.
A1.5.1
Instruction support
The Advanced SIMD and floating-point instructions support:
•
Load and store for single elements and vectors of multiple elements.
Note
Single elements are also referred to as scalar elements.
•
•
•
•
•
Data processing on single and multiple elements for both integer and floating-point data types.
When ARMv8.3-CompNum is implemented, complex number arithmetic.
Floating-point conversion between different levels of precision.
Conversion between floating-point, fixed-point integer, and integer data types.
Floating-point rounding.
For more information on the SIMD and floating-point instructions in AArch64 state, see Chapter C3 A64
Instruction Set Overview.
For more information on the Advanced SIMD and floating-point instructions in AArch32 state, see Chapter F1 The
AArch32 Instruction Sets Overview.
A1-50
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
A1 Introduction to the ARMv8 Architecture
A1.5 Advanced SIMD and floating-point support
A1.5.2
Floating-point standards, and terminology
The ARM includes support for all the required features of ANSI/IEEE Std 754-2008, IEEE Standard for Binary
Floating-Point Arithmetic, referred to as IEEE 754-2008. However, some terms in this manual are based on the
1985 version of this standard, referred to as IEEE 754-1985:
•
ARM floating-point terminology generally uses the IEEE 754-1985 terms. This section summarizes how
IEEE 754-2008 changes these terms.
•
References to IEEE 754 that do not include the issue year apply to either issue of the standard.
Table A1-3 shows how the terminology in this manual differs from that used in IEEE 754-2008.
Table A1-3 Floating-point terminology
This manual
IEEE 754-2008
Normalized a
Normal
Denormal, or denormalized
Subnormal
Round towards Minus Infinity (RM)
roundTowardsNegative
Round towards Plus Infinity (RP)
roundTowardsPositive
Round towards Zero (RZ)
roundTowardZero
Round to Nearest (RN)
roundTiesToEven
Round to Nearest with Ties to Away
roundTiesToAway
Rounding mode
Rounding-direction attribute
a. Normalized number is used in preference to normal number,
because of the other specific uses of normal in this manual.
A1.5.3
ARM standard floating-point input and output values
ARMv8 provides full IEEE 754 floating-point arithmetic support. In AArch32 state, floating-point operations
performed using Advanced SIMD instructions are limited to ARM standard floating-point operation, regardless of
the selected rounding mode in the FPSCR. Unlike AArch32, AArch64 SIMD floating point arithmetic is performed
using the rounding mode selected by the FPCR.
ARM standard floating-point arithmetic supports the following input formats defined by the IEEE 754
floating-point standard:
•
Zeros.
•
Normalized numbers.
•
Denormalized numbers are flushed to 0 before floating-point operations, see Flush-to-zero on page A1-52.
•
NaNs.
•
Infinities.
ARM standard floating-point arithmetic supports the Round to Nearest (roundTiesToEven) rounding mode defined
by the IEEE 754 standard.
ARM standard floating-point arithmetic supports the following output result formats defined by the IEEE 754
standard:
•
Zeros.
•
Normalized numbers.
•
Results that are less than the minimum normalized number are flushed to zero, see Flush-to-zero on
page A1-52.
•
NaNs produced in floating-point operations are always the default NaN, see NaN handling and the Default
NaN on page A1-53.
ARM DDI 0487D.a
ID103018
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
A1-51
A1 Introduction to the ARMv8 Architecture
A1.5 Advanced SIMD and floating-point support
•
A1.5.4
Infinities.
Flush-to-zero
The performance of floating-point processing can be reduced when doing calculations involving denormalized
numbers and Underflow exceptions. In many algorithms, this performance can be recovered, without significantly
affecting the accuracy of the final result, by replacing the denormalized operands and intermediate results with
zeros. To permit this optimization, ARM floating-point implementations allow a Flush-to-zero mode to be used for
different floating-point formats as follows:
For AArch64:
•
If FPCR.FZ==1, then Flush-to-Zero mode is used for all Single-Precision and Double-Precision inputs and
outputs of all instructions.
•
If FPCR.FZ16==1, then Flush-to-Zero mode is used for all Half-Precision inputs and outputs of
floating-point instructions, other than:
—
Conversions between Half-Precision and Single-Precision numbers.
—
Conversions between Half-Precision and Double-Precision numbers.
For AArch32:
•
If FPSCR.FZ==0, then Flush-to-Zero mode is used for all Single-Precision and Double-Precision inputs and
outputs of all Advanced SIMD floating-point instructions.
•
If FPSCR.FZ==1, then Flush-to-Zero mode is used for all Single-Precision and Double-Precision inputs and
outputs of all instructions.
•
If FPSCR.FZ16==1, then Flush-to-Zero mode is used for all Half-Precision inputs and outputs of
floating-point instructions, other than:
—
Conversions between Half-Precision and Single-Precision numbers.
—
Conversions between Half-Precision and Double-Precision numbers.
If Flush-To-Zero mode is used on an Single-precision or Double-precision input:
•
All inputs to floating-point operations that are denormalized numbers in their represented precision are
treated as though they were zero with the same sign as the input, and an Input Denormal floating-point
exception is generated.
Note
The Input Denormal floating-point exception occurs only in Flush-to-zero mode.
•
In AArch32 state, the FPSCR contains a cumulative exception bit FPSCR.IDC and optional trap enable bit
FPSCR.IDE corresponding to the Input Denormal floating-point exception.
•
In AArch64 state, the FPSR contains a cumulative exception bit FPSR.IDC and optional trap enable bit
FPCR.IDE corresponding to the Input Denormal floating-point exception.
•
The occurrence of all floating-point exceptions except Input Denormal is determined using the input values
that are treated as zero by this mechanism.
If Flush-To-Zero mode is used on a Half-precision input:
•
All inputs to floating-point operations that are denormalized numbers in their represented precision are
treated as though they were zero with the same sign as the input.
Note
When ARMv8.2-FP16 is implemented, when in Flush-to-zero mode, a half-precision floating-point number
that is flushed to zero does not generate an Input Denormal floating-point exception. This is because this
situation is much less exceptional than for double-precision or single-precision denormalized numbers.
A1-52
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
A1 Introduction to the ARMv8 Architecture
A1.5 Advanced SIMD and floating-point support
•
The occurrence of all floating-point exceptions is determined using the input values that are treated as zero
by this mechanism.
If Flush-To-Zero mode is used on any output of an instruction:
•
The output is returned as zero, with the same sign bit as the result, if the result before rounding of the
operation specified by the instruction satisfies the condition:
0 < Abs(result) < MinNorm, where:
—
MinNorm is 2^-14 for half-precision.
—
MinNorm is 2^-126 for single-precision.
—
MinNorm is 2^-1022 for double-precision.
If this occurs, then:
—
An Underflow Exception is generated, but in all implementations, the Underflow Exception is not
trapped even if the AArch32 FPSCR.UFE==1 or the AArch64 FPCR.UFE==1.
—
An Inexact Exception is not generated.
Note
Flush-to-zero mode is incompatible with the IEEE 754 standard, and must not be used when IEEE 754 compatibility
is a requirement. Flush-to-zero mode must be used with care. Although it can improve performance on some
algorithms, there are significant limitations on its use. These are application dependent:
A1.5.5
•
On many algorithms, it has no noticeable effect, because the algorithm does not normally use denormalized
numbers.
•
On other algorithms, it can cause exceptions to occur or seriously reduce the accuracy of the results of the
algorithm.
NaN handling and the Default NaN
The IEEE 754 standard specifies that:
•
An operation that causes an Invalid Operation floating-point exception generates a quiet NaN as its result if
that exception is untrapped.
•
An operation involving a quiet NaN operand, but not a signaling NaN operand, returns an input NaN as its
result.
The floating-point processing behavior when Default NaN mode is disabled adheres to this, with the following
additions:
•
•
If an untrapped Invalid Operation floating-point exception occurs, the quiet NaN result is derived from:
—
The first signaling NaN operand, if the exception occurs because at least one of the operands is a
signaling NaN.
—
Otherwise, the default NaN.
If an untrapped Invalid Operation floating-point exception does not occur, but at least one of the operands is
a quiet NaN, the result is derived from the first quiet NaN operand.
Depending on the operation, the exact value of a derived quiet NaN result may differ in both sign and number of
fraction bits from its source. For a quiet NaN result derived from signaling NaN operand, the most-significant
fraction bit is set to 1.
Note
ARM DDI 0487D.a
ID103018
•
In these descriptions, first operand relates to the left-to-right ordering of the arguments to the pseudocode
function that describes the operation.
•
The IEEE 754 standard specifies that the sign bit of a NaN has no significance.
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
A1-53
A1 Introduction to the ARMv8 Architecture
A1.5 Advanced SIMD and floating-point support
The SIMD and floating-point processing behavior when Default NaN mode is enabled is that the Default NaN is
the result of all floating-point operations that either:
•
Cause untrapped Invalid Operation floating-point exceptions.
•
Have one or more quiet NaN inputs, but no signaling NaN inputs.
Table A1-4 shows the format of the default NaN for ARM floating-point operations.
Default NaN mode is selected for the floating-point processing by setting the FPCR.DN bit to 1.
Other aspects of the functionality of the Invalid Operation floating-point exception are not affected by Default NaN
mode. These are that:
•
If untrapped, it causes the FPSR.IOC bit to be set to 1.
•
If trapped, it causes a user trap handler to be invoked.
Table A1-4 Default NaN encoding
A1-54
Half-precision, IEEE Format
Single-precision
Double-precision
Sign bit
0
0
0
Exponent
0x1F
0xFF
0x7FF
Fraction
Bit[9] == 1, bits[8:0] == 0
Bit[22] == 1, bits[21:0] == 0
Bit[51] == 1, bits[50:0] == 0
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
A1 Introduction to the ARMv8 Architecture
A1.6 The ARM memory model
A1.6
The ARM memory model
The ARM memory model supports:
•
Generating an exception on an unaligned memory access.
•
Restricting access by applications to specified areas of memory.
•
Translating virtual addresses (VAs) provided by executing instructions to physical addresses (PAs).
•
Altering the interpretation of multi-byte data between big-endian and little-endian.
•
Controlling the order of accesses to memory.
•
Controlling caches and address translation structures.
•
Synchronizing access to shared memory by multiple PEs.
•
Barriers that control and prevent speculative access to memory.
VA support depends on the Execution state, as follows:
AArch64 state
Supports 64-bit virtual addressing, with the Translation Control Register determining the supported
VA range. Execution at EL1 and EL0 supports two independent VA ranges, each with its own
translation controls.
AArch32 state
Supports 32-bit virtual addressing, with the Translation Control Register determining the supported
VA range. For execution at EL1 and EL0, system software can split the VA range into two
subranges, each with its own translation controls.
The supported PA space is IMPLEMENTATION DEFINED, and can be discovered by system software.
Regardless of the Execution state, the Virtual Memory System Architecture (VMSA) can translate VAs to blocks or
pages of memory anywhere within the supported PA space.
For more information, see:
For execution in AArch64 state
•
Chapter B2 The AArch64 Application Level Memory Model.
•
Chapter D4 The AArch64 System Level Memory Model.
•
Chapter D5 The AArch64 Virtual Memory System Architecture.
For execution in AArch32 state
•
Chapter E2 The AArch32 Application Level Memory Model.
•
Chapter G4 The AArch32 System Level Memory Model.
•
Chapter G5 The AArch32 Virtual Memory System Architecture.
ARM DDI 0487D.a
ID103018
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
A1-55
A1 Introduction to the ARMv8 Architecture
A1.7 ARMv8 architecture extensions
A1.7
ARMv8 architecture extensions
The original ARMv8-A architecture is called ARMv8.0. The following sections of this manual describe or
summarize permitted extensions to ARMv8.0:
•
The ARMv8 Cryptographic Extension on page A1-57.
•
The Reliability, Availability, and Serviceability (RAS) Extension on page A1-74.
•
Event monitors on page D1-2262.
•
The IVIPT Extension on page D5-2535.
•
Chapter H7 The PC Sample-based Profiling Extension.
In addition to describing ARMv8.0, this manual describes the following architectural extensions:
The ARMv8.1 architectural extension
The ARMv8.1 architecture extension adds both:
•
Architectural features. Some of these are mandatory, others are optional. Some features must
be implemented together.
•
Architectural requirements. These are mandatory.
An implementation is ARMv8.1 compliant when all of the following apply:
•
It includes all of the ARMv8.1 architectural features that are mandatory. See Architectural
features added by ARMv8.1 on page A1-58 for all of the ARMv8.1 architectural features.
•
It includes all of the ARMv8.1 architectural requirements. Additional requirements of
ARMv8.1 on page A1-61 lists these requirements.
For more information, see The ARMv8.1 architecture extension on page A1-58.
The ARMv8.2 architectural extension
The ARMv8.2 architecture extension is an extension to ARMv8.1. It adds both:
•
Architectural features. Some of these are mandatory, others are optional. Some features must
be implemented together.
•
Architectural requirements. These are mandatory.
An implementation is ARMv8.2 compliant if all of the following apply:
•
It is ARMv8.1 compliant.
•
It includes all of the ARMv8.2 architectural features that are mandatory. See Architectural
features added by ARMv8.2 on page A1-61 for all of the ARMv8.2 architectural features.
•
It includes all of the ARMv8.2 architectural requirements. Additional requirements of
ARMv8.2 on page A1-67 lists these requirements.
For more information, see The ARMv8.2 architecture extension on page A1-61.
The ARMv8.3 architectural extension
The ARMv8.3 architecture extension is an extension to ARMv8.2. It adds architectural features.
Some of these are mandatory, others are optional. Some features must be implemented together.
An implementation is ARMv8.3 compliant if all of the following apply:
•
It is ARMv8.2 compliant.
•
It includes all of the ARMv8.3 architectural features that are mandatory.
For more information, see The ARMv8.3 architecture extension on page A1-67.
The ARMv8.4 architectural extension
The ARMv8.4 architecture extension is an extension to ARMv8.3. It adds architectural features.
Some of these are mandatory, others are optional. Some features must be implemented together.
An implementation is ARMv8.4 compliant if all of the following apply:
A1-56
•
It is ARMv8.3 compliant.
•
It includes all of the ARMv8.4 architectural features that are mandatory.
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
A1 Introduction to the ARMv8 Architecture
A1.7 ARMv8 architecture extensions
For more information, see The ARMv8.4 architecture extension on page A1-69.
The Statistical Profiling Extension (SPE)
SPE is an optional extension to ARMv8.2. That is, SPE requires the implementation of ARMv8.2.
For more information see The Statistical Profiling Extension (SPE) on page A1-75.
The Scalable Vector Extension (SVE)
SVE is an optional extension to ARMv8.2. That is, SVE requires the implementation of ARMv8.2.
For more information see The Scalable Vector Extension (SVE) on page A1-75.
The Activity Monitors Extension (AMU)
AMU is an optional extension to ARMv8.4. That is, AMU requires the implementation of
ARMv8.4.
For more information see The Activity Monitors Extension on page A1-75.
The Memory Partitioning and Monitoring Extension (MPAM)
MPAM is an optional extension to ARMv8.2. That is, MPAM requires the implementation of
ARMv8.2.
For more information see The Memory Partitioning and Monitoring Extension (MPAM) on
page A1-76.
See also Permitted implementation of subsets of ARMv8.x and ARMv8.(x+1) architectural features.
A1.7.1
Permitted implementation of subsets of ARMv8.x and ARMv8.(x+1) architectural features
An ARMv8.x compliant implementation can include any arbitrary subset of the architectural features of
ARMv8.(x+1), subject only to those constraints that require that certain features be implemented together.
An ARMv8.x compliant implementation cannot include any features of ARMv8.(x+2).
Note
The addition of ARMv8.(x+1) features to an ARMv8.x compliant implementation is only permitted if the
implementer has a licence to ARMv8.(x+1) in addition to the licence to ARMv8.x.
A1.7.2
The ARMv8 Cryptographic Extension
The ARMv8 Cryptographic Extension provides instructions for the acceleration of encryption and decryption, and
includes the following features:
•
ARMv8.0-AES, which includes AESD and AESE instructions.
•
ARMv8.0-SHA, which includes the SHA1* and SHA256* instructions.
The presence of the Cryptographic Extension in an implementation is subject to export license controls. The
Cryptographic Extension is an extension of the SIMD support and operates on the vector register file.
The Cryptographic Extension also provides multiply instructions that operate on long polynomials.
The Cryptographic Extension provides this functionality in AArch64 state and AArch32 state, and an
implementation that supports both AArch64 state and AArch32 state provides the same Cryptographic Extension
functionality in both states.
For more information see The Cryptographic Extension on page C3-226 or The Cryptographic Extension in
AArch32 state on page F1-3645.
ARM DDI 0487D.a
ID103018
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
A1-57
A1 Introduction to the ARMv8 Architecture
A1.7 ARMv8 architecture extensions
ARMv8.2 extensions to the Cryptographic Extension
From ARMv8.2, an implementation of the ARMv8.0 Cryptographic Extension can include either or both of:
•
The AES functionality, including support for multiplication of 64-bit polynomials. The
ID_AA64ISAR0_EL1.AES field indicates whether this functionality is supported.
•
The SHA1 and SHA2-256 functionality. The ID_AA64ISAR0_EL1.{SHA2, SHA1} fields indicate whether
this functionality is supported.
In addition, ARMv8.2 adds two optional extensions to the ARMv8 Cryptographic Extension, that provide
cryptographic functionality in AArch64 state only. These two optional features are:
ARMv8.2-SHA, SHA2-512 and SHA3 functionality
In the A64 instruction set only, ARMv8.2-SHA adds Advanced SIMD instructions that support:
•
SHA2-512 (SHA512).
•
SHA3.
Implementation of ARMv8.2-SHA requires implementation of the ARMv8.0 Cryptographic
Extension SHA-1 and SHA256 functionality.
The ID_AA64ISAR0_EL1.{SHA2, SHA3} fields identify the presence of ARMv8.2-SHA.
For more information see ARMv8.2-SHA, SHA2-512 and SHA3 on page C3-227.
ARMv8.2-SM, SM3 and SM4 functionality
In the A64 instruction set only, ARMv8.2-SM adds Advanced SIMD instructions that support the
Chinese cryptography algorithms SM3 and SM4.
Implementation of ARMv8.2-SM is independent of the implementation of any SHA functionality.
The ID_AA64ISAR0_EL1.{SM3, SM4} fields identify the presence of ARMv8.2-SM.
Note
This means ARMv8.2-SM can be implemented without any other Cryptographic Extension
features.
For more information see ARMv8.2-SM, SM3 and SM4 on page C3-228.
A1.7.3
The ARMv8.1 architecture extension
The ARMv8.1 architecture extension adds both architectural features and architectural requirements.
Architectural features added by ARMv8.1
An implementation of the ARMv8.1 extension must include all of the features that this section describes as
mandatory. Such an implementation, when combined with the additional requirements of ARMv8.1, is also called
an implementation of the ARMv8.1 architecture.
The ARMv8.1 architecture extension adds the following architectural features, which are identified by the
architectural feature name and a short description of the feature:
ARMv8.1-LSE, ARMv8.1 Large System Extensions
ARMv8.1-LSE introduces a set of atomic instructions:
•
Compare and Swap instructions, CAS and CASP.
•
Atomic memory operation instructions, LD and ST, where is one of ADD, CLR, EOR,
SET, SMAX, SMIN, UMAX, and UMIN.
•
Swap instruction, SWP.
These instructions are only added to the A64 instruction set.
This feature is mandatory in ARMv8.1 implementations.
Implementations of ARMv8.1-VHE require the implementation of ARMv8.1-LSE.
A1-58
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
A1 Introduction to the ARMv8 Architecture
A1.7 ARMv8 architecture extensions
The ID_AA64ISAR0_EL1.Atomic field identifies the presence of ARMv8.1-LSE.
For more information, see:
•
Compare and Swap on page C3-189.
•
Atomic memory operations on page C3-190.
•
Swap on page C3-192.
ARMv8.1-RDMA, ARMv8.1 Advanced SIMD instructions
ARMv8.1-RDMA introduces Rounding Double Multiply Add/Subtract Advanced SIMD
instructions. For more information, see:
For the A64 instruction set
•
SQRDMLAH (by element) on page C7-1888.
•
SQRDMLAH (vector) on page C7-1891.
•
SQRDMLSH (by element) on page C7-1893.
•
SQRDMLSH (vector) on page C7-1896.
For the T32 and A32 instruction sets
•
VQRDMLAH on page F6-4985.
•
VQRDMLSH on page F6-4989.
This feature is mandatory in ARMv8.1 implementations.
The following fields identify the presence of ARMv8.1-RDMA:
•
ID_AA64ISAR0_EL1.RDM.
•
ID_ISAR5_EL1.RDM.
•
ID_ISAR5.RDM.
ARMv8.1-LOR, Limited ordering regions
Limited ordering regions allow large systems to perform special load-acquire and store-release
instructions that provide order between the memory accesses to a region of the PA map as observed
by a limited set of observers.
This feature is supported in AArch64 state only.
This feature is mandatory in ARMv8.1 implementations.
The ID_AA64MMFR1_EL1.LO field identifies the support for ARMv8.1-LOR.
For more information, see:
•
Limited ordering regions on page B2-109.
ARMv8.1-HPD, Hierarchical permission disables
ARMv8.1-HPD introduces the facility to disable the hierarchical attributes, APTable, PXNTable,
and UXNTable, in the translation tables. This disable has no effect on the NSTable bit.
This feature is mandatory in ARMv8.1 implementations.
This feature is added only to the VMSAv8-64 translation regimes. ARMv8.2 extends this to the
AArch32 translation regimes, see ARMv8.2-AA32HPD.
The ID_AA64MMFR1_EL1.HPDS field identifies the support for ARMv8.1-HPD.
ARMv8.1-TTHM, Hardware management of the Access flag and dirty state
In ARMv8.0, all updates to the translation tables are performed by software. From ARMv8.1, for
the VMSAv8-64 translation regimes only, hardware can perform updates to the translation tables in
two contexts:
•
Hardware management of the Access flag.
•
Hardware management of dirty state, with updates to a dirty state in the translation tables.
The dirty state is introduced in ARMv8.1.
Hardware management of dirty state can only be enabled when hardware management of the Access
flag is also enabled.
ARM DDI 0487D.a
ID103018
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
A1-59
A1 Introduction to the ARMv8 Architecture
A1.7 ARMv8 architecture extensions
This feature is optional in ARMv8.1 implementations. It is IMPLEMENTATION DEFINED whether this
is implemented.
The ID_AA64MMFR1_EL1.HAFDBS field identifies the support for ARMv8.1-TTHM.
For more information, see:
•
The dirty state on page D5-2466.
•
Hardware management of the Access flag and dirty state on page D5-2467.
ARMv8.1-PAN, Privileged access never
ARMv8.1-PAN adds a new bit to PSTATE. When the value of this PAN state bit is 1, any privileged
data access from EL1 or EL2 to a virtual memory address that is accessible at EL0 generates a
Permission fault.
This feature is mandatory in ARMv8.1 implementations.
This feature is supported in AArch64 and AArch32 states.
The following fields identify the support for ARMv8.1-PAN:
•
ID_AA64MMFR1_EL1.PAN.
•
ID_MMFR3_EL1.PAN.
•
ID_MMFR3.PAN.
For more information, see:
•
About PSTATE.PAN on page D5-2457.
•
About the PAN bit on page G5-5505.
ARMv8.1-VMID16, 16-bit VMID
In an ARMv8.1 implementation, when EL2 is using AArch64, the VMID size is an
choice of 8 bits or 16 bits.
IMPLEMENTATION DEFINED
This feature is optional in ARMv8.1 implementations. It is IMPLEMENTATION DEFINED whether this
is implemented.
When implemented, this feature is supported only when EL2 is using AArch64.
The ID_AA64MMFR1_EL1.VMIDBits field identifies the supported VMID size.
For more information, see:
•
VMID size on page D5-2511.
ARMv8.1-VHE, Virtualization Host Extensions
ARMv8.1 introduces the Virtualization Host Extensions (VHE) that provide enhanced support for
Type 2 hypervisors in Non-secure state.
This feature is mandatory in ARMv8.1 implementations.
An implementation that includes ARMv8.1-VHE requires ARMv8.1-LSE to be implemented.
The ID_AA64MMFR1_EL1.VH field identifies the support for ARMv8.1-VHE.
The following fields indicate the presence of the Virtualization Host Extensions for debug,
including the changes for the PC Sample-based Profiling Extension and the Performance Monitors
Extension:
•
ID_AA64DFR0_EL1.DebugVer.
•
ID_DFR0_EL1.{CopSDbg, CopDbg}.
For more information, see:
•
Virtualization Host Extensions on page D5-2486.
ARMv8.1-PMU, ARMv8.1 PMU Extension
ARMv8.1 makes the following enhancements to the Performance Monitors Extension:
A1-60
•
The event number space is extended to 16 bits to allow additional IMPLEMENTATION DEFINED
event types, and the reserved space for future additions to the architecturally-defined event
types is extended.
•
The HPMD bit is added to MDCR_EL2. This bit disables event counting at EL2.
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
A1 Introduction to the ARMv8 Architecture
A1.7 ARMv8 architecture extensions
•
The STALL_FRONTEND and STALL_BACKEND events are required to be implemented.
For more information, see Required events on page D6-2582.
The Performance Monitors Extension is an OPTIONAL feature of an implementation, but ARM
strongly recommends that ARMv8.1 implementations include either:
•
ARMv8.1-PMU.
•
An IMPLEMENTATION DEFINED form of performance monitors.
The following fields identify the ARMv8.1-PMU:
•
ID_AA64DFR0_EL1.PMUVer.
•
ID_DFR0_EL1.PerfMon.
•
ID_DFR0.PerfMon.
Additional requirements of ARMv8.1
The ARMv8.1 architecture includes some mandatory changes that are not associated with a feature. These are:
Changes to CRC32 instructions
All implementations of the ARMv8.1 architecture are required to implement the CRC32* instructions.
These are optional in ARMv8.0.
The following fields identify the support for the CRC32* instructions:
•
ID_AA64ISAR0_EL1.CRC32.
•
ID_ISAR5_EL1.CRC32.
•
ID_ISAR5.CRC32.
An implementation of the ARMv8.1 extension must comply with all of the additional requirements. Such an
implementation, when combined with the mandatory architectural features of ARMv8.1, is also called an
implementation of the ARMv8.1 architecture.
A1.7.4
The ARMv8.2 architecture extension
The ARMv8.2 architecture extension adds both architectural features and architectural requirements.
Architectural features added by ARMv8.2
An implementation of the ARMv8.2 extension must include all of the features that this section describes as
mandatory. Such an implementation, when combined with the additional requirements of ARMv8.2, is also called
an implementation of the ARMv8.2 architecture.
The ARMv8.2 architecture extension adds the following architectural features, which are identified by the
architectural feature name and a short description of the feature:
ARMv8.2-A64ISA, ARMv8.2 changes to the A64 ISA
ARMv8.2-A64ISA adds the BFC instruction to the A64 instruction set as an alias of BFM. It also
requires that the new BFC instruction and the A64 pseudo-instruction REV64 are implemented by
assemblers.
Note
•
In ARMv8.0 and ARMv8.1, the A64 pseudo-instruction REV64 is optional.
•
Because this feature relates to support for an instruction alias and for a pseudo-instruction
there are no corresponding feature ID register fields.
This change to the instruction set and assembler requirements is mandatory in an ARMv8.2
implementation.
For more information, see:
•
BFC on page C6-735.
•
REV64 on page C6-1074.
ARM DDI 0487D.a
ID103018
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
A1-61
A1 Introduction to the ARMv8 Architecture
A1.7 ARMv8 architecture extensions
ARMv8.2-ATS1E1, AT S1E1R and AT S1E1W instruction variants, taking account of PSTATE.PAN
ARMv8.2-ATS1E1 adds new variants of the AArch64 AT S1E1R and AT S1E1W instructions and the
AArch32 ATS1CPR and ATS1CPW instructions. These new instructions factor in the PSTATE.PAN bit
when determining whether or not the location will generate a permission fault for a privileged
access, as is reported in the PAR. For more information, see:
For the AArch64 System instructions
•
AT S1E1RP, Address Translate Stage 1 EL1 Read PAN on page C5-467.
•
AT S1E1WP, Address Translate Stage 1 EL1 Write PAN on page C5-471.
For the AArch32 System instructions
•
ATS1CPRP, Address Translate Stage 1 Current state PL1 Read PAN on
page G8-5672.
•
ATS1CPWP, Address Translate Stage 1 Current state PL1 Write PAN on
page G8-5676.
This feature is mandatory in ARMv8.2 implementations.
These instructions are added to the A64 and A32/T32 instruction sets.
The following fields identify the presence of ARMv8.2-ATS1E1:
•
ID_AA64MMFR1_EL1.PAN.
•
ID_MMFR3_EL1.PAN.
•
ID_MMFR3.PAN.
For more information, see:
•
Address translation instructions on page D5-2440.
•
ATS1C**, Address translation stage 1, current security state on page G5-5578.
•
Encoding and availability of the address translation instructions on page G5-5579.
ARMv8.2-FP16, Half-precision floating-point data processing
ARMv8.2-FP16 supports:
•
Half-precision data-processing instructions for Advanced SIMD and floating-point in both
AArch64 and AArch32 states.
•
The FPCR.FZ16 and FPSCR.FZ16 bits, that enable a Flush-to-zero mode for half-precision
data-processing instructions.
This feature is optional in ARMv8.2 implementations, unless SVE is implemented, in which case
ARMv8.2-FP16 is mandatory. When this feature is implemented it is implemented in both
Advanced SIMD and floating-point, and in AArch64 and AArch32 states.
The following fields identify the presence of ARMv8.2-FP16:
•
ID_AA64PFR0_EL1.{FP, AdvSIMD}.
•
MVFR1_EL1.{FPHP, SIMDHP}.
•
MVFR1.{FPHP, SIMDHP}.
For more information, see:
•
Half-precision floating-point formats on page A1-43.
•
Flush-to-zero on page A1-52.
•
Modified immediate constants in A64 instructions on page C2-166.
ARMv8.2-DotProd, SIMD Dot Product
ARMv8.2-DotProd provides instructions to perform the dot product of two 32-bit vectors,
accumulating the result in a third 32-bit vector. This can be performed using signed or unsigned
arithmetic.
This feature is optional in ARMv8.2 implementations, and mandatory in ARMv8.4
implementations.
These instructions are added to the A64 and A32/T32 instruction sets.
The following fields identify the presence of ARMv8.2-DotProd:
•
ID_AA64ISAR0_EL1.DP.
A1-62
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
A1 Introduction to the ARMv8 Architecture
A1.7 ARMv8 architecture extensions
•
•
ID_ISAR6_EL1.DP.
ID_ISAR6.DP.
For more information, see:
•
SIMD dot product on page C3-225.
•
Advanced SIMD dot product instructions on page F1-3643.
ARMv8.2-FHM, Floating-point multiplication variant
ARMv8.2-FHM adds new floating-point multiplication instructions.
These instructions are added to the A64 and A32/T32 instruction sets.
This feature is optional in ARMv8.2 implementations, and can only be implemented when
ARMv8.2-FP16 is implemented. This feature is mandatory in ARMv8.4 implementations.
The following fields identify the presence of ARMv8.2-FHM:
•
ID_AA64ISAR0_EL1.FHM.
•
ID_ISAR6_EL1.FHM.
•
ID_ISAR6.FHM.
For more information, see:
•
SIMD arithmetic on page C3-213.
•
SIMD by element arithmetic on page C3-219.
•
Advanced SIMD multiply instructions on page F1-3642.
ARMv8.2-LSMAOC, Load/Store Multiple atomicity and ordering controls
ARMv8.2-LSMAOC adds controls that disable legacy behavior of AArch32 Load Multiple and
Store Multiple instructions, and provide a trap of one aspect of this legacy behavior.
Implementation of ARMv8.2-LSMAOC is optional. When implemented it provides:
•
•
LSMAOE fields in the SCTLR_EL1, SCTLR_EL2, HSCTLR, and SCTLR registers. These
fields can have the following effects on the behavior of AArch32 Load Multiple and Store
Multiple instructions:
—
An interrupt can be taken between two memory accesses made by a single Load
Multiple or Store Multiple instruction.
—
The memory accesses made by a single Load Multiple or Store Multiple instruction to
Device memory with the non-Reordering attribute can be reordered.
nTLSMD fields in the SCTLR_EL1, SCTLR_EL2, HSCTLR, and SCTLR registers. These
fields can cause an access to Device-nGRE, Device-nGnRE, or Device-nGnRnE memory by
an AArch32 Load Multiple and Store Multiple instruction to generate an Alignment fault.
Note
ARMv8.2 deprecates software dependence on the legacy behavior of AArch32 Load Multiple and
Store Multiple instructions, and these fields disable this behavior.
The following fields identify the support for ARMv8.2-LSMAOC:
•
ID_AA64MMFR2_EL1.LSM
•
ID_MMFR4_EL1.LSM
•
ID_MMFR4.LSM.
For more information, see the register field descriptions and:
ARM DDI 0487D.a
ID103018
•
Generation of Alignment faults by Load/store multiple accesses to Device memory on
page E2-3581.
•
Multi-register loads and stores that access Device memory on page E2-3594.
•
Taking an interrupt or other exception during a multiple-register load or store on
page G1-5273.
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
A1-63
A1 Introduction to the ARMv8 Architecture
A1.7 ARMv8 architecture extensions
ARMv8.2-UAO, PSTATE override of Unprivileged Load/Store
ARMv8.2 adds a new bit to PSTATE. When the value of PSTATE.UAO is 1, and when executed at
EL1 or at EL2 with HCR_EL2.{E2H, TGE} == {1, 1}, the memory accesses made by the
Load/Store unprivileged instructions behave as if they were made by the Load/Store register
instructions. See Load/Store unprivileged on page C3-181 and Load/Store register on page C3-177.
This feature is mandatory in ARMv8.2 implementations.
This feature is supported in AArch64 state only.
The ID_AA64MMFR2_EL1.UAO field identifies the support for ARMv8.2-UAO.
For more information, see:
•
About PSTATE.UAO on page D5-2458.
ARMv8.2-DCPoP, Data cache clean to Point of Persistence
ARMv8.2-DCPoP introduces a mechanism to identify and manage persistent memory locations in
a shared memory hierarchy, including adding the DC CVAP instruction.
This feature is mandatory in ARMv8.2 implementations.
This feature is supported in AArch64 state only.
The ID_AA64ISAR1_EL1.DPB field identifies the support for ARMv8.2-DCPoP.
For more information about ARMv8.2-DCPoP, see:
•
Memory hierarchy on page B2-111.
ARMv8.2-VPIPT, VMID-aware PIPT instruction cache
ARMv8.2-VPIPT supports a new instruction cache type, described as the VMID-aware PIPT
(VPIPT) instruction cache.
Note
ARMv8.2 adds VPIPT to the set of supported cache types, meaning an ARMv8.2 implementation
is permitted to implement VPIPT caches, but is not required to do so.
This feature is supported in AArch64 and AArch32 states.
The CTR_EL0.L1Ip and CTR.L1Ip fields identify the support for ARMv8.2-VPIPT.
For more information, see:
•
VPIPT (VMID-aware PIPT) instruction caches on page D5-2534.
•
VPIPT (VMID-aware PIPT ) instruction caches on page G5-5544.
ARMv8.2-AA32HPD, AArch32 Hierarchical permission disables
ARMv8.1-HPD introduced the ability to disable the hierarchical attributes, APTable, PXNTable,
and UXNTable, in the VMSAv8-64 translation regimes. ARMv8.2-AA32HPD extends this
functionality to the VMSAv8-32 translation regimes when those regimes are using the Long
descriptor translation table format.
This feature is optional in ARMv8.2 implementations. It is IMPLEMENTATION DEFINED whether this
is implemented.
The ID_MMFR4_EL1.HPDS and ID_MMFR4.HPDS fields identify the support for
ARMv8.2-AA32HPD.
For more information, see:
•
Attribute fields in VMSAv8-32 Long-descriptor translation table format descriptors on
page G5-5486.
ARMv8.2-TTPBHA, Translation table page-based hardware attributes
ARMv8.2 provides a mechanism to allow operating systems or hypervisors to make up to four bits
of translation table final-level descriptors available for IMPLEMENTATION DEFINED hardware use.
This functionality is available for all translation regimes in AArch64 state and for stages of
translation in AArch32 state that use the Long descriptor translation table format.
A1-64
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
A1 Introduction to the ARMv8 Architecture
A1.7 ARMv8 architecture extensions
ARMv8.2-TTPBHA is optional in ARMv8.2 implementations, but implementation of
ARMv8.2-TTPBHA requires implementation of both:
•
ARMv8.1-HPD.
•
ARMv8.2-AA32HPD, if any Exception level higher than EL0 can use AArch32.
Note
For stage 1 translations, page-based hardware attributes can only be used for a stage of translation
for which the Hierarchical permission disables field has a value of 1.
The following fields identify the support for ARMv8.2-TTPBHA:
•
ID_AA64MMFR1_EL1.HPDS
•
ID_MMFR4_EL1.HPDS
•
ID_MMFR4.HPDS.
For more information, see:
•
Memory attribute fields in the VMSAv8-64 translation table format descriptors on
page D5-2449.
•
Attribute fields in VMSAv8-32 Long-descriptor translation table format descriptors on
page G5-5486.
ARMv8.2-LPA, Large PA and IPA support
ARMv8.2-LPA:
•
Allows a larger intermediate physical address (IPA) and PA space of up to 52 bits when using
the 64KB translation granule.
•
Allows a level 1 block size where the block covers a 4TB address range for the 64KB
translation granule if the implementation support 52 bits of PA.
This is an optional feature in ARMv8.2 implementations. It is IMPLEMENTATION DEFINED whether
it is implemented.
This feature is supported in AArch64 state only.
The ID_AA64MMFR0_EL1.PARange field identifies the support for ARMv8.2-LPA.
For more information about ARMv8.2-LPA, see:
•
VMSA address types and address spaces on page D5-2385.
•
Address size configuration on page D5-2399.
•
Extending addressing above 48 bits on page D5-2404.
•
VMSAv8-64 translation table level 0, level 1, and level 2 descriptor formats on
page D5-2444.
•
ARMv8 translation table level 3 descriptor formats on page D5-2447.
ARMv8.2-LVA, Large VA support
ARMv8.2-LVA supports a larger VA space for each translation table base register of up to 52 bits
when using the 64KB translation granule.
This feature is supported in AArch64 state only.
This is an optional feature in ARMv8.2 implementations. It is IMPLEMENTATION DEFINED whether
it is implemented.
If ARMv8.2-LVA is implemented, then any implemented trace macrocell must be at least ETMv4.2.
The ID_AA64MMFR2_EL1.VARange field identifies the support for ARMv8.2-LVA.
For more information about ARMv8.2-LVA, see:
ARM DDI 0487D.a
ID103018
•
VMSA address types and address spaces on page D5-2385.
•
Address size configuration on page D5-2399.
•
Extending addressing above 48 bits on page D5-2404.
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
A1-65
A1 Introduction to the ARMv8 Architecture
A1.7 ARMv8 architecture extensions
•
VMSAv8-64 translation table level 0, level 1, and level 2 descriptor formats on
page D5-2444.
•
ARMv8 translation table level 3 descriptor formats on page D5-2447.
ARMv8.2-TTCNP, Translation table Common not private translations
ARMv8.2-TTCNP permits multiple PEs in the same Inner Shareable domain to use the same
translation tables for a given stage of address translation.
This feature is mandatory in ARMv8.2 implementations.
This facility is available for all VMSAv8-64 translation regimes and for VMSAv8-32 translation
stages that use the Long descriptor translation table format.
The following fields identify the support for ARMv8.2-TTCNP:
•
ID_AA64MMFR2_EL1.CnP.
•
ID_MMFR4_EL1.CnP.
•
ID_MMFR4.CnP.
For more information, see:
•
Common not private translations on page D5-2510.
•
Common not private translations in VMSAv8-32 on page G5-5533.
ARMv8.2-TTS2UXN, Translation table stage 2 Unprivileged Execute-never
ARMv8.2-TTS2UXN extends the stage 2 translation table access permissions to provide control of
whether memory is executable at EL0 independent of whether it is executable at EL1.
This feature is mandatory in ARMv8.2 implementations.
This facility is available for stage 2 translation stages in VMSAv8-64 and VMSAv8-32.
The following fields identify the support for ARMv8.2-TTS2UXN:
•
ID_AA64MMFR1_EL1.XNX.
•
ID_MMFR4_EL1.XNX.
•
ID_MMFR4.XNX.
For more information, see:
•
Access permissions for instruction execution on page D5-2461.
•
Access permissions for instruction execution on page G5-5506.
ARMv8.2-Debug, ARMv8.2 Debug
ARMv8.2-Debug covers a selection of mandatory changes, including:
•
If the core power domain is powered up and DoubleLockStatus() == TRUE,
EDPRSR.{DLK,SPD,PU} is only permitted to read {UNKNOWN, 0, 0}.
•
The definition of Exception Catch debug events is extended to include reset entry.
•
All CONSTRAINED UNPREDICTABLE cases that generate Exception Catch debug events are
removed.
•
Controls are added to EDECCR to control Exception Catch debug event generation on
exception return.
•
All IMPLEMENTATION DEFINED control of external debug accesses to OSLAR_EL1 is
removed.
•
ExternalSecureNoninvasiveDebugEnabled() cannot override software controls of counting
attributable events in Secure state.
The fields that identify the support for ARMv8.2-Debug are:
•
ID_AA64DFR0_EL1.DebugVer and DBGDIDR.Version.
•
ID_DFR0_EL1.{CopSDbg, CopDbg} and ID_DFR0.{CopSDbg, CopDbg}.
•
EDDEVARCH.ARCHID.
For more information, see:
•
Exception Catch debug events from ARMv8.2 on page H3-6471.
A1-66
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
A1 Introduction to the ARMv8 Architecture
A1.7 ARMv8 architecture extensions
•
•
•
EDPRSR.{DLK, SPD, PU} and the Core power domain on page H6-6521.
Interaction with EL3 on page D6-2545.
External access disabled on page H8-6546.
ARMv8.2-PCSample, PC Sample-based Profiling
In ARMv8.2, the control and implementation of the OPTIONAL PC Sample-based Profiling extension
is moved from ED*SR Debug registers to PM*SR registers in the Performance Monitors address
space. See Chapter H7 The PC Sample-based Profiling Extension.
This is an optional feature in ARMv8.2 implementations. It is IMPLEMENTATION DEFINED whether
it is implemented.
The following fields identify the support for ARMv8.2-PCSample:
•
EDDEVID.PCSample.
•
DBGDEVID.PCSample.
•
EDDEVID1.PCSROffset.
•
DBGDEVID1.PCSROffset.
•
PMDEVID.PCSample.
ARMv8.2-IESB, Implicit error synchronization event
ARMv8.2-IESB adds an implicit error synchronization event at exception entry and return,
controlled by the added SCTLR_ELx.IESB fields. An IESB field is added to the ESR_ELx
syndrome registers.
The implicit error synchronization events affect the same synchronizable asynchronous events that
are synchronized by the ESB instruction, see The Reliability, Availability, and Serviceability (RAS)
Extension on page A1-74.
This feature is mandatory in ARMv8.2 implementations.
This feature is supported in AArch64 state only.
The ID_AA64MMFR2_EL1.IESB field identifies the support for ARMv8.2-IESB.
For more information, see:
•
The ARM® Reliability, Availability, and Serviceability (RAS) Specification, ARMv8, for the
ARMv8-A architecture profile.
Extensions to the ARM Cryptographic Extensions
See the description of the ARMv8.2-SHA and ARMv8.2-SM features in ARMv8.2 extensions to the
Cryptographic Extension on page A1-58.
Additional requirements of ARMv8.2
The ARMv8.2 architecture includes some mandatory changes that are not associated with a feature. These are:
Changes to ACTLR2 and HCTLR2 registers
In AArch32 state, the ACTLR2 and HACTLR2 registers become mandatory.
Implementation of RAS Extension
The RAS Extension must be implemented, see The Reliability, Availability, and Serviceability
(RAS) Extension on page A1-74.
An implementation of the ARMv8.2 extension must comply with all of the additional requirements. Such an
implementation, when combined with the mandatory architectural features of ARMv8.2, is also called an
implementation of the ARMv8.2 architecture.
A1.7.5
The ARMv8.3 architecture extension
The ARMv8.3 architecture extension adds architectural features.
ARM DDI 0487D.a
ID103018
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
A1-67
A1 Introduction to the ARMv8 Architecture
A1.7 ARMv8 architecture extensions
Architectural features added by ARMv8.3
An implementation of the ARMv8.3 extension must include all of the features that this section describes as
mandatory. Such an implementation is also called an implementation of the ARMv8.3 architecture.
The ARMv8.3 architecture extension adds the following architectural features, which are identified by the
architectural feature name and a short description of the feature:
ARMv8.3-CompNum, SIMD complex number support
ARMv8.3-CompNum introduces instructions for floating-point multiplication and addition of
complex numbers.
These instructions are added to the A64 and A32/T32 instruction sets.
This feature is mandatory in ARMv8.3 implementations.
The half-precision versions of these instructions are implemented only if ARMv8.2-FP16 is
implemented. Otherwise they are UNDEFINED.
The fields that identify the presence of ARMv8.3-CompNum are:
•
ID_AA64ISAR1_EL1.FCMA.
•
ID_ISAR5_EL1.VCMA.
•
ID_ISAR5.VCMA.
For more information, see:
•
SIMD complex number arithmetic on page C3-225.
•
Advanced SIMD complex number arithmetic instructions on page F1-3643.
ARMv8.3-JSConv, Javascript conversion instructions
ARMv8.3-JSConv introduces instructions that perform a conversion from a double-precision
floating point value to a signed 32-bit integer, with rounding to zero. For more information, see:
For the A64 instruction set
•
FJCVTZS on page C7-1480.
For the A32/T32 instruction set
•
VJCVT on page F6-4753.
These instructions are added to the A64 and A32/T32 instruction sets.
The feature is mandatory in ARMv8.3 implementations.
The fields that identify the presence of ARMv8.3-JSConv are:
•
ID_AA64ISAR1_EL1.JSCVT
•
ID_ISAR6_EL1.JSCVT
•
ID_ISAR6.JSCVT.
For more information, see:
•
Floating-point conversion on page C3-208.
•
About the A64 SIMD and floating-point instructions on page C7-1268.
•
Advanced SIMD and floating-point instructions on page E1-3542.
•
Floating-point data-processing instructions on page F1-3647.
ARMv8.3-RCpc, Weaker release consistency
ARMv8.3-RCpc introduces three instructions to support the weaker Release Consistency processor
consistent (RCpc) model that enables the reordering of a Store-Release followed by a Load-Acquire
to a different address:
•
LDAPR on page C6-847.
•
LDAPRB on page C6-849.
•
LDAPRH on page C6-850.
These instructions are added to the A64 instruction set.
The feature is mandatory in ARMv8.3 implementations.
A1-68
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
A1 Introduction to the ARMv8 Architecture
A1.7 ARMv8 architecture extensions
The ID_AA64ISAR1_EL1.LRCPC field identifies the presence of ARMv8.3-RCpc.
For more information, see:
•
Load-Acquire, Load-AcquirePC, and Store-Release on page B2-108.
•
Load-Acquire/Store-Release on page C3-182.
ARMv8.3-NV, Nested Virtualization
ARMv8.3-NV provides support for a Guest Hypervisor to run in Non-secure EL1 and ensures that
the Guest Hypervisor is unaware that it is running at that Exception level. A Guest Hypervisor is
supported regardless of the value of HCR_EL2.E2H.
This feature is supported in AArch64 state only.
The feature is mandatory in ARMv8.3 implementations.
The ID_AA64MMFR2_EL1.NV field identifies the support for ARMv8.3-NV.
For more information, see Nested virtualization on page D5-2492.
ARMv8.3-CCIDX, Cache extended number of sets
ARMv8.3-CCIDX introduces the following registers to allow caches to be described with greater
numbers of sets and greater associativity:
•
A 64-bit format of CCSIDR_EL1.
•
CCSIDR2_EL1.
•
CCSIDR2.
This feature is supported in AArch64 and AArch32 states.
This feature is optional in ARMv8.3 implementations.
The following fields identify the support for ARMv8.3-CCIDX:
•
ID_AA64MMFR2_EL1.CCIDX
•
ID_MMFR4_EL1.CCIDX.
•
ID_MMFR4.CCIDX.
For more information, see:
•
Possible formats of the Cache Size Identification Register, CCSIDR_EL1 on page D4-2355.
•
Possible formats of the Cache Size Identification Registers, CCSIDR and CCSIDR2 on
page G4-5427.
ARMv8.3-PAuth, Pointer Authentication
ARMv8.3-PAuth adds functionality that supports address authentication of the contents of a register
before that register is used as the target of an indirect branch, or as a load.
This feature is supported only in AArch64 state.
This feature is mandatory in ARMv8.3 implementations.
The fields that identify the support for ARMv8.3-PAuth are ID_AA64ISAR1_EL1.{GPI, GPA,
API, APA}.
For more information, see Pointer authentication in AArch64 state on page D5-2388.
A1.7.6
The ARMv8.4 architecture extension
The ARMv8.4 architecture extension adds architectural features.
Architectural features added by ARMv8.4
An implementation of the ARMv8.4 extension must include all of the features that this section describes as
mandatory. Such an implementation is also called an implementation of the ARMv8.4 architecture.
ARM DDI 0487D.a
ID103018
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
A1-69
A1 Introduction to the ARMv8 Architecture
A1.7 ARMv8 architecture extensions
The ARMv8.4 architecture extension adds the following architectural features, which are identified by the
architectural feature name and a short description of the feature:
ARMv8.4-DIT, Data Independent Timing instructions
ARMv8.4-DIT provides independent timing for data processing instructions with the addition of the
PSTATE.DIT and CPSR.DIT fields.
This feature is supported in AArch64 and AArch32 states.
This feature is mandatory in ARMv8.4 implementations.
The following fields identify the support for ARMv8.4-DIT:
•
ID_AA64PFR0_EL1.DIT.
•
ID_PFR0_EL1.DIT.
•
ID_PFR0.DIT.
For more information, see:
•
About PSTATE.DIT on page B1-87.
•
About the DIT bit on page E1-3540.
ARMv8.4-CondM, Condition flag Manipulation
ARMv8.4-CondM provides instructions which manipulate the PSTATE.{N,Z,C,V} flags.
These instructions are added to the A64 instruction set only.
This feature is mandatory in ARMv8.4 implementations.
The ID_AA64ISAR0_EL1.TS field identifies the presence of ARMv8.4-CondM.
For more information, see Flag manipulation instructions on page C3-200.
ARMv8.4-RCpc, ARMv8.4 enhancements to weaker release consistency
ARMv8.4-RCpc provides versions of the LDAPR and STLR with a 9-bit unscaled signed
immediate offset.
These instructions are added to the A64 instruction set only.
This feature is mandatory in ARMv8.4 implementations.
The ID_AA64ISAR1_EL1.LRCPC field identifies the presence of ARMv8.4-RCpc.
For more information, see:
•
Changes to single-copy atomicity in ARMv8.4 on page B2-93.
•
Non-exclusive Load-Acquire and Store-Release instructions on page C3-183.
•
A64 instructions that are changed in Debug state on page H2-6428.
ARMv8.4-LSE, Large System Extensions
ARMv8.4-LSE introduces changes to single-copy atomicity requirements for loads and stores, and
changes to alignment requirements for loads and stores.
This feature is supported in AArch64 state only.
This feature is mandatory in ARMv8.4 implementations.
The ID_AA64MMFR2_EL1.AT field identifies the support for ARMv8.4-LSE.
For more information, see:
•
Requirements for single-copy atomicity on page B2-92.
•
Unaligned data access restrictions on page B2-117.
ARMv8.4-TLBI, TLB maintenance and TLB range instructions
ARMv8.4-TLBI provides TLBI maintenance instructions that extend to the Outer Shareable domain
and TLBI invalidation instructions that apply to a range of input addresses.
This feature is supported in AArch64 state only.
This feature is mandatory in ARMv8.4 implementations.
The field ID_AA64ISAR0_EL1.TLB identifies the presence of ARMv8.4-TLBI.
A1-70
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
A1 Introduction to the ARMv8 Architecture
A1.7 ARMv8 architecture extensions
For more information, see:
•
TLB maintenance instruction syntax on page D5-2518.
•
TLB range maintenance instructions on page D5-2526.
ARMv8.4-TTL, Translation Table Level
ARMv8.4-TTL provides the TTL field to indicate the level of translation table walk holding the leaf
entry for the address that is being invalidated. This field is provided in all TLB maintenance
instructions that take a VA or an IPA argument.
This feature is supported in AArch64 state only.
This feature is mandatory in ARMv8.4 implementations.
The field ID_AA64MMFR2_EL1.TTL identifies the presence of ARMv8.4-TTL.
For more information, see:
•
TLB maintenance instruction syntax on page D5-2518.
•
TLB range maintenance instructions on page D5-2526.
ARMv8.4-S2FWB, Stage 2 forced Write-Back
ARMv8.4-S2FWB reduces the requirement of additional cache maintenance instructions in systems
where the data Cacheability attributes used by the Guest operating system are different from those
expected by the Hypervisor.
This feature is supported in AArch64 state.
This feature is mandatory in ARMv8.4 implementations.
The ID_AA64MMFR2_EL1.FWB field identifies the support for ARMv8.4-S2FWB.
For more information, see:
•
Memory region attributes on page D5-2476.
•
The stage 2 memory region attributes, EL1&0 translation regime on page D5-2478.
ARMv8.4-TTST, Small Translation tables
ARMv8.4-TTST relaxes the lower limit on the size of translation tables by increasing the maximum
permitted value of the T1SZ and T0SZ fields in TCR_EL1, TCR_EL2, TCR_EL3, VTCR_EL2 and
VSTCR_EL2.
This feature is supported in AArch64 state only.
This feature is mandatory in ARMv8.4 implementations or if ARMv8.4-SecEL2 is implemented.
This feature is optional if ARMv8.4-SecEL2 is not implemented.
The ID_AA64MMFR2_EL1.ST field identifies the support for ARMv8.4-TTST.
For more information, see:
•
Input address size on page D5-2401.
•
Overview of the VMSAv8-64 address translation stages on page D5-2415.
ARMv8.4-TTRem, Change in size of translation table mappings
ARMv8.4-TTRem provides support to identify the requirements of hardware to have
break-before-make sequences when changing between block size for a translation.
This feature is supported in AArch64 state only.
This feature is mandatory in ARMv8.4 implementations.
The ID_AA64MMFR2_EL1.BBM field identifies the support for ARMv8.4-TTRem.
For more information, see:
ARM DDI 0487D.a
ID103018
•
Memory attribute fields in the VMSAv8-64 translation table format descriptors on
page D5-2449.
•
Support levels for changing block size on page D5-2517.
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
A1-71
A1 Introduction to the ARMv8 Architecture
A1.7 ARMv8 architecture extensions
ARMv8.4-SecEL2, Secure EL2
ARMv8.4-SecEL2 permits EL2 to be implemented in Secure state, When Secure EL2 is enabled, a
new translation regime is introduced that follows the same format as the other Secure translation
regimes.
This feature is not supported if EL2 is using AArch32.
This feature is mandatory in ARMv8.4 implementations.
The ID_AA64PFR0_EL1.SEL2 field identifies the support for ARMv8.4-SecEL2.
For more information, see:
•
Virtualization on page D1-2152.
•
The VMSAv8-64 address translation system on page D5-2392.
ARMv8.4-NV, Enhanced support for Nested Virtualization
ARMv8.4 supports nested virtualization by redirecting register accesses that would be trapped to
EL1 and EL2 to access memory instead. The address of the memory access depends on information
held in VNCR_EL2.
This feature is supported in AArch64 state only.
This feature is mandatory in ARMv8.4 implementations.
The ID_AA64MMFR2_EL1.NV field identifies the support for ARMv8.4-NV.
For more information, see Enhanced support for nested virtualization on page D5-2494.
ARMv8.4-IDST, ID Space Trap handling
ARMv8.4-IDST allows read accesses to an ID register space when exceptions are generated, to be
reported in ESR_ELx using the EC code 0x18.
This feature is supported in AArch64 state only.
This feature is mandatory in ARMv8.4 implementations.
The ID_AA64MMFR2_EL1.IDS field identifies the support for ARMv8.4-IDST.
ARMv8.4-CNTSC, Generic Counter Scaling
ARMv8.4-CNTSC adds a scaling register to the memory-mapped counter module that allows the
frequency of the counter that is generated to be scaled from the basic frequency reported in the
counter ID mechanisms.
This feature is supported in AArch64 and AArch32 states.
This feature is optional in ARMv8.4 implementations.
The CNTID.CNTSC field identifies the support for ARMv8.4-CNTSC.
For more information, see:
•
CNTCR, Counter Control Register on page I5-6858.
ARMv8.4-Debug, ARMv8.4 Debug relaxations and extensions
ARMv8.4-Debug covers a selection of mandatory changes, including:
•
The fields MDCR_EL3.{EPMAD, EDAD} control Non-secure access to the debug and
PMU registers. The bus master is responsible for other debug authentication.
•
The OS Double Lock function is OPTIONAL in ARMv8.2 implementations onwards.
•
The feature ARMv8.0-DoubleLock has been introduced. See Additional changes in
ARMv8.4 on page A1-74.
•
The Software Lock is obsolete.
•
Non-invasive Debug controls are relaxed.
•
Secure and Non-secure views of the debug registers are enabled.
The fields that identify the support for ARMv8.4-Debug are:
•
ID_AA64DFR0_EL1.DebugVer.
•
DBGDIDR.Version.
A1-72
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
A1 Introduction to the ARMv8 Architecture
A1.7 ARMv8 architecture extensions
•
•
•
ID_DFR0_EL1.{CopSDbg, CopDbg}.
ID_DFR0.{CopSDbg, CopDbg}.
EDDEVARCH.ARCHID.
For more information, see:
•
Definition and constraints of a debugger in the context of external debug on page H1-6412
•
External debug interface register access permissions on page H8-6545
ARMv8.4-Trace, ARMv8.4 Self-hosted Trace Extensions
ARMv8.4-Trace adds controls of trace in a self-hosted system through System registers.
The feature provides:
•
Control of Exception levels and Security states where trace generation is prohibited.
•
Control of whether an offset is used for the timestamp recorded with trace information.
•
A context synchronization instruction TSB CSYNC which can be used to prevent reordering of
trace operation accesses with respect to other accesses of the same System registers.
If an ETM Architecture PE Trace Unit is implemented, this feature is mandatory, and the ETM PE
Trace Unit must implement System register access to its control registers. If a different PE Trace
Unit is implemented, this feature is optional.
The reset state of the PE has prohibited regions controlled by the feature and not the external
authentication signals. An external trace controller must override the internal controls before
enabling trace, including trace from reset. This is a change from previous trace architectures and is
not backwards-compatible.
The fields that identify the support for ARMv8.4-Trace are:
•
ID_AA64DFR0_EL1.TraceFilt.
•
ID_DFR0_EL1.TraceFilt.
•
ID_DFR0.TraceFilt.
•
EDDFR.TraceVer.
•
ID_AA64DFR0_EL1.TraceVer.
For more information, see:
•
Chapter D3 AArch64 Self-hosted Trace.
•
Chapter G3 AArch32 Self-hosted Trace.
ARMv8.4-PMU, ARMv8.4 PMU Extensions
ARMv8.4-PMU extends the number of events that are counted to allow for a top-down view of the
utilization of a PE’s resources in addition to the cycles being utilized. This permits the counting of
events in a multithreaded environment. It also introduces the PMMIR_EL1 and PMMIR registers.
This feature is supported in AArch64 and AArch32 states.
This feature is mandatory in ARMv8.4 implementations.
The fields that identify the support for ARMv8.4-PMU are:
•
ID_AA64DFR0_EL1.PMUVer.
•
ID_DFR0_EL1.Perfmon.
•
ID_DFR0.Perfmon.
•
EDDFR.PMUVer.
For more information, see PMU events and event numbers on page D6-2553.
ARMv8.4-RAS, ARMv8.4 RAS Extension
ARMv8.4-RAS implements RAS System Architecture v1.1 and adds support for:
•
ARMv8.4-DFE.
•
Simplifications to ERRSTATUS.
•
Additional ERRMISC registers.
•
The optional RAS Common Fault Injection Model Extension.
This feature is supported in AArch64 and AArch32 states.
ARM DDI 0487D.a
ID103018
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
A1-73
A1 Introduction to the ARMv8 Architecture
A1.7 ARMv8 architecture extensions
This feature is mandatory in ARMv8.4 implementations.
The following fields identify the support or partial support for ARMv8.4-RAS:
•
ID_AA64PFR0_EL1.RAS.
•
ID_AA64PFR1_EL1.RAS_frac.
•
ID_PFR0_EL1.RAS.
•
ID_PFR2_EL1.RAS_frac.
•
ID_PFR0.RAS.
•
ID_PFR2.RAS_frac.
When ARMv8.4-DFE is not implemented, and ERRIDR_EL1.NUM is zero, the values of
ID_AA64PFR0_EL1.RAS and ID_PFR0.RAS are IMPLEMENTATION DEFINED 0b0001 or 0b0010.
For more information, see:
•
The Reliability, Availability, and Serviceability (RAS) Extension.
•
ARM® Reliability, Availability, and Serviceability (RAS) Specification, ARMv8, for the
ARMv8-A architecture profile.
ARMv8.4-DFE, ARMv8.4 Double Fault Extension
ARMv8.4-DFE provides two controls:
•
SCR_EL3.EASE.
•
SCR_EL3.NMEA.
This feature is supported in AArch64 state only.
This feature is mandatory in ARMv8.4 implementations if EL3 is implemented and EL3 uses
AArch64. Otherwise, it is not implemented.
This feature is implemented if ID_AA64PFR0_EL1.RAS >= 0b0010 and the implementation
includes EL3 using AArch64.
For more information, see:
•
The Reliability, Availability, and Serviceability (RAS) Extension.
•
ARM® Reliability, Availability, and Serviceability (RAS) Specification, ARMv8, for the
ARMv8-A architecture profile.
Additional changes in ARMv8.4
The ARMv8.4 architecture includes some changes that are not associated with an ARMv8.4 feature:
The mandatory feature ARMv8.0-DoubleLock is introduced and from ARMv8.2, the Double Lock is OPTIONAL.
The ID_AA64DFR0_EL1.DoubleLock field identifies that the OS Double Lock has been implemented.
A1.7.7
The Reliability, Availability, and Serviceability (RAS) Extension
The RAS Extension is a mandatory extension to the ARMv8.2 architecture, and an optional extension to the
ARMv8.0 and the ARMv8.1 architectures.
The RAS Extension improves the dependability of a system by providing:
•
Reliability, that is, the continuity of correct service.
•
Availability, that is, the readiness for correct service.
•
Serviceability, that is, the ability to undergo modifications and repairs.
ID_AA64PFR0_EL1.RAS in AArch64 state, and ID_PFR0.RAS in AArch32 state, indicate whether the RAS
Extension is implemented.
The RAS Extension introduces a new barrier instruction, the Error Synchronization Barrier (ESB), to the A32, T32,
and A64 instruction sets.
System registers introduced by the RAS Extension are described in:
•
For AArch64, RAS registers on page D12-3404.
•
For AArch32, RAS registers on page G8-6311.
A1-74
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
A1 Introduction to the ARMv8 Architecture
A1.7 ARMv8 architecture extensions
In addition, the RAS Extension introduces a number of memory-mapped registers. These are described in the ARM®
Reliability, Availability, and Serviceability (RAS) Specification, ARMv8, for the ARMv8-A architecture profile.
ARMv8.2 introduces the following architectural features to the RAS Extension:
•
ARMv8.2-IESB.
ARMv8.4 introduces the following architectural features to the RAS Extension:
•
ARMv8.4-RAS.
•
ARMv8.4-DFE.
A1.7.8
The Statistical Profiling Extension (SPE)
The Statistical Profiling Extension is an optional extension introduced by the ARMv8.2 architecture.
Implementation of the Statistical Profiling Extension requires implementation of at least ARMv8.1 of the
ARMv8-A architecture profile. The Statistical Profiling Extension is only supported in AArch64 state.
The Statistical Profiling Extension provides a non-invasive method of sampling software and hardware using
randomized sampling of either architectural instructions, as defined by the instruction set architecture, or by
microarchitectural operations.
ID_AA64DFR0_EL1.PMSVer indicates whether the Statistical Profiling Extension is implemented.
For more information see Chapter D8 The Statistical Profiling Extension.
A1.7.9
The Scalable Vector Extension (SVE)
The Scalable Vector Extension is an optional extension introduced by the ARMv8.2 architecture. SVE is supported
in AArch64 state only.
The Scalable Vector Extension provides vector instructions that, primarily, support wider vectors than the ARM
Advanced SIMD instruction set. The ARM® Architecture Reference Manual Supplement, The Scalable Vector
Extension (SVE), for ARMv8-A describes the SVE.
ID_AA64PFR0_EL1.SVE indicates whether the Scalable Vector Extension is implemented.
The Scalable Vector Extension affects some AArch64 System registers, and those register changes are included in
this issue of this Manual, where they are identified as SVE features. SVE also introduces new AArch64 System
registers, however these do not appear in this manual. For more information about the new System registers
introduced by SVE, please see the ARM® Architecture Reference Manual Supplement, The Scalable Vector
Extension (SVE), for ARMv8-A.
The Scalable Vector Extension introduces the following System registers:
•
ID_AA64ZFR0_EL1.
•
ZCR_EL1, and an EL2 alias of this register, ZCR_EL12.
•
ZCR_EL2.
•
ZCR_EL3.
The Scalable Vector Extension modifies the following existing System registers:
•
CPACR_EL1.
•
CPTR_EL2.
•
CPTR_EL3.
•
ESR_ELx.
•
ID_AA64PFR0_EL1.
•
TCR_EL1.
•
TCR_EL2.
A1.7.10
The Activity Monitors Extension
The Activity Monitors Extension is an optional extension introduced by the ARMv8.4 architecture. AMU is
supported in AArch64 and AArch32 states.
ARM DDI 0487D.a
ID103018
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
A1-75
A1 Introduction to the ARMv8 Architecture
A1.7 ARMv8 architecture extensions
The Activity Monitors Extension implements version 1 of the Activity Monitors architecture, AMUv1, which
provides a function similar to a subset of the existing Performance Monitors Extension functionality, intended for
system management use rather than debugging and profiling.
The Activity Monitors Extension implements a System register interface to the Activity Monitors registers, and also
supports an optional external memory-mapped interface.
The fields that identify the presence of the Activity Monitors Extension are:
•
ID_AA64PFR0_EL1.AMU.
•
ID_PFR0_EL1.AMU.
•
ID_PFR0.AMU.
•
EDPFR.AMU.
For more information, see Chapter D7 The Activity Monitors Extension.
A1.7.11
The Memory Partitioning and Monitoring Extension (MPAM)
The Memory Partitioning and Monitoring Extension is an optional extension introduced by the ARMv8.4
architecture and requires implementation of at least ARMv8.2 of the ARMv8-A architecture profile. MPAM is
supported in AArch64 state only.
The MPAM Extension provides a framework for memory-system component controls that partition one or more of
the performance resources of the component.
The fields that identify the presence of the MPAM Extension are:
•
ID_AA64PFR0_EL1.MPAM.
•
EDPFR.MPAM.
For more information, see ARM® Architecture Reference Manual Supplement, Memory System Resource
Partitioning and Monitoring (MPAM), for ARMv8-A.
A1-76
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
Part B
The AArch64 Application Level Architecture
Chapter B1
The AArch64 Application Level Programmers’ Model
•
•
•
ARM DDI 0487D.a
ID103018
About the Application level programmers’ model on page B1-80.
Registers in AArch64 Execution state on page B1-81.
Software control features and EL0 on page B1-86.
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
B1-79
B1 The AArch64 Application Level Programmers’ Model
B1.1 About the Application level programmers’ model
B1.1
About the Application level programmers’ model
This chapter contains the programmers’ model information required for application development.
The information in this chapter is distinct from the system information required to service and support application
execution under an operating system, or higher level of system software. However, some knowledge of the system
information is needed to put the Application level programmers' model into context.
Depending on the implementation choices, the architecture supports multiple levels of execution privilege,
indicated by different Exception levels that number upwards from EL0 to EL3. EL0 corresponds to the lowest
privilege level and is often described as unprivileged. The Application level programmers’ model is the
programmers’ model for software executing at EL0. For more information see Exception levels on page D1-2146.
System software determines the Exception level, and therefore the level of privilege, at which software runs. When
an operating system supports execution at both EL1 and EL0, an application usually runs unprivileged at EL0. This:
•
Permits the operating system to allocate system resources to an application in a unique or shared manner.
•
Provides a degree of protection from other processes, and so helps protect the operating system from
malfunctioning software.
This chapter indicates where some system level understanding is necessary, and where relevant it gives a reference
to the system level description.
Execution at any Exception level above EL0 is often referred to as privileged execution.
For more information on the system level view of the architecture refer to Chapter D1 The AArch64 System Level
Programmers’ Model.
B1-80
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
B1 The AArch64 Application Level Programmers’ Model
B1.2 Registers in AArch64 Execution state
B1.2
Registers in AArch64 Execution state
This section describes the registers and process state visible at EL0 when executing in the AArch64 state. It includes
the following:
•
Registers in AArch64 state.
•
Process state, PSTATE on page B1-82.
•
System registers on page B1-84.
B1.2.1
Registers in AArch64 state
In the AArch64 application level view, an Arm processing element has:
R0-R30
31 general-purpose registers, R0 to R30. Each register can be accessed as:
•
A 64-bit general-purpose register named X0 to X30.
•
A 32-bit general-purpose register named W0 to W30.
See the register name mapping in Figure B1-1.
63
32 31
0
Rn
Wn
Xn
Figure B1-1 General-purpose register naming
The X30 general-purpose register is used as the procedure call link register.
Note
In instruction encodings, the value 0b11111 (31) is used to indicate the ZR (zero register). This
indicates that the argument takes the value zero, but does not indicate that the ZR is implemented
as a physical register.
SP
A 64-bit dedicated Stack Pointer register. The least significant 32 bits of the stack-pointer can be
accessed via the register name WSP.
The use of SP as an operand in an instruction, indicates the use of the current stack pointer.
Note
Stack pointer alignment to a 16-byte boundary is configurable at EL1. For more information see the
Procedure Call Standard for the Arm 64-bit Architecture.
PC
A 64-bit Program Counter holding the address of the current instruction.
Software cannot write directly to the PC. It can only be updated on a branch, exception entry or
exception return.
Note
Attempting to execute an A64 instruction that is not word-aligned generates a PC alignment fault,
see PC alignment checking on page D1-2164.
V0-V31
ARM DDI 0487D.a
ID103018
32 SIMD&FP registers, V0 to V31. Each register can be accessed as:
•
A 128-bit register named Q0 to Q31.
•
A 64-bit register named D0 to D31.
•
A 32-bit register named S0 to S31.
•
A 16-bit register named H0 to H31.
•
An 8-bit register named B0 to B31.
•
A 128-bit vector of elements.
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
B1-81
B1 The AArch64 Application Level Programmers’ Model
B1.2 Registers in AArch64 Execution state
•
A 64-bit vector of elements.
Where the number of bits described by a register name does not occupy an entire SIMD&FP
register, it refers to the least significant bits. See Figure B1-2.
127
64 63
32 31
16 15
8 7
0
Vn
Bn
Hn
Sn
Dn
Qn
Figure B1-2 SIMD and floating-point register naming
For more information about data types and vector formats, see Supported data types on page A1-39.
FPCR, FPSR Two SIMD and floating-point control and status registers, FPCR and FPSR.
See Registers for instruction processing and exception handling on page D1-2155 for more information on the
registers.
Pseudocode description of registers in AArch64 state
In the pseudocode functions that access registers:
•
The assignment form is used for register writes.
•
The non-assignment for register reads.
The uses of the X[] function are:
•
Reading or writing X0-X30, using n to index the required register.
•
Reading the zero register ZR, accessed as X[31].
Note
The pseudocode use of X[31] to represent the zero register does not indicate that hardware must implement this
register.
The AArch64 SP[] function is used to read or write the current SP.
The AArch64 PC[] function is used to read the PC.
The AArch64 V[] function is used to read or write the Advanced SIMD and floating-point registers V0-V31, using
a parameter n to index the required register.
The AArch64 Vpart[] function is used to read or write a part of one of V0-V31, using a parameter n to index the
required register, and a parameter part to indicate the required part of the register, see the function description for
more information.
The SP[], PC[], V[], and Vpart[] functions are defined in Chapter J1 ARMv8 Pseudocode.
B1.2.2
Process state, PSTATE
Process state or PSTATE is an abstraction of process state information. All of the instruction sets provide
instructions that operate on elements of PSTATE.
B1-82
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
B1 The AArch64 Application Level Programmers’ Model
B1.2 Registers in AArch64 Execution state
The following PSTATE information is accessible at EL0:
The Condition flags
Flag-setting instructions set these. They are:
N
Negative Condition flag. If the result of the instruction is regarded as a two's
complement signed integer, the PE sets this to:
•
1 if the result is negative.
•
0 if the result is positive or zero.
Z
Zero Condition flag. Set to:
•
1 if the result of the instruction is zero.
•
0 otherwise.
A result of zero often indicates an equal result from a comparison.
C
Carry Condition flag. Set to:
V
•
1 if the instruction results in a carry condition, for example an unsigned overflow
that is the result of an addition.
•
0 otherwise.
Overflow Condition flag. Set to:
•
1 if the instruction results in an overflow condition, for example a signed
overflow that is the result of an addition.
•
0 otherwise.
Conditional instructions test the N, Z, C and V Condition flags, combining them with the Condition
code for the instruction to determine whether the instruction must be executed. In this way,
execution of the instruction is conditional on the result of a previous operation. For more
information about conditional execution, see Condition flags and related instructions on
page C6-689.
The exception masking bits
D
Debug exception mask bit. When EL0 is enabled to modify the mask bits, this bit is
visible and can be modified. However, this bit is architecturally ignored at EL0.
A
SError interrupt mask bit.
I
IRQ interrupt mask bit.
F
FIQ interrupt mask bit.
For each bit, the values are:
0
Exception not masked.
1
Exception masked.
Access at EL0 using AArch64 state depends on SCTLR_EL1.UMA. See Traps to EL1 of EL0
accesses to the PSTATE.{D, A, I, F} interrupt masks on page D1-2212.
See Process state, PSTATE on page D1-2161 for the system level view of PSTATE.
ARM DDI 0487D.a
ID103018
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
B1-83
B1 The AArch64 Application Level Programmers’ Model
B1.2 Registers in AArch64 Execution state
Accessing PSTATE fields at EL0
At EL0 using AArch64 state, PSTATE fields can be accessed using Special-purpose registers that can be directly
read using the MRS instruction and directly written using the MSR (register) instructions. Table B1-1 shows the
Special-purpose registers that access the PSTATE fields that hold AArch64 state when the PE is at EL0 using
AArch64. All other PSTATE fields do not have direct read and write access at EL0.
Table B1-1 Accessing PSTATE fields at EL0 using MRS and MSR (register)
Special-purpose register
PSTATE fields
NZCV
N, Z, C, V
DAIF
D, A, I, F
Software can also use the MSR (immediate) instruction to directly write to PSTATE.{D, A, I, F}. Table B1-2 shows
the MSR (immediate) operands that can directly write to PSTATE.{D, A, I, F} when the PE is at EL0 using AArch64
state.
Table B1-2 Accessing PSTATE.{D, A, I, F} at EL0 using MSR (immediate)
Operand
PSTATE fields
Notes
DAIFSet
D, A, I, F
Directly sets any of the PSTATE.{D,A, I, F} bits to 1
DAIFClr
D, A, I, F
Directly clears any of the PSTATE.{D, A, I, F} bits to 0
However, access to the PSTATE.{D, A, I, F} fields at EL0 using AArch64 state depends on SCTLR_EL1.UMA.
Traps to EL1 of EL0 accesses to the PSTATE.{D, A, I, F} interrupt masks on page D1-2212.
Writes to the PSTATE fields have side-effects on various aspects of the PE operation. All of these side-effects, are
guaranteed:
•
Not to be visible to earlier instructions in the execution stream.
•
To be visible to later instructions in the execution stream.
B1.2.3
System registers
System registers provide support for execution control, status and general system configuration. The majority of the
System registers are not accessible at EL0.
However, some System registers can be configured to allow access from software executing at EL0. Any access
from EL0 to a System register with the access right disabled causes the instruction to behave as UNDEFINED. The
registers that can be accessed from EL0 are:
Cache ID registers
The CTR_EL0 and DCZID_EL0 registers provide implementation parameters for EL0
cache management support.
Debug registers
A debug communications channel is supported by the MDCCSR_EL0, DBGDTR_EL0,
DBGDTRRX_EL0 and DBGDTRTX_EL0 registers.
Performance Monitors registers
The Performance Monitors Extension provides counters and configuration registers.
Software executing at EL1 or a higher Exception level can configure some of these registers
to be accessible at EL0.
For more details, see Chapter D6 The Performance Monitors Extension.
Activity Monitors registers
The Activity Monitors Extension provides counters and configuration registers. Software
executing at EL1 or a higher Exception level can configure these registers to be accessible
at EL0.
B1-84
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
B1 The AArch64 Application Level Programmers’ Model
B1.2 Registers in AArch64 Execution state
For more details, see Chapter D7 The Activity Monitors Extension.
ARM DDI 0487D.a
ID103018
Thread ID registers
The TPIDR_EL0 and TPIDRRO_EL0 registers are two thread ID registers with different
access rights.
Timer registers
In ARMv8 the following operations are performed:
•
Read access to the system counter clock frequency using CNTFRQ_EL0.
•
Physical and virtual timer count registers, CNTPCT_EL0 and CNTVCT_EL0.
•
Physical up-count comparison, down-count value and timer control registers,
CNTP_CVAL_EL0, CNTP_TVAL_EL0, and CNTP_CTL_EL0.
•
Virtual up-count comparison, down-count value and timer control registers,
CNTV_CVAL_EL0, CNTV_TVAL_EL0, and CNTV_CTL_EL0.
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
B1-85
B1 The AArch64 Application Level Programmers’ Model
B1.3 Software control features and EL0
B1.3
Software control features and EL0
The following sections describe the EL0 view of the ARMv8 software control features:
•
Exception handling.
•
Wait for Interrupt and Wait for Event.
•
The YIELD instruction.
•
Application level cache management on page B1-87.
•
Instructions relating to Debug on page B1-87.
B1.3.1
Exception handling
In the Arm architecture, an exception causes a change of program flow. Execution of an exception handler starts, at
an Exception level higher than EL0, from a defined vector that relates to the exception taken.
Exceptions include:
•
Interrupts.
•
Memory system aborts.
•
Exceptions generated by attempting to execute an instruction that is UNDEFINED.
•
System calls.
•
Secure monitor or Hypervisor traps.
•
Debug exceptions.
Most details of exception handling are not visible to application level software, and are described in Chapter D1 The
AArch64 System Level Programmers’ Model.
The SVC instruction causes a Supervisor Call exception. This provides a mechanism for unprivileged software to
make a system call to an operating system.
The BRK instruction generates a Breakpoint Instruction exception. This provides a mechanism for debugging
software using debugger executing on the same PE, see Breakpoint Instruction exceptions on page D2-2294.
Note
The BRK instruction is supported only in the A64 instruction set. The equivalent instruction in the T32 and A32
instruction sets is BKPT.
B1.3.2
Wait for Interrupt and Wait for Event
Issuing a WFI instruction indicates that no further execution is required until a WFI wake-up event occurs, see Wait
For Interrupt on page D1-2258. This permits entry to a low-power state.
Issuing a WFE instruction indicates that no further execution is required until a WFE wake-up event occurs, see Wait
for Event mechanism and Send event on page D1-2255. This permits entry to a low-power state.
B1.3.3
The YIELD instruction
The YIELD instruction provides a hint that the task performed by a thread is of low importance so that it could yield,
see YIELD on page C6-1266. This mechanism can be used to improve overall performance in a Symmetric
Multithreading (SMT) or Symmetric Multiprocessing (SMP) system.
Examples of when the YIELD instruction might be used include a thread that is sitting in a spin-lock, or where the
arbitration priority of the snoop bit in an SMP system is modified. The YIELD instruction permits binary
compatibility between SMT and SMP systems.
The YIELD instruction is a NOP (No Operation) hint instruction.
The YIELD instruction has no effect in a single-threaded system, but developers of such systems can use the
instruction to flag its intended use for future migration to a multiprocessor or multithreading system. Operating
systems can use YIELD in places where a yield hint is wanted, knowing that it will be treated as a NOP if there is no
implementation benefit.
B1-86
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
B1 The AArch64 Application Level Programmers’ Model
B1.3 Software control features and EL0
B1.3.4
Application level cache management
A small number of cache management instructions can be enabled at EL0 from higher levels of privilege using the
SCTLR_EL1 System register. Any access from EL0 to an operation with the access right disabled causes the
instruction to behave as UNDEFINED.
About the available operations, see Application level access to functionality related to caches on page B2-113.
B1.3.5
Instructions relating to Debug
Exception handling on page B1-86 refers to the BRK instruction, which generates a Breakpoint Instruction exception.
In addition, in both AArch64 state and AArch32 state, the HLT instruction causes the PE to halt execution and enter
Debug state. This provides a mechanism for debugging software using a debugger that is external to the PE, see
Chapter H1 About External Debug.
Note
In AArch32 state, previous versions of the architecture defined the DBG instruction, that could provide a hint to the
debug system. In ARMv8, this instruction executes as a NOP. Arm deprecates the use of the DBG instruction.
B1.3.6
About PSTATE.DIT
When the value of PSTATE.DIT is 1:
•
•
The instructions listed in DIT are required to have;
—
Timing which is independent of the values of the data supplied in any of its registers, and the values
of the NZCV flags.
—
Responses to asynchronous exceptions which do not vary based on the values supplied in any of their
registers, or the values of the NZCV flags.
All loads and stores must have their timing insensitive to the value of the data being loaded or stored.
Note
ARM recommends that the ARMv8.3 pointer authentication instructions do not have their timing dependent on the
key value used in the pointer authentication, regardless of the PSTATE.DIT bit.
When the value of PSTATE.DIT is 0, the architecture makes no statement about the timing properties of any
instructions. However, it is likely that these instructions have timing that is invariant of the data in many situations.
A corresponding DIT bit is added to PSTATE in AArch64 state, and to CPSR in AArch32 state.
When an exception is taken from AArch64 state to AArch64 state, PSTATE.DIT is copied to SPSR_ELx.DIT.
When an exception is taken from AArch32 state to AArch64 state, CPSR.DIT is copied to SPSR_ELx.DIT.
When an exception returns to AArch64 state from AArch64 state, SPSR_ELx.DIT is copied to PSTATE.DIT.
When an exception returns to AArch32 state from AArch64 state, SPSR_ELx.DIT is copied to CPSR.DIT.
PSTATE.DIT can be written and read at all exception levels.
Note
PSTATE.DIT is unchanged on entry into Debug state.
PSTATE.DIT is not guaranteed to have any effect in Debug state.
ARM DDI 0487D.a
ID103018
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
B1-87
B1 The AArch64 Application Level Programmers’ Model
B1.3 Software control features and EL0
B1-88
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
Chapter B2
The AArch64 Application Level Memory Model
This chapter gives an application level view of the memory model. It contains the following sections:
•
About the Arm memory model on page B2-90.
•
Atomicity in the Arm architecture on page B2-92.
•
Definition of the ARMv8 memory model on page B2-97.
•
Caches and memory hierarchy on page B2-111.
•
Alignment support on page B2-116.
•
Endian support on page B2-119.
•
Memory types and attributes on page B2-122.
•
Mismatched memory attributes on page B2-132.
•
Synchronization and semaphores on page B2-135.
Note
In this chapter, System register names usually link to the description of the register in Chapter D12 AArch64 System
Register Descriptions, for example SCTLR_EL1.
ARM DDI 0487D.a
ID103018
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
B2-89
B2 The AArch64 Application Level Memory Model
B2.1 About the Arm memory model
B2.1
About the Arm memory model
The Arm architecture is a weakly ordered memory architecture that permits the observation and completion of
memory accesses in a different order from the program order. The following sections of this chapter provide the
complete definition of the ARMv8 memory model, this introduction is not intended to contradict the definition
found in those sections. In general, the basic principles of the ARMv8 memory model are:
•
To provide a memory model that has similar weaknesses to those found in the memory models used by
high-level programming languages such as C or Java. For example, by permitting independent memory
accesses to be reordered as seen by other observers.
•
To avoid the requirement for multi-copy atomicity in the majority of memory types.
•
The provision of instructions and memory barriers to compensate for the lack of multi-copy atomicity in the
cases where it would be needed.
•
The use of address, data, and control dependencies in the creation of order so as to avoid having excessive
numbers of barriers or other explicit instructions in common situations where some order is required by the
programmer or the compiler.
This section contains:
•
Address space.
•
Memory type overview.
B2.1.1
Address space
Address calculations are performed using 64-bit registers. However, supervisory software can configure the top
eight address bits for use as a tag, as described in Address tagging in AArch64 state on page D5-2386. If this is done,
address bits[63:56]:
•
Are not considered when determining whether the address is valid.
•
Are never propagated to the program counter.
Supervisory software determines the valid address range. Attempting to access an address that is not valid generates
an MMU fault.
Simple sequential execution of instructions might overflow the valid address range. For more information, see
Virtual address space overflow on page D4-2351.
Memory accesses use the Mem[] function. This function makes an access of the required type. If supervisory software
configures the top eight address bits for use as a tag, the top eight address bits are ignored.
The AccType{} enumeration defines the different access types.
Note
B2.1.2
•
Chapter D4 The AArch64 System Level Memory Model and Chapter D5 The AArch64 Virtual Memory System
Architecture include descriptions of memory system features that are transparent to the application, including
memory access, address translation, memory maintenance instructions, and alignment checking and the
associated fault handling. These chapters also include pseudocode descriptions of these operations.
•
For information on the pseudocode that relates to memory accesses, see Basic memory access on
page D4-2380, Unaligned memory access on page D4-2381, and Aligned memory access on page D4-2381.
Memory type overview
ARMv8 provides the following mutually-exclusive memory types:
B2-90
Normal
This is generally used for bulk memory operations, both read/write and read-only operations.
Device
The Arm architecture forbids Speculative reads of any type of Device memory. This means Device
memory types are suitable attributes for read-sensitive Locations.
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
B2 The AArch64 Application Level Memory Model
B2.1 About the Arm memory model
Locations of the memory map that are assigned to peripherals are usually assigned the Device
memory attribute.
Device memory has additional attributes that have the following effects:
•
They prevent aggregation of reads and writes, maintaining the number and size of the
specified memory accesses. See Gathering on page B2-128.
•
They preserve the access order and synchronization requirements, both for accesses to a
single peripheral and where there is a synchronization requirement on the observability of
one or more memory write and read accesses. See Reordering on page B2-129
•
They indicate whether a write can be acknowledged other than at the end point. See Early
Write Acknowledgement on page B2-130.
For more information on Normal memory and Device memory, see Memory types and attributes on page B2-122.
Note
Earlier versions of the Arm architecture defined a single Device memory type and a Strongly-ordered memory type.
A Note in Device memory on page B2-126 describes how these memory types map onto the ARMv8 memory types.
ARM DDI 0487D.a
ID103018
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
B2-91
B2 The AArch64 Application Level Memory Model
B2.2 Atomicity in the Arm architecture
B2.2
Atomicity in the Arm architecture
Atomicity is a feature of memory accesses, described as atomic accesses. The Arm architecture description refers to
two types of atomicity, single-copy atomicity and multi-copy atomicity. In the Armv8 architecture, the atomicity
requirements for memory accesses depend on the memory type, and whether the access is explicit or implicit. For
more information, see:
•
Requirements for single-copy atomicity.
•
Properties of single-copy atomic accesses on page B2-93.
•
Multi-copy atomicity on page B2-94.
•
Requirements for multi-copy atomicity on page B2-94.
•
Concurrent modification and execution of instructions on page B2-94.
For more information about the memory types, see Memory type overview on page B2-90.
B2.2.1
Requirements for single-copy atomicity
For explicit memory accesses generated from an Exception level the following rules apply:
•
A read that is generated by a load instruction that loads a single general-purpose register and is aligned to the
size of the read in the instruction is single-copy atomic.
•
A write that is generated by a store instruction that stores a single general-purpose register and is aligned to
the size of the write in the instruction is single-copy atomic.
•
Reads that are generated by a Load Pair instruction that loads two general-purpose registers and are aligned
to the size of the load to each register are treated as two single-copy atomic reads, one for each register being
loaded.
•
Writes that are generated by a Store pair instruction that stores two general-purpose registers and are aligned
to the size of the store of each register are treated as two single-copy atomic writes, one for each register being
stored.
•
Load-Exclusive Pair instructions of two 32-bit quantities and Store-Exclusive Pair instructions of 32-bit
quantities are single-copy atomic.
•
When the Store-Exclusive of a Load-Exclusive/Store-Exclusive pair instruction using two 64-bit quantities
succeeds, it causes a single-copy atomic update of the entire memory location being updated.
Note
To atomically load two 64-bit quantities, perform a Load-Exclusive pair/Store-Exclusive pair sequence of
reading and writing the same value for which the Store-Exclusive pair succeeds, and use the read values from
the Load-Exclusive pair.
B2-92
•
Where translation table walks generate a read of a translation table entry, this read is single-copy atomic.
•
For the atomicity of instruction fetches, see Concurrent modification and execution of instructions on
page B2-94.
•
Reads to SIMD and floating-point registers of a single 64-bit or smaller quantity that is aligned to the size of
the quantity being loaded are treated as single-copy atomic reads.
•
Writes from SIMD and floating-point registers of a single 64-bit or smaller quantity that is aligned to the size
of the quantity being stored are treated as single-copy atomic writes.
•
Element or Structure Reads to SIMD and floating-point registers of 64-bit or smaller elements, where each
element is aligned to the size of the element being loaded, have each element treated as a single-copy atomic
read.
•
Element or Structure Writes from SIMD and floating-point registers of 64-bit or smaller elements, where
each element is aligned to the size of the element being stored, have each element treated as a single-copy
atomic store.
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
B2 The AArch64 Application Level Memory Model
B2.2 Atomicity in the Arm architecture
•
Reads to SIMD and floating-point registers of a 128-bit value that is 64-bit aligned in memory are treated as
a pair of single-copy atomic 64-bit reads.
•
Writes from SIMD and floating-point registers of a 128-bit value that is 64-bit aligned in memory are treated
as a pair of single-copy atomic 64-bit writes.
All other memory accesses are regarded as streams of accesses to bytes, and no atomicity between accesses to
different bytes is ensured by the architecture.
All accesses to any byte are single-copy atomic.
Note
In AArch64 state, no memory accesses from a DC ZVA have single-copy atomicity of any quantity greater than
individual bytes.
If, according to these rules, an instruction is executed as a sequence of accesses, exceptions, including interrupts,
can be taken during that sequence, regardless of the memory type being accessed. If any of these exceptions are
returned from using their preferred return address, the instruction that generated the sequence of accesses is
re-executed, and so any access performed before the exception was taken is repeated. See also Taking an interrupt
or other exception during a multiple-register load or store on page D1-2207.
Note
The exception behavior for these multiple access instructions means that they are not suitable for use for writes to
memory for the purpose of software synchronization.
Changes to single-copy atomicity in ARMv8.4
Instructions that are introduced in ARMv8.4-RCpc are single-copy atomic when the following conditions are true:
•
All bytes being accessed are within the same 16-byte quantity aligned to 16 bytes.
•
Accesses are to Inner Write-Back, Outer Write-Back Normal cacheable memory.
Otherwise it is IMPLEMENTATION DEFINED whether they are single-copy atomic.
If ARMv8.4-LSE is implemented, all loads and stores are single-copy atomic when the following conditions are
true:
•
Accesses are unaligned to their data size but are aligned within a 16-byte quantity that is aligned to 16 bytes.
•
Accesses are to Inner Write-Back, Outer Write-Back Normal cacheable memory.
Otherwise it is IMPLEMENTATION DEFINED whether loads and stores are single-copy atomic.
If ARMv8.4-LSE is implemented, LDP, LDNP, and STP instructions that load or store two 64-bit registers are
single-copy atomic when the following conditions are true:
•
The overall memory access is aligned to 16 bytes.
•
Accesses are to Inner Write-Back, Outer Write-Back Normal cacheable memory.
If ARMv8.4-LSE is implemented, LDP, LDNP, and STP instructions that access fewer than 16 bytes are single-copy
atomic when the following conditions are true:
•
All bytes being accessed are within a 16-byte quantity aligned to 16 bytes.
•
Accesses are to Inner Write-Back, Outer Write-Back Normal cacheable memory.
Otherwise it is IMPLEMENTATION DEFINED whether LDP, LDNP, or STP instructions that access fewer than 16 bytes are
single-copy atomic.
B2.2.2
Properties of single-copy atomic accesses
A memory access instruction that is single-copy atomic has the following properties:
1.
ARM DDI 0487D.a
ID103018
For a pair of overlapping single-copy atomic store instructions, all of the overlapping writes generated by one
of the stores are Coherence-after the corresponding overlapping writes generated by the other store.
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
B2-93
B2 The AArch64 Application Level Memory Model
B2.2 Atomicity in the Arm architecture
2.
For a single-copy atomic load instruction L1 that overlaps a single-copy atomic store instruction S2, if one of
the overlapping reads generated by L1 Reads-from one of the overlapping writes generated by S2, then none
of the overlapping writes generated by S2 are Coherence-after the corresponding overlapping reads generated
by L1.
For more information, see Definition of the ARMv8 memory model on page B2-97.
B2.2.3
Multi-copy atomicity
In a multiprocessing system, writes to a memory location are multi-copy atomic if the following conditions are both
true:
•
All writes to the same location are serialized, meaning they are observed in the same order by all observers,
although some observers might not observe all of the writes.
•
A read of a location does not return the value of a write until all observers observe that write.
Note
Writes that are not coherent are not multi-copy atomic.
B2.2.4
Requirements for multi-copy atomicity
For Normal memory, writes are not required to be multi-copy atomic.
For Device memory, writes are not required to be multi-copy atomic.
The ARMv8 memory model is Other-multi-copy atomic. For more information, see Ordering constraints on
page B2-101.
B2.2.5
Concurrent modification and execution of instructions
The Armv8 architecture limits the set of instructions that can be executed by one thread of execution as they are
being modified by another thread of execution without requiring explicit synchronization.
Concurrent modification and execution of instructions can lead to the resulting instruction performing any behavior
that can be achieved by executing any sequence of instructions that can be executed from the same Exception level,
except where each of the instruction before modification and the instruction after modification is one of a B, BL, BRK,
HVC, ISB, NOP, SMC, or SVC instruction.
For the B, BL, BRK, HVC, ISB, NOP, SMC, and SVC instructions the architecture guarantees that, after modification of the
instruction, behavior is consistent with execution of either:
•
The instruction originally fetched.
•
A fetch of the modified instruction.
If one thread of execution changes a conditional branch instruction, such as B or BL, to another conditional instruction
and the change affects both the condition field and the branch target, execution of the changed instruction by another
thread of execution before the change is synchronized can lead to either:
•
The old condition being associated with the new target address.
•
The new condition being associated with the old target address.
These possibilities apply regardless of whether the condition, either before or after the change to the branch
instruction, is the always condition.
For all other instructions, to avoid UNPREDICTABLE or CONSTRAINED UNPREDICTABLE behavior, instruction
modifications must be explicitly synchronized before they are executed. The required synchronization is as follows:
1.
No PE must be executing an instruction when another PE is modifying that instruction.
2.
To ensure that the modified instructions are observable, a PE that is writing the instructions must issue the
following sequence of instructions and operations:
; Coherency example for data and instruction accesses within the same Inner Shareable domain.
B2-94
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
B2 The AArch64 Application Level Memory Model
B2.2 Atomicity in the Arm architecture
; Enter this code with containing a new 32-bit instruction,
; to be held in Cacheable space at a location pointed to by Xn.
STR Wt, [Xn]
DC CVAU, Xn
; Clean data cache by VA to point of unification (PoU)
DSB ISH
; Ensure visibility of the data cleaned from cache
IC IVAU, Xn
; Invalidate instruction cache by VA to PoU
DSB ISH
; Ensure completion of the invalidations
Note
3.
•
The DC CVAU operation is not required if the area of memory is either Non-cacheable or Write-Through
Cacheable.
•
If the contents of physical memory differ between the mappings, changing the mapping of VAs to PAs
can cause the instructions to be concurrently modified by one PE and executed by another PE. If the
modifications affect instructions other than those listed as being acceptable for modification,
synchronization must be used to avoid UNPREDICTABLE or CONSTRAINED UNPREDICTABLE behavior.
In a multiprocessor system, the IC IVAU is broadcast to all PEs within the Inner Shareable domain of the PE
running this sequence. However, when the modified instructions are observable, each PE that is executing
the modified instructions must issue the following instruction to ensure execution of the modified
instructions:
ISB
; Synchronize fetched instruction stream
For more information about the required synchronization operation, see Synchronization and coherency issues
between data and instruction accesses on page B2-114.
Note
For information about memory accesses caused by instruction fetches, see Ordering relations on page B2-100.
B2.2.6
Possible implementation restrictions on using atomic instructions
In some implementations, and for some memory types, the properties of atomicity can be met only by functionality
outside the PE. Some system implementations might not support atomic instructions for all regions of the memory.
In particular, this can apply to:
•
Any type of memory in the system that does not support hardware cache coherency.
•
Device, Non-cacheable memory, or memory that is treated as Non-cacheable, in an implementation that does
support hardware cache coherency.
In such implementations, it is defined by the system:
•
Whether the atomic instructions are atomic in regard to other agents that access memory.
•
If the atomic instructions are atomic in regard to other agents that access memory, which address ranges or
memory types this applies to.
An implementation can choose which memory type is treated as Non-cacheable.
The memory types for which it is architecturally guaranteed that the atomic instructions will be atomic are:
•
Inner Shareable, Inner Write-Back, Outer Write-Back Normal memory with Read allocation hints and Write
allocation hints and not transient.
•
Outer Shareable, Inner Write-Back, Outer Write-Back Normal memory with Read allocation hints and Write
allocation hints and not transient.
If the atomic instructions are not atomic in regard to other agents that access memory, then performing an atomic
instruction to such a location can have one or more of the following effects:
ARM DDI 0487D.a
ID103018
•
The instruction generates a synchronous External abort.
•
The instruction generates a System Error interrupt.
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
B2-95
B2 The AArch64 Application Level Memory Model
B2.2 Atomicity in the Arm architecture
•
The instruction generates an IMPLEMENTATION DEFINED MMU fault reported using the Data Abort Fault
status code of ESR_ELx.DFSC = 110101.
For the EL1&0 translation regime, if the atomic instruction is not supported because of the memory type that
is defined in the first stage of translation, or the second stage of translation is not enabled, then this exception
is a first stage abort and is taken to EL1. Otherwise, the exception is a second stage abort and is taken to EL2.
B2-96
•
The instruction is treated as a NOP.
•
The instructions are performed, but there is no guarantee that the memory accesses were performed
atomically in regard to other agents that access memory. In this case, the instruction might also generate a
System Error interrupt.
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
B2 The AArch64 Application Level Memory Model
B2.3 Definition of the ARMv8 memory model
B2.3
Definition of the ARMv8 memory model
This section describes observation and ordering in the ARMv8 memory model. It contains the following
subsections:
•
Locations.
•
Ordering and observability on page B2-98.
•
Ordering constraints on page B2-101.
•
Completion and endpoint ordering on page B2-102.
•
Memory barriers on page B2-103.
•
Limited ordering regions on page B2-109.
For more information about endpoint ordering of memory accesses, see Reordering on page B2-129.
In the ARMv8 memory model, the Shareability memory attribute indicates the degree to which hardware must
ensure memory coherency between a set of observers, see Memory types and attributes on page B2-122.
The Armv8 architecture defines additional memory attributes and associated behaviors, which are defined in the
system level section of this manual. See:
•
Chapter D4 The AArch64 System Level Memory Model.
•
Chapter D5 The AArch64 Virtual Memory System Architecture.
See also Mismatched memory attributes on page B2-132.
B2.3.1
Locations
The ARMv8 memory model provides a set of definitions that are used to constrain the permitted sequences of
accesses to memory. The ARMv8 memory model defines:
•
The ordering of observation of memory accesses between different observers.
•
The ordering of arrival of memory accesses arriving at an endpoint.
•
The mechanisms to control the ordering of observation of memory accesses and the arrival of memory
accesses at an endpoint.
Locations, Memory effects, and Observers
The ARMv8 memory model provides the following definition of a Location in memory:
Location
A Location refers to a single byte in memory.
As part of its execution an instruction might generate a Memory effect. Observers in the system might observe the
Memory effects of that instruction on a Location. The ARMv8 memory model provides the following definitions
of a Memory effect and an Observer:
Memory effect
The Memory effects of an instruction are the read, write, or barrier effects of that instruction. For an
instruction that accesses memory:
•
A read effect is generated for each Location that is read by the instruction.
•
A write effect is generated for each Location that is written by the instruction.
An instruction can generate both read and write effects.
The Memory effects of an instruction I1 are said to appear in program order before the Memory
effects of instruction I2 if and only if I1 occurs before I2 in program order.
For the purposes of describing the ARMv8 memory model, all read and write effects access only
Normal memory locations in a Common Shareability Domain. Where this section refers to a read,
write, or memory barrier without any qualification, then it is referring to the corresponding Memory
effect.
ARM DDI 0487D.a
ID103018
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
B2-97
B2 The AArch64 Application Level Memory Model
B2.3 Definition of the ARMv8 memory model
Observer
An Observer refers to either a processing element, or some other memory accessing agent that can
generate reads from or writes to memory.
Common Shareability Domain
A Common Shareability Domain for a program is the smallest Shareability domain that contains all
of the active Observers of the Memory effects generated by a program.
B2.3.2
Ordering and observability
The ARMv8 memory model permits reordering of memory accesses. This section defines the constraints placed on
the reordering of memory accesses using the following:
•
Register value dependencies to establish order between instructions on a PE.
•
Ordering constraints to establish order between accesses to a Location.
Register value dependencies
The ARMv8 memory model defines the following dependencies between instructions:
Register dependency
A Register dependency from a first data value V1 to a second data value V2 exists within a PE if and
only if either:
•
The register, excluding the AArch64 zero register (XZR or WZR), that is used to hold V1 is
used in the calculation of V2.
•
There is a Register dependency from V1 to a third data value V3 and there is a register
dependency from V3 to V2.
Register data dependency
A Register data dependency from a first data value V1 to a second data value V2 exists within a PE
if and only if either:
•
•
The register, excluding the AArch64 zero register (XZR or WZR) and the AArch32 PC, that
is used to hold V1 and is used in the calculation of V2, and the calculation between V1 and
the V2 does not consist of either:
—
A conditional branch whose condition is determined by V1.
—
A conditional selection, move, or computation whose condition is determined by V1,
where the input data values for the selection, move, or computation do not have a data
dependency on V1.
There is a Register data dependency from V1 to a third data value V3, and there is a Register
data dependency from V3 to V2.
Address dependency
An Address dependency from a read R1 to a subsequent read R2 exists if and only if there is a
Register data dependency from the data value that is returned by R1 to the address used by R2.
An Address dependency from a read R1 to a subsequent write W2 exists if and only if there is a
Register dependency from the data value that is returned by R1 to the address used by W2.
Data dependency
A Data dependency from a read R1 to a subsequent write W2 exists if and only if there is a Register
dependency from the data value returned by R1 to the data value written by W2.
B2-98
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
B2 The AArch64 Application Level Memory Model
B2.3 Definition of the ARMv8 memory model
Control dependency
A Control dependency from a read R1 to a subsequent instruction I2 exists if and only if either:
•
There is a Register dependency from the data value returned by R1 to the data value used in
the evaluation of a conditional branch, and I2 is only executed as a result of one of the
possible outcomes of that conditional branch.
•
There is a Register dependency from the data value returned by R1 to the data value used in
the determination of a synchronous exception on an instruction I3, and I2 appears in program
order after I3.
Ordering and observability at a Location
Memory effects on a Location are related by the following relations:
Reads-from
A Reads-from relation that couples reads and writes to the same Location such that each read is
paired with a single write in the program. A read R2 of a Location Reads-from a write W1 to the
same Location if and only if R2 takes its data from W1.
Note
The Reads-from relation represents a read being satisfied by a write and then returning the written
data.
Coherence order
A Coherence order relation for each Location in the program that provides a total order on all writes
from all coherent Observers to that Location, starting with a notional write of the initial value.
Note
The Coherence order of a Location represents the order in which writes to the Location arrive at
memory.
Coherence-after
A write W2 to a Location is Coherence-after another write W1 to the same Location if and only if
W2 is sequenced after W1 in the Coherence order of the Location.
A write W2 to a Location is Coherence-after a read R1 of the same location if and only if R1
Reads-from a write W3 to the same Location and W2 is Coherence-after W3.
Overlapping accesses
Two Memory effect overlap if and only if they access the same Location. Two instructions overlap
if and only if one or more of their generated Memory effects overlap.
Observed-by
A read or a write RW1 from an Observer is Observed-by a write W2 from a different Observer if and
only if W2 is coherence-after RW1.
A write W1 from an Observer is Observed-by a read R2 from a different Observer if and only if R2
Reads-from W1.
Note
The Observed-by relation only relates accesses generated by different Observers.
DMB FULL
A DMB FULL is a DMB with neither the LD or the ST qualifier.
Where this section refers to DMB without any qualification, then it is referring to all types of DMB.
Unless a specific shareability domain is defined, a DMB applies to the Common Shareability Domain.
ARM DDI 0487D.a
ID103018
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
B2-99
B2 The AArch64 Application Level Memory Model
B2.3 Definition of the ARMv8 memory model
All properties that apply to DMB also apply to the corresponding DSB.
Ordering relations
In addition to the ordering relations for a single Location, the ARMv8 memory model also provides ordering
relations to describe the ordering of Memory effects to multiple Locations. These are as follows:
Dependency-ordered-before
A dependency creates externally-visible order between a read and another Memory effect generated
by the same Observer. A read R1 is Dependency-ordered-before a read or write RW2 from the same
Observer if and only if R1 appears in program order before RW2 and any of the following cases
apply:
•
There is an Address dependency or a Data dependency from R1 to RW2.
•
RW2 is a write W2 and there is a Control dependency from R1 to W2.
•
RW2 is a read R2 generated by an instruction appearing in program order after an instruction
I3 that generates a Context synchronization event, and there is a Control dependency from R1
to I3.
•
RW2 is a write W2 appearing in program order after a read or a write RW3 and there is an
Address dependency from R1 to RW3.
•
RW2 is a write W2 that is Coherence-after a write W3 and there is a Control dependency or a
Data dependency from R1 to W3.
•
RW2 is a read R2 that Reads-from a write W3 and there is an Address dependency or a Data
dependency from R1 to W3.
Atomic-ordered-before
Load-Exclusive and Store-Exclusive instructions provide some ordering guarantees, even in the
absence of dependencies. A read or a write RW1 is Atomic-ordered-before a read or a write RW2
from the same Observer if and only if RW1 appears in program order before RW2 and either of the
following cases apply:
•
RW1 is a read R1 and RW2 is a write W2 such that R1 and W2 are generated by an atomic
instruction or a successful Load-Exclusive/Store-Exclusive instruction pair to the same
Location.
•
RW1 is a write W1 generated by an atomic instruction or a successful Store-Exclusive
instruction and RW2 is a read R2 generated by an instruction with Acquire or AcquirePC
semantics such that R2 Reads-from W1.
For more information, see Synchronization and semaphores on page B2-135.
Barrier-ordered-before
Barrier instructions order prior Memory effects before subsequent Memory effects generated by the
same Observer. A read or a write RW1 is Barrier-ordered-before a read or a write RW2 from the
same Observer if and only if RW1 appears in program order before RW2 and any of the following
cases apply:
•
RW1 appears in program order before a DMB FULL or an atomic instruction with both Acquire
and Release semantics that appears in program order before RW2.
•
RW1 is a write W1 generated by an instruction with Release semantics and RW2 is a read R2
generated by an instruction with Acquire semantics.
•
RW1 is a read R1 and either:
•
—
R1 appears in program order before a DMB LD that appears in program order before RW2.
—
R1 is generated by an instruction with Acquire or AcquirePC semantics.
RW2 is a write W2 and either:
—
B2-100
RW1 is a write W1 appearing in program order before a DMB ST that appears in program
order before W2.
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
B2 The AArch64 Application Level Memory Model
B2.3 Definition of the ARMv8 memory model
—
W2 is generated by an instruction with Release semantics.
—
RW1 appears in program order before a write W3 generated by an instruction with
Release semantics and W2 is Coherence-after W3.
Ordered-before
An arbitrary pair of Memory effects is ordered if it can be linked by a chain of ordered accesses
consistent with external observation. A read or a write RW1 is Ordered-before a read or a write RW2
if and only if any of the following cases apply:
B2.3.3
•
RW1 is Observed-by RW2.
•
RW1 is Dependency-ordered-before RW2.
•
RW1 is Atomic-ordered-before RW2.
•
RW1 is Barrier-ordered-before RW2.
•
RW1 is Ordered-before a read or a write that is Ordered-before RW2.
Ordering constraints
The ARMv8 memory model is described as being Other-multi-copy atomic. The definition of Other-multi-copy
atomic is as follows:
Other-multi-copy atomic
In an Other-multi-copy atomic system, it is required that a write from an Observer, if observed by a
different Observer, is then observed by all other Observers that access the Location coherently. It is,
however, permitted for an Observer to observe its own writes prior to making them visible to other
observers in the system.
The Other-multi-copy atomic property of the ARMv8 memory model is enforced by placing constraints on the
possible executions of a program. Those executions that meet the constraints given by the ordering model are said
to be architecturally well-formed. An implementation that is executing a program is only permitted to exhibit
behavior consistent with an architecturally well-formed execution:
Architecturally well-formed
An architecturally well-formed execution must satisfy both of the following requirements:
Internal visibility requirement
For a read or a write RW1 that appears in program order before a read or a write RW2 to
the same Location, the internal visibility requirement requires that exactly one of the
following statements is true:
•
RW2 is a write W2 that is Coherence-after RW1.
•
RW1 is a write W1 and RW2 is a read R2 such that either:
•
—
R2 Reads-from W1.
—
R2 Reads-from another write that is Coherence-after W1.
RW1 and RW2 are both reads R1 and R2 such that R1 Reads-from a write W3 and
either:
—
R2 Reads-from W3.
—
R2 Reads-from another write that is Coherence-after W3.
Note
If a Memory effect M1 from an Observer appears in program order before a Memory
effect M2 from the same Observer, then M1 will be seen to occur before M2 by that
Observer.
ARM DDI 0487D.a
ID103018
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
B2-101
B2 The AArch64 Application Level Memory Model
B2.3 Definition of the ARMv8 memory model
External visibility requirement
For a read or a write RW1 from an Observer that is Ordered-before a read or a write RW2
from a different Observer, the external visibility constraint requires that RW2 is not
Observed-by RW1. This means that an Architecturally well-formed execution must not
exhibit a cycle in the Ordered-before relation.
Note
If a Memory effect M1 from an Observer is Ordered-before another Memory effect M2,
from a different Observer, then M1 will be seen to occur before M2 by all Observers in
the system.
B2.3.4
Completion and endpoint ordering
Interaction between Observers in a system is not restricted to communication via shared variables in coherent
memory. For example, an Observer could configure an interrupt controller to raise an interrupt on another Observer
as a form of message passing. These interactions typically involve an additional agent, which defines the instruction
sequence that is required to establish communication links between different Observers. When these forms of
interaction are used in conjunction with shared variables, a DSB instruction can be used to enforce ordering between
them.
For all memory, the completion rules are defined as:
•
•
A read R1 to a Location is complete for a shareability domain when all of the following are true:
—
Any write to the same Location by an Observer within the shareability domain will be Coherence-after
R1.
—
Any translation table walks associated with R1 are complete for that shareability domain.
A write W1 to a Location is complete for a shareability domain when all of the following are true:
—
Any write to the same Location by an Observer within the shareability domain will be Coherence-after
W1.
—
Any read to the same Location by an Observer within the shareability domain will either Reads-from
W1 or Reads-from a write that is Coherence-after W1.
—
Any translation table walks associated with the write are complete for that shareability domain.
•
A translation table walk is complete for a shareability domain when the memory accesses, including the
updates to translation table entries, associated with the translation table walk are complete for that
shareability domain, and the TLB is updated.
•
A cache maintenance instruction is complete for a shareability domain when the memory effects of the
instruction are complete for that shareability domain, and any translation table walks that arise from the
instruction are complete for that shareability domain.
•
A TLB invalidate instruction is complete when all memory accesses using the TLB entries that have been
invalidated are complete.
The completion of any cache or TLB maintenance instruction includes its completion on all PEs that are affected
by both the instruction and the DSB operation that is required to guarantee visibility of the maintenance instruction.
Note
These completion rules mean that, for example, a cache maintenance instruction that operates by VA to the PoC
completes only after memory at the PoC has been updated.
Additionally, for Device-nGnRnE memory, a read or write of a Location in a Memory-mapped peripheral that
exhibits side-effects is complete only when the read or write both:
•
Can begin to affect the state of the Memory-mapped peripheral.
•
Can trigger all associated side-effects, whether they affect other peripheral devices, PEs, or memory.
B2-102
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
B2 The AArch64 Application Level Memory Model
B2.3 Definition of the ARMv8 memory model
Note
This requirement for Device-nGnRnE memory is consistent with the memory access having reached the peripheral
endpoint.
Peripherals
This section defines a Memory-mapped peripheral and the total order of reads and writes to a peripheral which is
defined as the Peripheral coherence order:
Memory-mapped peripheral
A Memory-mapped peripheral occupies a memory region of IMPLEMENTATION DEFINED size and
can be accessed using load and store instructions. Memory effects to a Memory-mapped peripheral
can have side-effects, such as causing the peripheral to perform an action. Values that are read from
addresses within a Memory-mapped peripheral might not correspond to the last data value written
to those addresses. As such, Memory effects to a Memory-mapped peripheral might not appear in
the Reads-from or Coherence order relations.
Peripheral coherence order
The Peripheral coherence order of a Memory-mapped peripheral is a total order on all reads and
writes to that peripheral.
Note
The Peripheral coherence order for a Memory-mapped peripheral signifies the order in which
accesses arrive at the endpoint.
For a read or a write RW1 and a read or a write RW2 to the same peripheral, then RW1 will appear
in the Peripheral coherence order for the peripheral before RW2 if either of the following cases
apply:
•
RW1 and RW2 are accesses using Non-cacheable or Device attributes and RW1 is
Ordered-before RW2.
•
RW1 and RW2 are accesses using Device-nGnRE or Device-nGnRnE attributes and RW1
appears in program order before RW2.
Out-of-band-ordered-before
A read or a write RW1 is Out-of-band-ordered-before a read or a write RW2 if and only if either of
the following cases apply:
•
RW1 appears in program order before a DSB instruction that begins an IMPLEMENTATION
DEFINED instruction sequence indirectly leading to the generation of RW2.
•
RW1 is Ordered-before a read or a write RW3 and RW3 is Out-of-band-ordered-before RW2.
If a Memory effect M1 is Out-of-band-ordered-before a read or a write M2, then M1 is seen to occur
before M2 by all Observers.
B2.3.5
Memory barriers
Memory barrier is the general term applied to an instruction, or sequence of instructions, that forces synchronization
events by a PE with respect to retiring Load/Store instructions. The memory barriers defined by the Armv8
architecture provide a range of functionality, including:
•
Ordering of Load/Store instructions.
•
Completion of Load/Store instructions.
•
Context synchronization.
The following subsections describe the ARMv8 memory barrier instructions:
•
Instruction Synchronization Barrier (ISB) on page B2-104
•
Data Memory Barrier (DMB) on page B2-104.
ARM DDI 0487D.a
ID103018
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
B2-103
B2 The AArch64 Application Level Memory Model
B2.3 Definition of the ARMv8 memory model
•
•
•
•
•
•
•
•
Data Synchronization Barrier (DSB) on page B2-106.
Consumption of Speculative Data Barrier (CSDB) on page B2-105.
Speculative Store Bypass Barrier (SSBB) on page B2-105.
Physical Speculative Store Bypass Barrier (PSSBB) on page B2-105.
Trace Synchronization Barrier (TSB CSYNC) on page B2-106
Shareability and access limitations on the data barrier operations on page B2-107.
Load-Acquire, Load-AcquirePC, and Store-Release on page B2-108.
LoadLOAcquire, StoreLORelease on page B2-109.
Note
Depending on the required synchronization, a program might use memory barriers on their own, or it might use them
in conjunction with cache maintenance and memory management instructions that in general are only available
when software execution is at EL1 or higher.
DMB and DSB instructions affect reads and writes to the memory system generated by Load/Store instructions and data
or unified cache maintenance instructions being executed by the PE. Instruction fetches or accesses caused by a
hardware translation table access are not explicit accesses.
Instruction Synchronization Barrier (ISB)
An ISB instruction ensures that all instructions that come after the ISB instruction in program order are fetched from
the cache or memory after the ISB instruction has completed. Using an ISB ensures that the effects of
context-changing operations executed before the ISB are visible to the instructions fetched after the ISB instruction.
Examples of context-changing operations that require the insertion of an ISB instruction to ensure the effects of the
operation are visible to instructions fetched after the ISB instruction are:
•
Completed cache and TLB maintenance instructions.
•
Changes to System registers.
Any context-changing operations appearing in program order after the ISB instruction only take effect after the ISB
has been executed.
The pseudocode function for the operation of an ISB is InstructionSynchronizationBarrier().
See also Memory barriers on page D4-2382.
Data Memory Barrier (DMB)
The DMB instruction is a memory barrier instruction that ensures the relative order of memory accesses before the
barrier with memory accesses after the barrier. The DMB instruction does not ensure the completion of any of the
memory accesses for which it ensures relative order.
The full definition of the DMB is covered formally in the Definition of the ARMv8 memory model on page B2-97
and this introduction to the DMB instruction is not intended to contradict that section.
The basic principle of a DMB instruction is to introduce order between memory accesses that are specified to be
affected by the DMB options supplied as arguments to the DMB instruction. The DMB instruction ensures that all
affected memory accesses by the PE executing the DMB that appear in program order before the DMB and those
which originate from a different PE, to the extent required by the DMB options, which have been Observed-by the
PE before the DMB is executed, are Observed-by each PE, to the extent required by the DMB options, before any
affected memory accesses that appear in program order after the DMB are Observed-by that PE.
The use of a DMB creates order between the Memory effects of instructions as described in the definition of
Barrier-ordered-before.
DMB only affects memory accesses and the operation of data cache and unified cache maintenance instructions, see
A64 Cache maintenance instructions on page D4-2364. It has no effect on the ordering of any other instructions
executing on the PE. A DMB instruction intended to ensure the completion of cache maintenance instructions must
have an access type of both loads and stores.
The pseudocode function for the operation of a DMB is DataMemoryBarrier().
B2-104
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
B2 The AArch64 Application Level Memory Model
B2.3 Definition of the ARMv8 memory model
Consumption of Speculative Data Barrier (CSDB)
The CSDB instruction is a memory barrier instruction that controls speculative execution and data value prediction.
This includes:
•
Data value predictions of any instructions.
•
PSTATE.{N,Z,C,V} predictions of any instructions other than conditional branch instructions appearing in
program order before the CSDB that have not been architecturally resolved.
•
Predictions of SVE predication state for any SVE instructions.
For purposes of the definition of CSDB, PSTATE.{N,Z,C,V} is not considered a data value. This definition permits:
•
Control flow speculation before and after the CSDB.
•
Speculative execution of conditional data processing instructions after the CSDB, unless they use the results
of data value or PSTATE.{N,Z,C,V} predictions of instructions appearing in program order before the CSDB
that have not been architecturally resolved.
Speculative Store Bypass Barrier (SSBB)
The SSBB is a memory barrier that prevents speculative loads from bypassing earlier stores to the same virtual
address under certain conditions.
The semantics of the Speculative Store Bypass Barrier are:
•
When a load to a location appears in program order after the SSBB, then the load does not speculatively read
an entry earlier in the coherence order for that location than the entry generated by the latest store satisfying
all of the following conditions:
—
The store is to the same location as the load.
—
The store uses the same virtual address as the load.
—
The store appears in program order before the SSBB.
•
When a load to a location appears in program order before the SSBB, then the load does not speculatively read
data from any store satisfying all of the following conditions:
—
The store is to the same location as the load.
—
The store uses the same virtual address as the load.
—
The store appears in program order before the SSBB.
Physical Speculative Store Bypass Barrier (PSSBB)
The PSSBB is a memory barrier that prevents speculative loads from bypassing earlier stores to the same physical
address under certain conditions.
The semantics of the Physical Speculative Store Bypass Barrier are:
ARM DDI 0487D.a
ID103018
•
When a load to a location appears in program order after the PSSBB, then the load does not speculatively read
an entry earlier in the coherence order for that location than the entry generated by the latest store satisfying
all of the following conditions:
—
The store is to the same location as the load.
—
The store appears in program order before the PSSBB.
•
When a load to a location appears in program order before the PSSBB, then the load does not speculatively read
data from any store satisfying all of the following conditions:
—
The store is to the same location as the load.
—
The store appears in program order before the SSBB.
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
B2-105
B2 The AArch64 Application Level Memory Model
B2.3 Definition of the ARMv8 memory model
Trace Synchronization Barrier (TSB CSYNC)
The TSB CSYNC is a memory barrier instruction that preserves the relative order of memory accesses to System
registers due to trace operations and other memory accesses to the same registers.
A trace operation is an operation of the PE Trace Unit generating trace for an instruction when ARMv8.4-Trace is
implemented and enabled.
A TSB CSYNC is not required to execute in program order with respect to other instructions. This includes being
reordered with respect to other trace instructions. One or more context synchronization events are required to ensure
that TSB CSYNC is executed in the necessary order.
If trace is generated between a context synchronization event and a TSB CSYNC operation, these trace operations may
be reordered with respect to the TSB CSYNC operation, and therefore may not be synchronized.
The following situations are synchronized using a TSB CSYNC:
•
A direct write B to a System register is ordered after an indirect read or indirect write of the same register by
a trace operation A, if all of the following are true:
—
A is executed in program order before a context synchronization event C.
—
C is in program order before a TSB CSYNC operation T.
—
B is executed in program order after T.
•
A direct read B of a System register is ordered after an indirect write to the same register by a trace operation
if all the following are true:
—
A is executed in program order before a context synchronization event C1.
—
C1 is in program order before TSB CSYNC operation T.
—
T is executed in program order before a second context synchronization event C2.
—
B is executed in program order after C2.
A TSB CSYNC operation is not needed to ensure a direct write B to a System register is ordered before an indirect read
or indirect write of the same register by a trace operation A, if all the following are true:
•
A is executed in program order after a context synchronization event C.
•
B is executed in program order before C.
The pseudocode function for the operation of a TSB CSYNC is TraceSynchronizationBarrier().
Data Synchronization Barrier (DSB)
A DSB is a memory barrier that ensures that memory accesses that occur before the DSB have completed before
the completion of the DSB instruction. In doing this, it acts as a stronger barrier than a DMB and all ordering that
is created by a DMB with specific options is also generated by a DSB with the same options.
Execution of a DSB:
•
At EL2 ensures that any memory accesses caused by Speculative translation table walks from the EL1&0
translation regime have been observed.
•
At EL3 ensures that any memory accesses caused by speculative translation table walks from the EL2 or
EL2&0 translation regime.
For more information, see Use of out-of-context translation regimes on page D5-2406.
A DSB executed by a PE, PEe, completes when all of the following apply:
B2-106
•
All explicit memory accesses of the required access types appearing in program order before the DSB are
complete for the set of observers in the required shareability domain.
•
If the required access types of the DSB is reads and writes, then all cache maintenance instructions and all TLB
maintenance instructions issued by PEe before the DSB are complete for the required shareability domain.
Copyright © 2013-2018 ARM Limited or its affiliates. All rights reserved.
Non-Confidential
ARM DDI 0487D.a
ID103018
B2 The AArch64 Application Level Memory Model
B2.3 Definition of the ARMv8 memory model
In addition, no instruction that appears in program order after the DSB instruction can alter any state of the system
or perform any part of its functionality until the DSB completes other than:
•
Being fetched from memory and decoded.
•
Reading the general-purpose, SIMD and floating-point, Special-purpose, or System registers that are directly
or indirectly read without causing side-effects.
The pseudocode function for the operation of a DSB is DataSynchronizationBarrier().
See also Memory barriers on page D4-2382.
Shareability and access limitations on the data barrier operations
The DMB and DSB instructions take an argument that specifies:
•
The shareability domain over which the instruction must operate. This is one of:
—
Full system.
—
Outer Shareable.
—
Inner Shareable.
—
Non-shareable.
•
The accesses for which the instruction operates. This is one of:
—
Read and write accesses, both before and after the barrier instruction.
—
Write accesses only, before and after the barrier instruction.
—
Read accesses before the barrier instruction, and read and write accesses after the barrier instruction.
Note
This form of a DMB or DSB instruction can be described as a Load-Load/Store barrier.
For more information on whether an access is before or after a barrier instruction, see Data Memory Barrier (DMB)
on page B2-104 or Data Synchronization Barrier (DSB) on page B2-106.
Table B2-1 shows how these options are encoded in the